Commit Graph

32350 Commits

Author SHA1 Message Date
Hans Wennborg f1f36517b7 InstCombine: Fold comparisons between unguessable allocas and other pointers
This will allow us to optimize code such as:

  int f(int *p) {
    int x;
    return p == &x;
  }

as well as:

  int *allocate(void);
  int f() {
    int x;
    int *p = allocate();
    return p == &x;
  }

The folding can only be done under certain circumstances. Even though p and &x
cannot alias, the comparison must still return true if the pointer
representations are equal. If a user successfully generates a p that's a
correct guess for &x, comparison should return true even though p is an invalid
pointer.

This patch argues that if the address of the alloca isn't observable outside the
function, the function can act as-if the address is impossible to guess from the
outside. The tricky part is keeping the act consistent: if we fold p == &x to
false in one place, we must make sure to fold any other comparisons based on
those pointers similarly. To ensure that, we only fold when &x is involved
exactly once in comparison instructions.

Differential Revision: http://reviews.llvm.org/D13358

llvm-svn: 249490
2015-10-07 00:20:07 +00:00
David Blaikie 534ff2caca Move test to X86-specific due to some IR invalid on other targets
llvm-svn: 249489
2015-10-07 00:17:31 +00:00
David Blaikie c9ad9191a7 DebugInfo: Include the decl_line/decl_file in subprogram definitions if they differ from those in the declaration
This is handy for some AutoFDO stuff, and seems like a minor improvement
to correctness (otherwise a debug info consumer might think the decl
line/file of the def was the same as that of the declaration - though
what a consumer might use that for, I'm not sure - maybe "list <func>"
would've misbehaved with the old behavior?) and at a minor cost (in my
experiment, with fission, without type units, without compression, 0.01%
growth in debug info in the executable/objects, 0.02% growth in the .dwo
files).

llvm-svn: 249487
2015-10-07 00:04:16 +00:00
David Majnemer 7735a6d07a [WinEH] Create a separate MBB for funclet prologues
Our current emission strategy is to emit the funclet prologue in the
CatchPad's normal destination.  This is problematic because
intra-funclet control flow to the normal destination is not erroneous
and results in us reevaluating the prologue if said control flow is
taken.

Instead, use the CatchPad's location for the funclet prologue.  This
correctly models our desire to have unwind edges evaluate the prologue
but edges to the normal destination result in typical control flow.

Differential Revision: http://reviews.llvm.org/D13424

llvm-svn: 249483
2015-10-06 23:31:59 +00:00
Lang Hames 44780acd91 [Orc] Teach the CompileOnDemand layer to clone aliases.
This allows modules containing aliases to be lazily jit'd. Previously these
failed with missing symbol errors because the aliases weren't cloned from the
original module.

llvm-svn: 249481
2015-10-06 22:55:05 +00:00
Kevin Enderby a59824a174 Fix two bugs in llvm-objdump’s printing of Objective-C meta data
from malformed Mach-O files that caused crashes.

We recently got about 700 malformed Mach-O files which we have
been using the improve the robustness of tools that deal with reading
data from object files.  These resulted in about 20 small bug fixes to
the darwin based tools.

The goal here is to also improve the robustness of llvm-objdump and
this is the first two fixes.  In talking with Tim Northover the approach
we thought might be best is to:

1) Only include tests for the malformed Mach-O files that cause crashes
(not all 700+ tests).
2) The test should only contain the command line option that caused the
crash and not all the others that don’t matter.
3) There should be only one line for the FileCheck that is past the point
of the crash if possible and if possible indicates the malformation.

Again the goal is to fix crashes and not so much care about how the
printing of malformed data comes out.

Tim also suggested if we really wanted to add test cases for all 700+
malformed Mach-O files putting them in the regression tests might be
an option.  But many of these do not cause crashes.

llvm-svn: 249479
2015-10-06 22:27:08 +00:00
Sanjoy Das 5c8bead46d [IndVars] Don't break dominance in `eliminateIdentitySCEV`
Summary:
After r249211, `getSCEV(X) == getSCEV(Y)` does not guarantee that X and
Y are related in the dominator tree, even if X is an operand to Y (I've
included a toy example in comments, and a real example as a test case).

This commit changes `SimplifyIndVar` to require a `DominatorTree`.  I
don't think this is a problem because `ScalarEvolution` requires it
anyway.

Fixes PR25051.

Depends on D13459.

Reviewers: atrick, hfinkel

Subscribers: joker.eph, llvm-commits, sanjoy

Differential Revision: http://reviews.llvm.org/D13460

llvm-svn: 249471
2015-10-06 21:44:49 +00:00
Tom Stellard 0fbf899c0f AMDGPU/SI: Remove calling convention assertion from LowerFormalArguments()
Summary:
We currently ignore the calling convention, so there is no real reason to
assert on the calling convention of functions.

Reviewers: arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D13367

llvm-svn: 249468
2015-10-06 21:16:34 +00:00
Philip Reames 675418ebc0 Extend known bits to understand @llvm.bswap
This is a cleaned up patch from the one written by John Regehr based on the findings of the Souper superoptimizer.

When writing tests, I was surprised to find that instsimplify apparently doesn't know how to collapse bit test sequences based purely on known bits. This required me to split my tests across both instsimplify and instcombine.

Differential Revision: http://reviews.llvm.org/D13250

llvm-svn: 249453
2015-10-06 20:20:45 +00:00
Philip Reames 600a91580f Fix pr25040 - Handle vectors of i1s in recently added implication code
As mentioned in the bug, I'd missed the presence of a getScalarType in the caller of the new implies method. As a result, when we ended up with a implication over two vectors, we'd trip an assert and crash.

Differential Revision: http://reviews.llvm.org/D13441

llvm-svn: 249442
2015-10-06 19:00:02 +00:00
Chad Rosier cb14dd0265 [ARM] Simplify tests and make checks more rigid. NFC.
llvm-svn: 249432
2015-10-06 17:54:12 +00:00
Mehdi Amini cf2513b352 This patch builds on top of D13378 to handle constant condition.
With this patch, clang -O3 optimizes correctly providing > 1000x speedup on this artificial benchmark):

for (a=0; a<n; a++)
    for (b=0; b<n; b++)
        for (c=0; c<n; c++)
            for (d=0; d<n; d++)
                for (e=0; e<n; e++)
                    for (f=0; f<n; f++)
                        x++;
From test-suite/SingleSource/Benchmarks/Shootout/nestedloop.c

Reviewers: sanjoyd

Differential Revision: http://reviews.llvm.org/D13390

From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 249431
2015-10-06 17:19:20 +00:00
Tom Stellard 88e0b25181 AMDGPU/SI: Add 64-bit versions of v_nop and v_clrexcp
Summary:
The assembly printing of these is still missing the encoding size
suffix, but this will be fixed in a later commit.

Reviewers: arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D13436

llvm-svn: 249424
2015-10-06 15:57:53 +00:00
Krzysztof Parzyszek fb33824efd [Hexagon] Add an early if-conversion pass
llvm-svn: 249423
2015-10-06 15:49:14 +00:00
Daniel Sanders 1b3341724c [mips][microMIPS] Fix an issue with selecting sqrt instruction in LLVM backend
Summary:
This fixes 7 tests during fast LLVM test-suite run:
* MultiSource/Benchmarks/McCat/18-imp/imp
* MultiSource/Applications/oggenc/oggenc
* MultiSource/Benchmarks/MallocBench/gs/gs
* MultiSource/Benchmarks/MiBench/automotive-susan/automotive-susan
* MultiSource/Benchmarks/VersaBench/beamformer/beamformer
* MultiSource/Benchmarks/MiBench/consumer-lame/consumer-lame
* MultiSource/Benchmarks/Bullet/bullet

Error message was in the form of:
fatal error: error in backend: Cannot select: 0x95c3288: f32 = fsqrt 0x95c0190 [ORD=9] [ID=18]
  0x95c0190: f32 = fadd 0x95bef30, 0x95c4d00 [ORD=8] [ID=17]
    0x95bef30: f32 = fmul 0x95c4988, 0x95c4988 [ORD=5] [ID=16]
...

There was problem with selecting sqrt instruction in LLVM backend.

To fix the issue changes are made in TableGen definition for sqrt instruction in MipsInstrFPU.td and new test file sqrt.ll is added to LLVM regression tests.

Patch by Zlatko Buljan

Reviewers: zoran.jovanovic, hvarga, dsanders

Subscribers: llvm-commits, petarj

Differential Revision: http://reviews.llvm.org/D13235

llvm-svn: 249416
2015-10-06 15:17:25 +00:00
Daniel Sanders add9057fa7 Revert r249123 - [mips][microMIPS] Fix an issue with selecting sqrt instruction in LLVM backend
The author was not credited and most of the commit message is missing. Will re-commit with this fixed.

llvm-svn: 249415
2015-10-06 15:13:16 +00:00
Filipe Cabecinhas b70fd8719e Make sure the CastInst is valid before trying to create it
Bug found with afl-fuzz.

llvm-svn: 249396
2015-10-06 12:37:54 +00:00
Andrea Di Biagio 40f59e4466 [InstCombine] Teach SimplifyDemandedVectorElts how to handle ConstantVector select masks with ConstantExpr elements (PR24922)
If the mask of a select instruction is a ConstantVector, method
SimplifyDemandedVectorElts iterates over the mask elements to identify which
values are selected from the select inputs.

Before this patch, method SimplifyDemandedVectorElts always used method
Constant::isNullValue() to check if a value in the mask was zero. Unfortunately
that method always returns false when called on a ConstantExpr.

This patch fixes the problem in SimplifyDemandedVectorElts by adding an explicit
check for ConstantExpr values. Now, if a value in the mask is a ConstantExpr, we
avoid calling isNullValue() on it.

Fixes PR24922.

Differential Revision: http://reviews.llvm.org/D13219

llvm-svn: 249390
2015-10-06 10:34:53 +00:00
Daniel Sanders bb65d730bf [mips][disassembler] Changed CHECK-EB directives to CHECK so div/divu are tested.
llvm-svn: 249386
2015-10-06 10:08:14 +00:00
Daniel Sanders d245267be0 [mips][disassembler] Merged disassembler tests into the corresponding ISA/ASE subdirectories.
llvm-svn: 249384
2015-10-06 10:02:35 +00:00
Daniel Sanders 31bfdb5a82 [mips][disassembler] Moved DSP tests into proper place and corrected formatting.
llvm-svn: 249383
2015-10-06 09:28:48 +00:00
Craig Topper 2c4068f409 [TwoAddressInstructionPass] When looking for a 3 addr conversion after commuting, make sure regB has been updated to take into account the commute.
llvm-svn: 249378
2015-10-06 05:39:59 +00:00
Alexei Starovoitov 4e01a38da0 [bpf] Avoid extra pointer arithmetic for stack access
For the program like below
struct key_t {
  int pid;
  char name[16];
};
extern void test1(char *);
int test() {
  struct key_t key = {};
  test1(key.name);
  return 0;
}
For key.name, the llc/bpf may generate the below code:
  R1 = R10  // R10 is the frame pointer
  R1 += -24 // framepointer adjustment
  R1 |= 4   // R1 is then used as the first parameter of test1
OR operation is not recognized by in-kernel verifier.

This patch introduces an intermediate FI_ri instruction and
generates the following code that can be properly verified:
  R1 = R10
  R1 += -20

Patch by Yonghong Song <yhs@plumgrid.com>

llvm-svn: 249371
2015-10-06 04:00:53 +00:00
Craig Topper 79dd1bf094 [X86] Teach constant hoisting that ANDs with 64-bit immediates in the range 0x80000000-0xffffffff can be handled cheaply and don't need to be hoisted.
Most importantly, this keeps constant hoisting from preventing instruction selections ability to turn an AND with 0xffffffff into a move into a 32-bit subregister.

llvm-svn: 249370
2015-10-06 02:50:24 +00:00
Dan Gohman e51c058ecc [WebAssembly] Switch to a more traditional assembly syntax
This new syntax is built around putting each instruction on its own line
in a "mnemonic op, op, op" like syntax. It also uses conventional data
section directives like ".byte" and so on rather than requiring everything
to be in hierarchical S-expression format. This is a more natural syntax
for a ".s" file format from the perspective of LLVM MC and related tools,
while remaining easy to translate into other forms as needed.

llvm-svn: 249364
2015-10-06 00:27:55 +00:00
Adrian Prantl d2793a030b dsymutil: Don't prune forward declarations inside of an imported TAG_module
if there exists not definition for the type.
For this to work, we need to clone the imported modules before building
the decl context chains of the DIEs in the non-skeleton CUs.

llvm-svn: 249362
2015-10-05 23:11:20 +00:00
Arnold Schwaighofer 0591c5d719 MergeFunctions: Clear GlobalNumbers ValueMap
Otherwise, the map will observe changes as long as MergeFunctions is alive. This
is bad because follow-up passes could replace-all-uses-with on the key of an
entry in the map. The value handle callback of ValueMap however asserts that the
key type matches.

rdar://22971893

llvm-svn: 249327
2015-10-05 17:26:36 +00:00
Scott Douglass 953f908173 [ARM] Modify codegen for memcpy intrinsic to prefer LDM/STM.
We were previously codegen'ing memcpy as regular load/store operations and
hoping that the register allocator would allocate registers in ascending order
so that we could apply an LDM/STM combine after register allocation. According
to the commit that first introduced this code (r37179), we planned to teach the
register allocator to allocate the registers in ascending order. This never got
implemented, and up to now we've been stuck with very poor codegen.

A much simpler approach for achieving better codegen is to create MEMCPY pseudo
instructions, attach scratch virtual registers to them and then, post register
allocation, expand the MEMCPYs into LDM/STM pairs using the scratch registers.
The register allocator will have picked arbitrary registers which we sort when
expanding the MEMCPY. This approach also avoids the need to repeatedly calculate
offsets which ultimately ought to be eliminated pre-RA in order to decrease
register pressure.

Fixes PR9199 and PR23768.

[This is based on Peter Collingbourne's r238473 which was reverted.]

Differential Revision: http://reviews.llvm.org/D13239

Change-Id: I727543c2e94136e0f80b8e22d5642d7b9ee5b458
Author: Peter Collingbourne <peter@pcc.me.uk>
llvm-svn: 249322
2015-10-05 14:49:54 +00:00
Zoran Jovanovic 5a8dffc618 [mips][microMIPS] Implement JALRC16, JRCADDIUSP and JRC16 instructions
Differential Revision: http://reviews.llvm.org/D11219

llvm-svn: 249317
2015-10-05 14:00:09 +00:00
Alexandros Lamprineas 1bab191f25 [MC layer][AArch64] llvm-mc accepts 4-bit immediate values for
"msr pan, #imm", while only 1-bit immediate values should be valid.
Changed encoding and decoding for msr pstate instructions.

Differential Revision: http://reviews.llvm.org/D13011

llvm-svn: 249313
2015-10-05 13:42:31 +00:00
Daniel Sanders d5a89418c5 [mips] Changed the way symbols are handled in dla and la instructions to allow simple expressions.
Summary:
An instruction like "(d)la $5, symbol+8" previously would have crashed the
assembler as it contains an expression. This is now fixed.
A few tests cases have also been changed to reflect these changes, however
these should only be syntax changes. Some new test cases have also been
added.

Patch by Scott Egerton.

Reviewers: vkalintiris, dsanders

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D12760

llvm-svn: 249311
2015-10-05 13:19:29 +00:00
Alexandros Lamprineas 057f0a68cc Added missing test for [ARM] AttributeParser. Check that build attribute
Tag_Advanced_SIMD_arch is set correctly when targeting v8.1-a NEON.

Differential Revision: http://reviews.llvm.org/D13281

llvm-svn: 249304
2015-10-05 12:13:29 +00:00
Rafael Espindola e3a20f57d9 Fix pr24486.
This extends the work done in r233995 so that now getFragment (in addition to
getSection) also works for variable symbols.

With that the existing logic to decide if a-b can be computed works even if
a or b are variables. Given that, the expression evaluation can avoid expanding
variables as aggressively and that in turn lets the relocation code see the
original variable.

In order for this to work with the asm streamer, there is now a dummy fragment
per section. It is used to assign a section to a symbol when no other fragment
exists.

This patch is a joint work by Maxim Ostapenko andy myself.

llvm-svn: 249303
2015-10-05 12:07:05 +00:00
Teresa Johnson 403a787e03 Support for function summary index bitcode sections and files.
Summary:
The bitcode format is described in this document:
  https://drive.google.com/file/d/0B036uwnWM6RWdnBLakxmeDdOeXc/view
For more info on ThinLTO see:
  https://sites.google.com/site/llvmthinlto

The first customer is ThinLTO, however the data structures are designed
and named more generally based on prior feedback. There are a few
comments regarding how certain interfaces are used by ThinLTO, and the
options added here to gold currently have ThinLTO-specific names as the
behavior they provoke is currently ThinLTO-specific.

This patch includes support for generating per-module function indexes,
the combined index file via the gold plugin, and several tests
(more are included with the associated clang patch D11908).

Reviewers: dexonsmith, davidxl, joker.eph

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D13107

llvm-svn: 249270
2015-10-04 14:33:43 +00:00
Simon Pilgrim bb01c6fda2 [X86][SSE4A] Added shuffle decode tests for 'special case' SSE4A EXTRQI/INSERTQI ops.
llvm-svn: 249263
2015-10-04 10:12:53 +00:00
Joerg Sonnenberger 726e624c0c [SPARCv9] Add support for the rdpr/wrpr instructions.
llvm-svn: 249262
2015-10-04 09:11:22 +00:00
Igor Breger 78741a1b1e AVX512: Implemented encoding and intrinsics for VPERMILPS/PD instructions.
Added tests for intrinsics and encoding.

Differential Revision: http://reviews.llvm.org/D12690

llvm-svn: 249261
2015-10-04 07:20:41 +00:00
David Majnemer 161935520d [WinEH] Permit branch folding in the face of funclets
Track which basic blocks belong to which funclets.  Permit branch
folding to fire but only if it can prove that doing so will not cause
code in one funclet to be reused in another.

llvm-svn: 249257
2015-10-04 02:22:52 +00:00
Simon Pilgrim dde63374c5 [DAGCombiner] Generalize FADD constant combines to work with vectors
Updated the FADD combines to work with vectors as well as scalars.

Differential Revision: http://reviews.llvm.org/D13416

llvm-svn: 249251
2015-10-03 22:06:06 +00:00
Sanjay Patel 004ea240ad add test cases that demonstrate bad behavior
These are based on PR25016 and likely caused by a bug in
MachineCombiner's definition of improvesCriticalPathLen().

llvm-svn: 249249
2015-10-03 20:52:55 +00:00
Davide Italiano 4961936d1a [llvm-size] Attempt to fix a test failure on Windows.
llvm-svn: 249247
2015-10-03 20:20:28 +00:00
Davide Italiano f0acbbfd96 [llvm-size] Fix time to check if time of use bug.
This was the last tool relying on this pattern.

llvm-svn: 249244
2015-10-03 19:44:06 +00:00
Simon Pilgrim 93ea954e6d [X86][SSE] Add FADD combine tests.
llvm-svn: 249240
2015-10-03 18:17:43 +00:00
Dan Gohman dc51b96b7f [WebAssembly] Implement the remaining conversion operations.
This is a temporary assembly syntax that will likely evolve along with
broader upcoming syntax changes.

llvm-svn: 249225
2015-10-03 02:10:28 +00:00
Dan Gohman 6a050f30de [WebAssembly] Rename setlocal to set_local to match the spec.
llvm-svn: 249218
2015-10-03 00:01:53 +00:00
Dan Gohman eb440092c9 [WebAssembly] Update this test for the new loop scheme.
llvm-svn: 249217
2015-10-02 23:54:03 +00:00
Sanjoy Das 55015d210f [SCEV] Recognize simple br-phi patterns
Summary:
Teach SCEV to match patterns like

```
  br %cond, label %left, label %right
 left:
  br label %merge
 right:
  br label %merge
 merge:
  V = phi [ %x, %left ], [ %y, %right ]
```

as "select %cond, %x, %y".  Before this SCEV would match PHI nodes
exclusively to add recurrences.

This addresses PR25005.

Reviewers: joker.eph, joker-eph, atrick

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D13378

llvm-svn: 249211
2015-10-02 23:09:44 +00:00
Piotr Padlewski dc9b2cfc50 inariant.group handling in GVN
The most important part required to make clang
devirtualization works ( ͡°͜ʖ ͡°).
The code is able to find non local dependencies, but unfortunatelly
because the caller can only handle local dependencies, I had to add
some restrictions to look for dependencies only in the same BB.

http://reviews.llvm.org/D12992

llvm-svn: 249196
2015-10-02 22:12:22 +00:00
Dan Gohman e3e4a5ff52 [WebAssembly] Fix CFG stackification of nested loops.
llvm-svn: 249187
2015-10-02 21:11:36 +00:00
Dan Gohman 9cc692b06e [WebAssembly] Support calls marked as "tail", fastcc, and coldcc.
llvm-svn: 249184
2015-10-02 20:54:23 +00:00
Richard Trieu e0129e474d Call the correct overload.
Call the correct overload so a string literal does not get converted to a bool.
Also fix the test case to match the names given.

llvm-svn: 249183
2015-10-02 20:52:14 +00:00
Dan Gohman baba8c648b [WebAssembly] Add a resize_memory intrinsic.
llvm-svn: 249178
2015-10-02 20:10:26 +00:00
Michael Zolotukhin d57f4b9011 [Tests] Add one more case to LoopUnroll/pr18861.ll for better coverage.
llvm-svn: 249174
2015-10-02 19:21:52 +00:00
Michael Zolotukhin 8df4bddd16 [Tests] Give meaningful names to blocks in LoopUnroll/pr18861.ll, add a description of what's going on.
llvm-svn: 249173
2015-10-02 19:21:49 +00:00
Michael Zolotukhin 47eef7a3c9 [Tests] Slightly reduce test LoopUnroll/pr18861.ll.
llvm-svn: 249172
2015-10-02 19:21:43 +00:00
Dan Gohman 72f1692a2c [WebAssembly] Add a memory_size intrinsic.
llvm-svn: 249171
2015-10-02 19:21:15 +00:00
Sanjoy Das 7d910f2b11 [SCEV] Try to prove predicates by splitting them
Summary:
This change teaches SCEV that to prove `A u< B` it is sufficient to
prove each of these facts individually:

 - B >= 0
 - A s< B
 - A >= 0

In practice, SCEV sometimes finds it easier to prove these facts
individually than to prove `A u< B` as one atomic step.

Reviewers: reames, atrick, nlewycky, hfinkel

Subscribers: sanjoy, llvm-commits

Differential Revision: http://reviews.llvm.org/D13042

llvm-svn: 249168
2015-10-02 18:50:30 +00:00
Roman Divacky 4b5507a037 Actually switch the arch when we see .arch. PR21695
llvm-svn: 249165
2015-10-02 18:25:25 +00:00
Tim Northover 8d67b8e053 ARM: diagnose invalid local fixups on Thumb1
We previously stopped producing Thumb2 relaxations when they weren't supported,
but only diagnosed the case where an actual relocation was produced. We should
also tell people if local symbols aren't going to work rather than silently
overflowing.

llvm-svn: 249164
2015-10-02 18:07:18 +00:00
Tim Northover 956b008db6 ARM: correctly align constant pool value on Thumb1 targets.
Since we're using tLDRpci to access it, the constant pool's address must be 0
(mod 4).

llvm-svn: 249163
2015-10-02 18:07:13 +00:00
Andrea Di Biagio 77f62652c1 Reapply r249121 : "[FastISel][x86] Teach how to select SSE2/AVX bitcasts between 128/256-bit vector types."
This patch teaches FastIsel the following two things:
1) On SSE2, no instructions are needed for bitcasts between 128-bit vector types;
2) On AVX, no instructions are needed for bitcasts between 256-bit vector types.

Example:

  %1 = bitcast <4 x i31> %V to <2 x i64>

Before (-fast-isel -fast-isel-abort=1):

  FastIsel miss: %1 = bitcast <4 x i31> %V to <2 x i64>

Now we don't fall back to SelectionDAG and we correctly fold that computation
propagating the register associated to %V.

Originally reviewed here: http://reviews.llvm.org/D13347

llvm-svn: 249147
2015-10-02 16:08:05 +00:00
Andrea Di Biagio 45874e67a1 Revert: [FastISel][x86] Teach how to select SSE2/AVX bitcasts between 128/256-bit vector types.
r249121 caused a Clang test failure (avx2-buitins.c).
Revert r249121 while I keep investigating on the reason why that test failed.

llvm-svn: 249124
2015-10-02 13:06:19 +00:00
Zoran Jovanovic 9ffdfa5986 [mips][microMIPS] Fix an issue with selecting sqrt instruction in LLVM backend
Differential Revision: http://reviews.llvm.org/D13235

llvm-svn: 249123
2015-10-02 13:06:02 +00:00
Andrea Di Biagio cb33456122 [FastISel][x86] Teach how to select SSE2/AVX bitcasts between 128/256-bit vector types.
This patch teaches FastIsel the following two things:
1) On SSE2, no instructions are needed for bitcasts between 128-bit vector types;
2) On AVX, no instructions are needed for bitcasts between 256-bit vector types.

Example:

  %1 = bitcast <4 x i31> %V to <2 x i64>

Before (-fast-isel -fast-isel-abort=1):

  FastIsel miss: %1 = bitcast <4 x i31> %V to <2 x i64>

Now we don't fall back to SelectionDAG and we correctly fold that computation
propagating the register associated to %V.

Differential Revision: http://reviews.llvm.org/D13347

llvm-svn: 249121
2015-10-02 12:45:37 +00:00
Adrian Prantl 42562c38f5 dsymutil: Also ignore the ByteSize when building the DeclContext cache for
clang modules.

Forward decls of ObjC interfaces don't have a bytesize.

llvm-svn: 249110
2015-10-02 00:27:08 +00:00
Bruno Cardoso Lopes b491a2d641 [SimplifyLibCalls] Fix instruction misplacement in string/memory libcall optimization
When trying to optimize fortified library functions use the right
location to insert new instructions in order to preserve correct
def-use order.

This fixes an issue where a misplaced instruction definition would
happen to be *after* one of its use after a RAUW, forming invalid IR.
This behavior was introduced by r227250.

Differential Revision: http://reviews.llvm.org/D13301

rdar://problem/22802369

llvm-svn: 249092
2015-10-01 22:43:53 +00:00
Colin LeMahieu 665c9be489 [Hexagon] XFAILing test while diagnosing backend error.
llvm-svn: 249088
2015-10-01 22:14:05 +00:00
Joerg Sonnenberger c8d50d6347 Fix relocation used for GOT references in non-PIC mode. Fix relocations
for "set" pseudo op in PIC mode.

Differential Revision: http://reviews.llvm.org/D13173

llvm-svn: 249086
2015-10-01 22:08:20 +00:00
Davide Italiano f070688ecf [PATCH] D13360: [llvm-objdump] Teach -d about AArch64 mapping symbols
AArch64 uses $d* and $x* to interleave between text and data.
llvm-objdump didn't know about this so it ended up printing garbage.
This patch is a first step towards a solution of the problem.

Differential Revision:	 http://reviews.llvm.org/D13360

llvm-svn: 249083
2015-10-01 21:57:09 +00:00
Reid Kleckner fc64fae6e3 [WinEH] Emit __C_specific_handler tables for the new IR
We emit denormalized tables, where every range of invokes in the same
state gets a complete list of EH action entries. This is significantly
simpler than trying to infer the correct nested scoping structure from
the MI. Fortunately, for SEH, the nesting structure is really just a
size optimization.

With this, some basic __try / __except examples work.

llvm-svn: 249078
2015-10-01 21:38:24 +00:00
Colin LeMahieu f92c175bdd [Hexagon] XFAILing test while diagnosing backend error.
llvm-svn: 249075
2015-10-01 21:19:03 +00:00
Tom Stellard e9f8b24985 AMDGPU/SI: Remove assert from AMDGPUOpenCLImageTypeLowering pass
Summary:
Instead of asserting when the kernel metadata is different than we expect,
we should just skip lowering that function.  This fixes assertion
failures with OpenCL argument metadata from older LLVM releases.

Reviewers: arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D13356

llvm-svn: 249073
2015-10-01 21:16:05 +00:00
David Majnemer 4600c06434 [WinEH] Stop BranchFolding from merging across funclets
BranchFolding would merge two funclets together, this is not OK.
Disable this and strengthen the assertion in FuncletLayout.

llvm-svn: 249069
2015-10-01 21:04:13 +00:00
David Majnemer f828a0ccc7 [WinEH] Make FuncletLayout more robust against catchret
Catchret transfers control from a catch funclet to an earlier funclet.
However, it is not completely clear which funclet the catchret target is
part of.  Make this clear by stapling the catchret target's funclet
membership onto the CATCHRET SDAG node.

llvm-svn: 249052
2015-10-01 18:44:59 +00:00
Jonas Paulsson 12629324a4 [SystemZ] Add some generic (floating point support) load instructions.
Add generic instructions for load complement, load negative and load positive
for fp32 and fp64, and let isel prefer them. They do not clobber CC, and so
give scheduler more freedom. SystemZElimCompare pass will convert them when it
can to the CC-setting variants.

Regression tests updated to expect the new opcodes in places where the old ones
where used. New test case SystemZ/fp-cmp-05.ll checks that
SystemZCompareElim.cpp can handle the new opcodes.

README.txt updated (bullet removed).

Note that fp128 is not yet handled, because it is relatively rare, and is a
bit trickier, because of the fact that l.dfr would operate on the sign bit of
one of the subregisters of a fp128, but we would not want to copy the other
sub-reg in case src and dst regs are not the same.

Reviewed by Ulrich Weigand.

llvm-svn: 249046
2015-10-01 18:12:28 +00:00
Rafael Espindola e883514736 Fix printing of 64 bit values and make test more strict.
llvm-svn: 249043
2015-10-01 17:57:31 +00:00
Tom Stellard e0e582c9aa AMDGPU: Add MEM_RAT STORE_TYPED.
v2: Add test (Matt).
    Fix capitalization of isEOP (Matt).
    Move pattern to class parameter (Matt).
    Make the instruction available to Cayman (Matt).
    Change name from MEM_RAT WRITE_TYPED to MEM_RAT STORE_TYPED.

Patch by: Zoltan Gilian

llvm-svn: 249042
2015-10-01 17:51:34 +00:00
NAKAMURA Takumi 1ed20db720 Revert r248959, "[WinEH] Emit int3 after noreturn calls on Win64"
It broke; LLVM :: CodeGen__Generic__2009-11-16-BadKillsCrash.ll

llvm-svn: 249032
2015-10-01 17:00:56 +00:00
Arnaud A. de Grandmaison 849f3bf8c9 [InstCombine] Remove trivially empty lifetime start/end ranges.
Summary:
Some passes may open up opportunities for optimizations, leaving empty
lifetime start/end ranges. For example, with the following code:

    void foo(char *, char *);
    void bar(int Size, bool flag) {
      for (int i = 0; i < Size; ++i) {
        char text[1];
        char buff[1];
        if (flag)
          foo(text, buff); // BBFoo
      }
    }

the loop unswitch pass will create 2 versions of the loop, one with
flag==true, and the other one with flag==false, but always leaving
the BBFoo basic block, with lifetime ranges covering the scope of the for
loop. Simplify CFG will then remove BBFoo in the case where flag==false,
but will leave the lifetime markers.

This patch teaches InstCombine to remove trivially empty lifetime marker
ranges, that is ranges ending right after they were started (ignoring
debug info or other lifetime markers in the range).

This fixes PR24598: excessive compile time after r234581.

Reviewers: reames, chandlerc

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D13305

llvm-svn: 249018
2015-10-01 14:54:31 +00:00
Ulrich Weigand cf1670a095 [SystemZ] Add assembly instructions for obtaining clock values as well as CPU features
Provide assembler support for STCK, STCKF, STCKE, and STFLE.

Author: joncmu
Differential Revision: http://reviews.llvm.org/D13299

llvm-svn: 249015
2015-10-01 14:43:48 +00:00
Zoran Jovanovic 2960f3a346 [mips][microMIPS] Implement CACHEE, WRPGPR and WSBH instructions
Differential Revision: http://reviews.llvm.org/D10337

llvm-svn: 249004
2015-10-01 12:49:27 +00:00
Scott Douglass 290183d734 [ARM] More care with Thumb1 writeback in ARMLoadStoreOptimizer
Differential Revision: http://reviews.llvm.org/D13240

llvm-svn: 249002
2015-10-01 11:56:19 +00:00
Jingyue Wu df1a1b113b [NaryReassociate] SeenExprs records WeakVH
Summary:
The instructions SeenExprs records may be deleted during rewriting.
FindClosestMatchingDominator should ignore these deleted instructions.

Fixes PR24301.

Reviewers: grosser

Subscribers: grosser, llvm-commits

Differential Revision: http://reviews.llvm.org/D13315

llvm-svn: 248983
2015-10-01 03:51:44 +00:00
Dehao Chen 7c41dd6498 Update sample profile propagation algorithm.
http://reviews.llvm.org/D13218

llvm-svn: 248968
2015-10-01 00:26:56 +00:00
Ahmed Bougacha 23a0d1a1d6 [X86] Don't custom-lower vNi32 uint_to_fp when unsafe-fp-math.
The custom code produces incorrect results if later reassociated.

Since r221657, on x86, vNi32 uitofp is lowered using an optimized
sequence:

  movdqa LCPI0_0(%rip), %xmm1 ## xmm1 = [65535, ...]
  pand %xmm0, %xmm1
  por LCPI0_1(%rip), %xmm1 ## [0x4b000000, ...]
  psrld $16, %xmm0
  por LCPI0_2(%rip), %xmm0 ## [0x53000000, ...]
  addps LCPI0_3(%rip), %xmm0 ## [float -5.497642e+11, ...]
  addps %xmm1, %xmm0

Since r240361, the machine combiner opportunistically reassociates
2-instruction sequences (with -ffast-math). In the new code sequence,
the ADDPS' are eligible. In isolation, for simple examples (without
reassociable users), this makes no performance difference (the goal
being to enable reassociation of longer chains).

In the trivial example (just one uitofp), the reassociation doesn't
happen, because (I think) it would require the emission of a separate
movaps for a constantpool load (instead of folding it into addps).

However, when we have multiple uitofp sequences, and the constantpool
loads are CSE'd earlier, the machine combiner can do the reassociation.

When the ADDPS' are reassociated, the resulting sequence isn't correct
anymore, as we'd be adding large (2**39) constants with comparatively
smaller values (~2**23). Given that two of the three inputs are powers
of 2 larger than 2**16, and that ulp(2**39) == 2**(39-24) == 2**15,
the reassociated chain will produce 0 for any input in [0, 2**14[.
In my testing, it also produces wrong results for 99.5% of [0, 2**32[.

Avoid this by disabling the new lowering when -ffast-math. It does
mean that we'll get slower code than without it, but at least we
won't get egregiously incorrect code.

One might argue that, considering -ffast-math is all but meaningless,
uitofp producing wrong results isn't a compiler bug. But it really is.

Fixes PR24512.

...though this is really more of a workaround.
Ideally, we'd have some sort of Machine FMF, but that's a problem
that's not worth tackling until we do more with machine IR.

llvm-svn: 248965
2015-10-01 00:11:07 +00:00
Reid Kleckner 6dec87a8a0 [WinEH] Emit int3 after noreturn calls on Win64
The Win64 unwinder disassembles forwards from each PC to try to
determine if this PC is in an epilogue. If so, it skips calling the EH
personality function for that frame. Typically, this means you cannot
catch an exception in the same frame that you threw it, because 'throw'
calls a noreturn runtime function.

Previously we avoided this problem with the TrapUnreachable
TargetOption, but that's a much bigger hammer than we need. All we need
is a 1 byte non-epilogue instruction right after the call.  Instead,
what we got was an unconditional branch to a shared block containing the
ud2, potentially 7 bytes instead of 1. So, this reverts r206684, which
added TrapUnreachable, and replaces it with something better.

The new code pattern matches for invoke/call followed by unreachable and
inserts an int3 into the DAG. To be 100% watertight, we would need to
insert SEH_Epilogue instructions into all basic blocks ending in a call
with no terminators or successors, but in practice this is unlikely to
come up.

llvm-svn: 248959
2015-09-30 23:09:23 +00:00
Sanjay Patel a114a10bbe [x86] enable machine combiner reassociations for 256-bit vector logical integer insts
llvm-svn: 248955
2015-09-30 22:25:55 +00:00
Chad Rosier 4c5a4646bf [AArch64] Remove an unnecessary run line and other cleanup. NFC.
Unscaled load/store combining has been enabled since the initial ARM64 port.  No
need for a redundance run.  Also, add CHECK-LABEL directives.

llvm-svn: 248945
2015-09-30 21:10:02 +00:00
Michael Zolotukhin fc783e91e0 [SLP] Don't vectorize loads of non-packed types (like i1, i2).
Summary:
Given an array of i2 elements, 4 consecutive scalar loads will be lowered to
i8-sized loads and thus will access 4 consecutive bytes in memory. If we
vectorize these loads into a single <4 x i2> load, it'll access only 1 byte in
memory. Hence, we should prohibit vectorization in such cases.

PS: Initial patch was proposed by Arnold.

Reviewers: aschwaighofer, nadav, hfinkel

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D13277

llvm-svn: 248943
2015-09-30 21:05:43 +00:00
Evgeniy Stepanov 422a61306e Move dw_op_minus test to DebugInfo/X86.
The test requires X86 target support, and checks the actual debug
info contents, including register numbers which would be different on
other platforms.

llvm-svn: 248938
2015-09-30 20:23:24 +00:00
Evgeniy Stepanov f608111d1b Fix debug info with SafeStack.
llvm-svn: 248933
2015-09-30 19:55:43 +00:00
Chad Rosier 11c825f7db [AArch64] Remove an unnecessary restriction on pre-index instructions.
Previously, the index was constrained to the size of the memory operation for
no apparent reason.  This change removes that constraint so that we can form
pre-index instructions with any valid offset.

llvm-svn: 248931
2015-09-30 19:44:40 +00:00
Hal Finkel 4c45775880 [PowerPC] Disable shrink wrapping
Shrink wrapping is causing a self-hosting failure on PPC64/Linux. Disable for
now until the problem can be fixed.

llvm-svn: 248924
2015-09-30 17:29:03 +00:00
Erik Eckstein 91c49810f2 SLPVectorizer: add a test to check if the minimum region size works.
This is an addition to rL248917.

llvm-svn: 248923
2015-09-30 17:28:19 +00:00
Artyom Skrobov 72ca6b8f3f [ARM] Support for ARMv6-Z / ARMv6-ZK missing
As Richard Barton observed at http://reviews.llvm.org/D12937#inline-107121
TargetParser in LLVM has insufficient support for ARMv6Z and ARMv6ZK.

In particular, there were no tests for TrustZone being supported in these
architectures.

The patch clears a FIXME: left by Saleem Abdulrasool in r201471, and fixes
his test case which hadn't really been testing what it was claiming to test.

Differential Revision: http://reviews.llvm.org/D13236

llvm-svn: 248921
2015-09-30 17:25:52 +00:00
Erik Eckstein 848c1aa452 SLPVectorizer: limit the scheduling region size per basic block.
Usually large blocks are not a problem. But if a large block (> 10k instructions)
contains many (potential) chains of vector instructions, and those chains are
spread over a wide range of instructions, then scheduling becomes a compile time problem.
This change introduces a limit for the accumulate scheduling region size of a block.
For real-world functions this limit will never be exceeded (it's about 10x larger than
the maximum value seen in the test-suite and external test suite).

llvm-svn: 248917
2015-09-30 17:00:44 +00:00
Andrea Di Biagio 0594e2a1e9 [InstCombine] Teach how to convert SSSE3/AVX2 byte shuffles to builtin shuffles if the shuffle mask is constant.
This patch teaches InstCombiner how to convert a SSSE3/AVX2 byte shuffle to a
builtin shuffle if the mask is constant.

Converting byte shuffle intrinsic calls to builtin shuffles can help finding
more opportunities for combining shuffles later on in selection dag.

We may end up with byte shuffles with constant masks as the result of inlining.

Differential Revision: http://reviews.llvm.org/D13252

llvm-svn: 248913
2015-09-30 16:44:39 +00:00
Jeroen Ketema ab99b59e8c [ARM][NEON] Use address space in vld([1234]|[234]lane) and vst([1234]|[234]lane) instructions
This commit changes the interface of the vld[1234], vld[234]lane, and vst[1234],
vst[234]lane ARM neon intrinsics and associates an address space with the
pointer that these intrinsics take. This changes, e.g.,

<2 x i32> @llvm.arm.neon.vld1.v2i32(i8*, i32)

to

<2 x i32> @llvm.arm.neon.vld1.v2i32.p0i8(i8*, i32)

This change ensures that address spaces are fully taken into account in the ARM
target during lowering of interleaved loads and stores.

Differential Revision: http://reviews.llvm.org/D12985

llvm-svn: 248887
2015-09-30 10:56:37 +00:00
Simon Pilgrim 3d11c994f7 [X86][XOP] Added support for the lowering of 128-bit vector shifts to XOP shift instructions
The XOP shifts just have logical/arithmetic versions and the left/right shifts are controlled by whether the value is positive/negative. Because of this I've added new X86ISD nodes instead of trying to force them to use the existing shift nodes.

Additionally Excavator cores (bdver4) support XOP and AVX2 - meaning that it should use the AVX2 shifts when it can and fall back to XOP in other cases.

Differential Revision: http://reviews.llvm.org/D8690

llvm-svn: 248878
2015-09-30 08:17:50 +00:00
Dehao Chen aae9e1f2bd Add unittest for new samle profile format.
http://reviews.llvm.org/D13145

llvm-svn: 248870
2015-09-30 01:05:37 +00:00
Dehao Chen 6722688eaa http://reviews.llvm.org/D13145
Support hierarachical sample profile format.

llvm-svn: 248865
2015-09-30 00:42:46 +00:00
Evgeniy Stepanov d3f544f271 [safestack] Fix a stupid mix-up in the direct-tls code path.
llvm-svn: 248863
2015-09-30 00:01:47 +00:00
Reid Kleckner a13dfd539b [WinEH] Setup RBP correctly in Win64 funclet prologues
Previously local variable captures just didn't work in 64-bit. Now we
can access local variables more or less correctly.

llvm-svn: 248857
2015-09-29 23:32:01 +00:00
David Majnemer 91b0ab9172 [WinEH] Ensure that funclets obey the x64 ABI
The x64 ABI requires that epilogues do not contain code other than stack
adjustments and some limited control flow.  However, we'd insert code to
initialize the return address after stack adjustments.  Instead, insert
EAX/RAX with the current value before we create the stack adjustments in
the epilogue.

llvm-svn: 248839
2015-09-29 22:33:36 +00:00
Maksim Panchenko cce239c45d HHVM calling conventions.
HHVM calling convention, hhvmcc, is used by HHVM JIT for
functions in translated cache. We currently support LLVM back end to
generate code for X86-64 and may support other architectures in the
future.

In HHVM calling convention any GP register could be used to pass and
return values, with the exception of R12 which is reserved for
thread-local area and is callee-saved. Other than R12, we always
pass RBX and RBP as args, which are our virtual machine's stack pointer
and frame pointer respectively.

When we enter translation cache via hhvmcc function, we expect
the stack to be aligned at 16 bytes, i.e. skewed by 8 bytes as opposed
to standard ABI alignment. This affects stack object alignment and stack
adjustments for function calls.

One extra calling convention, hhvm_ccc, is used to call C++ helpers from
HHVM's translation cache. It is almost identical to standard C calling
convention with an exception of first argument which is passed in RBP
(before we use RDI, RSI, etc.)

Differential Revision: http://reviews.llvm.org/D12681

llvm-svn: 248832
2015-09-29 22:09:16 +00:00
Chad Rosier 1769d8505f Fix test from r248825.
llvm-svn: 248827
2015-09-29 20:50:15 +00:00
Chad Rosier 4315012769 [AArch64] Add support for pre- and post-index LDPSWs.
llvm-svn: 248825
2015-09-29 20:39:55 +00:00
David Majnemer a80c151286 [WinEH] Teach AsmPrinter about funclets
Summary:
Funclets have been turned into functions by the time they hit the object
file.  Make sure that they have decent names for the symbol table and
CFI directives explaining how to reason about their prologues.

Differential Revision: http://reviews.llvm.org/D13261

llvm-svn: 248824
2015-09-29 20:12:33 +00:00
Zachary Turner 4dddcc64d3 [llvm-pdbdump] Add include-only filters.
PDB files have a lot of noise in them, with hundreds (or thousands)
of symbols from system libraries and compiler generated types.  If
you're only looking for a specific type, this can be problematic.

This CL allows you to display *only* types, variables, or compilands
matching a particular pattern.  These filters can even be combined
with exclude filters.  Include-only filters are given priority, so
that first the set of items to display is limited only to those that
match the include filters, and then the set of exclude filters is
applied to those.  If there are no include filters specified, then
it means "display everything".

llvm-svn: 248822
2015-09-29 19:49:06 +00:00
Chad Rosier dabe2534ed [AArch64] Add integer pre- and post-index halfword/byte loads and stores.
llvm-svn: 248817
2015-09-29 18:26:15 +00:00
Dehao Chen 028e122ca9 Revert r248810 which breaks tests.
llvm-svn: 248814
2015-09-29 18:18:49 +00:00
Dehao Chen 410a25aa7a http://reviews.llvm.org/D13231
Change lookup functions to const functions.

llvm-svn: 248810
2015-09-29 17:59:58 +00:00
James Molloy 897048bee3 [ValueTracking] Teach isKnownNonZero about monotonically increasing PHIs
If a PHI starts at a non-negative constant, monotonically increases
(only adds of a constant are supported at the moment) and that add
does not wrap, then the PHI is known never to be zero.

llvm-svn: 248796
2015-09-29 14:08:45 +00:00
Jeroen Ketema 740f9d79ca Arguments spilled on the stack before a function call may have
alignment requirements, for example in the case of vectors.
These requirements are exploited by the code generator by using
move instructions that have similar alignment requirements, e.g.,
movaps on x86.

Although the code generator properly aligns the arguments with
respect to the displacement of the stack pointer it computes,
the displacement itself may cause misalignment. For example if
we have

%3 = load <16 x float>, <16 x float>* %1, align 64
call void @bar(<16 x float> %3, i32 0)

the x86 back-end emits:

movaps  32(%ecx), %xmm2
movaps  (%ecx), %xmm0
movaps  16(%ecx), %xmm1
movaps  48(%ecx), %xmm3
subl    $20, %esp       <-- if %esp was 16-byte aligned before this instruction, it no longer will be afterwards 
movaps  %xmm3, (%esp)   <-- movaps requires 16-byte alignment, while %esp is not aligned as such.
movl    $0, 16(%esp)
calll   __bar

To solve this, we need to make sure that the computed value with which
the stack pointer is changed is a multiple af the maximal alignment seen
during its computation. With this change we get proper alignment:

subl    $32, %esp
movaps  %xmm3, (%esp)

Differential Revision: http://reviews.llvm.org/D12337

llvm-svn: 248786
2015-09-29 10:12:57 +00:00
Simon Pilgrim 43f5e0848e [InstCombine] Improve Vector Demanded Bits Through Bitcasts
Currently SimplifyDemandedVectorElts can only peek through bitcasts if the vectors have the same number of elements.

This patch fixes and enables some existing (disabled) code to support bitcasting to vectors with more/fewer elements. It currently only accepts cases when vectors alias cleanly (i.e. number of elements are an exact multiple of the other vector).

This was added to improve the demanded vector elements support for SSE vector shifts which require the __m128i (<2 x i64>) argument type to be bitcast to the vector type for the builtin shift. I've added extra tests for various additional bitcasts.

Differential Revision: http://reviews.llvm.org/D12935

llvm-svn: 248784
2015-09-29 08:19:11 +00:00
Dan Gohman 868e1c08d9 [WebAssembly] Rename test files to match platform naming conventions.
llvm-svn: 248783
2015-09-29 08:13:58 +00:00
Chen Li 9f27fc0599 [LoopUnswitch] Add block frequency analysis to recognize hot/cold regions
Summary: This patch adds block frequency analysis to LoopUnswitch pass to recognize hot/cold regions. For cold regions the pass only performs trivial unswitches since they do not increase code size, and for hot regions everything works as before. This helps to minimize code growth in cold regions and be more aggressive in hot regions. Currently the default cold regions are blocks with frequencies below 20% of function entry frequency, and it can be adjusted via -loop-unswitch-cold-block-frequency flag. The entire feature is controlled via -loop-unswitch-with-block-frequency flag and it is off by default.

Reviewers: broune, silvas, dnovillo, reames

Subscribers: davidxl, llvm-commits

Differential Revision: http://reviews.llvm.org/D11605

llvm-svn: 248777
2015-09-29 05:03:32 +00:00
Evgeniy Stepanov d8b86f7cdc Move dbg.declare intrinsics when merging and replacing allocas.
Place new and update dbg.declare calls immediately after the
corresponding alloca.

Current code in replaceDbgDeclareForAlloca puts the new dbg.declare
at the end of the basic block. LLVM codegen has problems emitting
debug info in a situation when dbg.declare appears after all uses of
the variable. This usually kinda works for inlining and ASan (two
users of this function) but not for SafeStack (see the pending change
in http://reviews.llvm.org/D13178).

llvm-svn: 248769
2015-09-29 00:30:19 +00:00
Reid Kleckner c71d6275ca [WinEH] Fix ip2state table emission with funclets
Previously we were hijacking the old LandingPadInfo data structures to
communicate our state numbers. Now we don't need that anymore.

llvm-svn: 248763
2015-09-28 23:56:30 +00:00
Sanjoy Das 4f1c45952c [SCEV] Don't crash on pointer comparisons
`ScalarEvolution::isImpliedCondOperandsViaNoOverflow` tries to cast the
operand type of the comparison it is given to an `IntegerType`.  This is
incorrect because it could actually be simplifying a comparison between
two pointers.  Switch it to using `getTypeSizeInBits` instead, which
does the right thing for both pointers and integers.

Fixed PR24956.

llvm-svn: 248743
2015-09-28 21:14:32 +00:00
Matt Arsenault 73aa8f687a AMDGPU: Fix splitting x16 SMRD loads
When used recursively, this would set the kill flag
on the intermediate step from first splitting
x16 to x8.

llvm-svn: 248741
2015-09-28 20:54:52 +00:00
Matt Arsenault e5d042cd56 AMDGPU: Fix moving SMRD loads with literal offsets on CI
llvm-svn: 248740
2015-09-28 20:54:46 +00:00
Matt Arsenault b378f075a2 AMDGPU: Add testcases
Make sure we are testing moving users
of the moved and split SMRD loads.

llvm-svn: 248738
2015-09-28 20:54:38 +00:00
Matt Arsenault f3c91f573f AMDGPU: Cleanup test
Run instnamer on it, and rename check prefix.

This is in preparation for adding new testcases to cover
bugs on other subtargets.

llvm-svn: 248737
2015-09-28 20:54:32 +00:00
Sean Silva ace7818ce6 [GlobalOpt] Sort members of llvm.used deterministically
Patch by Jake VanAdrighem!

Summary:
Fix the way we sort the llvm.used and llvm.compiler.used members.

This bug seems to have been introduced in rL183756 through a set of improper casts to GlobalValue*. In subsequent patches this problem was missed and transformed into a getName call on a ConstantExpr.

Reviewers: silvas

Subscribers: silvas, llvm-commits

Differential Revision: http://reviews.llvm.org/D12851

llvm-svn: 248728
2015-09-28 19:02:11 +00:00
Artur Pilipenko b4d009042b Introduce !align metadata for load instruction
Reviewed By: hfinkel

Differential Revision: http://reviews.llvm.org/D12853

llvm-svn: 248721
2015-09-28 17:41:08 +00:00
Philip Reames 13f023c09d [InstSimplify] Fold simple known implications to true
This was split off of http://reviews.llvm.org/D13040 to make it easier to test the correctness of the implication logic. For the moment, this only handles a single easy case which shows up when eliminating and combining range checks. In the (near) future, I plan to extend this for other cases which show up in range checks, but I wanted to make those changes incrementally once the framework was in place.

At the moment, the implication logic will be used by three places. One in InstSimplify (this review) and two in SimplifyCFG (http://reviews.llvm.org/D13040 & http://reviews.llvm.org/D13070). Can anyone think of other locations this style of reasoning would make sense?

Differential Revision: http://reviews.llvm.org/D13074

llvm-svn: 248719
2015-09-28 17:14:24 +00:00
Weiming Zhao 310770a90f [LoopReroll] Ignore debug intrinsics
Originally, debug intrinsics and annotation intrinsics may prevent
the loop to be rerolled, now they are ignored.

Differential Revision: http://reviews.llvm.org/D13150

llvm-svn: 248718
2015-09-28 17:03:23 +00:00
Dan Gohman 05a17aa82a [WebAssembly] Support for direct call and call_indirect.
llvm-svn: 248716
2015-09-28 16:22:39 +00:00
Zoran Jovanovic cdb64566cc [mips] Handling of immediates bigger than 16 bits
Differential Revision: http://reviews.llvm.org/D10539

llvm-svn: 248706
2015-09-28 11:11:34 +00:00
Hal Finkel bd582581b8 [DAGCombine] Fix getStoreMergeAndAliasCandidates's AA-enabled chain walking
When AA is being used, non-aliasing stores are canonicalized to use the same
chain, and DAGCombiner::getStoreMergeAndAliasCandidates can take advantage of
this by looking only as users of a store's chain operand. However, user
iteration is not result-number specific, we need to check that the use is as a
chain operand, and not via some other operand. It is certainly possible to have
another potentially-aliasing store, which shares the first's base pointer, and
uses the first's chain's node via some other operand.

Failure to catch this situation caused, at least in the included test case, an
assert later because the relative sequence-number ordering caused later
replacement to create a cycle in the DAG.

llvm-svn: 248698
2015-09-28 08:02:14 +00:00
Sanjoy Das f1090b6061 [SCEV] identical instructions don't compute equal values
Before this change `HasSameValue` would return true for distinct
`alloca` instructions if they happened to be allocating the same
type (`alloca` instructions are not specified as reading memory).  This
change adds an explicit whitelist of instruction types for which
"identical" instructions compute the same value.

Fixes PR24952.

llvm-svn: 248690
2015-09-27 21:09:48 +00:00
Sanjay Patel 9533407566 [InstCombine] fold zexts and constants into a phi (PR24766)
This is one step towards solving PR24766:
https://llvm.org/bugs/show_bug.cgi?id=24766

We were not producing the same IR for these two C functions because the store
to the temp bool causes extra zexts:

#include <stdbool.h>

bool switchy(char x1, char x2, char condition) {
   bool conditionMet = false;
   switch (condition) {
   case 0: conditionMet = (x1 == x2); break;
   case 1: conditionMet = (x1 <= x2); break;
   }
   return conditionMet;
}

bool switchy2(char x1, char x2, char condition) {
   switch (condition) {
   case 0: return (x1 == x2);
   case 1: return (x1 <= x2);
   }
  return false;
}

As noted in the code comments, this test case manages to avoid the more general existing
phi optimizations where there are only 2 phi inputs or where there are no constant phi 
args mixed in with the casts ops. It seems like a corner case, but if we don't catch it, 
then I don't think we can get SimplifyCFG to further optimize towards the canonical form
for this function shown in the bug report.

Differential Revision: http://reviews.llvm.org/D12866

llvm-svn: 248689
2015-09-27 20:34:31 +00:00
Joseph Tremoulet 09af67aba5 [EH] Create removeUnwindEdge utility
Summary:
Factor the code that rewrites invokes to calls and rewrites WinEH
terminators to their "unwind to caller" equivalents into a helper in
Utils/Local, and use it in the three places I'm aware of that need to do
this.


Reviewers: andrew.w.kaylor, majnemer, rnk

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D13152

llvm-svn: 248677
2015-09-27 01:47:46 +00:00
Simon Pilgrim 91717ee233 [InstCombine] Removed unnecessary meta attributes.
llvm-svn: 248672
2015-09-26 17:49:04 +00:00
Chen Li 7452d95656 [Bug 24848] Use range metadata to constant fold comparisons between two values
Summary:
This is the second part of fixing bug 24848 https://llvm.org/bugs/show_bug.cgi?id=24848.

If both operands of a comparison have range metadata, they should be used to constant fold the comparison.

Reviewers: sanjoy, hfinkel

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D13177

llvm-svn: 248650
2015-09-26 03:26:47 +00:00
Matt Arsenault 86095b8dec AMDGPU: Fix sched model for VOP2b instructions
Trying to use the version with the explicit output operand
would complain because of the missing WriteSALU. I'm not sure
why it doesn't complain about this with the implicit VCC def.

llvm-svn: 248646
2015-09-26 02:25:45 +00:00
Dan Gohman d0bf981296 [WebAssembly] Rename several functions and types according to the new spec.
llvm-svn: 248644
2015-09-26 01:09:44 +00:00
Ahmed Bougacha e81610fabb [ARM] Don't generate clrex for pre-v7 targets.
Since r248294, we emit clrex, but it doesn't exist on v6.

llvm-svn: 248640
2015-09-26 00:14:02 +00:00
Sanjoy Das b174f9a316 [SCEV] Reapply 'Teach isLoopBackedgeGuardedByCond to exploit trip counts'
Summary:
If the trip count of a specific backedge is `N`, then we know that
backedge is effectively guarded by the condition `{0,+,1} u< N`.  This
change teaches SCEV to use this condition to prove things in
`isLoopBackedgeGuardedByCond`.

Depends on D12948
Depends on D12949

The original checkin, r248608 had to be backed out due to an issue with
a ObjCXX unit test.  That issue is now fixed, so re-landing.

Reviewers: atrick, reames, majnemer, hfinkel

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D12950

llvm-svn: 248638
2015-09-25 23:53:50 +00:00
Sanjoy Das 96709c4854 [SCEV] Reapply 'Exploit A < B => (A+K) < (B+K) when possible'
Summary:

This change teaches SCEV's `isImpliedCond` two new identities:

  A u< B u< -C          =>  (A + C) u< (B + C)
  A s< B s< INT_MIN - C =>  (A + C) s< (B + C)

While these are useful on their own, they're really intended to support
D12950.

The original checkin, r248606 had to be backed out due to an issue with
a ObjCXX unit test.  That issue is now fixed, so re-landing.

Reviewers: atrick, reames, majnemer, nlewycky, hfinkel

Subscribers: aadg, sanjoy, llvm-commits

Differential Revision: http://reviews.llvm.org/D12948

llvm-svn: 248637
2015-09-25 23:53:45 +00:00
Sanjay Patel e1b09caaaf [InstCombine] match De Morgan's Law hidden by zext ops (PR22723)
This is a fix for PR22723:
https://llvm.org/bugs/show_bug.cgi?id=22723

My first attempt at this was to change what I thought was the root problem:

xor (zext i1 X to i32), 1 --> zext (xor i1 X, true) to i32

...but we create the opposite pattern in InstCombiner::visitZExt(), so infinite loop!

My next idea was to fix the matchIfNot() implementation in PatternMatch, but that would
mean potentially returning a different size for the match than what was input. I think
this would require all users of m_Not to check the size of the returned match, so I 
abandoned that idea.

I settled on just fixing the exact case presented in the PR. This patch does allow the
2 functions in PR22723 to compile identically (x86):

bool test(bool x, bool y) { return !x | !y; }
bool test(bool x, bool y) { return !x || !y; }
...
andb	%sil, %dil
xorb	$1, %dil
movb	%dil, %al
retq

Differential Revision: http://reviews.llvm.org/D12705

llvm-svn: 248634
2015-09-25 23:21:38 +00:00
Cong Hou 15ea016346 Use fixed-point representation for BranchProbability.
BranchProbability now is represented by its numerator and denominator in uint32_t type. This patch changes this representation into a fixed point that is represented by the numerator in uint32_t type and a constant denominator 1<<31. This is quite similar to the representation of BlockMass in BlockFrequencyInfoImpl.h. There are several pros and cons of this change:

Pros:

1. It uses only a half space of the current one.
2. Some operations are much faster like plus, subtraction, comparison, and scaling by an integer.

Cons:

1. Constructing a probability using arbitrary numerator and denominator needs additional calculations.
2. It is a little less precise than before as we use a fixed denominator. For example, 1 - 1/3 may not be exactly identical to 1 / 3 (this will lead to many BranchProbability unit test failures). This should not matter when we only use it for branch probability. If we use it like a rational value for some precise calculations we may need another construct like ValueRatio.

One important reason for this change is that we propose to store branch probabilities instead of edge weights in MachineBasicBlock. We also want clients to use probability instead of weight when adding successors to a MBB. The current BranchProbability has more space which may be a concern.

Differential revision: http://reviews.llvm.org/D12603

llvm-svn: 248633
2015-09-25 23:09:59 +00:00
Matthias Braun a3b701f828 SelectionDAGDumper: Print simple operands inline.
Print simple operands inline instead of their pointer/value number.
Simple operands are SDNodes without predecessors like Constant(FP), Register,
UNDEF. This unifies the behaviour with dumpr() which was already doing this.

Previously:
  t0: ch = EntryToken
    t1: i64 = Register %vreg0
  t2: i64,ch = CopyFromReg t0, t1
    t3: i64 = Constant<1>
  t4: i64 = add t2, t3
    t5: i64 = Constant<2>
  t6: i64 = add t2, t5
  t10: i64 = undef
  t11: i8,ch = load t0, t2, t10<LD1[%tmp81]>
  t12: i8,ch = load t0, t4, t10<LD1[%tmp10]>
  t13: i8,ch = load t0, t6, t10<LD1[%tmp12]>

Now:
  t0: ch = EntryToken
  t2: i64,ch = CopyFromReg t0, Register:i64 %vreg0
  t4: i64 = add t2, Constant:i64<1>
  t6: i64 = add t2, Constant:i64<2>
  t11: i8,ch = load<LD1[%tmp81]> t0, t2, undef:i64
  t12: i8,ch = load<LD1[%tmp10]> t0, t4, undef:i64
  t13: i8,ch = load<LD1[%tmp12]> t0, t6, undef:i64

Differential Revision: http://reviews.llvm.org/D12567

llvm-svn: 248628
2015-09-25 22:27:02 +00:00
Sanjay Patel bbbf9a1a34 merge vector stores into wider vector stores and fix AArch64 misaligned access TLI hook (PR21711)
This is a redo of D7208 ( r227242 - http://llvm.org/viewvc/llvm-project?view=revision&revision=227242 ).

The patch was reverted because an AArch64 target could infinite loop after the change in DAGCombiner 
to merge vector stores. That happened because AArch64's allowsMisalignedMemoryAccesses() wasn't telling
the truth. It reported all unaligned memory accesses as fast, but then split some 128-bit unaligned
accesses up in performSTORECombine() because they are slow.

This patch attempts to fix the problem in AArch's allowsMisalignedMemoryAccesses() while preserving
existing (perhaps questionable) lowering behavior.

The x86 test shows that store merging is working as intended for a target with fast 32-byte unaligned
stores.

Differential Revision: http://reviews.llvm.org/D12635
 

llvm-svn: 248622
2015-09-25 21:49:48 +00:00
Matthias Braun e86bbd8979 PrologueEpilogInserter: Fix missing live-ins when savepoint equals restorepoint
The algorithm would not modify the live-in list of blocks below the save
block point which is correct unless it happens to be a restore point at
the same time.
Also fixes the benign issue of live-in registers being added twice in
some cases.

The testcase is based on a test submitted by Kit Barton.

Differential Revision: http://reviews.llvm.org/D13176

llvm-svn: 248620
2015-09-25 21:41:40 +00:00
Tom Stellard e135ffd554 AMDGPU/SI: Use .hsatext section instead of .text for HSA
Reviewers: arsenm, grosbach, rafael

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D12424

llvm-svn: 248619
2015-09-25 21:41:28 +00:00
Sanjoy Das 4a39b97671 Revert two SCEV changes that caused test failures in clang.
r248606: "[SCEV] Exploit A < B => (A+K) < (B+K) when possible"
r248608: "[SCEV] Teach isLoopBackedgeGuardedByCond to exploit trip counts."
llvm-svn: 248614
2015-09-25 21:16:50 +00:00
Matt Arsenault 10aa807856 PeepholeOptimizer: Remove redundant copies
If a virtual register is copied and another copy was already
seen, replace with the previous copy. This only handles the
simplest cases for now.

This pattern shows up from various operand restrictions
AMDGPU has which require inserting copies depending
on the register class of the operands.

llvm-svn: 248611
2015-09-25 20:22:12 +00:00
Sanjoy Das d706fa8a0c [SCEV] Teach isLoopBackedgeGuardedByCond to exploit trip counts.
Summary:
If the trip count of a specific backedge is `N`, then we know that
backedge is effectively guarded by the condition `{0,+,1} u< N`.  This
change teaches SCEV to use this condition to prove things in
`isLoopBackedgeGuardedByCond`.

Depends on D12948
Depends on D12949

Reviewers: atrick, reames, majnemer, hfinkel

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D12950

llvm-svn: 248608
2015-09-25 19:59:57 +00:00