Commit Graph

34154 Commits

Author SHA1 Message Date
Sanjay Patel 5264cc772c [SimplifyCFG] limit recursion depth when speculating instructions (PR26308)
This is a fix for:
https://llvm.org/bugs/show_bug.cgi?id=26308

With the switch to using the TTI cost model in:
http://reviews.llvm.org/rL228826
...it became possible to hit a zero-cost cycle of instructions (gep -> phi -> gep...), 
so we need a cap for the recursion in DominatesMergePoint().

A recursion depth parameter was already added for a different reason in:
http://reviews.llvm.org/rL255660
...so we can just set a limit for it.

I pulled "10" out of the air and made it an independent parameter that we can play with.
It might be higher than it needs to be given the currently low default value of 
PHINodeFoldingThreshold (2). That's the starting cost value that we enter the recursion
with, and most instructions have cost set to TCC_Basic (1), so I don't think we're going
to speculate more than 2 instructions with the current parameters.

As noted in the review and the TODO comment, we can do better than just limiting recursion
depth.

Differential Revision: http://reviews.llvm.org/D16637

llvm-svn: 258971
2016-01-27 19:22:45 +00:00
John McCall 3fe604f89f Add support for objc_unsafeClaimAutoreleasedReturnValue to the
ObjC ARC Optimizer.

The main implication of this is:

1. Ensuring that we treat it conservatively in terms of optimization.
2. We put the ASM marker on it so that the runtime can recognize
objc_unsafeClaimAutoreleasedReturnValue from releaseRV.

<rdar://problem/21567064>

Patch by Michael Gottesman!

llvm-svn: 258970
2016-01-27 19:05:08 +00:00
Oliver Stannard 1e67a9f196 Refactor backend diagnostics for unsupported features
The BPF and WebAssembly backends had identical code for emitting errors
for unsupported features, and AMDGPU had very similar code. This merges
them all into one DiagnosticInfo subclass, that can be used by any
backend.

There should be minimal functional changes here, but some AMDGPU tests
have been updated for the new format of errors (it used a slightly
different format to BPF and WebAssembly). The AMDGPU error messages will
now benefit from having precise source locations when debug info is
available.

The implementation of DiagnosticInfoUnsupported::print must be in
lib/Codegen rather than in the existing file in lib/IR/ to avoid
introducing a dependency from IR to CodeGen.

Differential Revision: http://reviews.llvm.org/D16590

llvm-svn: 258951
2016-01-27 17:30:33 +00:00
Simon Pilgrim a73a0cfaf4 [X86][SSE] Test insertps instrinsic calls with masks that can't combine to something simpler
For these basic tests of the intrinsic, make sure the mask can't simplify to movss, blend-with-zero or something else

llvm-svn: 258941
2016-01-27 16:51:57 +00:00
Igor Laevsky ff291b5833 [DebugInfo] Support zero-length CIE in the _eh_frame parser
MCJIT emits zero-length CIE at the end of the _eh_frame section. This change
ensures that parser inside DebugInfo will not crash and correctly record such cases.
We are now recording DW_EH_PE_omit as a default value for FDE and LSDA encodings.
Also Offset != EndAugmentationOffset assertion check will only happen if augmentation 
string had 'z' letter in it.

Differential Revision: http://reviews.llvm.org/D16588

llvm-svn: 258931
2016-01-27 14:05:35 +00:00
Matthew Simpson b95861d35e Reapply commit r258404 with fix
This patch is the second attempt to reapply commit r258404. There was bug in
the initial patch and subsequent fix (mentioned below).

The initial patch caused an assertion because we were computing smaller type
sizes for instructions that cannot be demoted. The fix first determines the
instructions that will be demoted, and then applies the smaller type size to
only those instructions.

This should fix PR26239 and PR26307.

llvm-svn: 258929
2016-01-27 13:43:27 +00:00
Benjamin Kramer d477e9e378 Revert "Allow X86::COND_NE_OR_P and X86::COND_NP_OR_E to be reversed."
and "Add a missing test case for r258847."

This reverts commit r258847, r258848. Causes miscompilations and backend
errors.

llvm-svn: 258927
2016-01-27 12:44:12 +00:00
Sjoerd Meijer 0140f45964 Add missing build attribute regression tests for Cortex-A8
Differential Revision: http://reviews.llvm.org/D16576

llvm-svn: 258923
2016-01-27 11:34:51 +00:00
Marek Olsak e86f252209 AMDGPU/SI: Stoney has only 16 LDS banks
Summary:
This is a candidate for stable, along with all patches that add the "stoney"
processor.

Reviewers: tstellarAMD

Subscribers: arsenm

Differential Revision: http://reviews.llvm.org/D16485

llvm-svn: 258922
2016-01-27 11:19:45 +00:00
Igor Breger b1bd47ca1a AVX512: Fix vpmovzxbw predicate for AVX1/2 instructions.
Differential Revision: http://reviews.llvm.org/D16595

llvm-svn: 258915
2016-01-27 08:57:46 +00:00
Igor Breger d6c187b038 AVX512: Add store mask patterns.
Differential Revision: http://reviews.llvm.org/D16596

llvm-svn: 258914
2016-01-27 08:43:25 +00:00
Chen Li 5cde8389cf [IndVarSimplify] Rewrite loop exit values with their initial values from loop preheader
Summary:
This is a revised version of D13974, and the following quoted summary are from D13974

"This patch adds support to check if a loop has loop invariant conditions which lead to loop exits. If so, we know that if the exit path is taken, it is at the first loop iteration. If there is an induction variable used in that exit path whose value has not been updated, it will keep its initial value passing from loop preheader. We can therefore rewrite the exit value with
its initial value. This will help remove phis created by LCSSA and enable other optimizations like loop unswitch."

D13974 was committed but failed one lnt test. The bug was that we only checked the condition from loop exit's incoming block was a loop invariant. But there could be another condition from loop header to that incoming block not being a loop invariant. This would produce miscompiled code.

This patch fixes the issue by checking if the incoming block is loop header, and if not, don't perform the rewrite. The could be further improved by recursively checking all conditions leading to loop exit block, but I'd like to check in this simple version first and improve it with future patches.     

Reviewers: sanjoy

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D16570

llvm-svn: 258912
2016-01-27 07:40:41 +00:00
David Majnemer fccf5c6e01 Revert "Revert "[SimplifyCFG] allow speculation of exactly one expensive instruction (PR24818)""
This reverts commit r258903 which reverted r255660.  r258903 was an
accidental commit and should not have been committed.

llvm-svn: 258905
2016-01-27 02:59:41 +00:00
David Majnemer c761afd1d1 [SimplifyCFG] Don't mistake icmp of and for a tree of comparisons
SimplifyCFG tries to turn complex branch conditions into a switch.
Some of it's logic attempts to reason about bitwise arithmetic produced
by InstCombine.  InstCombine can turn things like (X == 2) || (X == 3)
into (X & 1) == 2 and so SimplifyCFG tries to detect when this occurs so
that it can produce a switch instruction.

However, the legality checking was not sufficient to determine whether
or not this had occured.  Correctly check this case by requiring that
the right-hand side of the comparison be a power of two.

This fixes PR26323.

llvm-svn: 258904
2016-01-27 02:43:28 +00:00
David Majnemer 47de2140f7 Revert "[SimplifyCFG] allow speculation of exactly one expensive instruction (PR24818)"
This reverts commit r255660.

llvm-svn: 258903
2016-01-27 02:43:22 +00:00
Matt Arsenault b22828f2fb AMDGPU: Fix default device handling
When no device name is specified, default to kaveri
for HSA since SI is not supported and it woud fail.

Default to "tahiti" instead of "SI" since these are
effectively the same, and tahiti is an actual device.

Move default device handling to the TargetMachine
rather than the AMDGPUSubtarget. The module ISA version
is computed from the device name provided with the target
machine, so the attributes printed by the AsmPrinter were
inconsistent with those computed in the subtarget.

Also remove DevName field from subtarget since it's redundant
with getCPU() in the superclass.

llvm-svn: 258901
2016-01-27 02:17:49 +00:00
Dan Gohman ece881d518 [WebAssembly] Add a test for the mem-intrinsic code in WebAssemblyPeephole.cpp
llvm-svn: 258895
2016-01-27 01:37:52 +00:00
Kevin Enderby 87c85b7e23 Fix identify_magic() to check that a file that starts with MH_MAGIC is
at least as big as the mach header to be identified as a Mach-O file and
make sure smaller files are not identified as a Mach-O files but as
unknown files. Also fix identify_magic() so it looks at all 4 bytes of
the filetype field when determining the type of the Mach-O file.
Then fix the macho-invalid-header test case to check that it is an
unknown file and make sure it does not get the error for
object_error::parse_failed.  And also update the unit tests.

llvm-svn: 258883
2016-01-26 23:43:37 +00:00
Reid Kleckner 1c93b4cd7b [llvm-tblgen] Stop emitting the intrinsic name matching code
The AMDGPU backend was the last user of the old StringMatcher
recognition code. Move it over to the new lookupLLVMIntrinsicName
funciton, which is now improved to handle all of the interesting edge
cases exposed by AMDGPU intrinsic names.

llvm-svn: 258875
2016-01-26 23:01:21 +00:00
Derek Schuff 90d9e8d370 [WebAssembly] Omit no-op adds for non-mem uses of FrameIndex
Differential Revision: http://reviews.llvm.org/D16554

llvm-svn: 258872
2016-01-26 22:47:43 +00:00
Simon Pilgrim 12e71db856 [X86][SSE] Added 8i8 to 8i64 sext/zext tests
llvm-svn: 258868
2016-01-26 22:19:22 +00:00
Simon Pilgrim 00adc1e105 [X86] Add support for zeroed shuffle elements to getShuffleScalarElt
Enable handling of SM_SentinelZero shuffle elements to getShuffleScalarElt. Improves VZEXT_LOAD matches in EltsFromConsecutiveLoads.

llvm-svn: 258865
2016-01-26 21:39:25 +00:00
Chris Bieneman e49730d4ba Remove autoconf support
Summary:
This patch is provided in preparation for removing autoconf on 1/26. The proposal to remove autoconf on 1/26 was discussed on the llvm-dev thread here: http://lists.llvm.org/pipermail/llvm-dev/2016-January/093875.html

"I felt a great disturbance in the [build system], as if millions of [makefiles] suddenly cried out in terror and were suddenly silenced. I fear something [amazing] has happened."
- Obi Wan Kenobi

Reviewers: chandlerc, grosbach, bob.wilson, tstellarAMD, echristo, whitequark

Subscribers: chfast, simoncook, emaste, jholewinski, tberghammer, jfb, danalbert, srhines, arsenm, dschuff, jyknight, dsanders, joker.eph, llvm-commits

Differential Revision: http://reviews.llvm.org/D16471

llvm-svn: 258861
2016-01-26 21:29:08 +00:00
JF Bastien 1a6c7608b1 WebAssembly: don't optimize memcpy/memmove/memcpy to frame index
r258781 optimized memcpy/memmove/memcpy so the intrinsic call can return its first argument, but missed the frame index case. Teach it to ignore that case so C code doesn't assert out in these cases.

llvm-svn: 258851
2016-01-26 20:22:42 +00:00
Cong Hou 26d04ef9d9 Add a missing test case for r258847.
llvm-svn: 258848
2016-01-26 20:09:38 +00:00
Cong Hou 551a57f797 Allow X86::COND_NE_OR_P and X86::COND_NP_OR_E to be reversed.
Currently, AnalyzeBranch() fails non-equality comparison between floating points
on X86 (see https://llvm.org/bugs/show_bug.cgi?id=23875). This is because this
function can modify the branch by reversing the conditional jump and removing
unconditional jump if there is a proper fall-through. However, in the case of
non-equality comparison between floating points, this can turn the branch
"unanalyzable". Consider the following case:

jne.BB1
jp.BB1
jmp.BB2
.BB1:
...
.BB2:
...

AnalyzeBranch() will reverse "jp .BB1" to "jnp .BB2" and then "jmp .BB2" will be
removed:

jne.BB1
jnp.BB2
.BB1:
...
.BB2:
...

However, AnalyzeBranch() cannot analyze this branch anymore as there are two
conditional jumps with different targets. This may disable some optimizations
like block-placement: in this case the fall-through behavior is enforced even if
the fall-through block is very cold, which is suboptimal.

Actually this optimization is also done in block-placement pass, which means we
can remove this optimization from AnalyzeBranch(). However, currently
X86::COND_NE_OR_P and X86::COND_NP_OR_E are not reversible: there is no defined
negation conditions for them.

In order to reverse them, this patch defines two new CondCode X86::COND_E_AND_NP
and X86::COND_P_AND_NE. It also defines how to synthesize instructions for them.
Here only the second conditional jump is reversed. This is valid as we only need
them to do this "unconditional jump removal" optimization.


Differential Revision: http://reviews.llvm.org/D11393

llvm-svn: 258847
2016-01-26 20:08:01 +00:00
Hemant Kulkarni ab4a46fa1c [llvm-readobj] Add -elf-section-groups option
Adds a way to inspect SHT_GROUP sections in ELF objects.
Displays signature, member sections of these sections.

Differential revision: http://reviews.llvm.org/D16555

llvm-svn: 258845
2016-01-26 19:46:39 +00:00
Aditya Nandakumar 3d0c46d489 Reassociate: Reprocess RedoInsts after each inst
Previously the RedoInsts was processed at the end of the block.
However it was possible that it left behind some instructions that
were not canonicalized.
This should guarantee that any previous instruction in the basic
block is canonicalized before we process a new instruction.

llvm-svn: 258830
2016-01-26 18:42:36 +00:00
Sanjay Patel 10e82d1ddb [x86, AVX] tighten checks
llvm-svn: 258828
2016-01-26 18:22:50 +00:00
Kevin Enderby 40fdbf87d2 Update the comments for the macho-invalid-zero-ncmds test and fix
llvm-objdump when printing the Mach Header to print the unknown
cputype and cpusubtype fields as decimal instead of not printing
them at all.  And change the test to check for that.

llvm-svn: 258826
2016-01-26 18:20:49 +00:00
Sanjay Patel 980b280f50 [LibCallSimplifier] fold memset(malloc(x), 0, x) --> calloc(1, x)
This is a step towards solving PR25892:
https://llvm.org/bugs/show_bug.cgi?id=25892

It won't handle the reported case. As noted by the 'TODO' comments in the patch, 
we need to relax the hasOneUse() constraint and also match patterns that include
memset_chk() and the llvm.memset() intrinsic in addition to memset().

Differential Revision: http://reviews.llvm.org/D16337

llvm-svn: 258816
2016-01-26 16:17:24 +00:00
Matthew Simpson 61d5a18469 Revert "Reapply commit r258404 with fix"
This commit exposes a crash in computeKnownBits on the Chromium buildbots.
Reverting to investigate.

Reference: https://llvm.org/bugs/show_bug.cgi?id=26307
llvm-svn: 258812
2016-01-26 15:45:49 +00:00
Igor Laevsky 03a670c0ec Re-submit r256008 "Improve DWARFDebugFrame::parse to also handle __eh_frame."
Originally this change was causing failures on windows buildbots.
But those problems were fixed in r258806.

llvm-svn: 258811
2016-01-26 15:09:42 +00:00
Simon Pilgrim 46696ef93c [X86][SSE] Add zero element and general 64-bit VZEXT_LOAD support to EltsFromConsecutiveLoads
This patch adds support for trailing zero elements to VZEXT_LOAD loads (and checks that no zero elts occur within the consecutive load).

It also generalizes the 64-bit VZEXT_LOAD load matching to work for loads other than 2x32-bit loads.

After this patch it will also be easier to add support for other basic load patterns like 32-bit VZEXT_LOAD loads, PMOVZX and subvector load insertion.

Differential Revision: http://reviews.llvm.org/D16217

llvm-svn: 258798
2016-01-26 09:30:08 +00:00
Matt Arsenault c5f6152911 AMDGPU: Make v32i8/v64i8 illegal types
Old intrinsics were forcing these, but they have now all
been removed. This fixes large i8 vector operations generally
being broken.

llvm-svn: 258788
2016-01-26 04:43:48 +00:00
Matt Arsenault 018179fc46 AMDGPU: Remove old sample intrinsics
I did my best to try to update all the uses in tests that
just happened to use the old ones to the newer intrinsics.

I'm not sure I got all of the immediate operand conversions
correct, since the value seems to have been ignored by the
old pattern but I don't think it really matters.

llvm-svn: 258787
2016-01-26 04:38:08 +00:00
Matt Arsenault 051d6f9fde AMDGPU: Add new amdgcn intrinsics for cube instructions
More cleanup to try to get all intrinsics using the correct
amdgcn prefix that are as close to the instruction as possible.

llvm-svn: 258786
2016-01-26 04:29:56 +00:00
Matt Arsenault 9a10cea7fb AMDGPU: Implement read_register and write_register intrinsics
Some of the special intrinsics now that now correspond to a instruction
also have special setting of some registers, e.g. llvm.SI.sendmsg sets
m0 as well as use s_sendmsg. Using these explicit register intrinsics
may be a better option.

Reading the exec mask and others may be useful for debugging. For this
I'm not sure this is entirely correct because we would want this to
be convergent, although it's possible this is already treated
sufficently conservatively.

llvm-svn: 258785
2016-01-26 04:29:24 +00:00
Matt Arsenault 0c3e2338fe AMDGPU: Restore AMDGPU prefixed rsq intrinsic for now
Also move into backend intrinsics to discourage use of the old name.

llvm-svn: 258783
2016-01-26 04:14:16 +00:00
Dan Gohman bdf08d5da6 [WebAssembly] Optimize memcpy/memmove/memcpy calls.
These calls return their first argument, but because LLVM uses an intrinsic
with a void return type, they can't use the returned attribute. Generalize
the store results pass to optimize these calls too.

llvm-svn: 258781
2016-01-26 04:01:11 +00:00
Dan Gohman bb3722430f [WebAssembly] Implement unaligned loads and stores.
Differential Revision: http://reviews.llvm.org/D16534

llvm-svn: 258779
2016-01-26 03:39:31 +00:00
Haicheng Wu f1c00a22be [LIR] Add support for structs and hand unrolled loops
This is a recommit of r258620 which causes PR26293.

The original message:

Now LIR can turn following codes into memset:

typedef struct foo {
  int a;
  int b;
} foo_t;

void bar(foo_t *f, unsigned n) {
  for (unsigned i = 0; i < n; ++i) {
    f[i].a = 0;
    f[i].b = 0;
  }
}

void test(foo_t *f, unsigned n) {
  for (unsigned i = 0; i < n; i += 2) {
    f[i] = 0;
    f[i+1] = 0;
  }
}

llvm-svn: 258777
2016-01-26 02:27:47 +00:00
Dan Gohman 75452734e4 Followup to 258750; update more tests to use .p2align .
llvm-svn: 258755
2016-01-26 00:35:07 +00:00
Dan Gohman 14d8436c66 Followup to 258750; update all MC tests to use .p2align .
llvm-svn: 258754
2016-01-26 00:27:59 +00:00
Dan Gohman 968b090552 Followup to 258750; update this test to use .p2align .
llvm-svn: 258752
2016-01-26 00:17:24 +00:00
Dan Gohman 61d15ae4f5 [MC] Use .p2align instead of .align
For historic reasons, the behavior of .align differs between targets.
Fortunately, there are alternatives, .p2align and .balign, which make the
interpretation of the parameter explicit, and which behave consistently across
targets.

This patch teaches MC to use .p2align instead of .align, so that people reading
code for multiple architectures don't have to remember which way each platform
does its .align directive.

Differential Revision: http://reviews.llvm.org/D16549

llvm-svn: 258750
2016-01-26 00:03:25 +00:00
Evgeniy Stepanov fbc3da577c [cfi] Cross-DSO CFI diagnostic mode (LLVM part).
* __cfi_check gets a 3rd argument: ubsan handler data
* Instead of trapping on failure, call __cfi_check_fail which must be
  present in the module (generated in the frontend).

llvm-svn: 258746
2016-01-25 23:35:03 +00:00
Matthias Braun 4e67e5c91a X86ISelLowering: Fix cmov(cmov) special lowering bug
There's a special case in EmitLoweredSelect() that produces an improved
lowering for cmov(cmov) patterns. However this special lowering is
currently broken if the inner cmov has multiple users so this patch
stops using it in this case.

If you wonder why this wasn't fixed by continuing to use the special
lowering and inserting a 2nd PHI for the inner cmov: I believe this
would incur additional copies/register pressure so the special lowering
does not improve upon the normal one anymore in this case.

This fixes http://llvm.org/PR26256 (= rdar://24329747)

llvm-svn: 258729
2016-01-25 22:08:25 +00:00
Teresa Johnson 71d12d2a4e [ThinLTO] Find all needed metadata when linking metadata as postpass
For metadata postpass linking, after importing all functions, we need
to recursively walk through any nodes reached via imported functions to
locate needed subprogram metadata. Some might only be reached indirectly
via the variable list for an inlined function.

llvm-svn: 258728
2016-01-25 22:04:56 +00:00
Simon Pilgrim d1d118097d [X86][AVX] Add commutation support for VPERM2X128 instructions
Its main use is to allow memory folding of the 1st operand

Differential Revision: http://reviews.llvm.org/D16521

llvm-svn: 258726
2016-01-25 21:51:34 +00:00