Summary:
Set the pad MBB as a funclet entry for CoreCLR as well as MSVCCXX, and
update state numbering to put the catchpad block rather than its normal
successor into the unwind map.
Reviewers: majnemer
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D13492
llvm-svn: 249569
The ARM RTABI defines the half- to single-precision float conversion functions
with an __aeabi prefix, but libgcc only has them with a __gnu prefix. Therefore
we need to emit the __aeabi version when compiling with an eabi or eabihf
triple, and the __gnu version with a gnueabi or gnueabihf triple.
llvm-svn: 249565
I knee-jerk tried to fix this in completely the wrong way - it's not an
CPU limitation, but an OS/object file type one, so moving it
into a CPU-specific classification didn't help at all.
llvm-svn: 249562
Without an additional check for NEON, the compiler crashes during
legalization of NEON ldN/stN.
Differential Revision: http://reviews.llvm.org/D13508
llvm-svn: 249550
Summary:
Some target intrinsics can access multiple elements, using the pointer as a
base address (e.g. AArch64 ld4). When trying to CSE such instructions,
it must be checked the available value comes from a compatible instruction
because the pointer is not enough to discriminate whether the value is
correct.
Reviewers: ssijaric
Subscribers: mcrosier, llvm-commits, aemerson
Differential Revision: http://reviews.llvm.org/D13475
llvm-svn: 249523
When outgoing function arguments are passed using push instructions, and EH
is enabled, we may need to indicate to the stack unwinder that the stack
pointer was adjusted before the call.
This should fix the exception handling issues in PR24792.
Differential Revision: http://reviews.llvm.org/D13132
llvm-svn: 249522
I don't think this assert adds much value, and removing it and related
variables avoids an "unused variable" warning in release builds.
llvm-svn: 249511
Since the `const` version of `getOperandBundle` returns a value of the
same type as the non-`const` version, the non-`const` version is
redundant.
llvm-svn: 249509
Summary:
A series of cosmetic cleanup changes to RewriteStatepointsForGC:
- Rename variables to LLVM style
- Remove some redundant asserts
- Remove an unsued `Pass *` parameter
- Remove unnecessary variables
- Use C++11 idioms where applicable
- Pass CallSite by value, not reference
Reviewers: reames, swaroop.sridhar
Subscribers: llvm-commits, sanjoy
Differential Revision: http://reviews.llvm.org/D13370
llvm-svn: 249508
Because of the constant bus requirement, it is never legal to
use a literal constant for these instructions despite the encoding
allowing it. This was already doing the right thing, but note why.
llvm-svn: 249500
This stops using an unknown reg class operand.
Currently build_vector selection has a broken looking check
where it tries to use a VGPR reg class and an SGPR one if it
sees an SGPR use.
With the source operand has an explicit VGPR class,
illegal copies will be inserted that SIFixSGPRCopies will take care
of normally later, which will allow removing the weird check
of build_vector users. Without this, when removed v_movrels_b32 would
still be emitted even though all of the values were only stored in
SGPRs.
llvm-svn: 249494
I'm not sure why this would be necessary, and no tests fail with
them removed. Looking at the uses is suspect as well because
the use reg classes will likely change when the users are moved
as a result of moving this instruction.
llvm-svn: 249493
This will allow us to optimize code such as:
int f(int *p) {
int x;
return p == &x;
}
as well as:
int *allocate(void);
int f() {
int x;
int *p = allocate();
return p == &x;
}
The folding can only be done under certain circumstances. Even though p and &x
cannot alias, the comparison must still return true if the pointer
representations are equal. If a user successfully generates a p that's a
correct guess for &x, comparison should return true even though p is an invalid
pointer.
This patch argues that if the address of the alloca isn't observable outside the
function, the function can act as-if the address is impossible to guess from the
outside. The tricky part is keeping the act consistent: if we fold p == &x to
false in one place, we must make sure to fold any other comparisons based on
those pointers similarly. To ensure that, we only fold when &x is involved
exactly once in comparison instructions.
Differential Revision: http://reviews.llvm.org/D13358
llvm-svn: 249490
This is handy for some AutoFDO stuff, and seems like a minor improvement
to correctness (otherwise a debug info consumer might think the decl
line/file of the def was the same as that of the declaration - though
what a consumer might use that for, I'm not sure - maybe "list <func>"
would've misbehaved with the old behavior?) and at a minor cost (in my
experiment, with fission, without type units, without compression, 0.01%
growth in debug info in the executable/objects, 0.02% growth in the .dwo
files).
llvm-svn: 249487
Our current emission strategy is to emit the funclet prologue in the
CatchPad's normal destination. This is problematic because
intra-funclet control flow to the normal destination is not erroneous
and results in us reevaluating the prologue if said control flow is
taken.
Instead, use the CatchPad's location for the funclet prologue. This
correctly models our desire to have unwind edges evaluate the prologue
but edges to the normal destination result in typical control flow.
Differential Revision: http://reviews.llvm.org/D13424
llvm-svn: 249483
This allows modules containing aliases to be lazily jit'd. Previously these
failed with missing symbol errors because the aliases weren't cloned from the
original module.
llvm-svn: 249481
from malformed Mach-O files that caused crashes.
We recently got about 700 malformed Mach-O files which we have
been using the improve the robustness of tools that deal with reading
data from object files. These resulted in about 20 small bug fixes to
the darwin based tools.
The goal here is to also improve the robustness of llvm-objdump and
this is the first two fixes. In talking with Tim Northover the approach
we thought might be best is to:
1) Only include tests for the malformed Mach-O files that cause crashes
(not all 700+ tests).
2) The test should only contain the command line option that caused the
crash and not all the others that don’t matter.
3) There should be only one line for the FileCheck that is past the point
of the crash if possible and if possible indicates the malformation.
Again the goal is to fix crashes and not so much care about how the
printing of malformed data comes out.
Tim also suggested if we really wanted to add test cases for all 700+
malformed Mach-O files putting them in the regression tests might be
an option. But many of these do not cause crashes.
llvm-svn: 249479
No classes are specializing the symbol table traits, so no need to look
through a typedef for class API. Make a few more functions private
since only SymbolTableListTraits should be using them.
llvm-svn: 249476
Nothing inherits from `MachineBasicBlock`, so this should have no real
functionality change. Just makes the code easier to understand.
llvm-svn: 249473
Summary:
After r249211, `getSCEV(X) == getSCEV(Y)` does not guarantee that X and
Y are related in the dominator tree, even if X is an operand to Y (I've
included a toy example in comments, and a real example as a test case).
This commit changes `SimplifyIndVar` to require a `DominatorTree`. I
don't think this is a problem because `ScalarEvolution` requires it
anyway.
Fixes PR25051.
Depends on D13459.
Reviewers: atrick, hfinkel
Subscribers: joker.eph, llvm-commits, sanjoy
Differential Revision: http://reviews.llvm.org/D13460
llvm-svn: 249471
The only specializations of `getSymTab()` were identical to the default
defined in `SymbolTableListTraits::getSymTab()`. Remove the
specializations, and stop treating it like a configuration point. Just
to be sure no one else accesses this, make it private.
llvm-svn: 249469
Summary:
We currently ignore the calling convention, so there is no real reason to
assert on the calling convention of functions.
Reviewers: arsenm
Subscribers: arsenm, llvm-commits
Differential Revision: http://reviews.llvm.org/D13367
llvm-svn: 249468
Factor out some common code used to get+set function prefix/prologue
data. This may come in handy if we ever decide to store personality
functions in the same way we store prefix/prologue data.
Differential Revision: http://reviews.llvm.org/D13120
Reviewed-by: bogner
llvm-svn: 249460
Summary:
Assign one state number per handler/funclet, tracking parent state,
handler type, and catch type token.
State numbers are arranged such that ancestors have lower state numbers
than their descendants.
Reviewers: majnemer, andrew.w.kaylor, rnk
Subscribers: pgavlin, AndyAyers, llvm-commits
Differential Revision: http://reviews.llvm.org/D13450
llvm-svn: 249457
Summary:
- Add CoreCLR to if/else ladders and switches as appropriate.
- Rename isMSVCEHPersonality to isFuncletEHPersonality to better
reflect what it captures.
Reviewers: majnemer, andrew.w.kaylor, rnk
Subscribers: pgavlin, AndyAyers, llvm-commits
Differential Revision: http://reviews.llvm.org/D13449
llvm-svn: 249455
This is a cleaned up patch from the one written by John Regehr based on the findings of the Souper superoptimizer.
When writing tests, I was surprised to find that instsimplify apparently doesn't know how to collapse bit test sequences based purely on known bits. This required me to split my tests across both instsimplify and instcombine.
Differential Revision: http://reviews.llvm.org/D13250
llvm-svn: 249453
As mentioned in the bug, I'd missed the presence of a getScalarType in the caller of the new implies method. As a result, when we ended up with a implication over two vectors, we'd trip an assert and crash.
Differential Revision: http://reviews.llvm.org/D13441
llvm-svn: 249442
With this patch, clang -O3 optimizes correctly providing > 1000x speedup on this artificial benchmark):
for (a=0; a<n; a++)
for (b=0; b<n; b++)
for (c=0; c<n; c++)
for (d=0; d<n; d++)
for (e=0; e<n; e++)
for (f=0; f<n; f++)
x++;
From test-suite/SingleSource/Benchmarks/Shootout/nestedloop.c
Reviewers: sanjoyd
Differential Revision: http://reviews.llvm.org/D13390
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 249431
Summary:
The assembly printing of these is still missing the encoding size
suffix, but this will be fixed in a later commit.
Reviewers: arsenm
Subscribers: arsenm, llvm-commits
Differential Revision: http://reviews.llvm.org/D13436
llvm-svn: 249424
Summary:
This fixes 7 tests during fast LLVM test-suite run:
* MultiSource/Benchmarks/McCat/18-imp/imp
* MultiSource/Applications/oggenc/oggenc
* MultiSource/Benchmarks/MallocBench/gs/gs
* MultiSource/Benchmarks/MiBench/automotive-susan/automotive-susan
* MultiSource/Benchmarks/VersaBench/beamformer/beamformer
* MultiSource/Benchmarks/MiBench/consumer-lame/consumer-lame
* MultiSource/Benchmarks/Bullet/bullet
Error message was in the form of:
fatal error: error in backend: Cannot select: 0x95c3288: f32 = fsqrt 0x95c0190 [ORD=9] [ID=18]
0x95c0190: f32 = fadd 0x95bef30, 0x95c4d00 [ORD=8] [ID=17]
0x95bef30: f32 = fmul 0x95c4988, 0x95c4988 [ORD=5] [ID=16]
...
There was problem with selecting sqrt instruction in LLVM backend.
To fix the issue changes are made in TableGen definition for sqrt instruction in MipsInstrFPU.td and new test file sqrt.ll is added to LLVM regression tests.
Patch by Zlatko Buljan
Reviewers: zoran.jovanovic, hvarga, dsanders
Subscribers: llvm-commits, petarj
Differential Revision: http://reviews.llvm.org/D13235
llvm-svn: 249416
If the mask of a select instruction is a ConstantVector, method
SimplifyDemandedVectorElts iterates over the mask elements to identify which
values are selected from the select inputs.
Before this patch, method SimplifyDemandedVectorElts always used method
Constant::isNullValue() to check if a value in the mask was zero. Unfortunately
that method always returns false when called on a ConstantExpr.
This patch fixes the problem in SimplifyDemandedVectorElts by adding an explicit
check for ConstantExpr values. Now, if a value in the mask is a ConstantExpr, we
avoid calling isNullValue() on it.
Fixes PR24922.
Differential Revision: http://reviews.llvm.org/D13219
llvm-svn: 249390
For the program like below
struct key_t {
int pid;
char name[16];
};
extern void test1(char *);
int test() {
struct key_t key = {};
test1(key.name);
return 0;
}
For key.name, the llc/bpf may generate the below code:
R1 = R10 // R10 is the frame pointer
R1 += -24 // framepointer adjustment
R1 |= 4 // R1 is then used as the first parameter of test1
OR operation is not recognized by in-kernel verifier.
This patch introduces an intermediate FI_ri instruction and
generates the following code that can be properly verified:
R1 = R10
R1 += -20
Patch by Yonghong Song <yhs@plumgrid.com>
llvm-svn: 249371
Most importantly, this keeps constant hoisting from preventing instruction selections ability to turn an AND with 0xffffffff into a move into a 32-bit subregister.
llvm-svn: 249370
This new syntax is built around putting each instruction on its own line
in a "mnemonic op, op, op" like syntax. It also uses conventional data
section directives like ".byte" and so on rather than requiring everything
to be in hierarchical S-expression format. This is a more natural syntax
for a ".s" file format from the perspective of LLVM MC and related tools,
while remaining easy to translate into other forms as needed.
llvm-svn: 249364
if there exists not definition for the type.
For this to work, we need to clone the imported modules before building
the decl context chains of the DIEs in the non-skeleton CUs.
llvm-svn: 249362
Given the work we are doing on ThinLTO, we will never need to support
module groups and working sets in GCC's implementation of LIPO. These
are currently dead code, and will continue to be so.
llvm-svn: 249351
The CATCHRET operand did not match the MachineFunction's CFG. This
mismatch happened because FrameLowering created a new MachineBasicBlock
and updated the CFG but forgot to update the CATCHRET operand.
Let's make sure this doesn't happen again by strengthing the funclet
membership analysis: it can now reason about the membership of all basic
blocks, not just those inside of funclets.
llvm-svn: 249344
Summary:
We are currently only using these aliases for VOPC instructions,
but this helper will make it easier to use them everywhere.
These aliases allow for the automatic matching of instructions
with forced 32-bit encoding. Eventually, we should be able to remove
the custom C++ logic we have for this in the assembler.
Reviewers: arsenm
Subscribers: arsenm, llvm-commits
Differential Revision: http://reviews.llvm.org/D13396
llvm-svn: 249330
Otherwise, the map will observe changes as long as MergeFunctions is alive. This
is bad because follow-up passes could replace-all-uses-with on the key of an
entry in the map. The value handle callback of ValueMap however asserts that the
key type matches.
rdar://22971893
llvm-svn: 249327
We were previously codegen'ing memcpy as regular load/store operations and
hoping that the register allocator would allocate registers in ascending order
so that we could apply an LDM/STM combine after register allocation. According
to the commit that first introduced this code (r37179), we planned to teach the
register allocator to allocate the registers in ascending order. This never got
implemented, and up to now we've been stuck with very poor codegen.
A much simpler approach for achieving better codegen is to create MEMCPY pseudo
instructions, attach scratch virtual registers to them and then, post register
allocation, expand the MEMCPYs into LDM/STM pairs using the scratch registers.
The register allocator will have picked arbitrary registers which we sort when
expanding the MEMCPY. This approach also avoids the need to repeatedly calculate
offsets which ultimately ought to be eliminated pre-RA in order to decrease
register pressure.
Fixes PR9199 and PR23768.
[This is based on Peter Collingbourne's r238473 which was reverted.]
Differential Revision: http://reviews.llvm.org/D13239
Change-Id: I727543c2e94136e0f80b8e22d5642d7b9ee5b458
Author: Peter Collingbourne <peter@pcc.me.uk>
llvm-svn: 249322
"msr pan, #imm", while only 1-bit immediate values should be valid.
Changed encoding and decoding for msr pstate instructions.
Differential Revision: http://reviews.llvm.org/D13011
llvm-svn: 249313
Summary:
An instruction like "(d)la $5, symbol+8" previously would have crashed the
assembler as it contains an expression. This is now fixed.
A few tests cases have also been changed to reflect these changes, however
these should only be syntax changes. Some new test cases have also been
added.
Patch by Scott Egerton.
Reviewers: vkalintiris, dsanders
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D12760
llvm-svn: 249311
This extends the work done in r233995 so that now getFragment (in addition to
getSection) also works for variable symbols.
With that the existing logic to decide if a-b can be computed works even if
a or b are variables. Given that, the expression evaluation can avoid expanding
variables as aggressively and that in turn lets the relocation code see the
original variable.
In order for this to work with the asm streamer, there is now a dummy fragment
per section. It is used to assign a section to a symbol when no other fragment
exists.
This patch is a joint work by Maxim Ostapenko andy myself.
llvm-svn: 249303
Summary:
The bitcode format is described in this document:
https://drive.google.com/file/d/0B036uwnWM6RWdnBLakxmeDdOeXc/view
For more info on ThinLTO see:
https://sites.google.com/site/llvmthinlto
The first customer is ThinLTO, however the data structures are designed
and named more generally based on prior feedback. There are a few
comments regarding how certain interfaces are used by ThinLTO, and the
options added here to gold currently have ThinLTO-specific names as the
behavior they provoke is currently ThinLTO-specific.
This patch includes support for generating per-module function indexes,
the combined index file via the gold plugin, and several tests
(more are included with the associated clang patch D11908).
Reviewers: dexonsmith, davidxl, joker.eph
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D13107
llvm-svn: 249270
Track which basic blocks belong to which funclets. Permit branch
folding to fire but only if it can prove that doing so will not cause
code in one funclet to be reused in another.
llvm-svn: 249257
The custom lowering in LowerExtendedLoad is doing the equivalent shuffle, so make use of existing lowering code to reduce duplication.
llvm-svn: 249243
visitSIGN_EXTEND_INREG calls SelectionDAG::getNode to constant fold scalar constants but handles vector constants itself, despite getNode being capable of dealing with them.
This required a minor change to the getNode implementation to actually deal with cases where the scalars of a BUILD_VECTOR were wider integers than the vector type - which was the only extra ability of the visitSIGN_EXTEND_INREG implementation.
No codegen intended and all existing tests remain the same.
llvm-svn: 249236
This time by lifting the lambda's in `createNodeFromSelectLikePHI` to
the file scope. Looks like there are differences in capture rules
between clang and MSVC?
llvm-svn: 249222
The trailing backslashes in some ASCII art added in r248527 cause a
"error: multi-line comment [-Werror=comment]" when building with gcc
4.9.1 -Wall. Swallow (ASCII-)artistic integrity and use pipes instead.
llvm-svn: 249212
The most important part required to make clang
devirtualization works ( ͡°͜ʖ ͡°).
The code is able to find non local dependencies, but unfortunatelly
because the caller can only handle local dependencies, I had to add
some restrictions to look for dependencies only in the same BB.
http://reviews.llvm.org/D12992
llvm-svn: 249196
Summary:
This change teaches SCEV that to prove `A u< B` it is sufficient to
prove each of these facts individually:
- B >= 0
- A s< B
- A >= 0
In practice, SCEV sometimes finds it easier to prove these facts
individually than to prove `A u< B` as one atomic step.
Reviewers: reames, atrick, nlewycky, hfinkel
Subscribers: sanjoy, llvm-commits
Differential Revision: http://reviews.llvm.org/D13042
llvm-svn: 249168
We previously stopped producing Thumb2 relaxations when they weren't supported,
but only diagnosed the case where an actual relocation was produced. We should
also tell people if local symbols aren't going to work rather than silently
overflowing.
llvm-svn: 249164
It is common to have a default soft process limit, at least on some families of
Linux distributions, of 1024. This is normally more than enough, but if you
have many cores, and you're running tests that create many threads, this can
become a problem. My POWER7 development machine has 48 cores, and when running
the lld regression tests, which often want to create up to 48 threads, I run
into problems. lit, by default, will want to run 48 tests in parallel, and
48*48 < 1024, and so many tests fail like this:
terminate called after throwing an instance of 'std::system_error'
what(): Resource temporarily unavailable
or lit fails like this when launching a test:
OSError: [Errno 11] Resource temporarily unavailable
lit can easily detect this situation and attempt to repair it before launching
tests (by raising the soft process limit to something that will allow ncpus^2
threads to be created), and should do so to prevent spurious test failures.
This is the follow-up to this thread:
http://lists.llvm.org/pipermail/llvm-dev/2015-October/090942.html
llvm-svn: 249161
This patch teaches FastIsel the following two things:
1) On SSE2, no instructions are needed for bitcasts between 128-bit vector types;
2) On AVX, no instructions are needed for bitcasts between 256-bit vector types.
Example:
%1 = bitcast <4 x i31> %V to <2 x i64>
Before (-fast-isel -fast-isel-abort=1):
FastIsel miss: %1 = bitcast <4 x i31> %V to <2 x i64>
Now we don't fall back to SelectionDAG and we correctly fold that computation
propagating the register associated to %V.
Originally reviewed here: http://reviews.llvm.org/D13347
llvm-svn: 249147
This patch teaches FastIsel the following two things:
1) On SSE2, no instructions are needed for bitcasts between 128-bit vector types;
2) On AVX, no instructions are needed for bitcasts between 256-bit vector types.
Example:
%1 = bitcast <4 x i31> %V to <2 x i64>
Before (-fast-isel -fast-isel-abort=1):
FastIsel miss: %1 = bitcast <4 x i31> %V to <2 x i64>
Now we don't fall back to SelectionDAG and we correctly fold that computation
propagating the register associated to %V.
Differential Revision: http://reviews.llvm.org/D13347
llvm-svn: 249121