There's still lots of callers passing nullptr, of course - some because
they'll never be migrated (InstCombines for bitcasts - well they don't
make any sense when the pointer type is opaque anyway, for example) and
others that will need more engineering to pass Types around.
llvm-svn: 234126
This allows the compiler/assembly programmer to switch back to a
section. This in turn fixes the bootstrap failure on powerpc (tested
on gcc110) without changing the ppc codegen at all.
I will try to cleanup the various getELFSection overloads in a followup patch.
Just using a default argument now would lead to ambiguities.
llvm-svn: 234099
Previously the patterns didn't have high enough priority and we would only use the GR32 form if the only the upper 32 or 56 bits were zero.
Fixes PR23100.
llvm-svn: 234075
This add support for catching an exception such that an exception object
available to the catch handler will be initialized by the runtime.
llvm-svn: 234062
We don't need to represent UnwindHelp in IR. Instead, we can use the
knowledge that we are emitting the parent function to decide if we
should create the UnwindHelp stack object.
llvm-svn: 234061
InstCombine didn't realize that it needs to use DataLayout to determine
how wide pointers are. This lead to assertion failures.
This fixes PR23113.
llvm-svn: 234046
The plan here is to push the API changes out from the common components
(like Constant::getGetElementPtr and IRBuilder::CreateGEP related
functions) and just update callers to either pass the type if it's
obvious, or pass null.
Do this with LoadInst as well and anything else that comes up, then to
start porting specific uses to not pass null anymore - this may require
some refactoring in each case.
llvm-svn: 234042
As a follow-up to r234021, assert that a debug info intrinsic variable's
`MDLocalVariable::getInlinedAt()` always matches the
`MDLocation::getInlinedAt()` of its `!dbg` attachment.
The goal here is to get rid of `MDLocalVariable::getInlinedAt()`
entirely (PR22778), but I'll let these assertions bake for a while
first.
If you have an out-of-tree backend that just broke, you're probably
attaching the wrong `DebugLoc` to a `DBG_VALUE` instruction. The one
you want is the location that was attached to the corresponding
`@llvm.dbg.declare` or `@llvm.dbg.value` call that you started with.
llvm-svn: 234038
Most desktop environments let the users specify his preferred application per
file type. On mac/linux we can use open/xdg-open for that and should try this
first before starting a heuristic search for various programs.
Differential Revision: http://reviews.llvm.org/D6534
llvm-svn: 234031
Check that the `MDLocalVariable::getInlinedAt()` in a debug info
intrinsic's variable always matches the `MDLocation::getInlinedAt()` of
its `!dbg` attachment.
The goal here is to get rid of `MDLocalVariable::getInlinedAt()`
entirely (PR22778), since it's expensive and unnecessary, but I'll let
this verifier check bake for a while (a week maybe?) first. I've
updated the testcases that had the wrong value for `inlinedAt:`.
This checks that things are sane in the IR, but currently things go out
of whack in a few places in the backend. I'll follow shortly with
assertions in the backend (with code fixes).
If you have out-of-tree testcases that just started failing, here's how
I updated these ones:
1. The verifier check gives you the basic block, function, instruction,
and relevant metadata arguments (metadata numbering doesn't
necessarily match the source file, unfortunately).
2. Look at the `@llvm.dbg.*()` instruction, and compare the
`inlinedAt:` fields of the variable argument (second `metadata`
argument) and the `!dbg` attachment.
3. Figure out based on the variable `scope:` chain and the functions in
the file whether the variable has been inlined (and into what), so
you can determine which `inlinedAt:` is actually correct. In all of
the in-tree testcases, the `!MDLocation()` was correct and the
`!MDLocalVariable()` was wrong, but YMMV.
4. Duplicate the metadata that you're going to change, and add/drop the
`inlinedAt:` field from one of them. Be careful that the other
references to the same metadata node point at the correct one.
llvm-svn: 234021
When enabling PPC64LE, I disabled some optimizations of BUILD_VECTOR
nodes for little endian because wrong results were produced. I've
subsequently investigated and found this is due to a call to
BuildVectorSDNode::isConstantSplat that was always specifying
big-endian. With this changed to correctly identify the target
endianness, the optimizations work as expected.
I found another case of a call to the same method with big-endian
hardcoded, in PPC::isAllNegativeZeroVector(). I discovered this was
an orphaned method with no callers, so I've just removed it.
The existing test/CodeGen/PowerPC/vec_constants.ll checks these
optimizations, so for testing I've just added a variant for little
endian.
llvm-svn: 234011
This patch attempts to fold the shuffling of 'scalar source' inputs - BUILD_VECTOR and SCALAR_TO_VECTOR nodes - if the shuffle node is the only user. This folds away a lot of unnecessary shuffle nodes, and allows quite a bit of constant folding that was being missed.
Differential Revision: http://reviews.llvm.org/D8516
llvm-svn: 234004
Fixes PR19582.
Previously, when an asm assignment (.set or =) was created, we would look up
the section immediately in MCSymbol::setVariableValue. This caused symbols
to receive the wrong section if the RHS of the assignment had not been seen
yet. This had a knock-on effect in the object file emitters, causing them
to emit extra symbols, or to give symbols the wrong visibility or the wrong
section. For example, in the following asm:
.data
.Llocal:
.text
leaq .Llocal1(%rip), %rdi
.Llocal1 = .Llocal2
.Llocal2 = .Llocal
the first assignment would give .Llocal1 a null section, which would never get
fixed up by the second assignment. This would cause the ELF object file emitter
to consider .Llocal1 to be an undefined symbol and give it external linkage,
even though .Llocal1 should not have been emitted at all in the object file.
Or in the following asm:
alias_to_local = Ltmp0
Ltmp0:
the Mach-O object file emitter would give the alias_to_local symbol a n_type
of N_SECT and a n_sect of 0. This is invalid under the Mach-O specification,
which requires N_SECT symbols to receive a non-zero section number if the
symbol is defined in a section in the object file.
https://developer.apple.com/library/mac/documentation/DeveloperTools/Conceptual/MachORuntime/#//apple_ref/c/tag/nlist
After this change we do not look up the section when the assignment is created,
but instead look it up on demand and store it in Section, which is treated
as a cache if the symbol is a variable symbol.
This change also fixes a bug in MCExpr::FindAssociatedSection. Previously,
if we saw a subtraction, we would return the first referenced section, even in
cases where we should have been returning the absolute pseudo-section. Now we
always return the absolute pseudo-section for expressions that subtract two
section-derived expressions. This isn't always correct (e.g. if one of the
sections ends up being laid out at an absolute address), but it's probably
the best we can do without more context.
This allows us to remove code in two places where we appear to have been
working around this bug, in MachObjectWriter::markAbsoluteVariableSymbols
and in X86AsmPrinter::EmitStartOfAsmFile.
Re-applies r233595 (aka D8586), which was reverted in r233898.
Differential Revision: http://reviews.llvm.org/D8798
llvm-svn: 233995
Summary:
Same as the last patch, but for BasicBlock
(Requires same code movement)
Reviewers: chandlerc
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D8801
llvm-svn: 233992
Register coalescing can change the target of a RegPair hint to a
physreg, we should not crash on this. This also slightly improved the
way ARMBaseRegisterInfo::updateRegAllocHint() works.
llvm-svn: 233987
This prevents us from running out of registers in the backend.
Introducing stack malloc calls prevents the backend from recognizing the
inline asm operands as stack objects. When the backend recognizes a
stack object, it doesn't need to materialize the address of the memory
in a physical register. Instead it generates a simple SP-based memory
operand. Introducing a stack malloc forces the backend to find a free
register for every memory operand. 32-bit x86 simply doesn't have enough
registers for this to succeed in most cases.
Reviewers: kcc, samsonov
Differential Revision: http://reviews.llvm.org/D8790
llvm-svn: 233979
Summary:
The old requirement on GEP candidates being in bounds is unnecessary.
For off-bound GEPs, we still have
&B[i * S] = B + (i * S) * e = B + (i * e) * S
Test Plan: slsr_offbound_gep in slsr-gep.ll
Reviewers: meheff
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D8809
llvm-svn: 233949
This makes it possible to use the same representation of llvm.eh.actions
in outlined handlers as we use in the parent function because i32's are
just constants that can be copied freely between functions.
I had to add a sentinel alloca to the list of child allocas so that we
don't try to sink the catch object into the handler. Normally, one would
use nullptr for this kind of thing, but TinyPtrVector doesn't support
null elements. More than that, it's elements have to have a suitable
alignment. Therefore, I settled on this for my sentinel:
AllocaInst *getCatchObjectSentinel() {
return static_cast<AllocaInst *>(nullptr) + 1;
}
llvm-svn: 233947
Without this patch, we split the 256-bit vector into halves and produced something like:
movzwl (%rdi), %eax
vmovd %eax, %xmm0
vxorps %xmm1, %xmm1, %xmm1
vblendps $15, %ymm0, %ymm1, %ymm0 ## ymm0 = ymm0[0,1,2,3],ymm1[4,5,6,7]
Now, we eliminate the xor and blend because those zeros are free with the vmovd:
movzwl (%rdi), %eax
vmovd %eax, %xmm0
This should be the final fix needed to resolve PR22685:
https://llvm.org/bugs/show_bug.cgi?id=22685
llvm-svn: 233941
Require the pointee type to be passed explicitly and assert that it is
correct. For now it's possible to pass nullptr here (and I've done so in
a few places in this patch) but eventually that will be disallowed once
all clients have been updated or removed. It'll be a long road to get
all the way there... but if you have the cahnce to update your callers
to pass the type explicitly without depending on a pointer's element
type, that would be a good thing to do soon and a necessary thing to do
eventually.
llvm-svn: 233938
For code like this:
define <8 x i32> @load_v8i32() {
ret <8 x i32> <i32 7, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0>
}
We produce this AVX code:
_load_v8i32: ## @load_v8i32
movl $7, %eax
vmovd %eax, %xmm0
vxorps %ymm1, %ymm1, %ymm1
vblendps $1, %ymm0, %ymm1, %ymm0 ## ymm0 = ymm0[0],ymm1[1,2,3,4,5,6,7]
retq
There are at least 2 bugs in play here:
We're generating a blend when a move scalar does the same job using 2 less instruction bytes (see FIXMEs).
We're not matching an existing pattern that would eliminate the xor and blend entirely. The zero bytes are free with vmovd.
The 2nd fix involves an adjustment of "AddedComplexity" [1] and mostly masks the 1st problem.
[1] AddedComplexity has close to no documentation in the source.
The best we have is this comment: "roughly corresponds to the number of nodes that are covered".
It appears that x86 has bastardized this definition by inflating its values for some other
undocumented reason. For example, we have a pattern with "AddedComplexity = 400" (!).
I searched my way to this page:
https://groups.google.com/forum/#!topic/llvm-dev/5UX-Og9M0xQ
Differential Revision: http://reviews.llvm.org/D8794
llvm-svn: 233931
Summary:
Avoid duplicate code in Mips16FrameLowering and MipsSEFrameLowering by
providing an implementation of the eliminateCallFramePseudoInstr()
function from their base class.
Depends on D8640.
Reviewers: dsanders
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D8641
llvm-svn: 233909
Summary:
adjustStackPtr() is implemented from both MipsSEInstrInfo and
Mips16InstrInfo. It makes sense to expose this function from
MipsInstrInfo and avoid explicit casting in some cases.
Depends on D8638.
Reviewers: dsanders
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D8640
llvm-svn: 233905
Corrected forgotten change to remove excess "generic-armv8.1-a" cpu
Subscribers: llvm-commits
Completion of http://reviews.llvm.org/rL233811
llvm-svn: 233903
I'm playing with supporting custom stack map formats with statepoints. While
doing so, I noticed that the existing implementation didn't indicate inherently
unsized frames. This change essentially just ports the functionality that already
exists for the default StackMaps section to custom stackmaps.
llvm-svn: 233891
use these to add support for C++ static ctors/dtors to the Orc-lazy JIT in LLI.
Replace the trivial_retval_1 regression test - the new 'hello' test is covering
strictly more code.
llvm-svn: 233885
addl has higher throughput and this was needlessly picking a suboptimal
encoding causing PR23098.
I wish there was a way of doing this without further duplicating tbl-
generated patterns, but so far I haven't found one.
llvm-svn: 233832
Summary:
This change teaches ScalarEvolution::isLoopBackedgeGuardedByCond to look
at edges within the loop body that dominate the latch. We don't do an
exhaustive search for all possible edges, but only a quick walk up the
dom tree.
This re-lands r233447. r233447 was reverted because it caused massive
compile-time regressions. This change has a fix for the same issue.
llvm-svn: 233829
Summary:
This is part 1 of fixes to address the problems described in
https://llvm.org/bugs/show_bug.cgi?id=22719.
The restriction to limit loop scales to 4,096 does not really prevent
overflows anymore, as the underlying algorithm has changed and does
not seem to suffer from this problem.
Additionally, artificially restricting loop scales to such a low number
skews frequency information, making loops of equal hotness appear to
have very different hotness properties.
The only loops that are artificially restricted to a scale of 4096 are
infinite loops (those loops with an exit mass of 0). This prevents
infinite loops from skewing the frequencies of other regions in the CFG.
At the end of propagation, frequencies are scaled to values that take no
more than 64 bits to represent. When the range of frequencies to be
represented fits within 61 bits, it pushes up the scaling factor to a
minimum of 8 to better distinguish small frequency values. Otherwise,
small frequency values are all saturated down at 1.
Tested on x86_64.
Reviewers: dexonsmith
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D8718
llvm-svn: 233826
v8.1a is renamed to architecture, following current entity naming approach.
Excess generic cpu is removed. Intended use: "generic" cpu with "v8.1a" subtarget feature
Reviewers: jmolloy
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D8767
llvm-svn: 233811
v8.1a is renamed to architecture, accordingly to approaches in ARM backend.
Excess generic cpu is removed. Intended use: "generic" cpu with "v8.1a" subtarget feature
Reviewers: jmolloy
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D8766
llvm-svn: 233810
We used to do this before refactorings around r225640.
Some clang users checked for _chk libcall availability using:
__has_builtin(__builtin___memcpy_chk)
When compiling with -fno-builtin, this is always true.
When passing -ffreestanding/-mkernel, which both imply -fno-builtin, we
end up with fortified libcalls, which isn't acceptable in a freestanding
environment which only provides their non-fortified counterparts.
Until we change clang and/or teach external users to check for availability
differently, disregard the "nobuiltin" attribute and TLI::has.
Workaround for PR23093.
llvm-svn: 233776
Under normal circumstances, use of CR bits is disabled when running at -O0, but
it is enabled by default otherwise, and if you have optnone functions, they'll
still generally be generated with crbits turned on (because nothing else turns
them off). FastISel can't handle most things dealing with i1 values when using
CR bits, and checks for that, but was not checking the return type on
functions; we can't fast-isel function calls with i1 return values either when
using CR bits for boolean values.
Fixes PR22664.
llvm-svn: 233775
This lets us catch exceptions in simple cases.
N.B. Things that do not work include (but are not limited to):
- Throwing from within a catch handler.
- Catching an object with a named catch parameter.
- 'CatchHigh' is fictitious, we aren't sure of its purpose.
- We aren't entirely efficient with regards to the number of EH states
that we generate.
- IP-to-State tables are sensitive to the order of emission.
llvm-svn: 233767
Even at -O0, we fall back to SDAG when we hit intrinsics, and if the intrinsic
is a memset/memcpy/etc. we might normally use vector types. At -O0, this is
probably not a good idea (because, if there is a bug in the lowering code,
there would be no good way to turn it off). At -O0, only use scalar preferred
types.
Related to PR22754.
llvm-svn: 233755
extended loads.
Implement the related target lowering hook so that the optimization has a better
estimation of the cost of an extension.
rdar://problem/19267165
llvm-svn: 233753
Uniqued nodes have more complete registration with
`ReplaceableMetadataImpl` so that they can update themselves when
operands change. Fix a bug where `MDNode::replaceWithUniqued()` wasn't
enabling these callbacks.
The two most obvious ways missing callbacks causes problems is that
auto-resolution fails and re-uniquing (on changed operands) just doesn't
happen. I've added tests for both -- in both cases, I confirmed that
the final check was failing before the fix.
rdar://problem/20365935
llvm-svn: 233751
The existing code in getMemsetValue only handled integer-preferred types when
the fill value was not a constant. Make this more robust in two ways:
1. If the preferred type is a floating-point value, do the mul-splat trick on
the corresponding integer type and then bitcast.
2. If the preferred type is a vector, do the mul-splat trick on one vector
element, and then build a vector out of them.
Fixes PR22754 (although, we should also turn off use of vector types at -O0).
llvm-svn: 233749
This patch fixes MCJIT::addGlobalMapping by changing the implementation of the
ExecutionEngineState class. The new implementation maintains a bidirectional
mapping between symbol names (std::strings) and addresses (uint64_ts), rather
than a mapping between Value*s and void*s.
This has fix has been made for backwards compatibility, however the strongly
preferred way to resolve unknown symbols is by writing a custom
RuntimeDyld::SymbolResolver (formerly RTDyldMemoryManager) and overriding the
findSymbol method. The addGlobalMapping method is a hangover from the legacy JIT
(which has was removed in 3.6), and may be deprecated in a future release as
part of a clean-up of the ExecutionEngine interface.
Patch by Murat Bolat. Thanks Murat!
llvm-svn: 233747
Specify an allocation order with a register class. This is used by register
allocators with a greedy heuristic. This is usefull as it is sometimes
beneficial to color more constrained classes first.
Differential Revision: http://reviews.llvm.org/D8626
llvm-svn: 233743
When allocating live intervals in linear order and all of them are local
to a single basic block you get an optimal coloring. This is also true
if you reverse the order, but it is not true if you sort live ranges
beginnings in reverse order, change to sort live range endings in
reverse order. Take the following live ranges for example:
|---| |--------|
|----------| |-------|
They get colored suboptimally with 3 registers if you sort the live range
starting points in reverse order (but optimally with live range begins in order,
or live range ends in reverse order).
Apparently the previous strategy was intentional because of allocation
time considerations. I am having a hard time replicating these effects,
while I see substantial improvements in allocation quality with this
change.
No testcase as none of the (in tree) targets use reverse order mode.
Differential Revision: http://reviews.llvm.org/D8625
llvm-svn: 233742
Change lowerCTPOP to:
- Gracefully handle a known-zero input value
- Simplify computation of significant bit size
Thanks to Jay Foad for the review!
llvm-svn: 233736
I suggested this change in D7898 (http://llvm.org/viewvc/llvm-project?view=revision&revision=231354)
It improves the v4i64 case although not optimally. This AVX codegen:
vmovq {{.*#+}} xmm0 = mem[0],zero
vxorpd %ymm1, %ymm1, %ymm1
vblendpd {{.*#+}} ymm0 = ymm0[0],ymm1[1,2,3]
Becomes:
vmovsd {{.*#+}} xmm0 = mem[0],zero
Unfortunately, this doesn't completely solve PR22685. There are still at least 2 problems under here:
We're not handling v32i8 / v16i16.
We're not getting the FP / int domains right for instruction selection.
But since this patch alone appears to do no harm, reduces code duplication, and helps v4i64,
I'm submitting this patch ahead of fixing the above.
Differential Revision: http://reviews.llvm.org/D8341
llvm-svn: 233704
So far, we do not yet support any instruction specific to zEC12.
Most of the facilities added with zEC12 are indeed not very useful
to compiler code generation, but there is one exception: the
miscellaneous-extensions facility provides the RISBGN instruction,
which is a variant of RISBG that does not set the condition code.
Add support for this facility, MC support for RISBGN, and CodeGen
support for prefering RISBGN over RISBG on zEC12, unless we can
actually make use of the condition code set by RISBG.
llvm-svn: 233690
We already exploit a number of instructions specific to z196,
but not yet POPCNT. Add support for the population-count
facility, MC support for the POPCNT instruction, CodeGen
support for using POPCNT, and implement the getPopcntSupport
TargetTransformInfo hook.
llvm-svn: 233689
This hooks up the TargetTransformInfo machinery for SystemZ,
and provides an implementation of getIntImmCost.
In addition, the patch adds the isLegalICmpImmediate and
isLegalAddImmediate TargetLowering overrides, and updates
a couple of test cases where we now generate slightly
better code.
llvm-svn: 233688
it more liberally.
SplitVecOp_TRUNCATE has logic for recursively splitting oversize vectors
that need more than one round of splitting to become legal. There are many
other ISD nodes that could benefit from this logic, so factor it out and
use it for FP_TO_UINT,FP_TO_SINT,SINT_TO_FP,UINT_TO_FP and FTRUNC.
llvm-svn: 233681