r218129 omits DW_TAG_subprograms which have no inlined subroutines when
emitting -gmlt data. This makes -gmlt very low cost for -O0 builds.
Darwin's dsymutil reasonably considers a CU empty if it has no
subprograms (which occurs with the above optimization in -O0 programs
without any force_inline function calls) and drops the line table, CU,
and everything in this situation, making backtraces impossible.
Until dsymutil is modified to account for this, disable this
optimization on Darwin to preserve the desired functionality.
(see r218545, which should be reverted after this patch, for other
discussion/details)
Footnote:
In the long term, it doesn't look like this scheme (of simplified debug
info to describe inlining to enable backtracing) is tenable, it is far
too size inefficient for optimized code (the DW_TAG_inlined_subprograms,
even once compressed, are nearly twice as large as the line table
itself (also compressed)) and we'll be considering things like Cary's
two level line table proposal to encode all this information directly in
the line table.
llvm-svn: 218702
It was hacky to use an opcode as a switch because it won't always match
(rsqrte != sqrte), and it looks like we'll need to add more special casing
per arch than I had hoped for. Eg, x86 will prefer a different NR estimate
implementation. ARM will want to use it's 'step' instructions. There also
don't appear to be any new estimate instructions in any arch in a long,
long time. Altivec vloge and vexpte may have been the first and last in
that field...
llvm-svn: 218698
Currently, the DAG Combiner only tries to convert type-legal build_vector nodes
into shuffles. This patch simply moves the logic that checks if a
build_vector has a legal value type up before we even start analyzing the
operands. This allows to early exit immediately from method
'visitBUILD_VECTOR' if the node type is known to be illegal.
No functional change intended.
llvm-svn: 218677
If there is a store followed by a store with the same value to the same location, then the store is dead/noop. It can be removed.
This problem is found in spec2006-197.parser.
For example,
stur w10, [x11, #-4]
stur w10, [x11, #-4]
Then one of the two stur instructions can be removed.
Patch by David Xu!
llvm-svn: 218569
This is purely refactoring. No functional changes intended. PowerPC is the only target
that is currently using this interface.
The ultimate goal is to allow targets other than PowerPC (certainly X86 and Aarch64) to turn this:
z = y / sqrt(x)
into:
z = y * rsqrte(x)
And:
z = y / x
into:
z = y * rcpe(x)
using whatever HW magic they can use. See http://llvm.org/bugs/show_bug.cgi?id=20900 .
There is one hook in TargetLowering to get the target-specific opcode for an estimate instruction
along with the number of refinement steps needed to make the estimate usable.
Differential Revision: http://reviews.llvm.org/D5484
llvm-svn: 218553
Machine Sink uses loop depth information to select between successors BBs to
sink machine instructions into, where BBs within smaller loop depths are
preferable. This patch adds support for choosing between successors by using
profile information from BlockFrequencyInfo instead, whenever the information
is available.
Tested it under SPEC2006 train (average of 30 runs for each program); ~1.5%
execution speedup in average on x86-64 darwin.
<rdar://problem/18021659>
llvm-svn: 218472
The InstrEmitter will skip the check of MI.hasPostISelHook()
before calling AdjustInstrPostInstrSelection() when NDEBUG
is not defined.
This was added in r140228, and I'm not sure if it is intentional or not,
but it is a likely source for bugs, because it means with
Release+Asserts builds you can forget to set the hasPostISelHook
flag on TableGen definitions and AdjustInstrPostInstrSelection() will
still be called.
llvm-svn: 218458
Summary:
I originally tried doing this specifically for X86 in the backend in D5091,
but it was rather brittle and generally running too late to be general.
Furthermore, other targets may want to implement similar optimizations.
So I reimplemented it at the IR-level, fitting it into AtomicExpandPass
as it interacts with that pass (which could not be cleanly done before
at the backend level).
This optimization relies on a new target hook, which is only used by X86
for now, as the correctness of the optimization on other targets remains
an open question. If it is found correct on other targets, it should be
trivial to enable for them.
Details of the optimization are discussed in D5091.
Test Plan: make check-all + a new test
Reviewers: jfb
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D5422
llvm-svn: 218455
Summary:
AtomicExpand already had logic for expanding wide loads and stores on LL/SC
architectures, and for expanding wide stores on CmpXchg architectures, but
not for wide loads on CmpXchg architectures. This patch fills this hole,
and makes use of this new feature in the X86 backend.
Only one functionnal change: we now lose the SynchScope attribute.
It is regrettable, but I have another patch that I will submit soon that will
solve this for all of AtomicExpand (it seemed better to split it apart as it
is a different concern).
Test Plan: make check-all (lots of tests for this functionality already exist)
Reviewers: jfb
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D5404
llvm-svn: 218332
Summary:
The goal is to eventually remove all the code related to getInsertFencesForAtomic
in SelectionDAGBuilder as it is wrong (designed for ARM, not really portable, works
mostly by accident because the backends are overly conservative), and repeats the
same logic that goes in emitLeading/TrailingFence.
In this patch, I make AtomicExpandPass insert the fences as it knows better
where to put them. Because this requires getting the fences and not just
passing an IRBuilder around, I had to change the return type of
emitLeading/TrailingFence.
This code only triggers on ARM for now. Because it is earlier in the pipeline
than SelectionDAGBuilder, it triggers and lowers atomic accesses to atomic so
SelectionDAGBuilder does not add barriers anymore on ARM.
If this patch is accepted I plan to implement emitLeading/TrailingFence for all
backends that setInsertFencesForAtomic(true), which will allow both making them
less conservative and simplifying SelectionDAGBuilder once they are all using
this interface.
This should not cause any functionnal change so the existing tests are used
and not modified.
Test Plan: make check-all, benefits from existing tests of atomics on ARM
Reviewers: jfb, t.p.northover
Subscribers: aemerson, llvm-commits
Differential Revision: http://reviews.llvm.org/D5179
llvm-svn: 218329
This is purely a plumbing patch. No functional changes intended.
The ultimate goal is to allow targets other than PowerPC (certainly X86 and Aarch64) to turn this:
z = y / sqrt(x)
into:
z = y * rsqrte(x)
using whatever HW magic they can use. See http://llvm.org/bugs/show_bug.cgi?id=20900 .
The first step is to add a target hook for RSQRTE, take the already target-independent code selfishly hoarded by PPC, and put it into DAGCombiner.
Next steps:
The code in DAGCombiner::BuildRSQRTE() should be refactored further; tests that exercise that logic need to be added.
Logic in PPCTargetLowering::BuildRSQRTE() should be hoisted into DAGCombiner.
X86 and AArch64 overrides for TargetLowering.BuildRSQRTE() should be added.
Differential Revision: http://reviews.llvm.org/D5425
llvm-svn: 218219
A problem with our old behavior becomes observable under x86-64 COFF
when we need a read-only GV which has an initializer which is referenced
using a relocation: we would mark the section as writable. Marking the
section as writable interferes with section merging.
This fixes PR21009.
llvm-svn: 218179
To reduce the size of -gmlt data, skip the subprograms without any
inlined subroutines. Since we've now got the ability to make these
determinations in the backend (funnily enough - we added the flag so we
wouldn't produce ranges under -gmlt, but with this change we use the
flag, but go back to producing ranges under -gmlt).
Instead, just produce CU ranges to inform the consumer which parts of
the code are described by this CU's line table. Tools could inspect the
line table directly to compute the range, but the CU ranges only seem to
be about 0.5% of object/executable size, so I'm not too worried about
teaching llvm-symbolizer that trick just yet - it's certainly a possible
piece of future work.
Update an llvm-symbolizer test just to demonstrate that this schema is
acceptable there (if it wasn't, the compiler-rt tests would catch this,
but good to have an in-llvm-tree test for llvm-symbolizer's behavior
here)
Building the clang binary with -gmlt with this patch reduces the total
size of object files by 5.1% (5.56% without ranges) without compression
and the executable by 4.37% (4.75% without ranges).
llvm-svn: 218129
Summary:
This will allow to request the creation of a forward delacred variable
at is point of use (for imported declarations, this will be
DwarfDebug::constructImportedEntityDIE) rather than having to put the
forward decl in a retention list.
Note that getOrCreateGlobalVariable returns the actual definition DIE when the
routine creates a declaration and a definition DIE. If you agree this is the
right behavior, then I'll have a followup patch that registers the definition
in the DIE map instead of the declaration as it is today (this 'breaks' only
one test, where we test that the imported entity is the declaration). I'm
not sure what's best here, but it's easy enough for a consumer to follow the
DW_AT_specification link to get to the declaration, whereas it takes more
work to find the actual definition from a declaration DIE.
Reviewers: echristo, dblaikie, aprantl
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D5381
llvm-svn: 218126
The heuristic used by DAGCombine to form FMAs checks that the FMUL has only one
use, but this is overly-conservative on some systems. Specifically, if the FMA
and the FADD have the same latency (and the FMA does not compete for resources
with the FMUL any more than the FADD does), there is no need for the
restriction, and furthermore, forming the FMA leaving the FMUL can still allow
for higher overall throughput and decreased critical-path length.
Here we add a new TLI callback, enableAggressiveFMAFusion, false by default, to
elide the hasOneUse check. This is enabled for PowerPC by default, as most
PowerPC systems will benefit.
Patch by Olivier Sallenave, thanks!
llvm-svn: 218120
With this optimization, we will not always insert zext for values crossing
basic blocks, but insert sext if the users of a value crossing basic block
has preference of sign predicate.
llvm-svn: 218101
This omission will be done in a fancier manner once we're dealing with
"put gmlt in the skeleton CUs under fission" - it'll have to be
conditional on the kind of CU we're emitting into (skeleton or gmlt).
llvm-svn: 218098
The patch moved some logic around in an attempt to generate potentially more
DW_AT_declaration attributes. The patch was flawed though and it stopped
generating the attribute in some cases.
llvm-svn: 218060
Summary:
This doesn't show up today as we don't emit decalration only variables. This
will be tested when the followup patches implementing import of forward
declared entities lands in clang.
Reviewers: echristo, dblaikie, aprantl
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D5382
llvm-svn: 218041
shim between the TargetTransformInfo immutable pass and the Subtarget
via the TargetMachine and Function. Migrate a single call from
BasicTargetTransformInfo as an example and provide shims where TargetMachine
begins taking a Function to determine the subtarget.
No functional change.
llvm-svn: 218004
This required a new hook called hasLoadLinkedStoreConditional to know whether
to expand atomics to LL/SC (ARM, AArch64, in a future patch Power) or to
CmpXchg (X86).
Apart from that, the new code in AtomicExpandPass is mostly moved from
X86AtomicExpandPass. The main result of this patch is to get rid of that
pass, which had lots of code duplicated with AtomicExpandPass.
llvm-svn: 217928
The default implementation of getCmpSelInstrCost, which provides the cost of
icmp/fcmp/select instructions, did not deal sensibly with illegal vector types
that were scalarized. We'd ask for the legalization cost of the vector type,
which would return something like (4, f64) given an input of <4 x double>, and
we'd then check the TLI status of the ISD opcode on that scalar type. This would
result in querying (ISD::VSELECT, f64), for example. Amusingly enough,
ISD::VSELECT on scalar types is marked as Legal by default (as with most other
operations), and most backends never change this because VSELECT is never
generated on scalars. However, seeing the resulting operation as Legal, we'd
neglect to add the scalarization cost before returning. The result is that we'd
grossly under-estimate the cost of cmps/selects on illegal vector types.
Now, if type legalization clearly results in scalarization, we skip the early
return and add the scalarization cost.
llvm-svn: 217859
On MachO, and MachO only, we cannot have a truly empty function since that
breaks the linker logic for atomizing the section.
When we are emitting a frame pointer, the presence of an unreachable will
create a cfi instruction pointing past the last instruction. This is perfectly
fine. The FDE information encodes the pc range it applies to. If some tool
cannot handle this, we should explicitly say which bug we are working around
and only work around it when it is actually relevant (not for ELF for example).
Given the unreachable we could omit the .cfi_def_cfa_register, but then
again, we could also omit the entire function prologue if we wanted to.
llvm-svn: 217801
introduced in r217629.
We were returning the old sext instead of the new zext as the promoted instruction!
Thanks Joerg Sonnenberger for the test case.
llvm-svn: 217800
pointer to a dead function. To make sure it's valid, doFinalization nullptrs
RewindFunction just like the constructor and so it will be found on next run.
llvm-svn: 217737
enabled. A good chunk of the MIsNeedChainEdge() is logic that is valid and should be applied even for targets
that are not using for alias analysis.
llvm-svn: 217706
And since it /looked/ like the DwarfStrSectionSym was unused, I tried
removing it - but then it turned out that DwarfStringPool was
reconstructing the same label (and expecting it to have already been
emitted) and uses that.
So I kept it around, but wanted to pass it in to users - since it seemed
a bit silly for DwarfStringPool to have it passed in and returned but
itself have no use for it. The only two users don't handle strings in
both .dwo and .o files so they only ever need the one symbol - no need
to keep it (and have an unused symbol) in the DwarfStringPool used for
fission/.dwo.
Refactor a bunch of accelerator table usage to remove duplication so I
didn't have to touch 4-5 callers.
llvm-svn: 217628
Do
(shl (add x, c1), c2) -> (add (shl x, c2), c1 << c2)
This is already done for multiplies, but since multiplies
by powers of two are turned into shifts, we also need
to handle it here.
This might want checks for isLegalAddImmediate to avoid
transforming an add of a legal immediate with one that isn't.
llvm-svn: 217610
This is an extension of the change made with r215820:
http://llvm.org/viewvc/llvm-project?view=revision&revision=215820
That patch allowed combining of splatted vector FP constants that are multiplied.
This patch allows combining non-uniform vector FP constants too by relaxing the
check on the type of vector. Also, canonicalize a vector fmul in the
same way that we already do for scalars - if only one operand of the fmul is a
constant, make it operand 1. Otherwise, we miss potential folds.
This fold is also done by -instcombine, but it's possible that extra
fmuls may have been generated during lowering.
Differential Revision: http://reviews.llvm.org/D5254
llvm-svn: 217599
"Unroll" is not the appropriate name for this variable. Clang already uses
the term "interleave" in pragmas and metadata for this.
Differential Revision: http://reviews.llvm.org/D5066
llvm-svn: 217528
So that the two operations in DwarfDebug couldn't get separated (because
I accidentally separated them in some work in progress), put them
together. While we're here, move DwarfUnit::addRange to
DwarfCompileUnit, since it's not relevant to type units.
llvm-svn: 217468
PrevSection/PrevCU are used to detect holes in the address range of a CU
to ensure the DW_AT_ranges does not include those holes. When we see a
function with no debug info, though it may be in the same range as the
prior and subsequent functions, there should be a gap in the CU's
ranges. By setting PrevCU to null in that case, the range would not be
extended to cover the gap.
llvm-svn: 217466
This solves the problem of having a kill flag inside a loop
with a definition of the register prior to the loop:
%vreg368<def> ...
Inside loop:
%vreg520<def> = COPY %vreg368
%vreg568<def,tied1> = add %vreg341<tied0>, %vreg520<kill>
=> was coalesced into =>
%vreg568<def,tied1> = add %vreg341<tied0>, %vreg368<kill>
MachineVerifier then complained:
*** Bad machine code: Virtual register killed in block, but needed live out. ***
The kill flag for %vreg368 is incorrect, and is cleared by this patch.
This is similar to the clearing done at the end of
MachineSinking::SinkInstruction().
Patch provided by Jonas Paulsson.
Reviewed by Quentin Colombet and Juergen Ributzka.
llvm-svn: 217427
Previously, fast-isel would not clean up after failing to select a call
instruction, because it would have called flushLocalValueMap() which moves
the insertion point, making SavedInsertPt in selectInstruction() invalid.
Fixing this by making SavedInsertPt a member variable, and having
flushLocalValueMap() update it.
This removes some redundant code at -O0, and more importantly fixes PR20863.
Differential Revision: http://reviews.llvm.org/D5249
llvm-svn: 217401
It's probably not a huge deal to not do this - if we could, maybe the
address could be reused by a subprogram low_pc and avoid an extra
relocation, but it's just one per CU at best.
llvm-svn: 217338
This problem is bigger than just fsub, but this is the minimum fix to solve
fneg for PR20556 ( http://llvm.org/bugs/show_bug.cgi?id=20556 ), and we solve
zero subtraction with the same change.
llvm-svn: 217286
This reverts commit r217211.
Both the bfd ld and gold outputs were valid. They were using a Rela relocation,
so the value present in the relocated location was not used, which caused me
to misread the output.
llvm-svn: 217264
Fixes PR20523.
When spilling variables onto the stack, spillVirtReg() is setting the
parent pointer of the cloned DBG_VALUE intrinsic for the stack location
to the parent pointer of the original intrinsic. MachineInstr parent
pointers should however always point to the parent basic block.
MBB is shadowing the MBB member variable. The instruction still ends up
being inserted into the right basic block, because it's inserted after MI
which serves as the iterator.
I failed at constructing a reliable testcase for this, see
http://llvm.org/bugs/show_bug.cgi?id=20523 for a large testcases.
llvm-svn: 217260
Summary:
This fixes a long standing issue where we would emit many little .text
sections and only one .pdata and .xdata section. Now we generate one
.pdata / .xdata pair per .text section and associate them correctly.
Fixes PR19667.
Reviewers: majnemer
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D5181
llvm-svn: 217176
Summary:
Split shouldExpandAtomicInIR() into different versions for Stores/Loads/RMWs/CmpXchgs.
Makes runOnFunction cleaner (no more redundant checking/casting), and will help moving
the X86 backend to this pass.
This requires a way of easily detecting which instructions are atomic.
I followed the pattern of mayReadFromMemory, mayWriteOrReadMemory, etc.. in making
isAtomic() a method of Instruction implemented by a switch on the opcodes.
Test Plan: make check
Reviewers: jfb
Subscribers: mcrosier, llvm-commits
Differential Revision: http://reviews.llvm.org/D5035
llvm-svn: 217080
Fixes two latent bugs:
- There was no fence inserted before expanded seq_cst load (unsound on Power)
- There was only a fence release before seq_cst stores (again unsound, in particular on Power)
It is not even clear if this is correct on ARM swift processors (where release fences are
DMB ishst instead of DMB ish). This behaviour is currently preserved on ARM Swift
as it is not clear whether it is incorrect. I would love to get documentation stating
whether it is correct or not.
These two bugs were not triggered because Power is not (yet) using this pass, and these
behaviours happen to be (mostly?) working on ARM
(although they completely butchered the semantics of the llvm IR).
See:
http://lists.cs.uiuc.edu/pipermail/llvmdev/2014-August/075821.html
for an example of the problems that can be caused by the second of these bugs.
I couldn't see a way of fixing these in a completely target-independent way without
adding lots of unnecessary fences on ARM, hence the target-dependent parts of this
patch.
This patch implements the new target-dependent parts only for ARM (the default
of not doing anything is enough for AArch64), other architectures will use this
infrastructure in later patches.
llvm-svn: 217076
This is the final round of renaming. This changes tblgen to emit lower-case
function names for FastEmitInst_* and FastEmit_*, and updates all its uses
in the source code.
Reviewed by Eric
llvm-svn: 217075
Things got a little bit messy over the years and it is time for a little bit
spring cleaning.
This first commit is focused on the FastISel base class itself. It doxyfies all
comments, C++11fies the code where it makes sense, renames internal methods to
adhere to the coding standard, and clang-formats the files.
Reviewed by Eric
llvm-svn: 217060
This allows the target to disable target-independent instruction selection and
jump directly into the target-dependent instruction selection code.
This can be beneficial for targets, such as AArch64, which could emit much
better code, but never got a chance to do so, because the target-independent
instruction selector was able to find an instruction sequence.
llvm-svn: 216947
If an fmul was introduced by lowering, it wouldn't be folded
into a multiply by a constant since the earlier combine would
have replaced the fmul with the fadd.
llvm-svn: 216932
This removes static initializers from the backends which generate this data, and also makes this struct match the other Tablegen generated structs in behaviour
Reviewed by Andy Trick and Chandler C
llvm-svn: 216919
When I recommitted r208640 (in r216898) I added an exclusion for TargetConstant
offsets, as there is no guarantee that a backend can handle them on generic
ADDs (even if it generates them during address-mode matching) -- and,
specifically, applying this transformation directly with TargetConstants caused
a self-hosting failure on PPC64. Ignoring all TargetConstants, however, is less
than ideal. Instead, for non-opaque constants, we can convert them into regular
constants for use with the generated ADD (or SUB).
llvm-svn: 216908
I reverted r208640 in r209747 because r208640 broke self-hosting on PPC64. The
underlying cause of the failure is that pre-inc loads with increments
represented by ISD::TargetConstants were being transformed into ISD:::ADDs with
ISD::TargetConstant operands. PPC doesn't have a pattern for those, and so they
were selected as invalid r+r adds.
This recommits r208640, rebased and with an exclusion for ISD::TargetConstant
increments. This behavior seems correct, although in the future we might want
to ask the target to split out the indexing that uses ISD::TargetConstants.
Unfortunately, I don't yet have small test case where the relevant invalid
'add' instruction is not itself dead (and thus eliminated by
DeadMachineInstructionElim -- sometimes bugpoint is too good at removing things)
Original commit message (by Adam Nemet):
Right now the load may not get DCE'd because of the side-effect of updating
the base pointer.
This can happen if we lower a read-modify-write of an illegal larger type
(e.g. i48) such that the modification only affects one of the subparts (the
lower i32 part but not the higher i16 part). See the testcase.
In order to spot the dead load we need to revisit it when SimplifyDemandedBits
decided that the value of the load is masked off. This is the
CommitTargetLoweringOpt piece.
I checked compile time with ARM64 by sending SPEC bitcode files through llc.
No measurable change.
Fixes <rdar://problem/16031651>
llvm-svn: 216898
The structures for Windows unwinding are shared across multiple platforms.
Indicate the encoding to be used for the particular target. Use this to switch
the unwind emitter instantiated by the AsmPrinter.
llvm-svn: 216895
implicit uses of the whole register when a sub register is defined.
Now the same iterator is used in the rematerilization loop as in the
spill loop later.
Patch provided by Mikael Holmen.
This fix was proposed and reviewed by Quentin Colombet,
http://lists.cs.uiuc.edu/pipermail/llvmdev/2014-August/076135.html.
Unfortunately, this error in the rematerilization code has only been
seen in a large test case for an out-of-tree target, and is probably
hard to reproduce on an in-tree target. Therefore, no testcase is
provided.
llvm-svn: 216873
Summary:
Fixes a FIXME in MachineSinking. Instead of using the simple heuristics
in isPostDominatedBy, use the real MachinePostDominatorTree. The old
heuristics caused instructions to sink unnecessarily, and might create
register pressure.
Test Plan:
Added a NVPTX codegen test to verify that our change is in effect. It also
shows the unnecessary register pressure caused by over-sinking. Updated
affected tests in AArch64 and X86.
Reviewers: eliben, meheff, Jiangning
Reviewed By: Jiangning
Subscribers: jholewinski, aemerson, mcrosier, llvm-commits
Differential Revision: http://reviews.llvm.org/D4814
llvm-svn: 216862
DW_TAG_lexical_scopes inform debuggers about the instruction range for
which a given variable (or imported declaration/module/etc) is valid. If
the scope doesn't itself contain any such entities, it's a waste of
space and should be omitted.
We were correctly doing this for entirely empty leaves, but not for
intermediate nodes.
Reduces total (not just debug sections) .o file size for a bootstrap
-gmlt LLVM by 22% and bootstrap -gmlt clang executable by 13%. The wins
for a full -g build will be less as a % (and in absolute terms), but
should still be substantial - with some of that win being fewer
relocations, thus more substantiall reducing link times than fewer bytes
alone would have.
llvm-svn: 216861
This makes the emptiness of the scope with regards to variables and
nested scopes is the same as with regards to imported entities. Just
check if we had nothing at all before we build the node.
llvm-svn: 216840
First of many steps to improve lexical scope construction (to omit
trivial lexical scopes - those without any direct variables). To that
end it's easier not to create imported entities directly into the
lexical scope node, but to build them, then add them if necessary.
llvm-svn: 216838
Summary:
This extends the work done in [1], adding missing intrinsic lowering for floor, trunc, round and copysign.
[1] http://comments.gmane.org/gmane.comp.compilers.llvm.cvs/199372
Test Plan: Extended `test/ExecutionEngine/Interpreter/intrinsics.ll` to test the additional missing intrinsics. All tests pass.
Reviewers: dexonsmith
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D5120
llvm-svn: 216827
When sinking an instruction it might be moved past the original last use of one
of its operands. This last use has the kill flag set and the verifier will
obviously complain about this.
Before Machine Sinking (AArch64):
%vreg3<def> = ASRVXr %vreg1, %vreg2<kill>
%XZR<def> = SUBSXrs %vreg4, %vreg1<kill>, 160, %NZCV<imp-def>
...
After Machine Sinking:
%XZR<def> = SUBSXrs %vreg4, %vreg1<kill>, 160, %NZCV<imp-def>
...
%vreg3<def> = ASRVXr %vreg1, %vreg2<kill>
This fix clears all the kill flags in all instruction that use the same operands
as the instruction that is being sunk.
This fixes rdar://problem/18180996.
llvm-svn: 216803
specifier and change the default behavior to only emit the
DW_AT_accessibility(public) attribute when the isPublic() is explicitly
set.
rdar://problem/18154959
llvm-svn: 216799
Rushed when I realized I hadn't committed the FreeDeleter for a clang
change I'd committed, and didn't check that I had things lying around in
my client.
Apologies for the noise.
llvm-svn: 216792
Summary:
If a variadic function body contains a musttail call, then we copy all
of the remaining register parameters into virtual registers in the
function prologue. We track the virtual registers through the function
body, and add them as additional registers to pass to the call. Because
this is all done in virtual registers, the register allocator usually
gives us good code. If the function does a call, however, it will have
to spill and reload all argument registers (ew).
Forwarding regparms on x86_32 is not implemented because most compilers
don't support varargs in 32-bit with regparms.
Reviewers: majnemer
Subscribers: aemerson, llvm-commits
Differential Revision: http://reviews.llvm.org/D5060
llvm-svn: 216780
The code in SelectionDAG::getMemset for some reason assumes the value passed to
memset is an i32. This breaks the generated code for targets that only have
registers smaller than 32 bits because the value might get split into multiple
registers by the calling convention. See the test for the MSP430 target included
in the patch for an example.
This patch ensures that nothing is assumed about the type of the value. Instead,
the type is taken from the selected overload of the llvm.memset intrinsic.
llvm-svn: 216716
On MachO, putting a symbol that doesn't start with a 'L' or 'l' in one of the
__TEXT,__literal* sections prevents the linker from merging the context of the
section.
Since private GVs are the ones the get mangled to start with 'L' or 'l', we now
only put those on the __TEXT,__literal* sections.
llvm-svn: 216682
was marked custom. The target independent DAG combine has no way to know if
the shuffles it is introducing are ones that the target could support or not.
llvm-svn: 216678
The included test case would fail, because the MI PHI node would have two
operands from the same predecessor.
This problem occurs when a switch instruction couldn't be selected. This happens
always, because there is no default switch support for FastISel to begin with.
The problem was that FastISel would first add the operand to the PHI nodes and
then fall-back to SelectionDAG, which would then in turn add the same operands
to the PHI nodes again.
This fix removes these duplicate PHI node operands by reseting the
PHINodesToUpdate to its original state before FastISel tried to select the
instruction.
This fixes <rdar://problem/18155224>.
llvm-svn: 216640
Currently instructions are folded very aggressively for AArch64 into the memory
operation, which can lead to the use of killed operands:
%vreg1<def> = ADDXri %vreg0<kill>, 2
%vreg2<def> = LDRBBui %vreg0, 2
... = ... %vreg1 ...
This usually happens when the result is also used by another non-memory
instruction in the same basic block, or any instruction in another basic block.
This fix teaches hasTrivialKill to not only check the LLVM IR that the value has
a single use, but also to check if the register that represents that value has
already been used. This can happen when the instruction with the use was folded
into another instruction (in this particular case a load instruction).
This fixes rdar://problem/18142857.
llvm-svn: 216634
FastEmitInst_ri was constraining the first operand without checking if it is
a virtual register. Use constrainOperandRegClass as all the other
FastEmitInst_* functions.
llvm-svn: 216613
This teaches the AArch64 backend to deal with the operations required
to deal with the operations on v4f16 and v8f16 which are exposed by
NEON intrinsics, plus the add, sub, mul and div operations.
llvm-svn: 216555
This combine is essentially combining target-specific nodes back into target
independent nodes that it "knows" will be combined yet again by a target
independent DAG combine into a different set of target-independent nodes that
are legal (not custom though!) and thus "ok". This seems... deeply flawed. The
crux of the problem is that we don't combine un-legalized shuffles that are
introduced by legalizing other operations, and thus we don't see a very
profitable combine opportunity. So the backend just forces the input to that
combine to re-appear.
However, for this to work, the conditions detected to re-form the unlegalized
nodes must be *exactly* right. Previously, failing this would have caused poor
code (if you're lucky) or a crasher when we failed to select instructions.
After r215611 we would fall back into the legalizer. In some cases, this just
"fixed" the crasher by produces bad code. But in the test case added it caused
the legalizer and the dag combiner to iterate forever.
The fix is to make the alignment checking in the x86 side of things match the
alignment checking in the generic DAG combine exactly. This isn't really a
satisfying or principled fix, but it at least make the code work as intended.
It also highlights that it would be nice to detect the availability of under
aligned loads for a given type rather than bailing on this optimization. I've
left a FIXME to document this.
Original commit message for r215611 which covers the rest of the chang:
[SDAG] Fix a case where we would iteratively legalize a node during
combining by replacing it with something else but not re-process the
node afterward to remove it.
In a truly remarkable stroke of bad luck, this would (in the test case
attached) end up getting some other node combined into it without ever
getting re-processed. By adding it back on to the worklist, in addition
to deleting the dead nodes more quickly we also ensure that if it
*stops* being dead for any reason it makes it back through the
legalizer. Without this, the test case will end up failing during
instruction selection due to an and node with a type we don't have an
instruction pattern for.
It took many million runs of the shuffle fuzz tester to find this.
llvm-svn: 216537
Take a StringRef instead of a "const char *".
Take a "std::error_code &" instead of a "std::string &" for error.
A create static method would be even better, but this patch is already a bit too
big.
llvm-svn: 216393
This patch adds support to recognize division by uniform power of 2 and modifies the cost table to vectorize division by uniform power of 2 whenever possible.
Updates Cost model for Loop and SLP Vectorizer.The cost table is currently only updated for X86 backend.
Thanks to Hal, Andrea, Sanjay for the review. (http://reviews.llvm.org/D4971)
llvm-svn: 216371
There's no need to do this if the user doesn't call va_start. In the
future, we're going to have thunks that forward these register
parameters with musttail calls, and they won't need these spills for
handling va_start.
Most of the test suite changes are adding va_start calls to existing
tests to keep things working.
llvm-svn: 216294
Somewhat unnoticed in the original implementation of discriminators, but
it could cause instructions to end up in new, small,
DW_TAG_lexical_blocks due to the use of DILexicalBlock to track
discriminator changes.
Instead, use DILexicalBlockFile which we already use to track file
changes without introducing new scopes, so it works well to track
discriminator changes in the same way.
llvm-svn: 216239
isPow2DivCheap
That name doesn't specify signed or unsigned.
Lazy as I am, I eventually read the function and variable comments. It turns out that this is strictly about signed div. But I discovered that the comments are wrong:
srl/add/sra
is not the general sequence for signed integer division by power-of-2. We need one more 'sra':
sra/srl/add/sra
That's the sequence produced in DAGCombiner. The first 'sra' may be removed when dividing by exactly '2', but that's a special case.
This patch corrects the comments, changes the name of the flag bit, and changes the name of the accessor methods.
No functional change intended.
Differential Revision: http://reviews.llvm.org/D5010
llvm-svn: 216237
The advanced copy optimization does not yield any difference on the whole llvm
test-suite + SPECs, either in compile time or runtime (binaries are identical),
but has a big potential when data go back and forth between register files as
demonstrated with test/CodeGen/ARM/adv-copy-opt.ll.
Note: This was measured for both Os and O3 for armv7s, arm64, and x86_64.
<rdar://problem/12702965>
llvm-svn: 216236
AtomicExpandLoadLinked is currently rather ARM-specific. This patch is the first of
a group that aim at making it more target-independent. See
http://lists.cs.uiuc.edu/pipermail/llvmdev/2014-August/075873.html
for details
The command line option is "atomic-expand"
llvm-svn: 216231
The FPv4-SP floating-point unit is generally referred to as
single-precision only, but it does have double-precision registers and
load, store and GPR<->DPR move instructions which operate on them.
This patch enables the use of these registers, the main advantage of
which is that we now comply with the AAPCS-VFP calling convention.
This partially reverts r209650, which added some AAPCS-VFP support,
but did not handle return values or alignment of double arguments in
registers.
This patch also adds tests for Thumb2 code generation for
floating-point instructions and intrinsics, which previously only
existed for ARM.
llvm-svn: 216172
advanced copy optimization.
This is the final step patch toward transforming:
udiv r0, r0, r2
udiv r1, r1, r3
vmov.32 d16[0], r0
vmov.32 d16[1], r1
vmov r0, r1, d16
bx lr
into:
udiv r0, r0, r2
udiv r1, r1, r3
bx lr
Indeed, thanks to this patch, this optimization is able to look through
vmov.32 d16[0], r0
vmov.32 d16[1], r1
and is able to rewrite the following sequence:
vmov.32 d16[0], r0
vmov.32 d16[1], r1
vmov r0, r1, d16
into simple generic GPR copies that the coalescer managed to remove.
<rdar://problem/12702965>
llvm-svn: 216144
This patch adds a new property: isInsertSubreg and the related target hooks:
TargetIntrInfo::getInsertSubregInputs and
TargetInstrInfo::getInsertSubregLikeInputs to specify that a target specific
instruction is a (kind of) INSERT_SUBREG.
The approach is similar to r215394.
<rdar://problem/12702965>
llvm-svn: 216139
advanced copy optimization.
This patch is a step toward transforming:
udiv r0, r0, r2
udiv r1, r1, r3
vmov.32 d16[0], r0
vmov.32 d16[1], r1
vmov r0, r1, d16
bx lr
into:
udiv r0, r0, r2
udiv r1, r1, r3
bx lr
Indeed, thanks to this patch, this optimization is able to look through
vmov r0, r1, d16
but it does not understand yet
vmov.32 d16[0], r0
vmov.32 d16[1], r1
Comming patches will fix that and update the related test case.
<rdar://problem/12702965>
llvm-svn: 216136
This patch adds a new property: isExtractSubreg and the related target hooks:
TargetIntrInfo::getExtractSubregInputs and
TargetInstrInfo::getExtractSubregLikeInputs to specify that a target specific
instruction is a (kind of) EXTRACT_SUBREG.
The approach is similar to r215394.
<rdar://problem/12702965>
llvm-svn: 216130
Store TargetSelectionDAGInfo as a pointer instead of a reference:
getSelectionDAGInfo() may not be implemented for certain backends
(e.g. it's not currently implemented for R600).
This bug is reported by UBSan.
llvm-svn: 216129
Both MachineLoopInfo and MachineDominatorTree may be null in ScheduleDAGMI
constructor call. It is undefined behavior to take references to these values.
This bug is reported by UBSan.
llvm-svn: 216118
In PR20308 ( http://llvm.org/bugs/show_bug.cgi?id=20308 ), the critical-anti-dependency breaker
caused a miscompile because it broke a WAR hazard using a register that it thinks is available
based on info from a kill inst. Until PR18663 is solved, we shouldn't use any def/use info from
a kill because they are really just nops.
This patch adds guard checks for kills around calls to ScanInstruction() where the DefIndices
array is set. For good measure, add an assert in ScanInstruction() so we don't hit this bug again.
The test case is a reduced version of the code from the bug report.
Differential Revision: http://reviews.llvm.org/D4977
llvm-svn: 216114
the isRegSequence property.
This is a follow-up of r215394 and r215404, which respectively introduces the
isRegSequence property and uses it for ARM.
Thanks to the property introduced by the previous commits, this patch is able
to optimize the following sequence:
vmov d0, r2, r3
vmov d1, r0, r1
vmov r0, s0
vmov r1, s2
udiv r0, r1, r0
vmov r1, s1
vmov r2, s3
udiv r1, r2, r1
vmov.32 d16[0], r0
vmov.32 d16[1], r1
vmov r0, r1, d16
bx lr
into:
udiv r0, r0, r2
udiv r1, r1, r3
vmov.32 d16[0], r0
vmov.32 d16[1], r1
vmov r0, r1, d16
bx lr
This patch refactors how the copy optimizations are done in the peephole
optimizer. Prior to this patch, we had one copy-related optimization that
replaced a copy or bitcast by a generic, more suitable (in terms of register
file), copy.
With this patch, the peephole optimizer features two copy-related optimizations:
1. One for rewriting generic copies to generic copies:
PeepholeOptimizer::optimizeCoalescableCopy.
2. One for replacing non-generic copies with generic copies:
PeepholeOptimizer::optimizeUncoalescableCopy.
The goals of these two optimizations are slightly different: one rewrite the
operand of the instruction (#1), the other kills off the non-generic instruction
and replace it by a (sequence of) generic instruction(s).
Both optimizations rely on the ValueTracker introduced in r212100.
The ValueTracker has been refactored to use the information from the
TargetInstrInfo for non-generic instruction. As part of the refactoring, we
switched the tracking from the index of the definition to the actual register
(virtual or physical). This one change is to provide better consistency with
register related APIs and to ease the use of the TargetInstrInfo.
Moreover, this patch introduces a new helper class CopyRewriter used to ease the
rewriting of generic copies (i.e., #1).
Finally, this patch adds a dead code elimination pass right after the peephole
optimizer to get rid of dead code that may appear after rewriting.
This is related to <rdar://problem/12702965>.
Review: http://reviews.llvm.org/D4874
llvm-svn: 216088
legalization stage. With those two optimizations, fewer signed/zero extension
instructions can be inserted, and then we can expose more opportunities to
Machine CSE pass in back-end.
llvm-svn: 216066
Note: This was originally reverted to track down a buildbot error. This commit
exposed a latent bug that was fixed in r215753. Therefore it is reapplied
without any modifications.
I run it through SPEC2k and SPEC2k6 for AArch64 and it didn't introduce any new
regeressions.
Original commit message:
This changes the order in which FastISel tries to materialize a constant.
Originally it would try to use a simple target-independent approach, which
can lead to the generation of inefficient code.
On X86 this would result in the use of movabsq to materialize any 64bit
integer constant - even for simple and small values such as 0 and 1. Also
some very funny floating-point materialization could be observed too.
On AArch64 it would materialize the constant 0 in a register even the
architecture has an actual "zero" register.
On ARM it would generate unnecessary mov instructions or not use mvn.
This change simply changes the order and always asks the target first if it
likes to materialize the constant. This doesn't fix all the issues
mentioned above, but it enables the targets to implement such
optimizations.
Related to <rdar://problem/17420988>.
llvm-svn: 216006
When combining a pair of shuffle nodes, check if the combined shuffle mask is
trivially Undef. In case, immediately fold that pair of shuffles to Undef.
The lack of checks for undef masks was the root-cause of a poor-codegen bug
in the dag combiner.
Example:
%1 = shufflevector <4 x i32> %A, <4 x i32> %B, <4 x i32> <i32 4, i32 1, i32 1, i32 6>
%2 = shufflevector <4 x i32> %1, <4 x i32> undef, <4 x i32> <i32 0, i32 4, i32 1, i32 6>
%3 = shufflevector <4 x i32> %2, <4 x i32> undef, <4 x i32> <i32 1, i32 5, i32 3, i32 3>
Before this patch, on x86 (with -mcpu=corei7) we failed to fold the entire
sequence to Undef value and therefore we generated:
shufps $-123, %xmm1, $xmm0
pshufd $-46, %xmm0, %xmm0
With this patch, the entire shuffle sequence is folded to Undef and no
shuffles are generated in the output assembly.
Added new test cases to test 'combine-vec-shuffle-5.ll'.
llvm-svn: 215797
We used to assume that any fixed-offset stack object was not aliased. This
meant that no IR value could point to the memory contained in such an object.
This is a reasonable default, but is not a universally-correct
target-independent fact. For example, on PowerPC (both Darwin and non-Darwin),
some byval arguments are allocated at fixed offsets by the ABI. These, however,
certainly can be pointed to by IR values. This change moves the 'isAliased'
logic out of FixedStackPseudoSourceValue and into MFI, and allows the isAliased
property to be overridden for fixed-offset objects.
This will be used by an upcoming commit to the PowerPC backend to fix PR20280.
No functionality change intended (the behavior of
FixedStackPseudoSourceValue::isAliased has been made more conservative for
callers that don't pass an MFI object, but I don't see any in-tree callers that
do that).
llvm-svn: 215794
As Jim pointed out this assert isn't really needed to test for correctness,
because the code right afterwards does the same check and falls-back to
SelectionDAG - as intended.
llvm-svn: 215735
This reverts:
r215595 "[FastISel][X86] Add large code model support for materializing floating-point constants."
r215594 "[FastISel][X86] Use XOR to materialize the "0" value."
r215593 "[FastISel][X86] Emit more efficient instructions for integer constant materialization."
r215591 "[FastISel][AArch64] Make use of the zero register when possible."
r215588 "[FastISel] Let the target decide first if it wants to materialize a constant."
r215582 "[FastISel][AArch64] Cleanup constant materialization code. NFCI."
llvm-svn: 215673
This patch allows a vector fneg of a bitcasted integer value to be optimized in the same way that we already optimize a scalar fneg. If the integer variable is a constant, we can precompute the result and not require any logic ops.
This patch is very similar to a fabs patch committed at r214892.
Differential Revision: http://reviews.llvm.org/D4852
llvm-svn: 215646
input node after manually adding it to the worklist and using CombineTo.
Once we use CombineTo the input node may have been deleted. Despite this
being *completely confusing* and somewhat broken, the only way to
"correctly" return from a DAG combine after potentially deleting the
input node is to return *that exact node*....
But really, this code should just never have used CombineTo. It won't do
what it wants (returning the node as mentioned above just causes the
combine to infloop). The correct way to combine away a casted load to
a load of the correct type is to RAUW the chain directly and then return
the loaded value to replace the actual value node.
I managed to find this with the vector shuffle fuzzer even though it
clearly has nothing at all to do with vector shuffles and rather those
happen to trigger a load of a constant pool that hits this combine *just
right*. I've included the test as it is small and a nice stress test
that the infrastructure isn't asserting.
llvm-svn: 215622
combining by replacing it with something else but not re-process the
node afterward to remove it.
In a truly remarkable stroke of bad luck, this would (in the test case
attached) end up getting some other node combined into it without ever
getting re-processed. By adding it back on to the worklist, in addition
to deleting the dead nodes more quickly we also ensure that if it
*stops* being dead for any reason it makes it back through the
legalizer. Without this, the test case will end up failing during
instruction selection due to an and node with a type we don't have an
instruction pattern for.
It took many million runs of the shuffle fuzz tester to find this.
llvm-svn: 215611
This changes the order in which FastISel tries to materialize a constant.
Originally it would try to use a simple target-independent approach, which
can lead to the generation of inefficient code.
On X86 this would result in the use of movabsq to materialize any 64bit
integer constant - even for simple and small values such as 0 and 1. Also
some very funny floating-point materialization could be observed too.
On AArch64 it would materialize the constant 0 in a register even the
architecture has an actual "zero" register.
On ARM it would generate unnecessary mov instructions or not use mvn.
This change simply changes the order and always asks the target first if it
likes to materialize the constant. This doesn't fix all the issues
mentioned above, but it enables the targets to implement such
optimizations.
Related to <rdar://problem/17420988>.
llvm-svn: 215588
This is a cleaner solution to the problem described in r215431.
When instructions are combined a dangling DBG_VALUE is removed.
This resolves bug 20598.
llvm-svn: 215587
New function to erase a machine instruction and mark DBG_VALUE
for removal. A DBG_VALUE is marked for removal when it references
an operand defined in the instruction.
Use the new function to cleanup code in dead machine instruction
removal pass.
llvm-svn: 215580
critical edge has been split. The MachineDominatorTree will when lazy update the
underlying dominance properties when require.
** Context **
This is a follow-up of r215410.
Each time a critical edge is split this invalidates the dominator tree
information. Thus, subsequent queries of that interface will be slow until the
underlying information is actually recomputed (costly).
** Problem **
Prior to this patch, splitting a critical edge needed to query the dominator
tree to update the dominator information.
Therefore, splitting a bunch of critical edges will likely produce poor
performance as each query to the dominator tree will use the slow query path.
This happens a lot in passes like MachineSink and PHIElimination.
** Proposed Solution **
Splitting a critical edge is a local modification of the CFG. Moreover, as soon
as a critical edge is split, it is not critical anymore and thus cannot be a
candidate for critical edge splitting anymore. In other words, the predecessor
and successor of a basic block inserted on a critical edge cannot be inserted by
critical edge splitting.
Using these observations, we can pile up the splitting of critical edge and
apply then at once before updating the DT information.
The core of this patch moves the update of the MachineDominatorTree information
from MachineBasicBlock::SplitCriticalEdge to a lazy MachineDominatorTree.
** Performance **
Thanks to this patch, the motivating example compiles in 4- minutes instead of
6+ minutes. No test case added as the motivating example as nothing special but
being huge!
The binaries are strictly identical for all the llvm test-suite + SPECs with and
without this patch for both Os and O3.
Regarding compile time, I observed only noise, although on average I saw a
small improvement.
<rdar://problem/17894619>
llvm-svn: 215576
Add header guards to files that were missing guards. Remove #endif comments
as they don't seem common in LLVM (we can easily add them back if we decide
they're useful)
Changes made by clang-tidy with minor tweaks.
llvm-svn: 215558
This patch improves the existing algorithm in DAGCombiner that
attempts to fold shuffles according to rule:
shuffle(shuffle(x, y, M1), undef, M2) -> shuffle(y, undef, M3)
Before this change, there were cases where the DAGCombiner conservatively
avoided folding shuffles even if the resulting mask would have been legal.
That is because the algorithm wrongly assumed that commuting
an illegal shuffle mask would always produce an illegal mask.
With this change, we now correctly compute the commuted shuffle mask before
calling method 'isShuffleMaskLegal' on it.
On X86, this improves for example the codegen for the following function:
define <4 x i32> @test(<4 x i32> %A, <4 x i32> %B) {
%1 = shufflevector <4 x i32> %B, <4 x i32> %A, <4 x i32> <i32 1, i32 2, i32 6, i32 7>
%2 = shufflevector <4 x i32> %1, <4 x i32> undef, <4 x i32> <i32 2, i32 3, i32 2, i32 3>
ret <4 x i32> %2
}
Before this change the X86 backend (-mcpu=corei7) generated
the following assembly code for function @test:
shufps $-23, %xmm0, %xmm1 # xmm1 = xmm1[1,2],xmm0[2,3]
movhlps %xmm1, %xmm1 # xmm1 = xmm1[1,1]
movaps %xmm1, %xmm0
Now we produce:
movhlps %xmm0, %xmm0 # xmm0 = xmm0[1,1]
Added extra test cases in combine-vec-shuffle-2.ll to verify that we correctly
fold according to the above-mentioned rule.
llvm-svn: 215555
This implements PPCTargetLowering::getTgtMemIntrinsic for Altivec load/store
intrinsics. As with the construction of the MachineMemOperands for the
intrinsic calls used for unaligned load/store lowering, the only slight
complication is that we need to represent a larger memory range than the
loaded/stored value-type size (because the address is rounded down to an
aligned address, and we need to conservatively represent the entire possible
range of the actual access). This required adding an extra size field to
TargetLowering::IntrinsicInfo, and this was done in a way that required no
modifications to other targets (the size defaults to the store size of the
provided memory data type).
This fixes test/CodeGen/PowerPC/unal-altivec-wint.ll (so it can be un-XFAILed).
llvm-svn: 215512
refactoring in 215384. This way it can unique multiple entries describing
the same piece even if they don't have the exact same location.
(The same piece may get merged in and be added from OpenRanges).
There ought to be a more elegant solution for this, though.
llvm-svn: 215418
as long as possible.
** Context **
Each time the dominance information is modified, the dominator tree analysis
switches in a slow query mode. After a few queries without any modification on
the dominator tree, it performs an expensive update of its internal structure to
provide fast queries again.
** Problem **
Prior to this patch, the MachineSink pass was splitting the critical edges on
demand while relying heavy on the dominator tree information. In some cases,
this leads to pathological behavior where:
- We end up in the slow query mode right after splitting an edge.
- We update the dominance information.
- We break the dominance information again, thus ending up in the slow query
mode and so on.
** Proposed Solution **
To mitigate this effect, this patch postpones all the splitting of the edges at
the end of each iteration of the main loop.
The benefits are:
- The dominance information is valid for the life time of an iteration.
- This simplifies the code as we do not have to special treat instructions that
are sunk on critical edges. Indeed, the related block will be available
through the next iteration.
The downside is that when edges splitting is required, this incurs an additional
iteration of the main loop compared to the previous scheme.
** Performance **
Thanks to this patch, the motivating example compiles in 6+ minutes instead of
10+ minutes. No test case added as the motivating example as nothing special but
being huge!
I have measured only noise for both the compile time and the runtime on the llvm
test-suite + SPECs with Os and O3.
Note: The current implementation of MachineBasicBlock::SplitCriticalEdge also
uses the dominance information and therefore, hits this problem. A subsequent
patch will address that.
<rdar://problem/17894619>
llvm-svn: 215410
This patch adds a new property: isRegSequence and the related target hooks:
TargetIntrInfo::getRegSequenceInputs and
TargetInstrInfo::getRegSequenceLikeInputs to specify that a target specific
instruction is a (kind of) REG_SEQUENCE.
<rdar://problem/12702965>
llvm-svn: 215394
buildLocationLists easier to read.
The previous implementation conflated the merging of individual pieces
and the merging of entire DebugLocEntries.
By splitting this functionality into two separate functions the intention
of the code should be clearer.
llvm-svn: 215383
be propagated to all its users, and this propagation could increase the
probability of finding common subexpressions. If the COPY has only one user,
the COPY itself can be removed.
llvm-svn: 215344
That broke the build:
/data/buildslave/clang-amd64-freebsd/src-llvm/lib/CodeGen/PeepholeOptimizer.cpp:729:46: error: non-const lvalue reference to type 'SmallPtrSet<[...], 8>' cannot bind to a value of unrelated type 'SmallPtrSet<[...], 16>'
Changed |= optimizeExtInstr(MI, MBB, LocalMIs);
^~~~~~~~
/data/buildslave/clang-amd64-freebsd/src-llvm/lib/CodeGen/PeepholeOptimizer.cpp:265:49: note: passing argument to parameter 'LocalMIs' here
SmallPtrSet<MachineInstr*, 8> &LocalMIs) {
^
llvm-svn: 215341
Follow up to r214266. Add missing case in ScalarizeVectorResult() for
cttz_zero_undef.
Differential Revision: http://reviews.llvm.org/D4813
llvm-svn: 215330
floating point exceptions, added use of flag to fold potentially exception
raising floating point math in selection DAG. No functionality change, as
targets have to explicitly ask for this behavior and none does today.
llvm-svn: 215222
__stack_chk_guard.
Handle the case where the pointer operand of the load instruction that loads the
stack guard is not a global variable but instead a bitcast.
%StackGuard = load i8** bitcast (i64** @__stack_chk_guard to i8**)
call void @llvm.stackprotector(i8* %StackGuard, i8** %StackGuardSlot)
Original test case provided by Ana Pazos.
This fixes PR20558.
llvm-svn: 215167
Due to an unnecessary special case, inlined arguments that happened to
be from the same function as they were inlined into were misclassified
as non-inline arguments and would overwrite the non-inlined arguments.
Assert that we never overwrite a function's arguments, and stop
misclassifying inlined arguments as non-inline arguments to fix this
issue.
Excuse the rather crappy test case - handcrafted IR might do better, or
someone who understands better how to tickle the inliner to create a
recursive inlining situation like this (though it may also be necessary
to tickle the variable in a particular way to cause it to be recorded in
the MMI side table and go down this particular path for location
information).
llvm-svn: 215157
be deleted. This will be reapplied as soon as possible and before
the 3.6 branch date at any rate.
Approved by Jim Grosbach, Lang Hames, Rafael Espindola.
This reverts commits r215111, 215115, 215116, 215117, 215136.
llvm-svn: 215154
Re-commit of r214832,r21469 with a work-around that
avoids the previous problem with gcc build compilers
The work-around is to use SmallVector instead of ArrayRef
of basic blocks in preservesResourceLen()/MachineCombiner.cpp
llvm-svn: 215151
BranchFolderPass was not correctly setting the basic block branch weights when
tail-merging created or merged blocks. This patch recomutes the weights of
tail-merged blocks using the following formula:
branch_weight(merged block to successor j) =
sum(block_frequency(bb) * branch_probability(bb -> j))
bb is a block that is in the set of merged blocks.
<rdar://problem/16256423>
llvm-svn: 215135
I am sure we will be finding bits and pieces of dead code for years to
come, but this is a good start.
Thanks to Lang Hames for making MCJIT a good replacement!
llvm-svn: 215111
to get the subtarget and that's accessible from the MachineFunction
now. This helps clear the way for smaller changes where we getting
a subtarget will require passing in a MachineFunction/Function as
well.
llvm-svn: 214988
In r210492 the logic of calculateDbgValueHistory was changed to end
register variable live ranges at the end of MBB conditionally on
the fact that the register was or not clobbered by the function body.
This requires an initial scan of all the operands of the function
to collect all clobbered registers. In a second pass over all
instructions, we compare this set with the set of clobbered
registers for the current MachineInstruction. This modification
incurred a compilation time regression on some benchmarks: the
debug info emission phase takes ~10% more time.
While a small performance hit is unavoidable due to the initial
scan requirement, we can improve the situation by avoiding to
create too many temporary sets and just use lambdas to work directly
on the result of the initial scan.
Fixes <rdar://problem/17884104>
Patch by Frederic Riss!
llvm-svn: 214987
The handling of the epilogue is best expressed as an early exit and
there is no reason to look for register defs in DbgValue MIs.
Patch by Frederic Riss!
llvm-svn: 214986
Otherwise we can end up with an argument frame size that is not a
multiple of stack slot size, which is very awkward.
This fixes PR20547, which was a bug in x86_64 Sys V vararg handling.
However, it's much easier to test this with x86 callee-cleanup
functions, which previously ended in "retl $6" instead of "retl $8".
This does affect behavior of all backends, but it presumably fixes the
same bug in all of them.
llvm-svn: 214980
This patch addresses 2 FIXME comments that I added to CriticalAntiDepBreaker while fixing PR20020.
Initialize an MCSubRegIterator and an MCRegAliasIterator to include the self reg.
Assuming that works as advertised, there should be functional difference with this patch, just less code.
Also, remove the associated asserts - we're setting those values just before, so the asserts don't do anything meaningful.
Differential Revision: http://reviews.llvm.org/D4566
llvm-svn: 214973
This was coming in weird debug info that had variables (and hence
debug_locs) but was in GMLT mode (because it was missing the 13th field
of the compile_unit metadata) so no ranges were constructed. We should
always have at least one range for any CU with a debug_loc in it -
because the range should cover the debug_loc.
The assertion just ensures that the "!= 1" range case inside the
subsequent loop doesn't get entered for the case where there are no
ranges at all, which should never reach here in the first place.
llvm-svn: 214939
This simplifies construction and usage while making the data structure
smaller. It was a holdover from the days when we didn't have a separate
DebugLocList and all we had was a flat list of DebugLocEntries.
llvm-svn: 214933
Allow vector fabs operations on bitcasted constant integer values to be optimized
in the same way that we already optimize scalar fabs.
So for code like this:
%bitcast = bitcast i64 18446744069414584320 to <2 x float> ; 0xFFFF_FFFF_0000_0000
%fabs = call <2 x float> @llvm.fabs.v2f32(<2 x float> %bitcast)
%ret = bitcast <2 x float> %fabs to i64
Instead of generating something like this:
movabsq (constant pool loadi of mask for sign bits)
vmovq (move from integer register to vector/fp register)
vandps (mask off sign bits)
vmovq (move vector/fp register back to integer return register)
We should generate:
mov (put constant value in return register)
I have also removed a redundant clause in the first 'if' statement:
N0.getOperand(0).getValueType().isInteger()
is the same thing as:
IntVT.isInteger()
Testcases for x86 and ARM added to existing files that deal with vector fabs.
One existing testcase for x86 removed because it is no longer ideal.
For more background, please see:
http://reviews.llvm.org/D4770
And:
http://llvm.org/bugs/show_bug.cgi?id=20354
Differential Revision: http://reviews.llvm.org/D4785
llvm-svn: 214892
shorter/easier and have the DAG use that to do the same lookup. This
can be used in the future for TargetMachine based caching lookups from
the MachineFunction easily.
Update the MIPS subtarget switching machinery to update this pointer
at the same time it runs.
llvm-svn: 214838
This code is completely wrong. It is also dead, as if it were to *ever*
run, it would crash. Fortunately, after my work to the combiner, it is
at least *possible* to reach the code, and llvm-stress has found a test
case. Thanks to Patrick for reporting.
It would be really good if anyone who remembers how this code works and
what it was intended to do could add some more obvious test coverage
instead of my completely contrived and reduced test case. My test case
was so brittle I left a bread crumb comment in it to help the next
person to stumble on it and not know what it was actually testing for.
llvm-svn: 214785
Originally reverted in r213432 with flakey failures on an ASan self-host
build. After reduction it seems to be the same issue fixed in r213805
(ArgPromo + DebugInfo: Handle updating debug info over multiple
applications of argument promotion) and r213952 (by having
LiveDebugVariables strip dbg_value intrinsics in functions that are not
described by debug info). Though I cannot explain why this failure was
flakey...
llvm-svn: 214761
combines) until they are legal.
Doing it the old way could, when the stars align *just* right, cause
a node to get into the combine set prior to being legalized. Then, when
the same node showed up as an operand to another node later on (but not
so much later on that it had been deleted as dead) we would fail to add
it back to the worklist thinking it had already been combined. This
would in turn cause it to not be legalized. Fortunately, we can also
walk the operands looking for uncombined (and thus potentially
un-legalized) nodes late. It will still ensure that we walk all operands
of all nodes and send all of them through both the legalizer without
changes and the combiner at least once. (Which was the original goal of
this).
I have a test case for this bug, but it is terribly brittle. For
example, it will stop finding the bug the moment I enable the new
shuffle lowering. I don't yet have any test case that reliably exercises
this bug, and it isn't clear that it will be possible to craft one. It
is entirely possible that with the new shuffle lowering the two forms of
doing this are precisely equivalent. That doesn't mean we shouldn't take
the more conservative approach of insisting on things in the combined
set having survived the legalizer.
llvm-svn: 214673
GCC 4.8.2 objects to the tautological condition in the assert as the unsigned
value is guaranteed to be >= 0. Simplify the assertion by dropping the
tautological condition.
llvm-svn: 214671
This is intended to be the minimal change needed to fix PR20354 ( http://llvm.org/bugs/show_bug.cgi?id=20354 ). The check for a vector operation was wrong; we need to check that the fabs itself is not a vector operation.
This patch will not generate the optimal code. A constant pool load and 'and' op will be generated instead of just returning a value that we can calculate in advance (as we do for the scalar case). I've put a 'TODO' comment for that here and expect to have that patch ready soon.
There is a very similar optimization that we can do in visitFNEG, so I've put another 'TODO' there and expect to have another patch for that too.
llvm-svn: 214670
sequence - target independent framework
When the DAGcombiner selects instruction sequences
it could increase the critical path or resource len.
For example, on arm64 there are multiply-accumulate instructions (madd,
msub). If e.g. the equivalent multiply-add sequence is not on the
crictial path it makes sense to select it instead of the combined,
single accumulate instruction (madd/msub). The reason is that the
conversion from add+mul to the madd could lengthen the critical path
by the latency of the multiply.
But the DAGCombiner would always combine and select the madd/msub
instruction.
This patch uses machine trace metrics to estimate critical path length
and resource length of an original instruction sequence vs a combined
instruction sequence and picks the faster code based on its estimates.
This patch only commits the target independent framework that evaluates
and selects code sequences. The machine instruction combiner is turned
off for all targets and expected to evolve over time by gradually
handling DAGCombiner pattern in the target specific code.
This framework lays the groundwork for fixing
rdar://16319955
llvm-svn: 214666
so using a single helper which adds operands back onto the worklist.
Several places didn't rigorously do this but a couple already did.
Factoring them together and doing it rigorously is important to delete
things recursively early on in the combiner and get a chance to see
accurate hasOneUse values. While no existing test cases change, an
upcoming patch to add DAG combining logic for PSHUFB requires this to
work correctly.
llvm-svn: 214623
during DAGCombine in certain circumstances. Unfortunately, the circumstances required
to trigger the issue seem to require a pretty specific interaction of DAGCombines,
and I haven't been able to find a testcase that reproduces on X86, ARM, or AArch64.
The functionality added here is replicated in essentially every other DAG combine,
so it seems pretty obviously correct.
llvm-svn: 214622
variables (for example, by-value struct arguments passed in registers, or
large integer values split across several smaller registers).
On the IR level, this adds a new type of complex address operation OpPiece
to DIVariable that describes size and offset of a variable fragment.
On the DWARF emitter level, all pieces describing the same variable are
collected, sorted and emitted as DWARF expressions using the DW_OP_piece
and DW_OP_bit_piece operators.
http://reviews.llvm.org/D3373
rdar://problem/15928306
What this patch doesn't do / Future work:
- This patch only adds the backend machinery to make this work, patches
that change SROA and SelectionDAG's type legalizer to actually create
such debug info will follow. (http://reviews.llvm.org/D2680)
- Making the DIVariable complex expressions into an argument of dbg.value
will reduce the memory footprint of the debug metadata.
- The sorting/uniquing of pieces should be moved into DebugLocEntry,
to facilitate the merging of multi-piece entries.
llvm-svn: 214576
fromulation of the node, which isn't really the desired behavior from
within the combiner or legalizer, but is necessary within ISel. I've
added a hopefully helpful comment and fixed the only two places where
this took place.
Yet another step toward the combiner and legalizer not needing to use
update listeners with virtual calls to manage the worklists behind
legalization and combining.
llvm-svn: 214574
This lifts the (very few) places the legalizer would delete dead nodes
into the outer loop around the legalizer. This is significantly simpler
because it doesn't require the legalizer itself to manage the iterator
validity, and it doesn't require the legalizer to be a DAG update
listener in order to remove things from the legalized set. It also makes
the interface much less contrived for the case of the legalizer running
inside the last phase of DAG combining.
I'm working on centralizing the deletion of nodes during both legalizing
and combining as much as possible. My hope is to remove the need for DAG
update listeners from the combiner next, which would remove a costly
virtual dispatch chain on every deletion. This in turn should allow us
to more aggressively delete DAG nodes during combining which will in
turn allow us to combine more aggressively by exposing the actual nodes
which have single users to the combine phases.
llvm-svn: 214546
This change adds code to explicitly mark a function which requires runtime stack realignment as not having a fixed frame size in the StackMap section. As it happens, this is not actually a functional change. The size that would be reported without the check is also "-1", but as far as I can tell, that's an accident. The code change makes this explicit.
Note: There's a separate bug in handling of stackmaps and patchpoints in functions which need dynamic frame realignment. The current code assumes that offsets can be calculated from RBP, but realigned frames must use RSP. (There's a variable gap between RBP and the spill slots.) This change set does not address that issue.
Reviewers: atrick, ributzka
Differential Revision: http://reviews.llvm.org/D4572
llvm-svn: 214534
Altivec vector loads on PowerPC have an interesting property: They always load
from an aligned address (by rounding down the address actually provided if
necessary). In order to generate an actual unaligned load, you can generate two
load instructions, one with the original address, one offset by one vector
length, and use a special permutation to extract the bytes desired.
When this was originally implemented, I generated these two loads using regular
ISD::LOAD nodes, now marked as aligned. Unfortunately, there is a problem with
this:
The alignment of a load does not contribute to its identity, and SDNodes
are uniqued. So, imagine that we have some unaligned load, L1, that is not
aligned. The routine will create two loads, L1(aligned) and (L1+16)(aligned).
Further imagine that there had already existed a load (L1+16)(unaligned) with
the same chain operand as the load L1. When (L1+16)(aligned) is created as part
of the lowering of L1, this load *is* also the (L1+16)(unaligned) node, just
now marked as aligned (because the new alignment overwrites the old). But the
original users of (L1+16)(unaligned) now get the data intended for the
permutation yielding the data for L1, and (L1+16)(unaligned) no longer exists
to get its own permutation-based expansion. This was PR19991.
A second potential problem has to do with the MMOs on these loads, which can be
used by AA during instruction scheduling to break chain-based dependencies. If
the new "aligned" loads get the MMO from the original unaligned load, this does
not represent the fact that it will load data from below the original address.
Normally, this would not matter, but this load might be combined with another
load pair for a previous vector, and then the dependency on the otherwise-
ignored lower bytes can matter.
To fix both problems, instead of generating the necessary loads using regular
ISD::LOAD instructions, ppc_altivec_lvx intrinsics are used instead. These are
provided with MMOs with a conservative address range.
Unfortunately, I no longer have a failing test case (since PR19991 was
reported, other changes in CodeGen have forced this bug back into hiding it
again). Nevertheless, this should fix the underlying problem.
llvm-svn: 214481
Currently when DAGCombine converts loads feeding a switch into a switch of
addresses feeding a load the new load inherits the isInvariant flag of the left
side. This is incorrect since invariant loads can be reordered in cases where it
is illegal to reoarder normal loads.
This patch adds an isInvariant parameter to getExtLoad() and updates all call
sites to pass in the data if they have it or false if they don't. It also
changes the DAGCombine to use that data to make the right decision when
creating the new load.
llvm-svn: 214449
This is a follow-up to the activity in the bug at
http://llvm.org/bugs/show_bug.cgi?id=18663 . The underlying issue has
to do with how the KILL pseudo-instruction is handled. I defer to
Hal/Jakob/Uli for additional details and background.
This will disable the (bad?) assert, add an associated fixme comment,
and add a pair of tests.
The code change and the pr18663-2.ll test are copied from the referenced
bug. That test does not immediately fail in my environment, but I have
added the pr18663.ll test which does.
(Comment from Hal)
to provide everyone else with some context, this assert was not bad when
it was written. At that time, we only generated KILL pseudo instructions
around subregister copies. This logic, unfortunately, had its own problems.
In r199797, the relevant logic in MachineCopyPropagation was replaced to
generate KILLs for other kinds of copies too. This change in semantics broke
this now-problematic assumption in AggressiveAntiDepBreaker. The
AggressiveAntiDepBreaker really needs a proper cleanup to deal with the
change, but removing the assert (which just allows the function to return
false) is a safe conservative behavior, and should do for the time being.
llvm-svn: 214429
This fixes a mistake where I accidentially dropped the upper 32bit of a
64bit pointer during FastISel lowering of the patchpoint intrinsic.
llvm-svn: 214367
DAGCombine may choose to rewrite graphs where two loads feed a select into
graphs where a select of two addresses feed a load. While it sanity checks the
loads to make sure they are broadly equivalent it currently just uses the
alignment restriction of the left node. In cases where the right node has
stronger alignment requiresment this may lead to bad codegen, such as generating
an aligned load where an unaligned load is required. This patch makes the
combine generate a load with an alignment that is the same as whichever is more
restrictive of the two alignments.
Tests included.
rdar://17762530
llvm-svn: 214322
the jump instruction table pass. First, the verifier is already built
into all the tools. The test case is adapted to just run llvm-as
demonstrating that we still catch the broken module. Second, the
verifier is *extremely* slow. This was responsible for very significant
compile time regressions.
If you have deployed a Clang binary anywhere from r210280 to this
commit, you really want to re-deploy.
llvm-svn: 214287
Fix the missing case in ScalarizeVectorResult() that was exposed with
libclcore.bc in Android.
Differential Revision: http://reviews.llvm.org/D4645
llvm-svn: 214266
Per feedback on r214111, we are going to use null to represent unspecified
parameter. If the type array is {null}, it means a function that returns void;
If the type array is {null, null}, it means a variadic function that returns
void. In summary if we have more than one element in the type array and the last
element is null, it is a variadic function.
rdar://17628609
llvm-svn: 214189
The test being performed is just an approximation anyway, so it really
shouldn't crash when things don't go entirely as expected.
Should fix PR20474.
llvm-svn: 214177
We need to make sure we use the softened version of all appropriate operands in
the libcall, or things go horribly wrong. This may entail actually executing a
1-stage softening.
llvm-svn: 214175
The enum types array by design contains pointers to MDNodes rather than DIRefs.
Unique them when handling the enum types in DwarfDebug.
rdar://17628609
llvm-svn: 214139
DITypeArray is an array of DITypeRef, at its creation, we will create
DITypeRef (i.e use the identifier if the type node has an identifier).
This is the last patch to unique the type array of a subroutine type.
rdar://17628609
llvm-svn: 214132
This is the second of a series of patches to handle type uniqueing of the
type array for a subroutine type.
For vector and array types, getElements returns the array of subranges, so it
is a better name than getTypeArray. Even for class, struct and enum types,
getElements returns the members, which can be subprograms.
setArrays can set up to two arrays, the second is the templates.
This commit should have no functionality change.
llvm-svn: 214112
inspection in the proccess, and shuffle the logging in the DAG combiner
around a bit.
With this it is much easier to follow what the legalizer is doing. It
should even accurately present most of the strange legalization
operations where a single node is replaced by multiple nodes, etc. There
is still some information lost (we log SDNodes not SDValues so we don't
log which result is used for which thing), but I think this is much
closer to a usable system. Notably, this will make it *much* more
apparant when legalization is actually happening inside the combiner, or
when there is a cycle caused by interactions of the legalizer and the
combiner.
The "bug" I fixed here I'm not sure is remotely possible to trigger. We
were only adding one of the nodes in a replacement to the updated set
rather than all of the nodes in the replacement. Realistically, the
worst result of this are nodes not getting back onto the worklist in the
DAG combiner. I doubt it is possible to trigger this today, and
I certainly don't have any ideas about how, but this at least brings the
code into alignment with the principled operation of the routine.
llvm-svn: 214105
Rename to allowsMisalignedMemoryAccess.
On R600, 8 and 16 byte accesses are mostly OK with 4-byte alignment,
and don't need to be split into multiple accesses. Vector loads with
an alignment of the element type are not uncommon in OpenCL code.
llvm-svn: 214055
over each node in the worklist prior to combining.
This allows the combiner to produce new nodes which need to go back
through legalization. This is particularly useful when generating
operands to target specific nodes in a post-legalize DAG combine where
the operands are significantly easier to express as pre-legalized
operations. My immediate use case will be PSHUFB formation where we need
to build a constant shuffle mask with a build_vector node.
This also refactors the relevant functionality in the legalizer to
support this, and updates relevant tests. I've spoken to the R600 folks
and these changes look like improvements to them. The avx512 change
needs to be investigated, I suspect there is a disagreement between the
legalizer and the DAG combiner there, but it seems a minor issue so
leaving it to be re-evaluated after this patch.
Differential Revision: http://reviews.llvm.org/D4564
llvm-svn: 214020
This is the first commit in a series that add an @llvm.assume intrinsic which
can be used to provide the optimizer with a condition it may assume to be true
(when the control flow would hit the intrinsic call). Some basic properties are added here:
- llvm.invariant(true) is dead.
- llvm.invariant(false) is unreachable (this directly corresponds to the
documented behavior of MSVC's __assume(0)), so is llvm.invariant(undef).
The intrinsic is tagged as writing arbitrarily, in order to maintain control
dependencies. BasicAA has been updated, however, to return NoModRef for any
particular location-based query so that we don't unnecessarily block code
motion.
llvm-svn: 213973
address of the stack guard was being spilled to the stack.
Previously the address of the stack guard would get spilled to the stack if it
was impossible to keep it in a register. This patch introduces a new target
independent node and pseudo instruction which gets expanded post-RA to a
sequence of instructions that load the stack guard value. Register allocator
can now just remat the value when it can't keep it in a register.
<rdar://problem/12475629>
llvm-svn: 213967
This recommits r208930, r208933, and r208975 (by reverting r209338) and
reverts r209529 (the FIXME to readd this functionality once the tools
were fixed) now that DWP has been fixed to cope with a single section
for all fission type units.
Original commit message:
"Since type units in the dwo file are handled by a debug aware tool,
they don't need to leverage the ELF comdat grouping to implement
deduplication. Avoid creating all the .group sections for these as a
space optimization."
llvm-svn: 213956
Reverted by Eric Christopher (Thanks!) in r212203 after Bob Wilson
reported LTO issues. Duncan Exon Smith and Aditya Nandakumar helped
provide a reduced reproduction, though the failure wasn't too hard to
guess, and even easier with the example to confirm.
The assertion that the subprogram metadata associated with an
llvm::Function matches the scope data referenced by the DbgLocs on the
instructions in that function is not valid under LTO. In LTO, a C++
inline function might exist in multiple CUs and the subprogram metadata
nodes will refer to the same llvm::Function. In this case, depending on
the order of the CUs, the first intance of the subprogram metadata may
not be the one referenced by the instructions in that function and the
assertion will fail.
A test case (test/DebugInfo/cross-cu-linkonce-distinct.ll) is added, the
assertion removed and a comment added to explain this situation.
This was then reverted again in r213581 as it caused PR20367. The root
cause of this was the early exit in LiveDebugVariables meant that
spurious DBG_VALUE intrinsics that referenced dead variables were not
removed, causing an assertion/crash later on. The fix is to have
LiveDebugVariables strip all DBG_VALUE intrinsics in functions without
debug info as they're not needed anyway. Test case added to cover this
situation (that occurs when a debug-having function is inlined into a
nodebug function) in test/DebugInfo/X86/nodebug_with_debug_loc.ll
Original commit message:
If a function isn't actually in a CU's subprogram list in the debug info
metadata, ignore all the DebugLocs and don't try to build scopes, track
variables, etc.
While this is possibly a minor optimization, it's also a correctness fix
for an incoming patch that will add assertions to LexicalScopes and the
debug info verifier to ensure that all scope chains lead to debug info
for the current function.
Fix up a few test cases that had broken/incomplete debug info that could
violate this constraint.
Add a test case where this occurs by design (inlining a
debug-info-having function in an attribute nodebug function - we want
this to work because /if/ the nodebug function is then inlined into a
debug-info-having function, it should be fine (and will work fine - we
just stitch the scopes up as usual), but should the inlining not happen
we need to not assert fail either).
llvm-svn: 213952
with a result number outside the range of results for the node.
I don't know how we managed to not really check this very basic
invariant for so long, but the code is *very* broken at this point.
I have over 270 test failures with the assert enabled. I'm committing it
disabled so that others can join in the cleanup effort and reproduce the
issues. I've also included one of the obvious fixes that I already
found. More fixes to come.
llvm-svn: 213926
which have successfully round-tripped through the combine phase, and use
this to ensure all operands to DAG nodes are visited by the combiner,
even if they are only added during the combine phase.
This is critical to have the combiner reach nodes that are *introduced*
during combining. Previously these would sometimes be visited and
sometimes not be visited based on whether they happened to end up on the
worklist or not. Now we always run them through the combiner.
This fixes quite a few bad codegen test cases lurking in the suite while
also being more principled. Among these, the TLS codegeneration is
particularly exciting for programs that have this in the critical path
like TSan-instrumented binaries (although I think they engineer to use
a different TLS that is faster anyways).
I've tried to check for compile-time regressions here by running llc
over a merged (but not LTO-ed) clang bitcode file and observed at most
a 3% slowdown in llc. Given that this is essentially a worst case (none
of opt or clang are running at this phase) I think this is tolerable.
The actual LTO case should be even less costly, and the cost in normal
compilation should be negligible.
With this combining logic, it is possible to re-legalize as we combine
which is necessary to implement PSHUFB formation on x86 as
a post-legalize DAG combine (my ultimate goal).
Differential Revision: http://reviews.llvm.org/D4638
llvm-svn: 213898
vector operation legalization with support for custom target lowering
and fallback to expand when it fails, and use this to implement sext and
anyext load lowering for x86 in a more principled way.
Previously, the x86 backend relied on a target DAG combine to "combine
away" sextload and extload nodes prior to legalization, or would expand
them during legalization with terrible code. This is particularly
problematic because the DAG combine relies on running over non-canonical
DAG nodes at just the right time to match several common and important
patterns. It used a combine rather than lowering because we didn't have
good lowering support, and to expose some tricks being employed to more
combine phases.
With this change it becomes a proper lowering operation, the backend
marks that it can lower these nodes, and I've added support for handling
the canonical forms that don't have direct legal representations such as
sextload of a v4i8 -> v4i64 on AVX1. With this change, our test cases
for this behavior continue to pass even after the DAG combiner beigns
running more systematically over every node.
There is some noise caused by this in the test suite where we actually
use vector extends instead of subregister extraction. This doesn't
really seem like the right thing to do, but is unlikely to be a critical
regression. We do regress in one case where by lowering to the
target-specific patterns early we were able to combine away extraneous
legal math nodes. However, this regression is completely addressed by
switching to a widening based legalization which is what I'm working
toward anyways, so I've just switched the test to that mode.
Differential Revision: http://reviews.llvm.org/D4654
llvm-svn: 213897
This patch minimizes the number of nops that must be emitted on X86 to satisfy
stackmap shadow constraints.
To minimize the number of nops inserted, the X86AsmPrinter now records the
size of the most recent stackmap's shadow in the StackMapShadowTracker class,
and tracks the number of instruction bytes emitted since the that stackmap
instruction was encountered. Padding is emitted (if it is required at all)
immediately before the next stackmap/patchpoint instruction, or at the end of
the basic block.
This optimization should reduce code-size and improve performance for people
using the llvm stackmap intrinsic on X86.
<rdar://problem/14959522>
llvm-svn: 213892
This commit adds scoped noalias metadata. The primary motivations for this
feature are:
1. To preserve noalias function attribute information when inlining
2. To provide the ability to model block-scope C99 restrict pointers
Neither of these two abilities are added here, only the necessary
infrastructure. In fact, there should be no change to existing functionality,
only the addition of new features. The logic that converts noalias function
parameters into this metadata during inlining will come in a follow-up commit.
What is added here is the ability to generally specify noalias memory-access
sets. Regarding the metadata, alias-analysis scopes are defined similar to TBAA
nodes:
!scope0 = metadata !{ metadata !"scope of foo()" }
!scope1 = metadata !{ metadata !"scope 1", metadata !scope0 }
!scope2 = metadata !{ metadata !"scope 2", metadata !scope0 }
!scope3 = metadata !{ metadata !"scope 2.1", metadata !scope2 }
!scope4 = metadata !{ metadata !"scope 2.2", metadata !scope2 }
Loads and stores can be tagged with an alias-analysis scope, and also, with a
noalias tag for a specific scope:
... = load %ptr1, !alias.scope !{ !scope1 }
... = load %ptr2, !alias.scope !{ !scope1, !scope2 }, !noalias !{ !scope1 }
When evaluating an aliasing query, if one of the instructions is associated
with an alias.scope id that is identical to the noalias scope associated with
the other instruction, or is a descendant (in the scope hierarchy) of the
noalias scope associated with the other instruction, then the two memory
accesses are assumed not to alias.
Note that is the first element of the scope metadata is a string, then it can
be combined accross functions and translation units. The string can be replaced
by a self-reference to create globally unqiue scope identifiers.
[Note: This overview is slightly stylized, since the metadata nodes really need
to just be numbers (!0 instead of !scope0), and the scope lists are also global
unnamed metadata.]
Existing noalias metadata in a callee is "cloned" for use by the inlined code.
This is necessary because the aliasing scopes are unique to each call site
(because of possible control dependencies on the aliasing properties). For
example, consider a function: foo(noalias a, noalias b) { *a = *b; } that gets
inlined into bar() { ... if (...) foo(a1, b1); ... if (...) foo(a2, b2); } --
now just because we know that a1 does not alias with b1 at the first call site,
and a2 does not alias with b2 at the second call site, we cannot let inlining
these functons have the metadata imply that a1 does not alias with b2.
llvm-svn: 213864
In order to enable the preservation of noalias function parameter information
after inlining, and the representation of block-level __restrict__ pointer
information (etc.), additional kinds of aliasing metadata will be introduced.
This metadata needs to be carried around in AliasAnalysis::Location objects
(and MMOs at the SDAG level), and so we need to generalize the current scheme
(which is hard-coded to just one TBAA MDNode*).
This commit introduces only the necessary refactoring to allow for the
introduction of other aliasing metadata types, but does not actually introduce
any (that will come in a follow-up commit). What it does introduce is a new
AAMDNodes structure to hold all of the aliasing metadata nodes associated with
a particular memory-accessing instruction, and uses that structure instead of
the raw MDNode* in AliasAnalysis::Location, etc.
No functionality change intended.
llvm-svn: 213859
Constant fold the lanes of the input constant build_vector individually
so we correctly handle when the vector elements are not all the same
constant value.
PR20394
llvm-svn: 213798
The target-independent DAGcombiner will generate:
asr w1, X, #31 w1 = splat sign bit.
add X, X, w1, lsr #28 X = X + 0 or pow2-1
asr w0, X, asr #4 w0 = X/pow2
However, the add + shifts is expensive, so generate:
add w0, X, 15 w0 = X + pow2-1
cmp X, wzr X - 0
csel X, w0, X, lt X = (X < 0) ? X + pow2-1 : X;
asr w0, X, asr 4 w0 = X/pow2
llvm-svn: 213758
This pass attempts to speculatively use a sqrt instruction if one exists on the target, falling back to a libcall if the target instruction returned NaN.
This was enabled for MIPS and System-Z, but is well guarded and is good for most targets - GCC does this for (that I've checked) X86, ARM and AArch64.
llvm-svn: 213752
insertions.
The old behavior could cause arbitrarily bad memory usage in the DAG
combiner if there was heavy traffic of adding nodes already on the
worklist to it. This commit switches the DAG combine worklist to work
the same way as the instcombine worklist where we null-out removed
entries and only add new entries to the worklist. My measurements of
codegen time shows slight improvement. The memory utilization is
unsurprisingly dominated by other factors (the IR and DAG itself
I suspect).
This change results in subtle, frustrating churn in the particular order
in which DAG combines are applied which causes a number of minor
regressions where we fail to match a pattern previously matched by
accident. AFAICT, all of these should be using AddToWorklist to directly
or should be written in a less brittle way. None of the changes seem
drastically bad, and a few of the changes seem distinctly better.
A major change required to make this work is to significantly harden the
way in which the DAG combiner handle nodes which become dead
(zero-uses). Previously, we relied on the ability to "priority-bump"
them on the combine worklist to achieve recursive deletion of these
nodes and ensure that the frontier of remaining live nodes all were
added to the worklist. Instead, I've introduced a routine to just
implement that precise logic with no indirection. It is a significantly
simpler operation than that of the combiner worklist proper. I suspect
this will also fix some other problems with the combiner.
I think the x86 changes are really minor and uninteresting, but the
avx512 change at least is hiding a "regression" (despite the test case
being just noise, not testing some performance invariant) that might be
looked into. Not sure if any of the others impact specific "important"
code paths, but they didn't look terribly interesting to me, or the
changes were really minor. The consensus in review is to fix any
regressions that show up after the fact here.
Thanks to the other reviewers for checking the output on other
architectures. There is a specific regression on ARM that Tim already
has a fix prepped to commit.
Differential Revision: http://reviews.llvm.org/D4616
llvm-svn: 213727
DAG into a helper function.
This adds a trip through the (very minimal) verification logic in
a bunch of places that were missing it, but shouldn't have any other
impact outside of refactoring. I'm hoping to use this to do more clever
things when DAG nodes are inserted into the graph.
llvm-svn: 213612
a bug in 2010 when they were added but are adding no value today.
In fact, they are utter lies. NodeAllocator is used to allocate almost
all of these node types. I don't know what we were trying to assert
here, and the docs don't give any answer. Until we once again stumble
upon a bug needing help, let's clear the path for improvements.
llvm-svn: 213610
We should update the usages to all of the results;
otherwise, we might get assertion failure or SEGV during
the type legalization of ATOMIC_CMP_SWAP_WITH_SUCCESS
with two or more illegal types.
For example, in the following sequence, both i8 and i1
might be illegal in some target, e.g. armv5, mipsel, mips64el,
%0 = cmpxchg i8* %ptr, i8 %desire, i8 %new monotonic monotonic
%1 = extractvalue { i8, i1 } %0, 1
Since both i8 and i1 should be legalized, the corresponding
ATOMIC_CMP_SWAP_WITH_SUCCESS dag will be checked/replaced/updated
twice.
If we don't update the usage to *ALL* of the results in the
first round, the DAG for extractvalue might be processed earlier.
The GetPromotedInteger() will result in assertion failure,
because its operand (i.e. the success bit of cmpxchg) is not
promoted beforehand.
llvm-svn: 213569
This makes the first stage DAG for @llvm.convert.to.fp16 an fptrunc,
and correspondingly @llvm.convert.from.fp16 an fpext. The legalisation
path is now uniform, regardless of the input IR:
fptrunc -> FP_TO_FP16 (if f16 illegal) -> libcall
fpext -> FP16_TO_FP (if f16 illegal) -> libcall
Each target should be able to select the version that best matches its
operations and not be required to duplicate patterns for both fptrunc
and FP_TO_FP16 (for example).
As a result we can remove some redundant AArch64 patterns.
llvm-svn: 213507
'Worklist' consistently rather than a deeply confusing mixture of
'WorkList' and 'Worklist'.
Notably, the very 'WorkList' of the DAG combiner was exposed to target
specific DAG combines under an interface 'AddToWorklist' which was
implemented by in turn calling 'AddToWorkList' in the combiner. This has
sent me circling with the wrong case in grep one too many times.
I chose to normalize on 'Worklist' because that one won the grep-vote
for llvm/lib/... by a hundered hits or so, and it is used in places
relatively "canonical" such as InstCombine's Worklist. Let's all jsut
pick this casing, whether "correct", "good", or "bad" and be
consistent...
llvm-svn: 213506
stack, filter all handle nodes from the DAG combiner worklist.
This will also handle cases where other handle nodes might be
(erroneously) added to the worklist and then cause bugs and explosions
when deleted. For example, when running the legalizer within the DAG
combiner, there are times when other handle nodes are used and can end
up here.
llvm-svn: 213505
Canonicalize shuffles according to rules:
* shuffle(A, shuffle(A, B)) -> shuffle(shuffle(A,B), A)
* shuffle(B, shuffle(A, B)) -> shuffle(shuffle(A,B), B)
* shuffle(B, shuffle(A, Undef)) -> shuffle(shuffle(A, Undef), B)
This patch helps identifying more shuffle pairs that could be combined reusing
the already existing rules in the DAGCombiner.
Added new test 'combine-vec-shuffle-5.ll' to verify that the canonicalized
shuffles are now folded into a single shuffle node by the DAGCombiner.
Added more test cases to 'combine-vec-shuffle-4.ll'.
llvm-svn: 213504
This patch removes function 'CommuteVectorShuffle' from X86ISelLowering.cpp
and moves its logic into SelectionDAG.cpp as method 'getCommutedVectorShuffles'.
This refactoring is in preperation of an upcoming change to the DAGCombiner.
llvm-svn: 213503
Summary: This patch introduces two new iterator ranges and updates existing code to use it. No functional change intended.
Test Plan: All tests (make check-all) still pass.
Reviewers: dblaikie
Reviewed By: dblaikie
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D4481
llvm-svn: 213474
It's also possible to just write "= nullptr", but there's some question
of whether that's as readable, so I leave it up to authors to pick which
they prefer for now. If we want to discuss standardizing on one or the
other, we can do that at some point in the future.
llvm-svn: 213438
Recommits 212776 which was reverted in r212793. This has been committed
and recommitted a few times as I try to test it harder and find/fix more
issues. The most recent revert was due to an asan bot failure which I
can't seem to reproduce locally, though I believe I'm following all the
steps the buildbot does.
So I'm going to recommit this in the hopes of investigating the failure
on the buildbot itself... apologies in advance for the bot noise. If
anyone sees failures with this /please/ provide me with any
reproductions, etc.
llvm-svn: 213391
Actual support for softening f16 operations is still limited, and can be added
when it's needed. But Soften is much closer to being a useful thing to try
than keeping it Legal when no registers can actually hold such values.
Longer term, we probably want something between Soften and Promote semantics
for most targets, it'll be more efficient to promote the 4 basic operations to
f32 than libcall them.
llvm-svn: 213372
Since the result of a SETCC for AArch64 is 0 or -1 in each lane, we can
move unary operations, in this case [su]int_to_fp through the mask
operation and constant fold the operation away. Generally speaking:
UNARYOP(AND(VECTOR_CMP(x,y), constant))
--> AND(VECTOR_CMP(x,y), constant2)
where constant2 is UNARYOP(constant).
This implements the transform where UNARYOP is [su]int_to_fp.
For example, consider the simple function:
define <4 x float> @foo(<4 x float> %val, <4 x float> %test) nounwind {
%cmp = fcmp oeq <4 x float> %val, %test
%ext = zext <4 x i1> %cmp to <4 x i32>
%result = sitofp <4 x i32> %ext to <4 x float>
ret <4 x float> %result
}
Before this change, the code is generated as:
fcmeq.4s v0, v0, v1
movi.4s v1, #0x1 // Integer splat value.
and.16b v0, v0, v1 // Mask lanes based on the comparison.
scvtf.4s v0, v0 // Convert each lane to f32.
ret
After, the code is improved to:
fcmeq.4s v0, v0, v1
fmov.4s v1, #1.00000000 // f32 splat value.
and.16b v0, v0, v1 // Mask lanes based on the comparison.
ret
The svvtf.4s has been constant folded away and the floating point 1.0f
vector lanes are materialized directly via fmov.4s.
Rather than do the folding manually in the target code, teach getNode()
in the generic SelectionDAG to handle folding constant operands of
vector [su]int_to_fp nodes. It is reasonable (as noted in a FIXME) to do
additional constant folding there as well, but I don't have test cases
for those operations, so leaving them for another time when it becomes
appropriate.
rdar://17693791
llvm-svn: 213341
Previously we asserted on this code. Currently compiler-rt doesn't
actually implement any of these new libcalls, but external help is
pretty much the only viable option for LLVM.
I've followed the much more generic "__truncST2" naming, as opposed to
the odd name for f32 -> f16 truncation. This can obviously be changed
later, or overridden by any targets that need to.
llvm-svn: 213252
This makes the two intrinsics @llvm.convert.from.f16 and
@llvm.convert.to.f16 accept types other than simple "float". This is
only strictly needed for the truncate operation, since otherwise
double rounding occurs and there's no way to represent the strict IEEE
conversion. However, for symmetry we allow larger types in the extend
too.
During legalization, we can expand an "fp16_to_double" operation into
two extends for convenience, but abort when the truncate isn't legal. A new
libcall is probably needed here.
Even after this commit, various target tweaks are needed to actually use the
extended intrinsics. I've put these into separate commits for clarity, so there
are no actual tests of f64 conversion here.
llvm-svn: 213248
This fixes an issue where a local value is defined before and used after an
inline asm call with side effects.
This fix simply flushes the local value map, which updates the insertion point
for the inline asm call to be above any previously defined local values.
This fixes <rdar://problem/17694203>
llvm-svn: 213203
It turns out that in most cases (the main exception being i1-related
types) once these operations are formed we cannot separate them and
the targets end up having to deal with them whether they want to or
not.
This is not a good situation, and a more reasonable default can be
formed by ackowledging this and having targets leave them as Legal.
Only x86 seems to be affected (other targets don't even try marking
the operation Expand).
Mostly there's no visible change here yet, but it will be useful to
have truly expanded EXTLOADS for MVT::f16 softening support.
llvm-svn: 213162
There is no need to pass on TLI separately to the function. As Eric pointed out
the Target Machine already provides everything we need.
llvm-svn: 213108
Refactoring; no functional changes intended
Removed PostRAScheduler bits from subtargets (X86, ARM).
Added PostRAScheduler bit to MCSchedModel class.
This bit is set by a CPU's scheduling model (if it exists).
Removed enablePostRAScheduler() function from TargetSubtargetInfo and subclasses.
Fixed the existing enablePostMachineScheduler() method to use the MCSchedModel (was just returning false!).
Added methods to TargetSubtargetInfo to allow overrides for AntiDepBreakMode, CriticalPathRCs, and OptLevel for PostRAScheduling.
Added enablePostRAScheduler() function to PostRAScheduler class which queries the subtarget for the above values.
Preserved existing scheduler behavior for ARM, MIPS, PPC, and X86:
a. ARM overrides the CPU's postRA settings by enabling postRA for any non-Thumb or Thumb2 subtarget.
b. MIPS overrides the CPU's postRA settings by enabling postRA for everything.
c. PPC overrides the CPU's postRA settings by enabling postRA for everything.
d. X86 is the only target that actually has postRA specified via sched model info.
Differential Revision: http://reviews.llvm.org/D4217
llvm-svn: 213101
The coalescer is very aggressive at propagating constraints on the register classes, and the register allocator doesn’t know how to split sub-registers later to recover. This patch provides an escape valve for targets that encounter this problem to limit coalescing.
This patch also implements such for ARM to lower register pressure when using lots of large register classes. This works around PR18825.
llvm-svn: 213078
This patch adds two new rules to the DAGCombiner:
1. shuffle (shuffle A, Undef, M0), B, M1 -> shuffle A, B, M2
2. shuffle (shuffle A, Undef, M0), A, M1 -> shuffle A, Undef, M2
We only do this if the combined shuffle is legal for the target.
Example:
;;
define <4 x float> @test(<4 x float> %a, <4 x float> %b) {
%1 = shufflevector <4 x float> %a, <4 x float> undef, <4 x i32><i32 6, i32 0, i32 1, i32 7>
%2 = shufflevector <4 x float> %1, <4 x float> %b, <4 x i32><i32 1, i32 2, i32 4, i32 5>
ret <4 x i32> %2
}
;;
(using llc -mcpu=corei7 -march=x86-64)
Before, the x86 backend generated:
pshufd $120, %xmm0, %xmm0
shufps $-108, %xmm0, %xmm1
movaps %xmm1, %xmm0
Now the x86 backend generates:
movsd %xmm1, %xmm0
llvm-svn: 213069
The patchpoint instruction should have been inserted before the target
generated call instruction to be inside the ADJSTACKDOWN/ADJSTACKUP call
sequence window.
llvm-svn: 213034
Always update the value map with the result register (if there is one), for the
patchpoint instruction we created to replace the target-specific call
instruction.
llvm-svn: 213033
This patch fixes a crasher in method 'DAGCombiner::visitOR' due to an invalid
call to method 'isShuffleMaskLegal'. On x86, method 'isShuffleMaskLegal'
always expects a legal vector value type in input.
With this patch, we immediately check if the input OR dag node has a legal
vector type; we only try to fold a OR dag node into a single shufflevector
if we know that the resulting shuffle will have a legal type.
This is to avoid calling method 'isShuffleMaskLegal' on a potentially
illegal vector value type.
Added a new test-case to file 'CodeGen/X86/combine-or.ll' to verify that
DAGCombiner doesn't crash in the attempt to check/combine an OR between shuffles
with illegal types.
llvm-svn: 213020
COFF lacks a feature that other object file formats support: mergeable
sections.
To work around this, MSVC sticks constant pool entries in special COMDAT
sections so that each constant is in it's own section. This permits
unused constants to be dropped and it also allows duplicate constants in
different translation units to get merged together.
This fixes PR20262.
Differential Revision: http://reviews.llvm.org/D4482
llvm-svn: 213006
This patch teaches the DAGCombiner how to fold a pair of shuffles
according to rules:
1. shuffle(shuffle A, B, M0), B, M1) -> shuffle(A, B, M2)
2. shuffle(shuffle A, B, M0), A, M1) -> shuffle(A, B, M3)
The new rules would only trigger if the resulting shuffle has legal type and
legal mask.
Added test 'combine-vec-shuffle-3.ll' to verify that DAGCombiner correctly
folds shuffles on x86 when the resulting mask is legal. Also added some negative
cases to verify that we avoid introducing illegal shuffles.
llvm-svn: 213001
Found during windows unwinding work. This header is indirectly included through
a chain leading through Support/Win64EH.h. Explicitly include the header. NFC.
llvm-svn: 212955
This crash was pretty common while compiling Rust for iOS (armv7). Reason -
SjLj preparation step was lowering aggregate arguments as ExtractValue +
InsertValue. ExtractValue has assertion which checks that there is some data in
value, which is not true in case of empty (no fields) structures. Rust uses
them quite extensively so this patch uses a 'select true, %val, undef'
instruction to lower the argument.
Patch by Valerii Hiora.
llvm-svn: 212922
Verify that DAGCombiner does not crash when trying to fold a pair of shuffles
according to rule (added at r212539):
(shuffle (shuffle A, Undef, M0), Undef, M1) -> (shuffle A, Undef, M2)
The DAGCombiner avoids folding shuffles if the resulting shuffle dag node
is not legal for the target. That means, the resulting shuffle must have
legal type and legal mask.
Before, the DAGCombiner only called method
'TargetLowering::isShuffleMaskLegal' to check if it was "safe" to fold according
to the above-mentioned rule. However, this caused a crash in the x86 backend
since method 'isShuffleMaskLegal' always expects to be called on a
legal vector type.
llvm-svn: 212915
This implements the target-independent lowering for the patchpoint
intrinsic. Targets have to implement the FastLowerCall
hook to support this intrinsic.
Related to <rdar://problem/17427052>
llvm-svn: 212849
The infrastructure mimics the call lowering we have already in place for
SelectionDAG, but with limitations. For example structure return demotion and
non-simple types are not supported (yet).
Currently every backend has its own implementation and duplicated code for call
lowering. There is also no specified interface that could be called from
target-independent code. The target-hook is opt-in and doesn't affect current
implementations.
llvm-svn: 212848
Create a separate helper function for target-independent intrinsic lowering. Also
add an target-hook that allows to directly call into a target-sepcific intrinsic
lowering method. Currently the implementation is opt-in and doesn't affect
existing target implementations.
llvm-svn: 212843
This reverts commit r212776.
Nope, still seems to be failing on the sanitizer bots... but hey, not
the msan self-host anymore, it's failing in asan now. I'll start looking
there next.
llvm-svn: 212793
Committed in r212205 and reverted in r212226 due to msan self-hosting
failure, I believe I've got that fixed by r212761 to Clang.
Original commit message:
"Originally committed in r211723, reverted in r211724 due to failure
cases found and fixed (ArgumentPromotion: r211872, Inlining: r212065),
committed again in r212085 and reverted again in r212089 after fixing
some other cases, such as debug info subprogram lists not keeping track
of the function they represent (r212128) and then short-circuiting
things like LiveDebugVariables that build LexicalScopes for functions
that might not have full debug info.
And again, I believe the invariant actually holds for some reasonable
amount of code (but I'll keep an eye on the buildbots and see what
happens... ).
Original commit message:
PR20038: DebugInfo: Inlined call sites where the caller has debug info
but the call itself has no debug location.
This situation does bad things when inlined, so I've fixed Clang not to
produce inlinable call sites without locations when the caller has debug
info (in the one case where I could find that this occurred). This
updates the PR20038 test case to be what clang now produces, and readds
the assertion that had to be removed due to this bug.
I've also beefed up the debug info verifier to help diagnose these
issues in the future, and I hope to add checks to the inliner to just
assert-fail if it encounters this situation. If, in the future, we
decide we have to cope with this situation, the right thing to do is
probably to just remove all the DebugLocs from the inlined
instructions."
llvm-svn: 212776
Move the code to a helper function to allow calls from TypeLegalizer.
No functionality change intended
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Tom Stellard <tom@stellard.net>
Reviewed-by: Owen Anderson <resistor@mac.com>
llvm-svn: 212772