Summary:
At least for CoreCLR, a catchpad which immediately executes an
`unreachable` instruction indicates that the exception can never have a
matching type, and so such catchpads can be removed, and so can their
catchswitches if the catchswitch becomes empty.
Reviewers: rnk, andrew.w.kaylor, majnemer
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D15846
llvm-svn: 256809
The red zone consists of 128 bytes beyond the stack pointer so that the
allocation of objects in leaf functions doesn't require decrementing
rsp. In r255656, we introduced an optimization that would cheaply
materialize certain constants via push/pop. Push decrements the stack
pointer and stores it's result at what is now the top of the stack.
However, this means that using push/pop would encroach on the red zone.
PR26023 gives an example where this corrupts an object in the red zone.
llvm-svn: 256808
Unfortunately this fix had the effect of exposing the
-verify-machineinstrs FIXME of X86InstrInfo.cpp in two testcases for
which I disabled it for now.
Two testcases also have additional pushq/popq where the corrected code
cannot prove that %rax is dead any longer. Looking at the examples, this
could potentially be fixed by improving computeRegisterLiveness() to check
the live-in lists of the successors blocks when reaching the end of a
block.
This fixes http://llvm.org/PR25951.
llvm-svn: 256799
Summary:
Enabling this feature will account for the two SGPRs used by the hardware
to store the XNACK_MASK physically.
The hardware only requires this reservation when the XNACK feature is
explicitly enabled. At some point, HSA will probably want to do that, but
it does increase SGPR register pressure, so leave it disabled by default
for now (but do add a small test).
Reviewers: arsenm, tstellarAMD
Subscribers: arsenm, llvm-commits
Differential Revision: http://reviews.llvm.org/D15869
llvm-svn: 256794
Summary: This patch fixes a bug in prepareICWorklistFromFunction, where the loop becomes infinite with instructions of token type. The patch checks if the instruction is token type, and if so it updates EndInst with the current instruction.
Reviewers: reames, majnemer
Subscribers: llvm-commits, sanjoy
Differential Revision: http://reviews.llvm.org/D15859
llvm-svn: 256792
r256763 had promoteLoopAccessesToScalars check for the existence of a
catchswitch when the exit blocks were populated but
promoteLoopAccessesToScalars may be called with a prepopulated set of
exit blocks which would also need to be checked.
This fixes PR26019.
llvm-svn: 256788
This patch removes the isOperatorNewLike predicate since it was only being used to establish a non-null return value and we have attributes specifically for that purpose with generic handling. To keep approximate the same behaviour for existing frontends, I added the various operator new like (i.e. instances of operator new) to InferFunctionAttrs. It's not really clear to me why this isn't handled in Clang, but I didn't want to break existing code and any subtle assumptions it might have.
Once this patch is in, I'm going to start separating the isAllocLike family of predicates. These appear to be being used for a mixture of things which should be more clearly separated and documented. Today, they're being used to indicate (at least) aliasing facts, CSE-ability, and default values from an allocation site.
Differential Revision: http://reviews.llvm.org/D15820
llvm-svn: 256787
Before reevaluating instructions, iterate over all instructions
to be reevaluated and remove trivially dead instructions and if
any of it's operands become trivially dead, mark it for deletion
until all trivially dead instructions have been removed
llvm-svn: 256773
Summary:
Fix the CLR state numbering to generate correct tables, and update the lit
test to verify them.
The CLR numbering assigns one state number to each catchpad and
cleanuppad.
It also computes two tree-like relations over states:
1) Each state has a "HandlerParentState", which is the state of the next
outer handler enclosing this state's handler (same as nearest ancestor
per the ParentPad linkage on EH pads, but skipping over catchswitches).
2) Each state has a "TryParentState", which:
a) for a catchpad that's not the last handler on its catchswitch, is
the state of the next catchpad on that catchswitch.
b) for all other pads, is the state of the pad whose try region is the
next outer try region enclosing this state's try region. The "try
regions are not present as such in the IR, but will be inferred
based on the placement of invokes and pads which reach each other
by exceptional exits.
Catchswitches do not get their own states, but each gets mapped to the
state of its first catchpad.
Table generation requires each state's "unwind dest" state to have a lower
state number than the given state.
Since HandlerParentState can be computed as a function of a pad's
ParentPad, and TryParentState can be computed as a function of its unwind
dest and the TryParentStates of its children, the CLR state numbering
algorithm first computes HandlerParentState in a top-down pass, then
computes TryParentState in a bottom-up pass.
Also reword some comments/names in the CLR EH table generation to make the
distinction between the different kinds of "parent" clear.
Reviewers: rnk, andrew.w.kaylor, majnemer
Subscribers: AndyAyers, llvm-commits
Differential Revision: http://reviews.llvm.org/D15325
llvm-svn: 256760
We had two bugs here:
- We might try to sink into a catchswitch, causing verifier failures.
- We will succeed in sinking into a cleanuppad but we didn't update the
funclet operand bundle.
This fixes PR26000.
llvm-svn: 256728
Summary:
There are a number of files in the tree which have been accidentally checked in with DOS line endings. Convert these to native line endings.
There are also a few files which have DOS line endings on purpose, and I have set the svn:eol-style property to 'CRLF' on those.
Reviewers: joerg, aaron.ballman
Subscribers: aaron.ballman, sanjoy, dsanders, llvm-commits
Differential Revision: http://reviews.llvm.org/D15848
llvm-svn: 256707
Summary:
The handler list must be nonempty and consist solely of CatchPads.
Reviewers: rnk, andrew.w.kaylor, majnemer
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D15842
llvm-svn: 256691
Summary: A catchswitch cannot be a parent of a cleanuppad or another catchswitch.
Reviewers: rnk, andrew.w.kaylor, majnemer
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D15841
llvm-svn: 256690
Summary:
Add a pass to update catchrets when their successors get cloned; the
existing pass doesn't catch these because it walks the funclet whose
blocks are being cloned but the catchret is in a child funclet.
Also update the test for removing incoming PHI values; when the
predecessor is a catchret, the relevant color is the catchret's parentPad,
not its block's color.
Reviewers: andrew.w.kaylor, rnk, majnemer
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D15840
llvm-svn: 256689
LLVM's targets need to know if stack pointer adjustments occur after the
prologue. This is needed to correctly determine if the red-zone is
appropriate to use or if a frame pointer is required.
Normally, LLVM can figure this out very precisely by reasoning about the
contents of the MachineFunction. There is an interesting corner case:
inline assembly.
The vast majority of inline assembly which will perform a push or pop is
done so to pair up with pushf or popf as appropriate. Unfortunately,
this inline assembly doesn't mark the stack pointer as clobbered
because, well, it isn't. The stack pointer is decremented and then
immediately incremented. Because of this, LLVM was changed in r256456
to conservatively assume that inline assembly contain a sequence of
stack operations. This is unfortunate because the vast majority of
inline assembly will not end up manipulating the stack pointer in any
way at all.
Instead, let's provide a more principled solution: an intrinsic.
FWIW, other compilers (MSVC and GCC among them) also provide this
functionality as an intrinsic.
llvm-svn: 256685
Not folding these cases tends to avoid partial register updates:
sqrtss (%eax), %xmm0
Has a partial update of %xmm0, while
movss (%eax), %xmm0
sqrtss %xmm0, %xmm0
Has a clobber of the high lanes immediately before the partial update,
avoiding a potential stall.
Given this, we only want to fold when optimizing for size.
This is consistent with the patterns we already have for some of
the fp/int converts, and in X86InstrInfo::foldMemoryOperandImpl()
Differential Revision: http://reviews.llvm.org/D15741
llvm-svn: 256671
The code that was meant to adjust the duplication cost based on the
terminator opcode was not being executed in cases where the initial
threshold was hit inside the loop.
Subscribers: mcrosier, llvm-commits
Differential Revision: http://reviews.llvm.org/D15536
llvm-svn: 256568
Summary:
* avoid generating POP {LR} in Thumb1 epilogues
* combine MOV LR, Rx + BX LR -> BX Rx in a peephole optimization pass
* combine POP {LR} + B + BX LR -> POP {PC} on v5T+
Test cases by Ana Pazos
Differential Revision: http://reviews.llvm.org/D15707
llvm-svn: 256523
Summary:
Previously, only the outer (last) bitcast was rematerialized, resulting in a
use of the unrelocated inner (first) bitcast after the statepoint. See the
test case for an example.
Reviewers: igor-laevsky, reames
Subscribers: reames, alex, llvm-commits, sanjoy
Differential Revision: http://reviews.llvm.org/D15789
llvm-svn: 256520
The cost is calculated for all X86 targets. When gather/scatter instruction
is not supported we calculate the cost of scalar sequence.
Differential revision: http://reviews.llvm.org/D15677
llvm-svn: 256519
The check lines were added with:
http://reviews.llvm.org/rL256458http://reviews.llvm.org/rL256460
but on a darwin target, the output looks like:
## InlineAsm Start
rorq %rdi
## InlineAsm End
## InlineAsm Start
rorq %rsi
## InlineAsm End
leaq (%rsi,%rdi), %rax
retq
llvm-svn: 256507
This adds support for the MCU psABI in a way different from r251223 and r251224,
basically reverting most of these two patches. The problem with the approach
taken in r251223/4 is that it only handled libcalls that originated from the backend.
However, the mid-end also inserts quite a few libcalls and assumes these use the
platform's default calling convention.
The previous patch tried to insert inregs when necessary both in the FE and,
somewhat hackily, in the CG. Instead, we now define a new default calling convention
for the MCU, which doesn't use inreg marking at all, similarly to what x86-64 does.
Differential Revision: http://reviews.llvm.org/D15054
llvm-svn: 256494
lower broadcast<type>x<vector> to shuffles.
there are two cases:
1.src is 128 bits and dest is 512 bits: in this case we will lower it to shuffle with imm = 0.
2.src is 256 bit and dest is 512 bits: in this case we will lower it to shuffle with imm = 01000100b (0x44) that way we will broadcast the 256bit source: ymm[0,1,2,3] => zmm[0,1,2,3,0,1,2,3] then it will mask it with the passthru value (in case it's mask op).
Differential Revision: http://reviews.llvm.org/D15790
llvm-svn: 256490
I believe this also fixes a case where a 64-bit memory form that is documented as being unsupported in 32-bit mode was able to be selected there.
llvm-svn: 256483
Fix TRUNCATE lowering vector to vector i1, use LSB and not MSB.
Implement VPMOVB/W/D/Q2M intrinsic.
Differential Revision: http://reviews.llvm.org/D15675
llvm-svn: 256470
a standalone pass.
There is no call graph or even interesting analysis for this part of
function attributes -- it is literally inferring attributes based on the
target library identification. As such, we can do it using a much
simpler module pass that just walks the declarations. This can also
happen much earlier in the pass pipeline which has benefits for any
number of other passes.
In the process, I've cleaned up one particular aspect of the logic which
was necessary in order to separate the two passes cleanly. It now counts
inferred attributes independently rather than just counting all the
inferred attributes as one, and the counts are more clearly explained.
The two test cases we had for this code path are both ... woefully
inadequate and copies of each other. I've kept the superset test and
updated it. We need more testing here, but I had to pick somewhere to
stop fixing everything broken I saw here.
Differential Revision: http://reviews.llvm.org/D15676
llvm-svn: 256466
is (by default) run much earlier than FuncitonAttrs proper.
This allows forcing optnone or other widely impactful attributes. It is
also a bit simpler as the force attribute behavior needs no specific
iteration order.
I've added the pass into the default module pass pipeline and LTO pass
pipeline which mirrors where function attrs itself was being run.
Differential Revision: http://reviews.llvm.org/D15668
llvm-svn: 256465
A frame pointer must be used if stack pointer is modified after the
prologue. LLVM will emit pushf/popf if we need to save/restore the
FLAGS register, requiring us to have a frame pointer for the function.
There is a small twist: this sequence might exist in user code via
inline-assembly. For now, conservatively assume that such functions
require a frame pointer. For real world justification, please see
clang's implementation of __readeflags.
This fixes PR25945.
llvm-svn: 256456
Summary: This patch changes gc.statepoint intrinsic's return type to token type instead of i32 type. Using token types could prevent LLVM to merge different gc.statepoint nodes into PHI nodes and cause further problems with gc relocations. The patch also changes the way on how gc.relocate and gc.result look for their corresponding gc.statepoint on unwind path. The current implementation uses the selector value extracted from a { i8*, i32 } landingpad as a hook to find the gc.statepoint, while the patch directly uses a token type landingpad (http://reviews.llvm.org/D15405) to find the gc.statepoint.
Reviewers: sanjoy, JosephTremoulet, pgavlin, igor-laevsky, mjacob
Subscribers: reames, mjacob, sanjoy, llvm-commits
Differential Revision: http://reviews.llvm.org/D15662
llvm-svn: 256443
Move RegStackify after coalescing and teach it to use LiveIntervals instead
of depending on SSA form. This avoids a problem where a register in a COPY
instruction is stackified and then subsequently coalesced with a register
that is not stackified.
This also puts it after the scheduler, which allows us to simplify the
EXPR_STACK constraint, as we no longer have instructions being reordered
after stackification and before coloring.
llvm-svn: 256402
This is an extension of the shuffle combining from r203229:
http://reviews.llvm.org/rL203229
The idea is to widen a short input vector with undef elements so the
existing shuffle transform for extract/insert can kick in.
The motivation is to finally solve PR2109:
https://llvm.org/bugs/show_bug.cgi?id=2109
For that example, the IR becomes:
%1 = bitcast <2 x i32>* %P to <2 x float>*
%ld1 = load <2 x float>, <2 x float>* %1, align 8
%2 = shufflevector <2 x float> %ld1, <2 x float> undef, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
%i2 = shufflevector <4 x float> %A, <4 x float> %2, <4 x i32> <i32 0, i32 1, i32 4, i32 5>
ret <4 x float> %i2
And x86 SSE output improves from:
movq (%rdi), %xmm1 ## xmm1 = mem[0],zero
movdqa %xmm1, %xmm2
shufps $229, %xmm2, %xmm2 ## xmm2 = xmm2[1,1,2,3]
shufps $48, %xmm0, %xmm1 ## xmm1 = xmm1[0,0],xmm0[3,0]
shufps $132, %xmm1, %xmm0 ## xmm0 = xmm0[0,1],xmm1[0,2]
shufps $32, %xmm0, %xmm2 ## xmm2 = xmm2[0,0],xmm0[2,0]
shufps $36, %xmm2, %xmm0 ## xmm0 = xmm0[0,1],xmm2[2,0]
retq
To the almost optimal:
movhpd (%rdi), %xmm0
Note: There's a tension in the existing transform related to generating
arbitrary shufflevector masks. We avoid that in other places in InstCombine
because we're scared that codegen can't handle strange masks, but it looks
like we're ok with producing those here. I purposely chose weird insert/extract
indexes for the regression tests to see the effect in these cases.
For PowerPC+Altivec, AArch64, and X86+SSE/AVX, I think the codegen is equal or
better for these examples.
Differential Revision: http://reviews.llvm.org/D15096
llvm-svn: 256394
The patterns that set a mask register to 0/1
KXOR %kn, %kn, %kn / KXNOR %kn, %kn, %kn
are replaced with
KXOR %k0, %k0, %kn / KXNOR %k0, %k0, %kn - AVX-512 targets optimization.
KNL does not recognize dependency-breaking idioms for mask registers,
so kxnor %k1, %k1, %k2 has a RAW dependence on %k1.
Using %k0 as the undef input register is a performance heuristic based
on the assumption that %k0 is used less frequently than the other mask
registers, since it is not usable as a write mask.
Differential Revision: http://reviews.llvm.org/D15739
llvm-svn: 256365
Summary: Linker testing was sad at seeing an unresolved external symbol. For now don't do that: it's valid but we're not playing with multi-file linking yet, and the LLVM tests are used as hacky sanity tests for single-file linking (the GCC torture tests are much better for this purpose). Another solution would be to use '.extern' to make the intent explicit (don't simple-file link this, there's an unresolved symbol), some assemblers use '.extern' while others ignore it, so we wouldn't really be inventing anything new.
Reviewers: sunfish, kripken
Subscribers: jfb, llvm-commits, dschuff
Differential Revision: http://reviews.llvm.org/D15753
llvm-svn: 256353
Teach the statepoint lowering code to emit Indirect stackmap entries for spill inserted by StatepointLowering (i.e. SelectionDAG), but Direct stackmap entries for in-IR allocas which represent manual stack slots. This is what the docs call for (http://llvm.org/docs/StackMaps.html#stack-map-format), but we've been emitting both as Direct. This was pointed out recently on the mailing list as a bug. It also blocks http://reviews.llvm.org/D15632 which extends the lowering to handle vector-of-pointers since only Indirect references can encode a variable sized slot.
To implement this, I introduced a new flag on the StackObject class used to maintian information about stack slots. I original considered (and prototyped in http://reviews.llvm.org/D15632), the idea of using the existing isSpillSlot flag, but end up deciding that was a bit too risky and that the cost of adding a new flag was low. Having the new flag will also allow us - in the future - to emit better comments in verbose assembly which indicate where a particular stack spill around a call comes from. (deopt, gc, regalloc).
Differential Revision: http://reviews.llvm.org/D15759
llvm-svn: 256352
This replicates the logic of Darwin dwarfdump for manually opening up
.dSYM bundles without introducing any new dependencies.
<rdar://problem/20491670>
llvm-svn: 256350
First step towards making better use of AVX's implicit zeroing of the upper half of a 256-bit vector by instructions that only act on the lower 128-bit vector - discussed on D14151.
As well as the fact that 128-bit shuffle instructions are generally more capable, this can be performant for older CPUs with 128-bit ALUs (e.g. Jaguar, Sandy Bridge) that must treat 256-bit vectors as multiple micro-ops.
Moved the similar subvector extraction shuffle combines from PerformShuffleCombine256 to lowerVectorShuffle as well.
Note: I've avoided combining shuffles that reference elements from the upper halves of the input vectors - this may be reviewed in future work as well (AVX1 would probably always gain, but AVX2 does have some cross-lane shuffle instructions).
Differential Revision: http://reviews.llvm.org/D15477
llvm-svn: 256332
We visited the same catchswitch twice because it was both the child of
another funclet and the predecessor of a cleanuppad.
Instead, change the numbering algorithm to only recurse if the unwind
destination of the inner funclet agrees with the unwind destination of
the catchswitch.
This fixes PR25926.
llvm-svn: 256317
MCDwarf emits a canned abbreviation table, but was not emitting proper
forms for DWARF version 4, which is the default after r249655.
Differential Revision: http://reviews.llvm.org/D15732
llvm-svn: 256313
Previously, "%" + name of the value was printed for each derived and base
pointer. This is correct for instructions, but wrong for e.g. globals.
llvm-svn: 256305
Summary:
For some reason doing executing an MUBUF instruction with the addr64
bit set and a zero base pointer in the resource descriptor causes
the memory operation to be dropped when the shader is executed using
the HSA runtime.
This kind of MUBUF instruction is commonly used when the pointer is
stored in VGPRs. The base pointer field in the resource descriptor
is set to zero and and the pointer is stored in the vaddr field.
This patch resolves the issue by only using flat instructions for
global memory operations when targeting HSA. This is an overly
conservative fix as all other configurations of MUBUF instructions
appear to work.
NOTE: re-commit by fixing a failure in Codegen/AMDGPU/llvm.dbg.value.ll
Reviewers: tstellarAMD
Subscribers: arsenm, llvm-commits
Differential Revision: http://reviews.llvm.org/D15543
llvm-svn: 256282
Summary:
For some reason doing executing an MUBUF instruction with the addr64
bit set and a zero base pointer in the resource descriptor causes
the memory operation to be dropped when the shader is executed using
the HSA runtime.
This kind of MUBUF instruction is commonly used when the pointer is
stored in VGPRs. The base pointer field in the resource descriptor
is set to zero and and the pointer is stored in the vaddr field.
This patch resolves the issue by only using flat instructions for
global memory operations when targeting HSA. This is an overly
conservative fix as all other configurations of MUBUF instructions
appear to work.
Reviewers: tstellarAMD
Subscribers: arsenm, llvm-commits
Differential Revision: http://reviews.llvm.org/D15543
llvm-svn: 256273
This patch removes all weight-related interfaces from BPI and replace
them by probability versions. With this patch, we won't use edge weight
anymore in either IR or MC passes. Edge probabilitiy is a better
representation in terms of CFG update and validation.
Differential revision: http://reviews.llvm.org/D15519
llvm-svn: 256263
Summary:
These were deprecated 11 months ago when a generic
llvm.experimental.gc.result intrinsic, which works for all types, was added.
Reviewers: sanjoy, reames
Subscribers: sanjoy, chenli, llvm-commits
Differential Revision: http://reviews.llvm.org/D15719
llvm-svn: 256262
Summary:
Previously, RS4GC crashed in CreateGCRelocates() because it assumed
that every base is also in the array of live variables, which isn't true if a
live variable has a constant base.
This change fixes the crash by making sure CreateGCRelocates() won't try to
relocate a live variable with a constant base. This would be unnecessary
anyway because anything with a constant base won't move.
Reviewers: reames
Subscribers: llvm-commits, sanjoy
Differential Revision: http://reviews.llvm.org/D15556
llvm-svn: 256252
This is a recommit of r256004 which was reverted in r256160. The issue was the
incorrect promotion for half and byte loads transformed into mov instructions.
This fix will replace half and byte type loads only with bit field extracts.
Original commit message:
This change promotes load instructions which directly read from stored by
replacing them with mov instructions. If the store is wider than the load,
the load will be replaced with a bitfield extract.
For example :
STRWui %W1, %X0, 1
%W0 = LDRHHui %X0, 3
becomes
STRWui %W1, %X0, 1
%W0 = UBFMWri %W1, 16, 31
llvm-svn: 256249
In r256077, I added printing for DIExpressions in DEBUG_VALUE comments,
but neglected to handle DW_OP_bit_piece operands. Thanks to
Mikael Holmen and Joerg Sonnenberger for spotting this.
llvm-svn: 256236
InitMCObjectFileInfo was trying to override the triple in awkward ways.
For example, a triple specifying COFF but not Windows was forced as ELF.
This makes it easy for internal invariants to get violated, such as
those which triggered PR25912.
This fixes PR25912.
llvm-svn: 256226
This uses the same criteria used in CFE's CodeGenPGO to identify hot and cold
callees and uses values of inlinehint-threshold and inlinecold-threshold
respectively as the thresholds for such callees.
Differential Revision: http://reviews.llvm.org/D15245
llvm-svn: 256222
This patch adds an option, -safe-stack-no-tls, for using normal
storage instead of thread-local storage for the unsafe stack pointer.
This can be useful when SafeStack is applied to an operating system
kernel.
http://reviews.llvm.org/D15673
Patch by Michael LeMay.
llvm-svn: 256221
The linker requires that a comdat section must be associated
with a another comdat section that precedes it. This
means the comdat section's name needs to use the profile name
var's name.
Patch tested by Johan Engelen.
llvm-svn: 256220
Today, we always take into account the possibility that object files
produced by MC may be consumed by an incremental linker. This results
in us initialing fields which vary with time (TimeDateStamp) which harms
hermetic builds (e.g. verifying a self-host went well) and produces
sub-optimal code because we cannot assume anything about the relative
position of functions within a section (call sites can get redirected
through incremental linker thunks).
Let's provide an MCTargetOption which controls this behavior so that we
can disable this functionality if we know a-priori that the build will
not rely on /incremental.
llvm-svn: 256203
This is recommit of r256028 with minor fixes in unittests:
CodeGen/Mips/eh.ll
CodeGen/Mips/insn-zero-size-bb.ll
Original commit message:
When identifying blocks post-dominated by an unreachable-terminated block
in BranchProbabilityInfo, consider only the edge to the normal destination
block if the terminator is InvokeInst and let calcInvokeHeuristics() decide
edge weights for the InvokeInst.
llvm-svn: 256202
This patch transforms truncation between vectors of integers into
X86ISD::PACKUS/PACKSS operations during DAG combine. We don't do it in
lowering phase because after type legalization, the original truncation
will be turned into a BUILD_VECTOR with each element that is extracted
from a vector and then truncated, and from them it is difficult to do
this optimization. This greatly improves the performance of truncations
on some specific types.
Cost table is updated accordingly.
Differential revision: http://reviews.llvm.org/D14588
llvm-svn: 256194
When targeting COFF, it is required that a comdat section to
have a global obj with the same name as the comdat (except for
comdats with select kind to be associative). This fix makes
sure that the comdat is keyed on the data variable for COFF.
Also improved test coverage for this.
llvm-svn: 256193
LiveDebugVariables unconditionally propagates all DBG_VALUE down the
dominator tree, which happens to work fine if there already is another
DBG_VALUE or the DBG_VALUE happends to describe a single-assignment vreg
but is otherwise wrong if the DBG_VALUE is coming from only one of the
predecessors.
In r255759 we introduced a proper data flow analysis scheduled after
LiveDebugVariables that correctly propagates DBG_VALUEs across basic block
boundaries. With the new pass in place, the incorrect propagation in
LiveDebugVariables can be retired witout loosing any of the benefits
where LiveDebugVariables happened to do the right thing.
llvm-svn: 256188
Summary:
These register has different encodings on CI and VI, so we add pseudo
FLAT_SCRACTH registers to be used before MC, and subtarget specific
registers to be used by the MC layer.
Reviewers: arsenm
Subscribers: arsenm, llvm-commits
Differential Revision: http://reviews.llvm.org/D15661
llvm-svn: 256178
This patch adds to the target description two additional patterns for matching
extract-extend operations to SMOV. The patterns catch the v16i8-to-i64 and
v8i16-to-i64 cases. The existing patterns miss these cases because the
extracted elements must first be legalized to i32, resulting in any_extend
nodes.
This was originally implemented as a DAG combine (r255895), but was reverted
due to failing out-of-tree tests.
llvm-svn: 256176
Disable post-ra scheduler for perturbed tests to appease the bots and to
preserve the history of the tests.
http://reviews.llvm.org/D15652
llvm-svn: 256158
Support for COFF timestamps was unintentionally broken in r246905 when
it was conditionally available depending on whether or not LLVM was
configured with LLVM_ENABLE_TIMESTAMPS. However, Config/config.h was
never included which essentially broke the feature. Due to lax testing,
the breakage was never identified until we observed strange failures
during incremental links of Chromium.
This issue is resolved by simply including Config/config.h in
WinCOFFObjectWriter and teaching lit that the MC/COFF/timestamp.s test
is conditionally supported depending on LLVM_ENABLE_TIMESTAMPS. With
this in place, we can strengthen the test to ensure that it will not
accidentally get broken in the future.
This fixes PR25891.
llvm-svn: 256137
This allows "icmp ugt %a, 4294967295" and "icmp uge %a, 4294967296" to be optimized into right shifts by 32 which can fold the immediate into the shift instruction. These patterns show up with some regularity in real code.
Unfortunately, since getImmCost can't see the icmp predicate we can't be tell if we're only catching these specific cases.
llvm-svn: 256126
Summary:
r250697 fixed the mapping for ARM mode. We have to do the same for Thumb2 otherwise the same llvm.arm.ssat() will generate different saturating amount for ARM and Thumb.
r250697: http://reviews.llvm.org/rL250697
Reviewers: rmaprath
Subscribers: aemerson, llvm-commits, rengolin
Differential Revision: http://reviews.llvm.org/D15653
llvm-svn: 256115
`CloneAndPruneIntoFromInst` sometimes RAUW's dead instructions with
`undef` before erasing them (to avoid deleting instructions that still
have uses). This changes the `WeakVH` in `OperandBundleCallSites` to
hold an `undef`, and we need to guard for this situation in eventuality
in `llvm::InlineFunction`.
llvm-svn: 256110
The test will mainly be useful to check that the .s file assembles and relocates properly because vtables reference functions in their data section.
llvm-svn: 256102
llc_dwarf adds an mtriple, which forces this to use COFF, causing
the test to fail. Hopefully using regular llc without the triple
will work fine everywhere
llvm-svn: 256084
Summary:
The analysis of shader inputs was completely wrong. We were passing the
wrong index to AttributeSet::hasAttribute() and the logic for which
inputs where in SGPRs was wrong too.
Reviewers: arsenm
Subscribers: arsenm, llvm-commits
Differential Revision: http://reviews.llvm.org/D15608
llvm-svn: 256082
As shown by the included test case, it's reasonable to end up with constant references during base pointer calculation. The code actually handled this case just fine, we only had the assert to help isolate problems under the belief that constant references shouldn't be present in IR generated by managed frontends. This turned out to be wrong on two fronts: 1) Manual Jacobs is working on a language with constant references, and b) we found a case where the optimizer does create them in practice.
llvm-svn: 256079
Summary:
First up is instcombine, where in the dbg.declare -> dbg.value conversion,
the llvm.dbg.value needs to be called on the actual loaded value, rather
than the address (since the whole point of this transformation is to be
able to get rid of the alloca). Further, now that that's cleaned up, we
can remove a hack in the backend, that would add an implicit OP_deref if
the argument to dbg.value was an alloca. This stems from before the
existence of DIExpression and is no longer necessary since the deref can
be expressed explicitly.
Now, in order to make sure that the tests pass with this change, we need to
correct the printing of DEBUG_VALUE comments to take into account the
expression, which wasn't taken into account before.
Unfortunately, for both these changes, there were a number of incorrect
test cases (mostly the wrong number of DW_OP_derefs, but also a couple
where the test itself was broken more badly). aprantl and I have gone
through and adjusted these test case in order to make them pass with
these fixes and in some cases to make sure they're actually testing
what they are meant to test.
Reviewers: aprantl
Subscribers: dsanders
Differential Revision: http://reviews.llvm.org/D14186
llvm-svn: 256077
noduplicate prevents unrolling of small loops that happen to have
barriers in them. If a loop has a barrier in it, it is OK to duplicate
it for the unroll.
llvm-svn: 256075
Summary:
When copying aggregate registers within the same register class, there may
be an overlap between source and destination that forces us to do the copy
backwards.
Do the simplest possible thing that guarantees the correct order of moves
when there are overlaps, and does whatever when there is no overlap. (The
last part forces some trivial adjustments to test cases.)
Together with r255906, this fixes a VM fault in Unreal Elemental Demo.
While at it, change the generation of kill and def flags to something that
looks more reasonable. This method is used very late during compilation, so
it probably doesn't matter in practice, and to be honest, I don't know if
this change is actually correct because the semantics in connection with
aggregate registers vs. sub-registers are not clear to me.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93264
Reviewers: arsenm, tstellarAMD
Subscribers: arsenm, llvm-commits
Differential Revision: http://reviews.llvm.org/D15622
llvm-svn: 256072
This deprecates:
* LLVMParseBitcode
* LLVMParseBitcodeInContext
* LLVMGetBitcodeModuleInContext
* LLVMGetBitcodeModule
They are replaced with the functions with a 2 suffix which do not record
a diagnostic.
llvm-svn: 256065
This patch removes all getEdgeWeight() interfaces from CodeGen directory. As
getEdgeProbability() is a little more expensive than getEdgeWeight(), I will
compose a patch soon in which BPI only stores probabilities instead of edge
weights so that getEdgeProbability() will have O(1) time.
Differential revision: http://reviews.llvm.org/D15489
llvm-svn: 256039
Summary:
If Candiadte may have a different type from GEP, we should bitcast or
pointer cast it to GEP's type so that the later RAUW doesn't complain.
Added a test in nary-gep.ll
Reviewers: tra, meheff
Subscribers: mcrosier, llvm-commits, jholewinski
Differential Revision: http://reviews.llvm.org/D15618
llvm-svn: 256035
When identifying blocks post-dominated by an unreachable-terminated block
in BranchProbabilityInfo, consider only the edge to the normal destination
block if the terminator is InvokeInst and let calcInvokeHeuristics() decide
edge weights for the InvokeInst.
llvm-svn: 256028
LLVM MC has single methods which can handle the output of EH frame and DWARF CIE's and FDE's.
This code improves DWARFDebugFrame::parse to do the same for parsing.
This also allows llvm-objdump to support the --dwarf=frames option which objdump supports. This
option dumps the .eh_frame section using the new code in DWARFDebugFrame::parse.
http://reviews.llvm.org/D15535
Reviewed by Rafael Espindola.
llvm-svn: 256008
This change promotes load instructions which directly read from stores by
replacing them with mov instructions. If the store is wider than the load,
the load will be replaced with a bitfield extract.
For example :
STRWui %W1, %X0, 1
%W0 = LDRHHui %X0, 3
becomes
STRWui %W1, %X0, 1
%W0 = UBFMWri %W1, 16, 31
llvm-svn: 256004
Summary:
Third patch split out from http://reviews.llvm.org/D14752.
Only map in needed DISubroutine metadata (imported or otherwise linked
in functions and other DISubroutine referenced by inlined instructions).
This is supported for ThinLTO, LTO and llvm-link --only-needed, with
associated tests for each one.
Depends on D14838.
Reviewers: dexonsmith, joker.eph
Subscribers: davidxl, llvm-commits, joker.eph
Differential Revision: http://reviews.llvm.org/D14843
llvm-svn: 256003