a standalone pass.
There is no call graph or even interesting analysis for this part of
function attributes -- it is literally inferring attributes based on the
target library identification. As such, we can do it using a much
simpler module pass that just walks the declarations. This can also
happen much earlier in the pass pipeline which has benefits for any
number of other passes.
In the process, I've cleaned up one particular aspect of the logic which
was necessary in order to separate the two passes cleanly. It now counts
inferred attributes independently rather than just counting all the
inferred attributes as one, and the counts are more clearly explained.
The two test cases we had for this code path are both ... woefully
inadequate and copies of each other. I've kept the superset test and
updated it. We need more testing here, but I had to pick somewhere to
stop fixing everything broken I saw here.
Differential Revision: http://reviews.llvm.org/D15676
llvm-svn: 256466
is (by default) run much earlier than FuncitonAttrs proper.
This allows forcing optnone or other widely impactful attributes. It is
also a bit simpler as the force attribute behavior needs no specific
iteration order.
I've added the pass into the default module pass pipeline and LTO pass
pipeline which mirrors where function attrs itself was being run.
Differential Revision: http://reviews.llvm.org/D15668
llvm-svn: 256465
MSC18 Debug didn't merge them.
FIXME: I tweaked just to appease a builder. Almost string literals should be addressed identically there.
llvm-svn: 256459
A frame pointer must be used if stack pointer is modified after the
prologue. LLVM will emit pushf/popf if we need to save/restore the
FLAGS register, requiring us to have a frame pointer for the function.
There is a small twist: this sequence might exist in user code via
inline-assembly. For now, conservatively assume that such functions
require a frame pointer. For real world justification, please see
clang's implementation of __readeflags.
This fixes PR25945.
llvm-svn: 256456
Summary: This patch changes gc.statepoint intrinsic's return type to token type instead of i32 type. Using token types could prevent LLVM to merge different gc.statepoint nodes into PHI nodes and cause further problems with gc relocations. The patch also changes the way on how gc.relocate and gc.result look for their corresponding gc.statepoint on unwind path. The current implementation uses the selector value extracted from a { i8*, i32 } landingpad as a hook to find the gc.statepoint, while the patch directly uses a token type landingpad (http://reviews.llvm.org/D15405) to find the gc.statepoint.
Reviewers: sanjoy, JosephTremoulet, pgavlin, igor-laevsky, mjacob
Subscribers: reames, mjacob, sanjoy, llvm-commits
Differential Revision: http://reviews.llvm.org/D15662
llvm-svn: 256443
We already know how to properly print out basic blocks in
printAsOperand, we should not roll it ourselves in
AsmPrinter::EmitBasicBlockStart. No functionality change is intended.
llvm-svn: 256413
Move RegStackify after coalescing and teach it to use LiveIntervals instead
of depending on SSA form. This avoids a problem where a register in a COPY
instruction is stackified and then subsequently coalesced with a register
that is not stackified.
This also puts it after the scheduler, which allows us to simplify the
EXPR_STACK constraint, as we no longer have instructions being reordered
after stackification and before coloring.
llvm-svn: 256402
This is an extension of the shuffle combining from r203229:
http://reviews.llvm.org/rL203229
The idea is to widen a short input vector with undef elements so the
existing shuffle transform for extract/insert can kick in.
The motivation is to finally solve PR2109:
https://llvm.org/bugs/show_bug.cgi?id=2109
For that example, the IR becomes:
%1 = bitcast <2 x i32>* %P to <2 x float>*
%ld1 = load <2 x float>, <2 x float>* %1, align 8
%2 = shufflevector <2 x float> %ld1, <2 x float> undef, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
%i2 = shufflevector <4 x float> %A, <4 x float> %2, <4 x i32> <i32 0, i32 1, i32 4, i32 5>
ret <4 x float> %i2
And x86 SSE output improves from:
movq (%rdi), %xmm1 ## xmm1 = mem[0],zero
movdqa %xmm1, %xmm2
shufps $229, %xmm2, %xmm2 ## xmm2 = xmm2[1,1,2,3]
shufps $48, %xmm0, %xmm1 ## xmm1 = xmm1[0,0],xmm0[3,0]
shufps $132, %xmm1, %xmm0 ## xmm0 = xmm0[0,1],xmm1[0,2]
shufps $32, %xmm0, %xmm2 ## xmm2 = xmm2[0,0],xmm0[2,0]
shufps $36, %xmm2, %xmm0 ## xmm0 = xmm0[0,1],xmm2[2,0]
retq
To the almost optimal:
movhpd (%rdi), %xmm0
Note: There's a tension in the existing transform related to generating
arbitrary shufflevector masks. We avoid that in other places in InstCombine
because we're scared that codegen can't handle strange masks, but it looks
like we're ok with producing those here. I purposely chose weird insert/extract
indexes for the regression tests to see the effect in these cases.
For PowerPC+Altivec, AArch64, and X86+SSE/AVX, I think the codegen is equal or
better for these examples.
Differential Revision: http://reviews.llvm.org/D15096
llvm-svn: 256394
Summary: This diff is the initial implementation of the LLVM CodeView library. There is much more work to be done, namely a CodeView dumper and tests. This patch should help others make progress on the LLVM->CodeView debug info emission while I continue with the implementation of the dumper and tests.
This library implements support for emitting debug info in the CodeView format. This phase of the implementation only includes support for CodeView type records. Clients that need to emit type records will use a class derived from TypeTableBuilder. TypeTableBuilder provides member functions for writing each kind of type record; each of these functions eventually calls the writeRecord virtual function to emit the actual bits of the record. Derived classes override writeRecord to implement the folding of duplicate records and the actual emission to the appropriate destination. LLVMCodeView provides MemoryTypeTableBuilder, which creates the table in memory. In the future, other classes derived from TypeTableBuilder will write to other destinations, such as the type stream in a PDB.
The rest of the types in LLVMCodeView define the actual CodeView type records and all of the supporting enums and other types used in the type records. The TypeIndex class is of particular interest, because it is used by clients as a handle to a type in the type table.
The library provides a relatively low-level interface based on the actual on-disk format of CodeView. For example, type records refer to other type records by TypeIndex, rather than by an actual pointer to the referent record. This allows clients to emit type records one at a time, rather than having to keep the entire transitive closure of type records in memory until everything has been emitted. At some point, having a higher-level interface layered on top of this one may be useful for debuggers and other tools that want a more holistic view of the debug info. The lower-level interface should be sufficient for compilers and linkers to do the debug info manipulation that they need to do efficiently.
Reviewers: rnk, majnemer
Subscribers: silvas, rnk, jevinskie, llvm-commits
Differential Revision: http://reviews.llvm.org/D14961
llvm-svn: 256385
The patterns that set a mask register to 0/1
KXOR %kn, %kn, %kn / KXNOR %kn, %kn, %kn
are replaced with
KXOR %k0, %k0, %kn / KXNOR %k0, %k0, %kn - AVX-512 targets optimization.
KNL does not recognize dependency-breaking idioms for mask registers,
so kxnor %k1, %k1, %k2 has a RAW dependence on %k1.
Using %k0 as the undef input register is a performance heuristic based
on the assumption that %k0 is used less frequently than the other mask
registers, since it is not usable as a write mask.
Differential Revision: http://reviews.llvm.org/D15739
llvm-svn: 256365