llvm-project/clang/lib/CodeGen/CGVTables.cpp

Ignoring revisions in .git-blame-ignore-revs. Click here to bypass and see the normal blame view.

1384 lines
55 KiB
C++
Raw Normal View History

//===--- CGVTables.cpp - Emit LLVM Code for C++ vtables -------------------===//
//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//
//===----------------------------------------------------------------------===//
//
// This contains code dealing with C++ code generation of virtual tables.
//
//===----------------------------------------------------------------------===//
#include "CGCXXABI.h"
#include "CodeGenFunction.h"
#include "CodeGenModule.h"
#include "clang/AST/Attr.h"
#include "clang/AST/CXXInheritance.h"
#include "clang/AST/RecordLayout.h"
#include "clang/Basic/CodeGenOptions.h"
#include "clang/CodeGen/CGFunctionInfo.h"
#include "clang/CodeGen/ConstantInitBuilder.h"
#include "llvm/IR/IntrinsicInst.h"
#include "llvm/Support/Format.h"
#include "llvm/Transforms/Utils/Cloning.h"
2010-03-18 04:06:32 +08:00
#include <algorithm>
#include <cstdio>
using namespace clang;
using namespace CodeGen;
CodeGenVTables::CodeGenVTables(CodeGenModule &CGM)
: CGM(CGM), VTContext(CGM.getContext().getVTableContext()) {}
[MS] Emit vftable thunks for functions with incomplete prototypes Summary: The following class hierarchy requires that we be able to emit a this-adjusting thunk for B::foo in C's vftable: struct Incomplete; struct A { virtual A* foo(Incomplete p) = 0; }; struct B : virtual A { void foo(Incomplete p) override; }; struct C : B { int c; }; This TU is valid, but lacks a definition of 'Incomplete', which makes it hard to build a thunk for the final overrider, B::foo. Before this change, Clang gives up attempting to emit the thunk, because it assumes that if the parameter types are incomplete, it must be emitting the thunk for optimization purposes. This is untrue for the MS ABI, where the implementation of B::foo has no idea what thunks C's vftable may require. Clang needs to emit the thunk without necessarily having access to the complete prototype of foo. This change makes Clang emit a musttail variadic call when it needs such a thunk. I call these "unprototyped" thunks, because they only prototype the "this" parameter, which must always come first in the MS C++ ABI. These thunks work, but they create ugly LLVM IR. If the call to the thunk is devirtualized, it will be a call to a bitcast of a function pointer. Today, LLVM cannot inline through such a call, but I want to address that soon, because we also use this pattern for virtual member pointer thunks. This change also implements an old FIXME in the code about reusing the thunk's computed CGFunctionInfo as much as possible. Now we don't end up computing the thunk's mangled name and arranging it's prototype up to around three times. Fixes PR25641 Reviewers: rjmccall, rsmith, hans Subscribers: Prazek, cfe-commits Differential Revision: https://reviews.llvm.org/D45112 llvm-svn: 329009
2018-04-03 04:20:33 +08:00
llvm::Constant *CodeGenModule::GetAddrOfThunk(StringRef Name, llvm::Type *FnTy,
GlobalDecl GD) {
return GetOrCreateLLVMFunction(Name, FnTy, GD, /*ForVTable=*/true,
/*DontDefer=*/true, /*IsThunk=*/true);
}
static void setThunkProperties(CodeGenModule &CGM, const ThunkInfo &Thunk,
llvm::Function *ThunkFn, bool ForVTable,
GlobalDecl GD) {
CGM.setFunctionLinkage(GD, ThunkFn);
CGM.getCXXABI().setThunkLinkage(ThunkFn, ForVTable, GD,
!Thunk.Return.isEmpty());
// Set the right visibility.
CGM.setGVProperties(ThunkFn, GD);
if (!CGM.getCXXABI().exportThunk()) {
ThunkFn->setDLLStorageClass(llvm::GlobalValue::DefaultStorageClass);
ThunkFn->setDSOLocal(true);
}
if (CGM.supportsCOMDAT() && ThunkFn->isWeakForLinker())
ThunkFn->setComdat(CGM.getModule().getOrInsertComdat(ThunkFn->getName()));
}
#ifndef NDEBUG
static bool similar(const ABIArgInfo &infoL, CanQualType typeL,
const ABIArgInfo &infoR, CanQualType typeR) {
return (infoL.getKind() == infoR.getKind() &&
(typeL == typeR ||
(isa<PointerType>(typeL) && isa<PointerType>(typeR)) ||
(isa<ReferenceType>(typeL) && isa<ReferenceType>(typeR))));
}
#endif
static RValue PerformReturnAdjustment(CodeGenFunction &CGF,
QualType ResultType, RValue RV,
const ThunkInfo &Thunk) {
// Emit the return adjustment.
bool NullCheckValue = !ResultType->isReferenceType();
llvm::BasicBlock *AdjustNull = nullptr;
llvm::BasicBlock *AdjustNotNull = nullptr;
llvm::BasicBlock *AdjustEnd = nullptr;
llvm::Value *ReturnValue = RV.getScalarVal();
if (NullCheckValue) {
AdjustNull = CGF.createBasicBlock("adjust.null");
AdjustNotNull = CGF.createBasicBlock("adjust.notnull");
AdjustEnd = CGF.createBasicBlock("adjust.end");
llvm::Value *IsNull = CGF.Builder.CreateIsNull(ReturnValue);
CGF.Builder.CreateCondBr(IsNull, AdjustNull, AdjustNotNull);
CGF.EmitBlock(AdjustNotNull);
}
Compute and preserve alignment more faithfully in IR-generation. Introduce an Address type to bundle a pointer value with an alignment. Introduce APIs on CGBuilderTy to work with Address values. Change core APIs on CGF/CGM to traffic in Address where appropriate. Require alignments to be non-zero. Update a ton of code to compute and propagate alignment information. As part of this, I've promoted CGBuiltin's EmitPointerWithAlignment helper function to CGF and made use of it in a number of places in the expression emitter. The end result is that we should now be significantly more correct when performing operations on objects that are locally known to be under-aligned. Since alignment is not reliably tracked in the type system, there are inherent limits to this, but at least we are no longer confused by standard operations like derived-to-base conversions and array-to-pointer decay. I've also fixed a large number of bugs where we were applying the complete-object alignment to a pointer instead of the non-virtual alignment, although most of these were hidden by the very conservative approach we took with member alignment. Also, because IRGen now reliably asserts on zero alignments, we should no longer be subject to an absurd but frustrating recurring bug where an incomplete type would report a zero alignment and then we'd naively do a alignmentAtOffset on it and emit code using an alignment equal to the largest power-of-two factor of the offset. We should also now be emitting much more aggressive alignment attributes in the presence of over-alignment. In particular, field access now uses alignmentAtOffset instead of min. Several times in this patch, I had to change the existing code-generation pattern in order to more effectively use the Address APIs. For the most part, this seems to be a strict improvement, like doing pointer arithmetic with GEPs instead of ptrtoint. That said, I've tried very hard to not change semantics, but it is likely that I've failed in a few places, for which I apologize. ABIArgInfo now always carries the assumed alignment of indirect and indirect byval arguments. In order to cut down on what was already a dauntingly large patch, I changed the code to never set align attributes in the IR on non-byval indirect arguments. That is, we still generate code which assumes that indirect arguments have the given alignment, but we don't express this information to the backend except where it's semantically required (i.e. on byvals). This is likely a minor regression for those targets that did provide this information, but it'll be trivial to add it back in a later patch. I partially punted on applying this work to CGBuiltin. Please do not add more uses of the CreateDefaultAligned{Load,Store} APIs; they will be going away eventually. llvm-svn: 246985
2015-09-08 16:05:57 +08:00
auto ClassDecl = ResultType->getPointeeType()->getAsCXXRecordDecl();
auto ClassAlign = CGF.CGM.getClassPointerAlignment(ClassDecl);
ReturnValue = CGF.CGM.getCXXABI().performReturnAdjustment(CGF,
Address(ReturnValue, ClassAlign),
Thunk.Return);
if (NullCheckValue) {
CGF.Builder.CreateBr(AdjustEnd);
CGF.EmitBlock(AdjustNull);
CGF.Builder.CreateBr(AdjustEnd);
CGF.EmitBlock(AdjustEnd);
llvm::PHINode *PHI = CGF.Builder.CreatePHI(ReturnValue->getType(), 2);
PHI->addIncoming(ReturnValue, AdjustNotNull);
PHI->addIncoming(llvm::Constant::getNullValue(ReturnValue->getType()),
AdjustNull);
ReturnValue = PHI;
}
return RValue::get(ReturnValue);
}
/// This function clones a function's DISubprogram node and enters it into
/// a value map with the intent that the map can be utilized by the cloner
/// to short-circuit Metadata node mapping.
/// Furthermore, the function resolves any DILocalVariable nodes referenced
/// by dbg.value intrinsics so they can be properly mapped during cloning.
static void resolveTopLevelMetadata(llvm::Function *Fn,
llvm::ValueToValueMapTy &VMap) {
// Clone the DISubprogram node and put it into the Value map.
auto *DIS = Fn->getSubprogram();
if (!DIS)
return;
auto *NewDIS = DIS->replaceWithDistinct(DIS->clone());
VMap.MD()[DIS].reset(NewDIS);
// Find all llvm.dbg.declare intrinsics and resolve the DILocalVariable nodes
// they are referencing.
for (auto &BB : Fn->getBasicBlockList()) {
for (auto &I : BB) {
if (auto *DII = dyn_cast<llvm::DbgVariableIntrinsic>(&I)) {
auto *DILocal = DII->getVariable();
if (!DILocal->isResolved())
DILocal->resolve();
}
}
}
}
// This function does roughly the same thing as GenerateThunk, but in a
// very different way, so that va_start and va_end work correctly.
// FIXME: This function assumes "this" is the first non-sret LLVM argument of
// a function, and that there is an alloca built in the entry block
// for all accesses to "this".
// FIXME: This function assumes there is only one "ret" statement per function.
// FIXME: Cloning isn't correct in the presence of indirect goto!
// FIXME: This implementation of thunks bloats codesize by duplicating the
// function definition. There are alternatives:
// 1. Add some sort of stub support to LLVM for cases where we can
// do a this adjustment, then a sibcall.
// 2. We could transform the definition to take a va_list instead of an
// actual variable argument list, then have the thunks (including a
// no-op thunk for the regular definition) call va_start/va_end.
// There's a bit of per-call overhead for this solution, but it's
// better for codesize if the definition is long.
2015-07-01 06:08:44 +08:00
llvm::Function *
CodeGenFunction::GenerateVarArgsThunk(llvm::Function *Fn,
const CGFunctionInfo &FnInfo,
GlobalDecl GD, const ThunkInfo &Thunk) {
const CXXMethodDecl *MD = cast<CXXMethodDecl>(GD.getDecl());
const FunctionProtoType *FPT = MD->getType()->castAs<FunctionProtoType>();
QualType ResultType = FPT->getReturnType();
// Get the original function
assert(FnInfo.isVariadic());
llvm::Type *Ty = CGM.getTypes().GetFunctionType(FnInfo);
llvm::Value *Callee = CGM.GetAddrOfFunction(GD, Ty, /*ForVTable=*/true);
llvm::Function *BaseFn = cast<llvm::Function>(Callee);
// Cloning can't work if we don't have a definition. The Microsoft ABI may
// require thunks when a definition is not available. Emit an error in these
// cases.
if (!MD->isDefined()) {
CGM.ErrorUnsupported(MD, "return-adjusting thunk with variadic arguments");
return Fn;
}
assert(!BaseFn->isDeclaration() && "cannot clone undefined variadic method");
// Clone to thunk.
llvm::ValueToValueMapTy VMap;
// We are cloning a function while some Metadata nodes are still unresolved.
// Ensure that the value mapper does not encounter any of them.
resolveTopLevelMetadata(BaseFn, VMap);
llvm::Function *NewFn = llvm::CloneFunction(BaseFn, VMap);
Fn->replaceAllUsesWith(NewFn);
NewFn->takeName(Fn);
Fn->eraseFromParent();
Fn = NewFn;
// "Initialize" CGF (minimally).
CurFn = Fn;
// Get the "this" value
llvm::Function::arg_iterator AI = Fn->arg_begin();
if (CGM.ReturnTypeUsesSRet(FnInfo))
++AI;
// Find the first store of "this", which will be to the alloca associated
// with "this".
Compute and preserve alignment more faithfully in IR-generation. Introduce an Address type to bundle a pointer value with an alignment. Introduce APIs on CGBuilderTy to work with Address values. Change core APIs on CGF/CGM to traffic in Address where appropriate. Require alignments to be non-zero. Update a ton of code to compute and propagate alignment information. As part of this, I've promoted CGBuiltin's EmitPointerWithAlignment helper function to CGF and made use of it in a number of places in the expression emitter. The end result is that we should now be significantly more correct when performing operations on objects that are locally known to be under-aligned. Since alignment is not reliably tracked in the type system, there are inherent limits to this, but at least we are no longer confused by standard operations like derived-to-base conversions and array-to-pointer decay. I've also fixed a large number of bugs where we were applying the complete-object alignment to a pointer instead of the non-virtual alignment, although most of these were hidden by the very conservative approach we took with member alignment. Also, because IRGen now reliably asserts on zero alignments, we should no longer be subject to an absurd but frustrating recurring bug where an incomplete type would report a zero alignment and then we'd naively do a alignmentAtOffset on it and emit code using an alignment equal to the largest power-of-two factor of the offset. We should also now be emitting much more aggressive alignment attributes in the presence of over-alignment. In particular, field access now uses alignmentAtOffset instead of min. Several times in this patch, I had to change the existing code-generation pattern in order to more effectively use the Address APIs. For the most part, this seems to be a strict improvement, like doing pointer arithmetic with GEPs instead of ptrtoint. That said, I've tried very hard to not change semantics, but it is likely that I've failed in a few places, for which I apologize. ABIArgInfo now always carries the assumed alignment of indirect and indirect byval arguments. In order to cut down on what was already a dauntingly large patch, I changed the code to never set align attributes in the IR on non-byval indirect arguments. That is, we still generate code which assumes that indirect arguments have the given alignment, but we don't express this information to the backend except where it's semantically required (i.e. on byvals). This is likely a minor regression for those targets that did provide this information, but it'll be trivial to add it back in a later patch. I partially punted on applying this work to CGBuiltin. Please do not add more uses of the CreateDefaultAligned{Load,Store} APIs; they will be going away eventually. llvm-svn: 246985
2015-09-08 16:05:57 +08:00
Address ThisPtr(&*AI, CGM.getClassPointerAlignment(MD->getParent()));
llvm::BasicBlock *EntryBB = &Fn->front();
llvm::BasicBlock::iterator ThisStore =
std::find_if(EntryBB->begin(), EntryBB->end(), [&](llvm::Instruction &I) {
return isa<llvm::StoreInst>(I) &&
I.getOperand(0) == ThisPtr.getPointer();
});
assert(ThisStore != EntryBB->end() &&
"Store of this should be in entry block?");
// Adjust "this", if necessary.
Builder.SetInsertPoint(&*ThisStore);
llvm::Value *AdjustedThisPtr =
CGM.getCXXABI().performThisAdjustment(*this, ThisPtr, Thunk.This);
AdjustedThisPtr = Builder.CreateBitCast(AdjustedThisPtr,
ThisStore->getOperand(0)->getType());
ThisStore->setOperand(0, AdjustedThisPtr);
if (!Thunk.Return.isEmpty()) {
// Fix up the returned value, if necessary.
for (llvm::BasicBlock &BB : *Fn) {
llvm::Instruction *T = BB.getTerminator();
if (isa<llvm::ReturnInst>(T)) {
RValue RV = RValue::get(T->getOperand(0));
T->eraseFromParent();
Builder.SetInsertPoint(&BB);
RV = PerformReturnAdjustment(*this, ResultType, RV, Thunk);
Builder.CreateRet(RV.getScalarVal());
break;
}
}
}
2015-07-01 06:08:44 +08:00
return Fn;
}
void CodeGenFunction::StartThunk(llvm::Function *Fn, GlobalDecl GD,
[MS] Emit vftable thunks for functions with incomplete prototypes Summary: The following class hierarchy requires that we be able to emit a this-adjusting thunk for B::foo in C's vftable: struct Incomplete; struct A { virtual A* foo(Incomplete p) = 0; }; struct B : virtual A { void foo(Incomplete p) override; }; struct C : B { int c; }; This TU is valid, but lacks a definition of 'Incomplete', which makes it hard to build a thunk for the final overrider, B::foo. Before this change, Clang gives up attempting to emit the thunk, because it assumes that if the parameter types are incomplete, it must be emitting the thunk for optimization purposes. This is untrue for the MS ABI, where the implementation of B::foo has no idea what thunks C's vftable may require. Clang needs to emit the thunk without necessarily having access to the complete prototype of foo. This change makes Clang emit a musttail variadic call when it needs such a thunk. I call these "unprototyped" thunks, because they only prototype the "this" parameter, which must always come first in the MS C++ ABI. These thunks work, but they create ugly LLVM IR. If the call to the thunk is devirtualized, it will be a call to a bitcast of a function pointer. Today, LLVM cannot inline through such a call, but I want to address that soon, because we also use this pattern for virtual member pointer thunks. This change also implements an old FIXME in the code about reusing the thunk's computed CGFunctionInfo as much as possible. Now we don't end up computing the thunk's mangled name and arranging it's prototype up to around three times. Fixes PR25641 Reviewers: rjmccall, rsmith, hans Subscribers: Prazek, cfe-commits Differential Revision: https://reviews.llvm.org/D45112 llvm-svn: 329009
2018-04-03 04:20:33 +08:00
const CGFunctionInfo &FnInfo,
bool IsUnprototyped) {
assert(!CurGD.getDecl() && "CurGD was already set!");
CurGD = GD;
CurFuncIsThunk = true;
// Build FunctionArgs.
const CXXMethodDecl *MD = cast<CXXMethodDecl>(GD.getDecl());
QualType ThisType = MD->getThisType();
QualType ResultType;
if (IsUnprototyped)
ResultType = CGM.getContext().VoidTy;
else if (CGM.getCXXABI().HasThisReturn(GD))
ResultType = ThisType;
else if (CGM.getCXXABI().hasMostDerivedReturn(GD))
ResultType = CGM.getContext().VoidPtrTy;
else
ResultType = MD->getType()->castAs<FunctionProtoType>()->getReturnType();
FunctionArgList FunctionArgs;
// Create the implicit 'this' parameter declaration.
CGM.getCXXABI().buildThisParam(*this, FunctionArgs);
[MS] Emit vftable thunks for functions with incomplete prototypes Summary: The following class hierarchy requires that we be able to emit a this-adjusting thunk for B::foo in C's vftable: struct Incomplete; struct A { virtual A* foo(Incomplete p) = 0; }; struct B : virtual A { void foo(Incomplete p) override; }; struct C : B { int c; }; This TU is valid, but lacks a definition of 'Incomplete', which makes it hard to build a thunk for the final overrider, B::foo. Before this change, Clang gives up attempting to emit the thunk, because it assumes that if the parameter types are incomplete, it must be emitting the thunk for optimization purposes. This is untrue for the MS ABI, where the implementation of B::foo has no idea what thunks C's vftable may require. Clang needs to emit the thunk without necessarily having access to the complete prototype of foo. This change makes Clang emit a musttail variadic call when it needs such a thunk. I call these "unprototyped" thunks, because they only prototype the "this" parameter, which must always come first in the MS C++ ABI. These thunks work, but they create ugly LLVM IR. If the call to the thunk is devirtualized, it will be a call to a bitcast of a function pointer. Today, LLVM cannot inline through such a call, but I want to address that soon, because we also use this pattern for virtual member pointer thunks. This change also implements an old FIXME in the code about reusing the thunk's computed CGFunctionInfo as much as possible. Now we don't end up computing the thunk's mangled name and arranging it's prototype up to around three times. Fixes PR25641 Reviewers: rjmccall, rsmith, hans Subscribers: Prazek, cfe-commits Differential Revision: https://reviews.llvm.org/D45112 llvm-svn: 329009
2018-04-03 04:20:33 +08:00
// Add the rest of the parameters, if we have a prototype to work with.
if (!IsUnprototyped) {
FunctionArgs.append(MD->param_begin(), MD->param_end());
[MS] Emit vftable thunks for functions with incomplete prototypes Summary: The following class hierarchy requires that we be able to emit a this-adjusting thunk for B::foo in C's vftable: struct Incomplete; struct A { virtual A* foo(Incomplete p) = 0; }; struct B : virtual A { void foo(Incomplete p) override; }; struct C : B { int c; }; This TU is valid, but lacks a definition of 'Incomplete', which makes it hard to build a thunk for the final overrider, B::foo. Before this change, Clang gives up attempting to emit the thunk, because it assumes that if the parameter types are incomplete, it must be emitting the thunk for optimization purposes. This is untrue for the MS ABI, where the implementation of B::foo has no idea what thunks C's vftable may require. Clang needs to emit the thunk without necessarily having access to the complete prototype of foo. This change makes Clang emit a musttail variadic call when it needs such a thunk. I call these "unprototyped" thunks, because they only prototype the "this" parameter, which must always come first in the MS C++ ABI. These thunks work, but they create ugly LLVM IR. If the call to the thunk is devirtualized, it will be a call to a bitcast of a function pointer. Today, LLVM cannot inline through such a call, but I want to address that soon, because we also use this pattern for virtual member pointer thunks. This change also implements an old FIXME in the code about reusing the thunk's computed CGFunctionInfo as much as possible. Now we don't end up computing the thunk's mangled name and arranging it's prototype up to around three times. Fixes PR25641 Reviewers: rjmccall, rsmith, hans Subscribers: Prazek, cfe-commits Differential Revision: https://reviews.llvm.org/D45112 llvm-svn: 329009
2018-04-03 04:20:33 +08:00
if (isa<CXXDestructorDecl>(MD))
CGM.getCXXABI().addImplicitStructorParams(*this, ResultType,
FunctionArgs);
}
// Start defining the function.
auto NL = ApplyDebugLocation::CreateEmpty(*this);
StartFunction(GlobalDecl(), ResultType, Fn, FnInfo, FunctionArgs,
MD->getLocation());
// Create a scope with an artificial location for the body of this function.
auto AL = ApplyDebugLocation::CreateArtificial(*this);
// Since we didn't pass a GlobalDecl to StartFunction, do this ourselves.
CGM.getCXXABI().EmitInstanceFunctionProlog(*this);
CXXThisValue = CXXABIThisValue;
Compute and preserve alignment more faithfully in IR-generation. Introduce an Address type to bundle a pointer value with an alignment. Introduce APIs on CGBuilderTy to work with Address values. Change core APIs on CGF/CGM to traffic in Address where appropriate. Require alignments to be non-zero. Update a ton of code to compute and propagate alignment information. As part of this, I've promoted CGBuiltin's EmitPointerWithAlignment helper function to CGF and made use of it in a number of places in the expression emitter. The end result is that we should now be significantly more correct when performing operations on objects that are locally known to be under-aligned. Since alignment is not reliably tracked in the type system, there are inherent limits to this, but at least we are no longer confused by standard operations like derived-to-base conversions and array-to-pointer decay. I've also fixed a large number of bugs where we were applying the complete-object alignment to a pointer instead of the non-virtual alignment, although most of these were hidden by the very conservative approach we took with member alignment. Also, because IRGen now reliably asserts on zero alignments, we should no longer be subject to an absurd but frustrating recurring bug where an incomplete type would report a zero alignment and then we'd naively do a alignmentAtOffset on it and emit code using an alignment equal to the largest power-of-two factor of the offset. We should also now be emitting much more aggressive alignment attributes in the presence of over-alignment. In particular, field access now uses alignmentAtOffset instead of min. Several times in this patch, I had to change the existing code-generation pattern in order to more effectively use the Address APIs. For the most part, this seems to be a strict improvement, like doing pointer arithmetic with GEPs instead of ptrtoint. That said, I've tried very hard to not change semantics, but it is likely that I've failed in a few places, for which I apologize. ABIArgInfo now always carries the assumed alignment of indirect and indirect byval arguments. In order to cut down on what was already a dauntingly large patch, I changed the code to never set align attributes in the IR on non-byval indirect arguments. That is, we still generate code which assumes that indirect arguments have the given alignment, but we don't express this information to the backend except where it's semantically required (i.e. on byvals). This is likely a minor regression for those targets that did provide this information, but it'll be trivial to add it back in a later patch. I partially punted on applying this work to CGBuiltin. Please do not add more uses of the CreateDefaultAligned{Load,Store} APIs; they will be going away eventually. llvm-svn: 246985
2015-09-08 16:05:57 +08:00
CurCodeDecl = MD;
CurFuncDecl = MD;
}
void CodeGenFunction::FinishThunk() {
// Clear these to restore the invariants expected by
// StartFunction/FinishFunction.
CurCodeDecl = nullptr;
CurFuncDecl = nullptr;
FinishFunction();
}
void CodeGenFunction::EmitCallAndReturnForThunk(llvm::FunctionCallee Callee,
[MS] Emit vftable thunks for functions with incomplete prototypes Summary: The following class hierarchy requires that we be able to emit a this-adjusting thunk for B::foo in C's vftable: struct Incomplete; struct A { virtual A* foo(Incomplete p) = 0; }; struct B : virtual A { void foo(Incomplete p) override; }; struct C : B { int c; }; This TU is valid, but lacks a definition of 'Incomplete', which makes it hard to build a thunk for the final overrider, B::foo. Before this change, Clang gives up attempting to emit the thunk, because it assumes that if the parameter types are incomplete, it must be emitting the thunk for optimization purposes. This is untrue for the MS ABI, where the implementation of B::foo has no idea what thunks C's vftable may require. Clang needs to emit the thunk without necessarily having access to the complete prototype of foo. This change makes Clang emit a musttail variadic call when it needs such a thunk. I call these "unprototyped" thunks, because they only prototype the "this" parameter, which must always come first in the MS C++ ABI. These thunks work, but they create ugly LLVM IR. If the call to the thunk is devirtualized, it will be a call to a bitcast of a function pointer. Today, LLVM cannot inline through such a call, but I want to address that soon, because we also use this pattern for virtual member pointer thunks. This change also implements an old FIXME in the code about reusing the thunk's computed CGFunctionInfo as much as possible. Now we don't end up computing the thunk's mangled name and arranging it's prototype up to around three times. Fixes PR25641 Reviewers: rjmccall, rsmith, hans Subscribers: Prazek, cfe-commits Differential Revision: https://reviews.llvm.org/D45112 llvm-svn: 329009
2018-04-03 04:20:33 +08:00
const ThunkInfo *Thunk,
bool IsUnprototyped) {
assert(isa<CXXMethodDecl>(CurGD.getDecl()) &&
"Please use a new CGF for this thunk");
const CXXMethodDecl *MD = cast<CXXMethodDecl>(CurGD.getDecl());
// Adjust the 'this' pointer if necessary
Compute and preserve alignment more faithfully in IR-generation. Introduce an Address type to bundle a pointer value with an alignment. Introduce APIs on CGBuilderTy to work with Address values. Change core APIs on CGF/CGM to traffic in Address where appropriate. Require alignments to be non-zero. Update a ton of code to compute and propagate alignment information. As part of this, I've promoted CGBuiltin's EmitPointerWithAlignment helper function to CGF and made use of it in a number of places in the expression emitter. The end result is that we should now be significantly more correct when performing operations on objects that are locally known to be under-aligned. Since alignment is not reliably tracked in the type system, there are inherent limits to this, but at least we are no longer confused by standard operations like derived-to-base conversions and array-to-pointer decay. I've also fixed a large number of bugs where we were applying the complete-object alignment to a pointer instead of the non-virtual alignment, although most of these were hidden by the very conservative approach we took with member alignment. Also, because IRGen now reliably asserts on zero alignments, we should no longer be subject to an absurd but frustrating recurring bug where an incomplete type would report a zero alignment and then we'd naively do a alignmentAtOffset on it and emit code using an alignment equal to the largest power-of-two factor of the offset. We should also now be emitting much more aggressive alignment attributes in the presence of over-alignment. In particular, field access now uses alignmentAtOffset instead of min. Several times in this patch, I had to change the existing code-generation pattern in order to more effectively use the Address APIs. For the most part, this seems to be a strict improvement, like doing pointer arithmetic with GEPs instead of ptrtoint. That said, I've tried very hard to not change semantics, but it is likely that I've failed in a few places, for which I apologize. ABIArgInfo now always carries the assumed alignment of indirect and indirect byval arguments. In order to cut down on what was already a dauntingly large patch, I changed the code to never set align attributes in the IR on non-byval indirect arguments. That is, we still generate code which assumes that indirect arguments have the given alignment, but we don't express this information to the backend except where it's semantically required (i.e. on byvals). This is likely a minor regression for those targets that did provide this information, but it'll be trivial to add it back in a later patch. I partially punted on applying this work to CGBuiltin. Please do not add more uses of the CreateDefaultAligned{Load,Store} APIs; they will be going away eventually. llvm-svn: 246985
2015-09-08 16:05:57 +08:00
llvm::Value *AdjustedThisPtr =
Thunk ? CGM.getCXXABI().performThisAdjustment(
*this, LoadCXXThisAddress(), Thunk->This)
: LoadCXXThis();
// If perfect forwarding is required a variadic method, a method using
// inalloca, or an unprototyped thunk, use musttail. Emit an error if this
// thunk requires a return adjustment, since that is impossible with musttail.
if (CurFnInfo->usesInAlloca() || CurFnInfo->isVariadic() || IsUnprototyped) {
if (Thunk && !Thunk->Return.isEmpty()) {
[MS] Emit vftable thunks for functions with incomplete prototypes Summary: The following class hierarchy requires that we be able to emit a this-adjusting thunk for B::foo in C's vftable: struct Incomplete; struct A { virtual A* foo(Incomplete p) = 0; }; struct B : virtual A { void foo(Incomplete p) override; }; struct C : B { int c; }; This TU is valid, but lacks a definition of 'Incomplete', which makes it hard to build a thunk for the final overrider, B::foo. Before this change, Clang gives up attempting to emit the thunk, because it assumes that if the parameter types are incomplete, it must be emitting the thunk for optimization purposes. This is untrue for the MS ABI, where the implementation of B::foo has no idea what thunks C's vftable may require. Clang needs to emit the thunk without necessarily having access to the complete prototype of foo. This change makes Clang emit a musttail variadic call when it needs such a thunk. I call these "unprototyped" thunks, because they only prototype the "this" parameter, which must always come first in the MS C++ ABI. These thunks work, but they create ugly LLVM IR. If the call to the thunk is devirtualized, it will be a call to a bitcast of a function pointer. Today, LLVM cannot inline through such a call, but I want to address that soon, because we also use this pattern for virtual member pointer thunks. This change also implements an old FIXME in the code about reusing the thunk's computed CGFunctionInfo as much as possible. Now we don't end up computing the thunk's mangled name and arranging it's prototype up to around three times. Fixes PR25641 Reviewers: rjmccall, rsmith, hans Subscribers: Prazek, cfe-commits Differential Revision: https://reviews.llvm.org/D45112 llvm-svn: 329009
2018-04-03 04:20:33 +08:00
if (IsUnprototyped)
CGM.ErrorUnsupported(
MD, "return-adjusting thunk with incomplete parameter type");
else if (CurFnInfo->isVariadic())
llvm_unreachable("shouldn't try to emit musttail return-adjusting "
"thunks for variadic functions");
[MS] Emit vftable thunks for functions with incomplete prototypes Summary: The following class hierarchy requires that we be able to emit a this-adjusting thunk for B::foo in C's vftable: struct Incomplete; struct A { virtual A* foo(Incomplete p) = 0; }; struct B : virtual A { void foo(Incomplete p) override; }; struct C : B { int c; }; This TU is valid, but lacks a definition of 'Incomplete', which makes it hard to build a thunk for the final overrider, B::foo. Before this change, Clang gives up attempting to emit the thunk, because it assumes that if the parameter types are incomplete, it must be emitting the thunk for optimization purposes. This is untrue for the MS ABI, where the implementation of B::foo has no idea what thunks C's vftable may require. Clang needs to emit the thunk without necessarily having access to the complete prototype of foo. This change makes Clang emit a musttail variadic call when it needs such a thunk. I call these "unprototyped" thunks, because they only prototype the "this" parameter, which must always come first in the MS C++ ABI. These thunks work, but they create ugly LLVM IR. If the call to the thunk is devirtualized, it will be a call to a bitcast of a function pointer. Today, LLVM cannot inline through such a call, but I want to address that soon, because we also use this pattern for virtual member pointer thunks. This change also implements an old FIXME in the code about reusing the thunk's computed CGFunctionInfo as much as possible. Now we don't end up computing the thunk's mangled name and arranging it's prototype up to around three times. Fixes PR25641 Reviewers: rjmccall, rsmith, hans Subscribers: Prazek, cfe-commits Differential Revision: https://reviews.llvm.org/D45112 llvm-svn: 329009
2018-04-03 04:20:33 +08:00
else
CGM.ErrorUnsupported(
MD, "non-trivial argument copy for return-adjusting thunk");
}
EmitMustTailThunk(CurGD, AdjustedThisPtr, Callee);
return;
}
// Start building CallArgs.
CallArgList CallArgs;
QualType ThisType = MD->getThisType();
CallArgs.add(RValue::get(AdjustedThisPtr), ThisType);
if (isa<CXXDestructorDecl>(MD))
CGM.getCXXABI().adjustCallArgsForDestructorThunk(*this, CurGD, CallArgs);
#ifndef NDEBUG
unsigned PrefixArgs = CallArgs.size() - 1;
#endif
// Add the rest of the arguments.
for (const ParmVarDecl *PD : MD->parameters())
EmitDelegateCallArg(CallArgs, PD, SourceLocation());
const FunctionProtoType *FPT = MD->getType()->castAs<FunctionProtoType>();
#ifndef NDEBUG
const CGFunctionInfo &CallFnInfo = CGM.getTypes().arrangeCXXMethodCall(
CallArgs, FPT, RequiredArgs::forPrototypePlus(FPT, 1), PrefixArgs);
assert(CallFnInfo.getRegParm() == CurFnInfo->getRegParm() &&
CallFnInfo.isNoReturn() == CurFnInfo->isNoReturn() &&
CallFnInfo.getCallingConvention() == CurFnInfo->getCallingConvention());
assert(isa<CXXDestructorDecl>(MD) || // ignore dtor return types
similar(CallFnInfo.getReturnInfo(), CallFnInfo.getReturnType(),
CurFnInfo->getReturnInfo(), CurFnInfo->getReturnType()));
assert(CallFnInfo.arg_size() == CurFnInfo->arg_size());
for (unsigned i = 0, e = CurFnInfo->arg_size(); i != e; ++i)
assert(similar(CallFnInfo.arg_begin()[i].info,
CallFnInfo.arg_begin()[i].type,
CurFnInfo->arg_begin()[i].info,
CurFnInfo->arg_begin()[i].type));
#endif
// Determine whether we have a return value slot to use.
QualType ResultType = CGM.getCXXABI().HasThisReturn(CurGD)
? ThisType
: CGM.getCXXABI().hasMostDerivedReturn(CurGD)
? CGM.getContext().VoidPtrTy
: FPT->getReturnType();
ReturnValueSlot Slot;
if (!ResultType->isVoidType() &&
CurFnInfo->getReturnInfo().getKind() == ABIArgInfo::Indirect)
Slot = ReturnValueSlot(ReturnValue, ResultType.isVolatileQualified(),
/*IsUnused=*/false, /*IsExternallyDestructed=*/true);
// Now emit our call.
llvm::CallBase *CallOrInvoke;
RValue RV = EmitCall(*CurFnInfo, CGCallee::forDirect(Callee, CurGD), Slot,
CallArgs, &CallOrInvoke);
// Consider return adjustment if we have ThunkInfo.
if (Thunk && !Thunk->Return.isEmpty())
RV = PerformReturnAdjustment(*this, ResultType, RV, *Thunk);
else if (llvm::CallInst* Call = dyn_cast<llvm::CallInst>(CallOrInvoke))
Call->setTailCallKind(llvm::CallInst::TCK_Tail);
// Emit return.
if (!ResultType->isVoidType() && Slot.isNull())
CGM.getCXXABI().EmitReturnFromThunk(*this, RV, ResultType);
// Disable the final ARC autorelease.
AutoreleaseResult = false;
Compute and preserve alignment more faithfully in IR-generation. Introduce an Address type to bundle a pointer value with an alignment. Introduce APIs on CGBuilderTy to work with Address values. Change core APIs on CGF/CGM to traffic in Address where appropriate. Require alignments to be non-zero. Update a ton of code to compute and propagate alignment information. As part of this, I've promoted CGBuiltin's EmitPointerWithAlignment helper function to CGF and made use of it in a number of places in the expression emitter. The end result is that we should now be significantly more correct when performing operations on objects that are locally known to be under-aligned. Since alignment is not reliably tracked in the type system, there are inherent limits to this, but at least we are no longer confused by standard operations like derived-to-base conversions and array-to-pointer decay. I've also fixed a large number of bugs where we were applying the complete-object alignment to a pointer instead of the non-virtual alignment, although most of these were hidden by the very conservative approach we took with member alignment. Also, because IRGen now reliably asserts on zero alignments, we should no longer be subject to an absurd but frustrating recurring bug where an incomplete type would report a zero alignment and then we'd naively do a alignmentAtOffset on it and emit code using an alignment equal to the largest power-of-two factor of the offset. We should also now be emitting much more aggressive alignment attributes in the presence of over-alignment. In particular, field access now uses alignmentAtOffset instead of min. Several times in this patch, I had to change the existing code-generation pattern in order to more effectively use the Address APIs. For the most part, this seems to be a strict improvement, like doing pointer arithmetic with GEPs instead of ptrtoint. That said, I've tried very hard to not change semantics, but it is likely that I've failed in a few places, for which I apologize. ABIArgInfo now always carries the assumed alignment of indirect and indirect byval arguments. In order to cut down on what was already a dauntingly large patch, I changed the code to never set align attributes in the IR on non-byval indirect arguments. That is, we still generate code which assumes that indirect arguments have the given alignment, but we don't express this information to the backend except where it's semantically required (i.e. on byvals). This is likely a minor regression for those targets that did provide this information, but it'll be trivial to add it back in a later patch. I partially punted on applying this work to CGBuiltin. Please do not add more uses of the CreateDefaultAligned{Load,Store} APIs; they will be going away eventually. llvm-svn: 246985
2015-09-08 16:05:57 +08:00
FinishThunk();
}
void CodeGenFunction::EmitMustTailThunk(GlobalDecl GD,
llvm::Value *AdjustedThisPtr,
llvm::FunctionCallee Callee) {
// Emitting a musttail call thunk doesn't use any of the CGCall.cpp machinery
// to translate AST arguments into LLVM IR arguments. For thunks, we know
// that the caller prototype more or less matches the callee prototype with
// the exception of 'this'.
SmallVector<llvm::Value *, 8> Args;
for (llvm::Argument &A : CurFn->args())
Args.push_back(&A);
// Set the adjusted 'this' pointer.
const ABIArgInfo &ThisAI = CurFnInfo->arg_begin()->info;
if (ThisAI.isDirect()) {
const ABIArgInfo &RetAI = CurFnInfo->getReturnInfo();
int ThisArgNo = RetAI.isIndirect() && !RetAI.isSRetAfterThis() ? 1 : 0;
llvm::Type *ThisType = Args[ThisArgNo]->getType();
if (ThisType != AdjustedThisPtr->getType())
AdjustedThisPtr = Builder.CreateBitCast(AdjustedThisPtr, ThisType);
Args[ThisArgNo] = AdjustedThisPtr;
} else {
assert(ThisAI.isInAlloca() && "this is passed directly or inalloca");
Compute and preserve alignment more faithfully in IR-generation. Introduce an Address type to bundle a pointer value with an alignment. Introduce APIs on CGBuilderTy to work with Address values. Change core APIs on CGF/CGM to traffic in Address where appropriate. Require alignments to be non-zero. Update a ton of code to compute and propagate alignment information. As part of this, I've promoted CGBuiltin's EmitPointerWithAlignment helper function to CGF and made use of it in a number of places in the expression emitter. The end result is that we should now be significantly more correct when performing operations on objects that are locally known to be under-aligned. Since alignment is not reliably tracked in the type system, there are inherent limits to this, but at least we are no longer confused by standard operations like derived-to-base conversions and array-to-pointer decay. I've also fixed a large number of bugs where we were applying the complete-object alignment to a pointer instead of the non-virtual alignment, although most of these were hidden by the very conservative approach we took with member alignment. Also, because IRGen now reliably asserts on zero alignments, we should no longer be subject to an absurd but frustrating recurring bug where an incomplete type would report a zero alignment and then we'd naively do a alignmentAtOffset on it and emit code using an alignment equal to the largest power-of-two factor of the offset. We should also now be emitting much more aggressive alignment attributes in the presence of over-alignment. In particular, field access now uses alignmentAtOffset instead of min. Several times in this patch, I had to change the existing code-generation pattern in order to more effectively use the Address APIs. For the most part, this seems to be a strict improvement, like doing pointer arithmetic with GEPs instead of ptrtoint. That said, I've tried very hard to not change semantics, but it is likely that I've failed in a few places, for which I apologize. ABIArgInfo now always carries the assumed alignment of indirect and indirect byval arguments. In order to cut down on what was already a dauntingly large patch, I changed the code to never set align attributes in the IR on non-byval indirect arguments. That is, we still generate code which assumes that indirect arguments have the given alignment, but we don't express this information to the backend except where it's semantically required (i.e. on byvals). This is likely a minor regression for those targets that did provide this information, but it'll be trivial to add it back in a later patch. I partially punted on applying this work to CGBuiltin. Please do not add more uses of the CreateDefaultAligned{Load,Store} APIs; they will be going away eventually. llvm-svn: 246985
2015-09-08 16:05:57 +08:00
Address ThisAddr = GetAddrOfLocalVar(CXXABIThisDecl);
llvm::Type *ThisType = ThisAddr.getElementType();
if (ThisType != AdjustedThisPtr->getType())
AdjustedThisPtr = Builder.CreateBitCast(AdjustedThisPtr, ThisType);
Builder.CreateStore(AdjustedThisPtr, ThisAddr);
}
// Emit the musttail call manually. Even if the prologue pushed cleanups, we
// don't actually want to run them.
llvm::CallInst *Call = Builder.CreateCall(Callee, Args);
Call->setTailCallKind(llvm::CallInst::TCK_MustTail);
// Apply the standard set of call attributes.
unsigned CallingConv;
llvm::AttributeList Attrs;
CGM.ConstructAttributeList(Callee.getCallee()->getName(), *CurFnInfo, GD,
Attrs, CallingConv, /*AttrOnCallSite=*/true);
Call->setAttributes(Attrs);
Call->setCallingConv(static_cast<llvm::CallingConv::ID>(CallingConv));
if (Call->getType()->isVoidTy())
Builder.CreateRetVoid();
else
Builder.CreateRet(Call);
// Finish the function to maintain CodeGenFunction invariants.
// FIXME: Don't emit unreachable code.
EmitBlock(createBasicBlock());
FinishThunk();
}
void CodeGenFunction::generateThunk(llvm::Function *Fn,
[MS] Emit vftable thunks for functions with incomplete prototypes Summary: The following class hierarchy requires that we be able to emit a this-adjusting thunk for B::foo in C's vftable: struct Incomplete; struct A { virtual A* foo(Incomplete p) = 0; }; struct B : virtual A { void foo(Incomplete p) override; }; struct C : B { int c; }; This TU is valid, but lacks a definition of 'Incomplete', which makes it hard to build a thunk for the final overrider, B::foo. Before this change, Clang gives up attempting to emit the thunk, because it assumes that if the parameter types are incomplete, it must be emitting the thunk for optimization purposes. This is untrue for the MS ABI, where the implementation of B::foo has no idea what thunks C's vftable may require. Clang needs to emit the thunk without necessarily having access to the complete prototype of foo. This change makes Clang emit a musttail variadic call when it needs such a thunk. I call these "unprototyped" thunks, because they only prototype the "this" parameter, which must always come first in the MS C++ ABI. These thunks work, but they create ugly LLVM IR. If the call to the thunk is devirtualized, it will be a call to a bitcast of a function pointer. Today, LLVM cannot inline through such a call, but I want to address that soon, because we also use this pattern for virtual member pointer thunks. This change also implements an old FIXME in the code about reusing the thunk's computed CGFunctionInfo as much as possible. Now we don't end up computing the thunk's mangled name and arranging it's prototype up to around three times. Fixes PR25641 Reviewers: rjmccall, rsmith, hans Subscribers: Prazek, cfe-commits Differential Revision: https://reviews.llvm.org/D45112 llvm-svn: 329009
2018-04-03 04:20:33 +08:00
const CGFunctionInfo &FnInfo, GlobalDecl GD,
const ThunkInfo &Thunk,
bool IsUnprototyped) {
StartThunk(Fn, GD, FnInfo, IsUnprototyped);
// Create a scope with an artificial location for the body of this function.
auto AL = ApplyDebugLocation::CreateArtificial(*this);
[MS] Emit vftable thunks for functions with incomplete prototypes Summary: The following class hierarchy requires that we be able to emit a this-adjusting thunk for B::foo in C's vftable: struct Incomplete; struct A { virtual A* foo(Incomplete p) = 0; }; struct B : virtual A { void foo(Incomplete p) override; }; struct C : B { int c; }; This TU is valid, but lacks a definition of 'Incomplete', which makes it hard to build a thunk for the final overrider, B::foo. Before this change, Clang gives up attempting to emit the thunk, because it assumes that if the parameter types are incomplete, it must be emitting the thunk for optimization purposes. This is untrue for the MS ABI, where the implementation of B::foo has no idea what thunks C's vftable may require. Clang needs to emit the thunk without necessarily having access to the complete prototype of foo. This change makes Clang emit a musttail variadic call when it needs such a thunk. I call these "unprototyped" thunks, because they only prototype the "this" parameter, which must always come first in the MS C++ ABI. These thunks work, but they create ugly LLVM IR. If the call to the thunk is devirtualized, it will be a call to a bitcast of a function pointer. Today, LLVM cannot inline through such a call, but I want to address that soon, because we also use this pattern for virtual member pointer thunks. This change also implements an old FIXME in the code about reusing the thunk's computed CGFunctionInfo as much as possible. Now we don't end up computing the thunk's mangled name and arranging it's prototype up to around three times. Fixes PR25641 Reviewers: rjmccall, rsmith, hans Subscribers: Prazek, cfe-commits Differential Revision: https://reviews.llvm.org/D45112 llvm-svn: 329009
2018-04-03 04:20:33 +08:00
// Get our callee. Use a placeholder type if this method is unprototyped so
// that CodeGenModule doesn't try to set attributes.
llvm::Type *Ty;
if (IsUnprototyped)
Ty = llvm::StructType::get(getLLVMContext());
else
Ty = CGM.getTypes().GetFunctionType(FnInfo);
llvm::Constant *Callee = CGM.GetAddrOfFunction(GD, Ty, /*ForVTable=*/true);
[MS] Emit vftable thunks for functions with incomplete prototypes Summary: The following class hierarchy requires that we be able to emit a this-adjusting thunk for B::foo in C's vftable: struct Incomplete; struct A { virtual A* foo(Incomplete p) = 0; }; struct B : virtual A { void foo(Incomplete p) override; }; struct C : B { int c; }; This TU is valid, but lacks a definition of 'Incomplete', which makes it hard to build a thunk for the final overrider, B::foo. Before this change, Clang gives up attempting to emit the thunk, because it assumes that if the parameter types are incomplete, it must be emitting the thunk for optimization purposes. This is untrue for the MS ABI, where the implementation of B::foo has no idea what thunks C's vftable may require. Clang needs to emit the thunk without necessarily having access to the complete prototype of foo. This change makes Clang emit a musttail variadic call when it needs such a thunk. I call these "unprototyped" thunks, because they only prototype the "this" parameter, which must always come first in the MS C++ ABI. These thunks work, but they create ugly LLVM IR. If the call to the thunk is devirtualized, it will be a call to a bitcast of a function pointer. Today, LLVM cannot inline through such a call, but I want to address that soon, because we also use this pattern for virtual member pointer thunks. This change also implements an old FIXME in the code about reusing the thunk's computed CGFunctionInfo as much as possible. Now we don't end up computing the thunk's mangled name and arranging it's prototype up to around three times. Fixes PR25641 Reviewers: rjmccall, rsmith, hans Subscribers: Prazek, cfe-commits Differential Revision: https://reviews.llvm.org/D45112 llvm-svn: 329009
2018-04-03 04:20:33 +08:00
// Fix up the function type for an unprototyped musttail call.
if (IsUnprototyped)
Callee = llvm::ConstantExpr::getBitCast(Callee, Fn->getType());
// Make the call and return the result.
EmitCallAndReturnForThunk(llvm::FunctionCallee(Fn->getFunctionType(), Callee),
&Thunk, IsUnprototyped);
}
[MS] Emit vftable thunks for functions with incomplete prototypes Summary: The following class hierarchy requires that we be able to emit a this-adjusting thunk for B::foo in C's vftable: struct Incomplete; struct A { virtual A* foo(Incomplete p) = 0; }; struct B : virtual A { void foo(Incomplete p) override; }; struct C : B { int c; }; This TU is valid, but lacks a definition of 'Incomplete', which makes it hard to build a thunk for the final overrider, B::foo. Before this change, Clang gives up attempting to emit the thunk, because it assumes that if the parameter types are incomplete, it must be emitting the thunk for optimization purposes. This is untrue for the MS ABI, where the implementation of B::foo has no idea what thunks C's vftable may require. Clang needs to emit the thunk without necessarily having access to the complete prototype of foo. This change makes Clang emit a musttail variadic call when it needs such a thunk. I call these "unprototyped" thunks, because they only prototype the "this" parameter, which must always come first in the MS C++ ABI. These thunks work, but they create ugly LLVM IR. If the call to the thunk is devirtualized, it will be a call to a bitcast of a function pointer. Today, LLVM cannot inline through such a call, but I want to address that soon, because we also use this pattern for virtual member pointer thunks. This change also implements an old FIXME in the code about reusing the thunk's computed CGFunctionInfo as much as possible. Now we don't end up computing the thunk's mangled name and arranging it's prototype up to around three times. Fixes PR25641 Reviewers: rjmccall, rsmith, hans Subscribers: Prazek, cfe-commits Differential Revision: https://reviews.llvm.org/D45112 llvm-svn: 329009
2018-04-03 04:20:33 +08:00
static bool shouldEmitVTableThunk(CodeGenModule &CGM, const CXXMethodDecl *MD,
bool IsUnprototyped, bool ForVTable) {
// Always emit thunks in the MS C++ ABI. We cannot rely on other TUs to
// provide thunks for us.
if (CGM.getTarget().getCXXABI().isMicrosoft())
return true;
// In the Itanium C++ ABI, vtable thunks are provided by TUs that provide
// definitions of the main method. Therefore, emitting thunks with the vtable
// is purely an optimization. Emit the thunk if optimizations are enabled and
// all of the parameter types are complete.
if (ForVTable)
return CGM.getCodeGenOpts().OptimizationLevel && !IsUnprototyped;
// Always emit thunks along with the method definition.
return true;
}
[MS] Emit vftable thunks for functions with incomplete prototypes Summary: The following class hierarchy requires that we be able to emit a this-adjusting thunk for B::foo in C's vftable: struct Incomplete; struct A { virtual A* foo(Incomplete p) = 0; }; struct B : virtual A { void foo(Incomplete p) override; }; struct C : B { int c; }; This TU is valid, but lacks a definition of 'Incomplete', which makes it hard to build a thunk for the final overrider, B::foo. Before this change, Clang gives up attempting to emit the thunk, because it assumes that if the parameter types are incomplete, it must be emitting the thunk for optimization purposes. This is untrue for the MS ABI, where the implementation of B::foo has no idea what thunks C's vftable may require. Clang needs to emit the thunk without necessarily having access to the complete prototype of foo. This change makes Clang emit a musttail variadic call when it needs such a thunk. I call these "unprototyped" thunks, because they only prototype the "this" parameter, which must always come first in the MS C++ ABI. These thunks work, but they create ugly LLVM IR. If the call to the thunk is devirtualized, it will be a call to a bitcast of a function pointer. Today, LLVM cannot inline through such a call, but I want to address that soon, because we also use this pattern for virtual member pointer thunks. This change also implements an old FIXME in the code about reusing the thunk's computed CGFunctionInfo as much as possible. Now we don't end up computing the thunk's mangled name and arranging it's prototype up to around three times. Fixes PR25641 Reviewers: rjmccall, rsmith, hans Subscribers: Prazek, cfe-commits Differential Revision: https://reviews.llvm.org/D45112 llvm-svn: 329009
2018-04-03 04:20:33 +08:00
llvm::Constant *CodeGenVTables::maybeEmitThunk(GlobalDecl GD,
const ThunkInfo &TI,
bool ForVTable) {
const CXXMethodDecl *MD = cast<CXXMethodDecl>(GD.getDecl());
[MS] Emit vftable thunks for functions with incomplete prototypes Summary: The following class hierarchy requires that we be able to emit a this-adjusting thunk for B::foo in C's vftable: struct Incomplete; struct A { virtual A* foo(Incomplete p) = 0; }; struct B : virtual A { void foo(Incomplete p) override; }; struct C : B { int c; }; This TU is valid, but lacks a definition of 'Incomplete', which makes it hard to build a thunk for the final overrider, B::foo. Before this change, Clang gives up attempting to emit the thunk, because it assumes that if the parameter types are incomplete, it must be emitting the thunk for optimization purposes. This is untrue for the MS ABI, where the implementation of B::foo has no idea what thunks C's vftable may require. Clang needs to emit the thunk without necessarily having access to the complete prototype of foo. This change makes Clang emit a musttail variadic call when it needs such a thunk. I call these "unprototyped" thunks, because they only prototype the "this" parameter, which must always come first in the MS C++ ABI. These thunks work, but they create ugly LLVM IR. If the call to the thunk is devirtualized, it will be a call to a bitcast of a function pointer. Today, LLVM cannot inline through such a call, but I want to address that soon, because we also use this pattern for virtual member pointer thunks. This change also implements an old FIXME in the code about reusing the thunk's computed CGFunctionInfo as much as possible. Now we don't end up computing the thunk's mangled name and arranging it's prototype up to around three times. Fixes PR25641 Reviewers: rjmccall, rsmith, hans Subscribers: Prazek, cfe-commits Differential Revision: https://reviews.llvm.org/D45112 llvm-svn: 329009
2018-04-03 04:20:33 +08:00
// First, get a declaration. Compute the mangled name. Don't worry about
// getting the function prototype right, since we may only need this
// declaration to fill in a vtable slot.
SmallString<256> Name;
MangleContext &MCtx = CGM.getCXXABI().getMangleContext();
llvm::raw_svector_ostream Out(Name);
if (const CXXDestructorDecl *DD = dyn_cast<CXXDestructorDecl>(MD))
MCtx.mangleCXXDtorThunk(DD, GD.getDtorType(), TI.This, Out);
else
MCtx.mangleThunk(MD, TI, Out);
llvm::Type *ThunkVTableTy = CGM.getTypes().GetFunctionTypeForVTable(GD);
llvm::Constant *Thunk = CGM.GetAddrOfThunk(Name, ThunkVTableTy, GD);
// If we don't need to emit a definition, return this declaration as is.
bool IsUnprototyped = !CGM.getTypes().isFuncTypeConvertible(
MD->getType()->castAs<FunctionType>());
if (!shouldEmitVTableThunk(CGM, MD, IsUnprototyped, ForVTable))
return Thunk;
// Arrange a function prototype appropriate for a function definition. In some
// cases in the MS ABI, we may need to build an unprototyped musttail thunk.
const CGFunctionInfo &FnInfo =
IsUnprototyped ? CGM.getTypes().arrangeUnprototypedMustTailThunk(MD)
: CGM.getTypes().arrangeGlobalDeclaration(GD);
llvm::FunctionType *ThunkFnTy = CGM.getTypes().GetFunctionType(FnInfo);
// If the type of the underlying GlobalValue is wrong, we'll have to replace
// it. It should be a declaration.
llvm::Function *ThunkFn = cast<llvm::Function>(Thunk->stripPointerCasts());
if (ThunkFn->getFunctionType() != ThunkFnTy) {
llvm::GlobalValue *OldThunkFn = ThunkFn;
assert(OldThunkFn->isDeclaration() && "Shouldn't replace non-declaration");
// Remove the name from the old thunk function and get a new thunk.
OldThunkFn->setName(StringRef());
[MS] Emit vftable thunks for functions with incomplete prototypes Summary: The following class hierarchy requires that we be able to emit a this-adjusting thunk for B::foo in C's vftable: struct Incomplete; struct A { virtual A* foo(Incomplete p) = 0; }; struct B : virtual A { void foo(Incomplete p) override; }; struct C : B { int c; }; This TU is valid, but lacks a definition of 'Incomplete', which makes it hard to build a thunk for the final overrider, B::foo. Before this change, Clang gives up attempting to emit the thunk, because it assumes that if the parameter types are incomplete, it must be emitting the thunk for optimization purposes. This is untrue for the MS ABI, where the implementation of B::foo has no idea what thunks C's vftable may require. Clang needs to emit the thunk without necessarily having access to the complete prototype of foo. This change makes Clang emit a musttail variadic call when it needs such a thunk. I call these "unprototyped" thunks, because they only prototype the "this" parameter, which must always come first in the MS C++ ABI. These thunks work, but they create ugly LLVM IR. If the call to the thunk is devirtualized, it will be a call to a bitcast of a function pointer. Today, LLVM cannot inline through such a call, but I want to address that soon, because we also use this pattern for virtual member pointer thunks. This change also implements an old FIXME in the code about reusing the thunk's computed CGFunctionInfo as much as possible. Now we don't end up computing the thunk's mangled name and arranging it's prototype up to around three times. Fixes PR25641 Reviewers: rjmccall, rsmith, hans Subscribers: Prazek, cfe-commits Differential Revision: https://reviews.llvm.org/D45112 llvm-svn: 329009
2018-04-03 04:20:33 +08:00
ThunkFn = llvm::Function::Create(ThunkFnTy, llvm::Function::ExternalLinkage,
Name.str(), &CGM.getModule());
CGM.SetLLVMFunctionAttributes(MD, FnInfo, ThunkFn);
// If needed, replace the old thunk with a bitcast.
if (!OldThunkFn->use_empty()) {
llvm::Constant *NewPtrForOldDecl =
[MS] Emit vftable thunks for functions with incomplete prototypes Summary: The following class hierarchy requires that we be able to emit a this-adjusting thunk for B::foo in C's vftable: struct Incomplete; struct A { virtual A* foo(Incomplete p) = 0; }; struct B : virtual A { void foo(Incomplete p) override; }; struct C : B { int c; }; This TU is valid, but lacks a definition of 'Incomplete', which makes it hard to build a thunk for the final overrider, B::foo. Before this change, Clang gives up attempting to emit the thunk, because it assumes that if the parameter types are incomplete, it must be emitting the thunk for optimization purposes. This is untrue for the MS ABI, where the implementation of B::foo has no idea what thunks C's vftable may require. Clang needs to emit the thunk without necessarily having access to the complete prototype of foo. This change makes Clang emit a musttail variadic call when it needs such a thunk. I call these "unprototyped" thunks, because they only prototype the "this" parameter, which must always come first in the MS C++ ABI. These thunks work, but they create ugly LLVM IR. If the call to the thunk is devirtualized, it will be a call to a bitcast of a function pointer. Today, LLVM cannot inline through such a call, but I want to address that soon, because we also use this pattern for virtual member pointer thunks. This change also implements an old FIXME in the code about reusing the thunk's computed CGFunctionInfo as much as possible. Now we don't end up computing the thunk's mangled name and arranging it's prototype up to around three times. Fixes PR25641 Reviewers: rjmccall, rsmith, hans Subscribers: Prazek, cfe-commits Differential Revision: https://reviews.llvm.org/D45112 llvm-svn: 329009
2018-04-03 04:20:33 +08:00
llvm::ConstantExpr::getBitCast(ThunkFn, OldThunkFn->getType());
OldThunkFn->replaceAllUsesWith(NewPtrForOldDecl);
}
// Remove the old thunk.
OldThunkFn->eraseFromParent();
}
bool ABIHasKeyFunctions = CGM.getTarget().getCXXABI().hasKeyFunctions();
bool UseAvailableExternallyLinkage = ForVTable && ABIHasKeyFunctions;
if (!ThunkFn->isDeclaration()) {
if (!ABIHasKeyFunctions || UseAvailableExternallyLinkage) {
// There is already a thunk emitted for this function, do nothing.
[MS] Emit vftable thunks for functions with incomplete prototypes Summary: The following class hierarchy requires that we be able to emit a this-adjusting thunk for B::foo in C's vftable: struct Incomplete; struct A { virtual A* foo(Incomplete p) = 0; }; struct B : virtual A { void foo(Incomplete p) override; }; struct C : B { int c; }; This TU is valid, but lacks a definition of 'Incomplete', which makes it hard to build a thunk for the final overrider, B::foo. Before this change, Clang gives up attempting to emit the thunk, because it assumes that if the parameter types are incomplete, it must be emitting the thunk for optimization purposes. This is untrue for the MS ABI, where the implementation of B::foo has no idea what thunks C's vftable may require. Clang needs to emit the thunk without necessarily having access to the complete prototype of foo. This change makes Clang emit a musttail variadic call when it needs such a thunk. I call these "unprototyped" thunks, because they only prototype the "this" parameter, which must always come first in the MS C++ ABI. These thunks work, but they create ugly LLVM IR. If the call to the thunk is devirtualized, it will be a call to a bitcast of a function pointer. Today, LLVM cannot inline through such a call, but I want to address that soon, because we also use this pattern for virtual member pointer thunks. This change also implements an old FIXME in the code about reusing the thunk's computed CGFunctionInfo as much as possible. Now we don't end up computing the thunk's mangled name and arranging it's prototype up to around three times. Fixes PR25641 Reviewers: rjmccall, rsmith, hans Subscribers: Prazek, cfe-commits Differential Revision: https://reviews.llvm.org/D45112 llvm-svn: 329009
2018-04-03 04:20:33 +08:00
return ThunkFn;
}
[MS] Emit vftable thunks for functions with incomplete prototypes Summary: The following class hierarchy requires that we be able to emit a this-adjusting thunk for B::foo in C's vftable: struct Incomplete; struct A { virtual A* foo(Incomplete p) = 0; }; struct B : virtual A { void foo(Incomplete p) override; }; struct C : B { int c; }; This TU is valid, but lacks a definition of 'Incomplete', which makes it hard to build a thunk for the final overrider, B::foo. Before this change, Clang gives up attempting to emit the thunk, because it assumes that if the parameter types are incomplete, it must be emitting the thunk for optimization purposes. This is untrue for the MS ABI, where the implementation of B::foo has no idea what thunks C's vftable may require. Clang needs to emit the thunk without necessarily having access to the complete prototype of foo. This change makes Clang emit a musttail variadic call when it needs such a thunk. I call these "unprototyped" thunks, because they only prototype the "this" parameter, which must always come first in the MS C++ ABI. These thunks work, but they create ugly LLVM IR. If the call to the thunk is devirtualized, it will be a call to a bitcast of a function pointer. Today, LLVM cannot inline through such a call, but I want to address that soon, because we also use this pattern for virtual member pointer thunks. This change also implements an old FIXME in the code about reusing the thunk's computed CGFunctionInfo as much as possible. Now we don't end up computing the thunk's mangled name and arranging it's prototype up to around three times. Fixes PR25641 Reviewers: rjmccall, rsmith, hans Subscribers: Prazek, cfe-commits Differential Revision: https://reviews.llvm.org/D45112 llvm-svn: 329009
2018-04-03 04:20:33 +08:00
setThunkProperties(CGM, TI, ThunkFn, ForVTable, GD);
return ThunkFn;
}
[MS] Emit vftable thunks for functions with incomplete prototypes Summary: The following class hierarchy requires that we be able to emit a this-adjusting thunk for B::foo in C's vftable: struct Incomplete; struct A { virtual A* foo(Incomplete p) = 0; }; struct B : virtual A { void foo(Incomplete p) override; }; struct C : B { int c; }; This TU is valid, but lacks a definition of 'Incomplete', which makes it hard to build a thunk for the final overrider, B::foo. Before this change, Clang gives up attempting to emit the thunk, because it assumes that if the parameter types are incomplete, it must be emitting the thunk for optimization purposes. This is untrue for the MS ABI, where the implementation of B::foo has no idea what thunks C's vftable may require. Clang needs to emit the thunk without necessarily having access to the complete prototype of foo. This change makes Clang emit a musttail variadic call when it needs such a thunk. I call these "unprototyped" thunks, because they only prototype the "this" parameter, which must always come first in the MS C++ ABI. These thunks work, but they create ugly LLVM IR. If the call to the thunk is devirtualized, it will be a call to a bitcast of a function pointer. Today, LLVM cannot inline through such a call, but I want to address that soon, because we also use this pattern for virtual member pointer thunks. This change also implements an old FIXME in the code about reusing the thunk's computed CGFunctionInfo as much as possible. Now we don't end up computing the thunk's mangled name and arranging it's prototype up to around three times. Fixes PR25641 Reviewers: rjmccall, rsmith, hans Subscribers: Prazek, cfe-commits Differential Revision: https://reviews.llvm.org/D45112 llvm-svn: 329009
2018-04-03 04:20:33 +08:00
// If this will be unprototyped, add the "thunk" attribute so that LLVM knows
// that the return type is meaningless. These thunks can be used to call
// functions with differing return types, and the caller is required to cast
// the prototype appropriately to extract the correct value.
if (IsUnprototyped)
ThunkFn->addFnAttr("thunk");
CGM.SetLLVMFunctionAttributesForDefinition(GD.getDecl(), ThunkFn);
// Thunks for variadic methods are special because in general variadic
// arguments cannot be perfectly forwarded. In the general case, clang
// implements such thunks by cloning the original function body. However, for
// thunks with no return adjustment on targets that support musttail, we can
// use musttail to perfectly forward the variadic arguments.
bool ShouldCloneVarArgs = false;
[MS] Emit vftable thunks for functions with incomplete prototypes Summary: The following class hierarchy requires that we be able to emit a this-adjusting thunk for B::foo in C's vftable: struct Incomplete; struct A { virtual A* foo(Incomplete p) = 0; }; struct B : virtual A { void foo(Incomplete p) override; }; struct C : B { int c; }; This TU is valid, but lacks a definition of 'Incomplete', which makes it hard to build a thunk for the final overrider, B::foo. Before this change, Clang gives up attempting to emit the thunk, because it assumes that if the parameter types are incomplete, it must be emitting the thunk for optimization purposes. This is untrue for the MS ABI, where the implementation of B::foo has no idea what thunks C's vftable may require. Clang needs to emit the thunk without necessarily having access to the complete prototype of foo. This change makes Clang emit a musttail variadic call when it needs such a thunk. I call these "unprototyped" thunks, because they only prototype the "this" parameter, which must always come first in the MS C++ ABI. These thunks work, but they create ugly LLVM IR. If the call to the thunk is devirtualized, it will be a call to a bitcast of a function pointer. Today, LLVM cannot inline through such a call, but I want to address that soon, because we also use this pattern for virtual member pointer thunks. This change also implements an old FIXME in the code about reusing the thunk's computed CGFunctionInfo as much as possible. Now we don't end up computing the thunk's mangled name and arranging it's prototype up to around three times. Fixes PR25641 Reviewers: rjmccall, rsmith, hans Subscribers: Prazek, cfe-commits Differential Revision: https://reviews.llvm.org/D45112 llvm-svn: 329009
2018-04-03 04:20:33 +08:00
if (!IsUnprototyped && ThunkFn->isVarArg()) {
ShouldCloneVarArgs = true;
if (TI.Return.isEmpty()) {
switch (CGM.getTriple().getArch()) {
case llvm::Triple::x86_64:
case llvm::Triple::x86:
case llvm::Triple::aarch64:
ShouldCloneVarArgs = false;
break;
default:
break;
}
}
}
if (ShouldCloneVarArgs) {
if (UseAvailableExternallyLinkage)
[MS] Emit vftable thunks for functions with incomplete prototypes Summary: The following class hierarchy requires that we be able to emit a this-adjusting thunk for B::foo in C's vftable: struct Incomplete; struct A { virtual A* foo(Incomplete p) = 0; }; struct B : virtual A { void foo(Incomplete p) override; }; struct C : B { int c; }; This TU is valid, but lacks a definition of 'Incomplete', which makes it hard to build a thunk for the final overrider, B::foo. Before this change, Clang gives up attempting to emit the thunk, because it assumes that if the parameter types are incomplete, it must be emitting the thunk for optimization purposes. This is untrue for the MS ABI, where the implementation of B::foo has no idea what thunks C's vftable may require. Clang needs to emit the thunk without necessarily having access to the complete prototype of foo. This change makes Clang emit a musttail variadic call when it needs such a thunk. I call these "unprototyped" thunks, because they only prototype the "this" parameter, which must always come first in the MS C++ ABI. These thunks work, but they create ugly LLVM IR. If the call to the thunk is devirtualized, it will be a call to a bitcast of a function pointer. Today, LLVM cannot inline through such a call, but I want to address that soon, because we also use this pattern for virtual member pointer thunks. This change also implements an old FIXME in the code about reusing the thunk's computed CGFunctionInfo as much as possible. Now we don't end up computing the thunk's mangled name and arranging it's prototype up to around three times. Fixes PR25641 Reviewers: rjmccall, rsmith, hans Subscribers: Prazek, cfe-commits Differential Revision: https://reviews.llvm.org/D45112 llvm-svn: 329009
2018-04-03 04:20:33 +08:00
return ThunkFn;
ThunkFn =
CodeGenFunction(CGM).GenerateVarArgsThunk(ThunkFn, FnInfo, GD, TI);
} else {
// Normal thunk body generation.
[MS] Emit vftable thunks for functions with incomplete prototypes Summary: The following class hierarchy requires that we be able to emit a this-adjusting thunk for B::foo in C's vftable: struct Incomplete; struct A { virtual A* foo(Incomplete p) = 0; }; struct B : virtual A { void foo(Incomplete p) override; }; struct C : B { int c; }; This TU is valid, but lacks a definition of 'Incomplete', which makes it hard to build a thunk for the final overrider, B::foo. Before this change, Clang gives up attempting to emit the thunk, because it assumes that if the parameter types are incomplete, it must be emitting the thunk for optimization purposes. This is untrue for the MS ABI, where the implementation of B::foo has no idea what thunks C's vftable may require. Clang needs to emit the thunk without necessarily having access to the complete prototype of foo. This change makes Clang emit a musttail variadic call when it needs such a thunk. I call these "unprototyped" thunks, because they only prototype the "this" parameter, which must always come first in the MS C++ ABI. These thunks work, but they create ugly LLVM IR. If the call to the thunk is devirtualized, it will be a call to a bitcast of a function pointer. Today, LLVM cannot inline through such a call, but I want to address that soon, because we also use this pattern for virtual member pointer thunks. This change also implements an old FIXME in the code about reusing the thunk's computed CGFunctionInfo as much as possible. Now we don't end up computing the thunk's mangled name and arranging it's prototype up to around three times. Fixes PR25641 Reviewers: rjmccall, rsmith, hans Subscribers: Prazek, cfe-commits Differential Revision: https://reviews.llvm.org/D45112 llvm-svn: 329009
2018-04-03 04:20:33 +08:00
CodeGenFunction(CGM).generateThunk(ThunkFn, FnInfo, GD, TI, IsUnprototyped);
}
[MS] Emit vftable thunks for functions with incomplete prototypes Summary: The following class hierarchy requires that we be able to emit a this-adjusting thunk for B::foo in C's vftable: struct Incomplete; struct A { virtual A* foo(Incomplete p) = 0; }; struct B : virtual A { void foo(Incomplete p) override; }; struct C : B { int c; }; This TU is valid, but lacks a definition of 'Incomplete', which makes it hard to build a thunk for the final overrider, B::foo. Before this change, Clang gives up attempting to emit the thunk, because it assumes that if the parameter types are incomplete, it must be emitting the thunk for optimization purposes. This is untrue for the MS ABI, where the implementation of B::foo has no idea what thunks C's vftable may require. Clang needs to emit the thunk without necessarily having access to the complete prototype of foo. This change makes Clang emit a musttail variadic call when it needs such a thunk. I call these "unprototyped" thunks, because they only prototype the "this" parameter, which must always come first in the MS C++ ABI. These thunks work, but they create ugly LLVM IR. If the call to the thunk is devirtualized, it will be a call to a bitcast of a function pointer. Today, LLVM cannot inline through such a call, but I want to address that soon, because we also use this pattern for virtual member pointer thunks. This change also implements an old FIXME in the code about reusing the thunk's computed CGFunctionInfo as much as possible. Now we don't end up computing the thunk's mangled name and arranging it's prototype up to around three times. Fixes PR25641 Reviewers: rjmccall, rsmith, hans Subscribers: Prazek, cfe-commits Differential Revision: https://reviews.llvm.org/D45112 llvm-svn: 329009
2018-04-03 04:20:33 +08:00
setThunkProperties(CGM, TI, ThunkFn, ForVTable, GD);
return ThunkFn;
2010-03-24 00:36:50 +08:00
}
[MS] Emit vftable thunks for functions with incomplete prototypes Summary: The following class hierarchy requires that we be able to emit a this-adjusting thunk for B::foo in C's vftable: struct Incomplete; struct A { virtual A* foo(Incomplete p) = 0; }; struct B : virtual A { void foo(Incomplete p) override; }; struct C : B { int c; }; This TU is valid, but lacks a definition of 'Incomplete', which makes it hard to build a thunk for the final overrider, B::foo. Before this change, Clang gives up attempting to emit the thunk, because it assumes that if the parameter types are incomplete, it must be emitting the thunk for optimization purposes. This is untrue for the MS ABI, where the implementation of B::foo has no idea what thunks C's vftable may require. Clang needs to emit the thunk without necessarily having access to the complete prototype of foo. This change makes Clang emit a musttail variadic call when it needs such a thunk. I call these "unprototyped" thunks, because they only prototype the "this" parameter, which must always come first in the MS C++ ABI. These thunks work, but they create ugly LLVM IR. If the call to the thunk is devirtualized, it will be a call to a bitcast of a function pointer. Today, LLVM cannot inline through such a call, but I want to address that soon, because we also use this pattern for virtual member pointer thunks. This change also implements an old FIXME in the code about reusing the thunk's computed CGFunctionInfo as much as possible. Now we don't end up computing the thunk's mangled name and arranging it's prototype up to around three times. Fixes PR25641 Reviewers: rjmccall, rsmith, hans Subscribers: Prazek, cfe-commits Differential Revision: https://reviews.llvm.org/D45112 llvm-svn: 329009
2018-04-03 04:20:33 +08:00
void CodeGenVTables::EmitThunks(GlobalDecl GD) {
const CXXMethodDecl *MD =
2010-03-24 00:36:50 +08:00
cast<CXXMethodDecl>(GD.getDecl())->getCanonicalDecl();
// We don't need to generate thunks for the base destructor.
if (isa<CXXDestructorDecl>(MD) && GD.getDtorType() == Dtor_Base)
return;
const VTableContextBase::ThunkInfoVectorTy *ThunkInfoVector =
VTContext->getThunkInfo(GD);
if (!ThunkInfoVector)
return;
for (const ThunkInfo& Thunk : *ThunkInfoVector)
[MS] Emit vftable thunks for functions with incomplete prototypes Summary: The following class hierarchy requires that we be able to emit a this-adjusting thunk for B::foo in C's vftable: struct Incomplete; struct A { virtual A* foo(Incomplete p) = 0; }; struct B : virtual A { void foo(Incomplete p) override; }; struct C : B { int c; }; This TU is valid, but lacks a definition of 'Incomplete', which makes it hard to build a thunk for the final overrider, B::foo. Before this change, Clang gives up attempting to emit the thunk, because it assumes that if the parameter types are incomplete, it must be emitting the thunk for optimization purposes. This is untrue for the MS ABI, where the implementation of B::foo has no idea what thunks C's vftable may require. Clang needs to emit the thunk without necessarily having access to the complete prototype of foo. This change makes Clang emit a musttail variadic call when it needs such a thunk. I call these "unprototyped" thunks, because they only prototype the "this" parameter, which must always come first in the MS C++ ABI. These thunks work, but they create ugly LLVM IR. If the call to the thunk is devirtualized, it will be a call to a bitcast of a function pointer. Today, LLVM cannot inline through such a call, but I want to address that soon, because we also use this pattern for virtual member pointer thunks. This change also implements an old FIXME in the code about reusing the thunk's computed CGFunctionInfo as much as possible. Now we don't end up computing the thunk's mangled name and arranging it's prototype up to around three times. Fixes PR25641 Reviewers: rjmccall, rsmith, hans Subscribers: Prazek, cfe-commits Differential Revision: https://reviews.llvm.org/D45112 llvm-svn: 329009
2018-04-03 04:20:33 +08:00
maybeEmitThunk(GD, Thunk, /*ForVTable=*/false);
}
void CodeGenVTables::addRelativeComponent(ConstantArrayBuilder &builder,
llvm::Constant *component,
unsigned vtableAddressPoint,
bool vtableHasLocalLinkage,
bool isCompleteDtor) const {
// No need to get the offset of a nullptr.
if (component->isNullValue())
return builder.add(llvm::ConstantInt::get(CGM.Int32Ty, 0));
auto *globalVal =
cast<llvm::GlobalValue>(component->stripPointerCastsAndAliases());
llvm::Module &module = CGM.getModule();
// We don't want to copy the linkage of the vtable exactly because we still
// want the stub/proxy to be emitted for properly calculating the offset.
// Examples where there would be no symbol emitted are available_externally
// and private linkages.
auto stubLinkage = vtableHasLocalLinkage ? llvm::GlobalValue::InternalLinkage
: llvm::GlobalValue::ExternalLinkage;
llvm::Constant *target;
if (auto *func = dyn_cast<llvm::Function>(globalVal)) {
target = getOrCreateRelativeStub(func, stubLinkage, isCompleteDtor);
} else {
llvm::SmallString<16> rttiProxyName(globalVal->getName());
rttiProxyName.append(".rtti_proxy");
// The RTTI component may not always be emitted in the same linkage unit as
// the vtable. As a general case, we can make a dso_local proxy to the RTTI
// that points to the actual RTTI struct somewhere. This will result in a
// GOTPCREL relocation when taking the relative offset to the proxy.
llvm::GlobalVariable *proxy = module.getNamedGlobal(rttiProxyName);
if (!proxy) {
proxy = new llvm::GlobalVariable(module, globalVal->getType(),
/*isConstant=*/true, stubLinkage,
globalVal, rttiProxyName);
proxy->setDSOLocal(true);
proxy->setUnnamedAddr(llvm::GlobalValue::UnnamedAddr::Global);
if (!proxy->hasLocalLinkage()) {
proxy->setVisibility(llvm::GlobalValue::HiddenVisibility);
proxy->setComdat(module.getOrInsertComdat(rttiProxyName));
}
}
target = proxy;
}
builder.addRelativeOffsetToPosition(CGM.Int32Ty, target,
/*position=*/vtableAddressPoint);
}
llvm::Function *CodeGenVTables::getOrCreateRelativeStub(
llvm::Function *func, llvm::GlobalValue::LinkageTypes stubLinkage,
bool isCompleteDtor) const {
// A complete object destructor can later be substituted in the vtable for an
// appropriate base object destructor when optimizations are enabled. This can
// happen for child classes that don't have their own destructor. In the case
// where a parent virtual destructor is not guaranteed to be in the same
// linkage unit as the child vtable, it's possible for an external reference
// for this destructor to be substituted into the child vtable, preventing it
// from being in rodata. If this function is a complete virtual destructor, we
// can just force a stub to be emitted for it.
if (func->isDSOLocal() && !isCompleteDtor)
return func;
llvm::SmallString<16> stubName(func->getName());
stubName.append(".stub");
// Instead of taking the offset between the vtable and virtual function
// directly, we emit a dso_local stub that just contains a tail call to the
// original virtual function and take the offset between that and the
// vtable. We do this because there are some cases where the original
// function that would've been inserted into the vtable is not dso_local
// which may require some kind of dynamic relocation which prevents the
// vtable from being readonly. On x86_64, taking the offset between the
// function and the vtable gets lowered to the offset between the PLT entry
// for the function and the vtable which gives us a PLT32 reloc. On AArch64,
// right now only CALL26 and JUMP26 instructions generate PLT relocations,
// so we manifest them with stubs that are just jumps to the original
// function.
auto &module = CGM.getModule();
llvm::Function *stub = module.getFunction(stubName);
if (stub) {
assert(stub->isDSOLocal() &&
"The previous definition of this stub should've been dso_local.");
return stub;
}
stub = llvm::Function::Create(func->getFunctionType(), stubLinkage, stubName,
module);
// Propogate function attributes.
stub->setAttributes(func->getAttributes());
stub->setDSOLocal(true);
stub->setUnnamedAddr(llvm::GlobalValue::UnnamedAddr::Global);
if (!stub->hasLocalLinkage()) {
stub->setVisibility(llvm::GlobalValue::HiddenVisibility);
stub->setComdat(module.getOrInsertComdat(stubName));
}
// Fill the stub with a tail call that will be optimized.
llvm::BasicBlock *block =
llvm::BasicBlock::Create(module.getContext(), "entry", stub);
llvm::IRBuilder<> block_builder(block);
llvm::SmallVector<llvm::Value *, 8> args;
for (auto &arg : stub->args())
args.push_back(&arg);
llvm::CallInst *call = block_builder.CreateCall(func, args);
call->setAttributes(func->getAttributes());
call->setTailCall();
if (call->getType()->isVoidTy())
block_builder.CreateRetVoid();
else
block_builder.CreateRet(call);
return stub;
}
bool CodeGenVTables::useRelativeLayout() const {
return CGM.getTarget().getCXXABI().isItaniumFamily() &&
CGM.getItaniumVTableContext().isRelativeLayout();
}
llvm::Type *CodeGenVTables::getVTableComponentType() const {
if (useRelativeLayout())
return CGM.Int32Ty;
return CGM.Int8PtrTy;
}
static void AddPointerLayoutOffset(const CodeGenModule &CGM,
ConstantArrayBuilder &builder,
CharUnits offset) {
builder.add(llvm::ConstantExpr::getIntToPtr(
llvm::ConstantInt::get(CGM.PtrDiffTy, offset.getQuantity()),
CGM.Int8PtrTy));
}
static void AddRelativeLayoutOffset(const CodeGenModule &CGM,
ConstantArrayBuilder &builder,
CharUnits offset) {
builder.add(llvm::ConstantInt::get(CGM.Int32Ty, offset.getQuantity()));
}
void CodeGenVTables::addVTableComponent(ConstantArrayBuilder &builder,
const VTableLayout &layout,
unsigned componentIndex,
llvm::Constant *rtti,
unsigned &nextVTableThunkIndex,
unsigned vtableAddressPoint,
bool vtableHasLocalLinkage) {
auto &component = layout.vtable_components()[componentIndex];
auto addOffsetConstant =
useRelativeLayout() ? AddRelativeLayoutOffset : AddPointerLayoutOffset;
switch (component.getKind()) {
case VTableComponent::CK_VCallOffset:
return addOffsetConstant(CGM, builder, component.getVCallOffset());
case VTableComponent::CK_VBaseOffset:
return addOffsetConstant(CGM, builder, component.getVBaseOffset());
case VTableComponent::CK_OffsetToTop:
return addOffsetConstant(CGM, builder, component.getOffsetToTop());
case VTableComponent::CK_RTTI:
if (useRelativeLayout())
return addRelativeComponent(builder, rtti, vtableAddressPoint,
vtableHasLocalLinkage,
/*isCompleteDtor=*/false);
else
return builder.add(llvm::ConstantExpr::getBitCast(rtti, CGM.Int8PtrTy));
case VTableComponent::CK_FunctionPointer:
case VTableComponent::CK_CompleteDtorPointer:
case VTableComponent::CK_DeletingDtorPointer: {
GlobalDecl GD;
// Get the right global decl.
switch (component.getKind()) {
default:
llvm_unreachable("Unexpected vtable component kind");
case VTableComponent::CK_FunctionPointer:
GD = component.getFunctionDecl();
break;
case VTableComponent::CK_CompleteDtorPointer:
GD = GlobalDecl(component.getDestructorDecl(), Dtor_Complete);
break;
case VTableComponent::CK_DeletingDtorPointer:
GD = GlobalDecl(component.getDestructorDecl(), Dtor_Deleting);
break;
}
if (CGM.getLangOpts().CUDA) {
// Emit NULL for methods we can't codegen on this
// side. Otherwise we'd end up with vtable with unresolved
// references.
const CXXMethodDecl *MD = cast<CXXMethodDecl>(GD.getDecl());
// OK on device side: functions w/ __device__ attribute
// OK on host side: anything except __device__-only functions.
bool CanEmitMethod =
CGM.getLangOpts().CUDAIsDevice
? MD->hasAttr<CUDADeviceAttr>()
: (MD->hasAttr<CUDAHostAttr>() || !MD->hasAttr<CUDADeviceAttr>());
if (!CanEmitMethod)
return builder.add(llvm::ConstantExpr::getNullValue(CGM.Int8PtrTy));
// Method is acceptable, continue processing as usual.
}
auto getSpecialVirtualFn = [&](StringRef name) -> llvm::Constant * {
// FIXME(PR43094): When merging comdat groups, lld can select a local
// symbol as the signature symbol even though it cannot be accessed
// outside that symbol's TU. The relative vtables ABI would make
// __cxa_pure_virtual and __cxa_deleted_virtual local symbols, and
// depending on link order, the comdat groups could resolve to the one
// with the local symbol. As a temporary solution, fill these components
// with zero. We shouldn't be calling these in the first place anyway.
if (useRelativeLayout())
return llvm::ConstantPointerNull::get(CGM.Int8PtrTy);
// For NVPTX devices in OpenMP emit special functon as null pointers,
// otherwise linking ends up with unresolved references.
if (CGM.getLangOpts().OpenMP && CGM.getLangOpts().OpenMPIsDevice &&
CGM.getTriple().isNVPTX())
return llvm::ConstantPointerNull::get(CGM.Int8PtrTy);
llvm::FunctionType *fnTy =
llvm::FunctionType::get(CGM.VoidTy, /*isVarArg=*/false);
llvm::Constant *fn = cast<llvm::Constant>(
CGM.CreateRuntimeFunction(fnTy, name).getCallee());
if (auto f = dyn_cast<llvm::Function>(fn))
f->setUnnamedAddr(llvm::GlobalValue::UnnamedAddr::Global);
return llvm::ConstantExpr::getBitCast(fn, CGM.Int8PtrTy);
};
llvm::Constant *fnPtr;
// Pure virtual member functions.
if (cast<CXXMethodDecl>(GD.getDecl())->isPure()) {
if (!PureVirtualFn)
PureVirtualFn =
getSpecialVirtualFn(CGM.getCXXABI().GetPureVirtualCallName());
fnPtr = PureVirtualFn;
// Deleted virtual member functions.
} else if (cast<CXXMethodDecl>(GD.getDecl())->isDeleted()) {
if (!DeletedVirtualFn)
DeletedVirtualFn =
getSpecialVirtualFn(CGM.getCXXABI().GetDeletedVirtualCallName());
fnPtr = DeletedVirtualFn;
// Thunks.
} else if (nextVTableThunkIndex < layout.vtable_thunks().size() &&
layout.vtable_thunks()[nextVTableThunkIndex].first ==
componentIndex) {
auto &thunkInfo = layout.vtable_thunks()[nextVTableThunkIndex].second;
nextVTableThunkIndex++;
[MS] Emit vftable thunks for functions with incomplete prototypes Summary: The following class hierarchy requires that we be able to emit a this-adjusting thunk for B::foo in C's vftable: struct Incomplete; struct A { virtual A* foo(Incomplete p) = 0; }; struct B : virtual A { void foo(Incomplete p) override; }; struct C : B { int c; }; This TU is valid, but lacks a definition of 'Incomplete', which makes it hard to build a thunk for the final overrider, B::foo. Before this change, Clang gives up attempting to emit the thunk, because it assumes that if the parameter types are incomplete, it must be emitting the thunk for optimization purposes. This is untrue for the MS ABI, where the implementation of B::foo has no idea what thunks C's vftable may require. Clang needs to emit the thunk without necessarily having access to the complete prototype of foo. This change makes Clang emit a musttail variadic call when it needs such a thunk. I call these "unprototyped" thunks, because they only prototype the "this" parameter, which must always come first in the MS C++ ABI. These thunks work, but they create ugly LLVM IR. If the call to the thunk is devirtualized, it will be a call to a bitcast of a function pointer. Today, LLVM cannot inline through such a call, but I want to address that soon, because we also use this pattern for virtual member pointer thunks. This change also implements an old FIXME in the code about reusing the thunk's computed CGFunctionInfo as much as possible. Now we don't end up computing the thunk's mangled name and arranging it's prototype up to around three times. Fixes PR25641 Reviewers: rjmccall, rsmith, hans Subscribers: Prazek, cfe-commits Differential Revision: https://reviews.llvm.org/D45112 llvm-svn: 329009
2018-04-03 04:20:33 +08:00
fnPtr = maybeEmitThunk(GD, thunkInfo, /*ForVTable=*/true);
// Otherwise we can use the method definition directly.
} else {
llvm::Type *fnTy = CGM.getTypes().GetFunctionTypeForVTable(GD);
fnPtr = CGM.GetAddrOfFunction(GD, fnTy, /*ForVTable=*/true);
}
if (useRelativeLayout()) {
return addRelativeComponent(
builder, fnPtr, vtableAddressPoint, vtableHasLocalLinkage,
component.getKind() == VTableComponent::CK_CompleteDtorPointer);
} else
return builder.add(llvm::ConstantExpr::getBitCast(fnPtr, CGM.Int8PtrTy));
}
case VTableComponent::CK_UnusedFunctionPointer:
if (useRelativeLayout())
return builder.add(llvm::ConstantExpr::getNullValue(CGM.Int32Ty));
else
return builder.addNullPointer(CGM.Int8PtrTy);
}
llvm_unreachable("Unexpected vtable component kind");
}
llvm::Type *CodeGenVTables::getVTableType(const VTableLayout &layout) {
SmallVector<llvm::Type *, 4> tys;
llvm::Type *componentType = getVTableComponentType();
for (unsigned i = 0, e = layout.getNumVTables(); i != e; ++i)
tys.push_back(llvm::ArrayType::get(componentType, layout.getVTableSize(i)));
return llvm::StructType::get(CGM.getLLVMContext(), tys);
}
void CodeGenVTables::createVTableInitializer(ConstantStructBuilder &builder,
const VTableLayout &layout,
llvm::Constant *rtti,
bool vtableHasLocalLinkage) {
llvm::Type *componentType = getVTableComponentType();
const auto &addressPoints = layout.getAddressPointIndices();
unsigned nextVTableThunkIndex = 0;
for (unsigned vtableIndex = 0, endIndex = layout.getNumVTables();
vtableIndex != endIndex; ++vtableIndex) {
auto vtableElem = builder.beginArray(componentType);
size_t vtableStart = layout.getVTableOffset(vtableIndex);
size_t vtableEnd = vtableStart + layout.getVTableSize(vtableIndex);
for (size_t componentIndex = vtableStart; componentIndex < vtableEnd;
++componentIndex) {
addVTableComponent(vtableElem, layout, componentIndex, rtti,
nextVTableThunkIndex, addressPoints[vtableIndex],
vtableHasLocalLinkage);
}
vtableElem.finishAndAddTo(builder);
}
}
llvm::GlobalVariable *CodeGenVTables::GenerateConstructionVTable(
const CXXRecordDecl *RD, const BaseSubobject &Base, bool BaseIsVirtual,
llvm::GlobalVariable::LinkageTypes Linkage,
VTableAddressPointsMapTy &AddressPoints) {
if (CGDebugInfo *DI = CGM.getModuleDebugInfo())
DI->completeClassData(Base.getBase());
std::unique_ptr<VTableLayout> VTLayout(
getItaniumVTableContext().createConstructionVTableLayout(
Base.getBase(), Base.getBaseOffset(), BaseIsVirtual, RD));
// Add the address points.
AddressPoints = VTLayout->getAddressPoints();
// Get the mangled construction vtable name.
SmallString<256> OutName;
llvm::raw_svector_ostream Out(OutName);
cast<ItaniumMangleContext>(CGM.getCXXABI().getMangleContext())
.mangleCXXCtorVTable(RD, Base.getBaseOffset().getQuantity(),
Base.getBase(), Out);
SmallString<256> Name(OutName);
bool UsingRelativeLayout = getItaniumVTableContext().isRelativeLayout();
bool VTableAliasExists = UsingRelativeLayout && CGM.getModule().getNamedAlias(Name);
if (VTableAliasExists) {
// We previously made the vtable hidden and changed its name.
Name.append(".local");
}
llvm::Type *VTType = getVTableType(*VTLayout);
// Construction vtable symbols are not part of the Itanium ABI, so we cannot
// guarantee that they actually will be available externally. Instead, when
// emitting an available_externally VTT, we provide references to an internal
// linkage construction vtable. The ABI only requires complete-object vtables
// to be the same for all instances of a type, not construction vtables.
if (Linkage == llvm::GlobalVariable::AvailableExternallyLinkage)
Linkage = llvm::GlobalVariable::InternalLinkage;
unsigned Align = CGM.getDataLayout().getABITypeAlignment(VTType);
// Create the variable that will hold the construction vtable.
llvm::GlobalVariable *VTable =
CGM.CreateOrReplaceCXXRuntimeVariable(Name, VTType, Linkage, Align);
// V-tables are always unnamed_addr.
VTable->setUnnamedAddr(llvm::GlobalValue::UnnamedAddr::Global);
llvm::Constant *RTTI = CGM.GetAddrOfRTTIDescriptor(
CGM.getContext().getTagDeclType(Base.getBase()));
// Create and set the initializer.
ConstantInitBuilder builder(CGM);
auto components = builder.beginStruct();
createVTableInitializer(components, *VTLayout, RTTI,
VTable->hasLocalLinkage());
components.finishAndSetAsInitializer(VTable);
// Set properties only after the initializer has been set to ensure that the
// GV is treated as definition and not declaration.
assert(!VTable->isDeclaration() && "Shouldn't set properties on declaration");
CGM.setGVProperties(VTable, RD);
Reland: Dead Virtual Function Elimination Remove dead virtual functions from vtables with replaceNonMetadataUsesWith, so that CGProfile metadata gets cleaned up correctly. Original commit message: Currently, it is hard for the compiler to remove unused C++ virtual functions, because they are all referenced from vtables, which are referenced by constructors. This means that if the constructor is called from any live code, then we keep every virtual function in the final link, even if there are no call sites which can use it. This patch allows unused virtual functions to be removed during LTO (and regular compilation in limited circumstances) by using type metadata to match virtual function call sites to the vtable slots they might load from. This information can then be used in the global dead code elimination pass instead of the references from vtables to virtual functions, to more accurately determine which functions are reachable. To make this transformation safe, I have changed clang's code-generation to always load virtual function pointers using the llvm.type.checked.load intrinsic, instead of regular load instructions. I originally tried writing this using clang's existing code-generation, which uses the llvm.type.test and llvm.assume intrinsics after doing a normal load. However, it is possible for optimisations to obscure the relationship between the GEP, load and llvm.type.test, causing GlobalDCE to fail to find virtual function call sites. The existing linkage and visibility types don't accurately describe the scope in which a virtual call could be made which uses a given vtable. This is wider than the visibility of the type itself, because a virtual function call could be made using a more-visible base class. I've added a new !vcall_visibility metadata type to represent this, described in TypeMetadata.rst. The internalization pass and libLTO have been updated to change this metadata when linking is performed. This doesn't currently work with ThinLTO, because it needs to see every call to llvm.type.checked.load in the linkage unit. It might be possible to extend this optimisation to be able to use the ThinLTO summary, as was done for devirtualization, but until then that combination is rejected in the clang driver. To test this, I've written a fuzzer which generates random C++ programs with complex class inheritance graphs, and virtual functions called through object and function pointers of different types. The programs are spread across multiple translation units and DSOs to test the different visibility restrictions. I've also tried doing bootstrap builds of LLVM to test this. This isn't ideal, because only classes in anonymous namespaces can be optimised with -fvisibility=default, and some parts of LLVM (plugins and bugpoint) do not work correctly with -fvisibility=hidden. However, there are only 12 test failures when building with -fvisibility=hidden (and an unmodified compiler), and this change does not cause any new failures for either value of -fvisibility. On the 7 C++ sub-benchmarks of SPEC2006, this gives a geomean code-size reduction of ~6%, over a baseline compiled with "-O2 -flto -fvisibility=hidden -fwhole-program-vtables". The best cases are reductions of ~14% in 450.soplex and 483.xalancbmk, and there are no code size increases. I've also run this on a set of 8 mbed-os examples compiled for Armv7M, which show a geomean size reduction of ~3%, again with no size increases. I had hoped that this would have no effect on performance, which would allow it to awlays be enabled (when using -fwhole-program-vtables). However, the changes in clang to use the llvm.type.checked.load intrinsic are causing ~1% performance regression in the C++ parts of SPEC2006. It should be possible to recover some of this perf loss by teaching optimisations about the llvm.type.checked.load intrinsic, which would make it worth turning this on by default (though it's still dependent on -fwhole-program-vtables). Differential revision: https://reviews.llvm.org/D63932 llvm-svn: 375094
2019-10-17 17:58:57 +08:00
CGM.EmitVTableTypeMetadata(RD, VTable, *VTLayout.get());
if (UsingRelativeLayout && !VTable->isDSOLocal())
GenerateRelativeVTableAlias(VTable, OutName);
return VTable;
}
// If the VTable is not dso_local, then we will not be able to indicate that
// the VTable does not need a relocation and move into rodata. An frequent
// time this can occur is for classes that should be made public from a DSO
// (like in libc++). For cases like these, we can make the vtable hidden or
// private and create a public alias with the same visibility and linkage as
// the original vtable type.
void CodeGenVTables::GenerateRelativeVTableAlias(llvm::GlobalVariable *VTable,
llvm::StringRef AliasName) {
assert(getItaniumVTableContext().isRelativeLayout() &&
"Can only use this if the relative vtable ABI is used");
assert(!VTable->isDSOLocal() && "This should be called only if the vtable is "
"not guaranteed to be dso_local");
// If the vtable is available_externally, we shouldn't (or need to) generate
// an alias for it in the first place since the vtable won't actually by
// emitted in this compilation unit.
if (VTable->hasAvailableExternallyLinkage())
return;
VTable->setName(AliasName + ".local");
auto Linkage = VTable->getLinkage();
assert(llvm::GlobalAlias::isValidLinkage(Linkage) &&
"Invalid vtable alias linkage");
llvm::GlobalAlias *VTableAlias = CGM.getModule().getNamedAlias(AliasName);
if (!VTableAlias) {
VTableAlias = llvm::GlobalAlias::create(
VTable->getValueType(), VTable->getAddressSpace(), Linkage, AliasName,
&CGM.getModule());
} else {
assert(VTableAlias->getValueType() == VTable->getValueType());
assert(VTableAlias->getLinkage() == Linkage);
}
VTableAlias->setVisibility(VTable->getVisibility());
VTableAlias->setUnnamedAddr(VTable->getUnnamedAddr());
// Both of these imply dso_local for the vtable.
if (!VTable->hasComdat()) {
// If this is in a comdat, then we shouldn't make the linkage private due to
// an issue in lld where private symbols can be used as the key symbol when
// choosing the prevelant group. This leads to "relocation refers to a
// symbol in a discarded section".
VTable->setLinkage(llvm::GlobalValue::PrivateLinkage);
} else {
// We should at least make this hidden since we don't want to expose it.
VTable->setVisibility(llvm::GlobalValue::HiddenVisibility);
}
VTableAlias->setAliasee(VTable);
}
static bool shouldEmitAvailableExternallyVTable(const CodeGenModule &CGM,
const CXXRecordDecl *RD) {
return CGM.getCodeGenOpts().OptimizationLevel > 0 &&
CGM.getCXXABI().canSpeculativelyEmitVTable(RD);
}
/// Compute the required linkage of the vtable for the given class.
///
/// Note that we only call this at the end of the translation unit.
llvm::GlobalVariable::LinkageTypes
CodeGenModule::getVTableLinkage(const CXXRecordDecl *RD) {
if (!RD->isExternallyVisible())
return llvm::GlobalVariable::InternalLinkage;
// We're at the end of the translation unit, so the current key
// function is fully correct.
const CXXMethodDecl *keyFunction = Context.getCurrentKeyFunction(RD);
if (keyFunction && !RD->hasAttr<DLLImportAttr>()) {
// If this class has a key function, use that to determine the
// linkage of the vtable.
const FunctionDecl *def = nullptr;
if (keyFunction->hasBody(def))
keyFunction = cast<CXXMethodDecl>(def);
switch (keyFunction->getTemplateSpecializationKind()) {
case TSK_Undeclared:
case TSK_ExplicitSpecialization:
assert((def || CodeGenOpts.OptimizationLevel > 0 ||
CodeGenOpts.getDebugInfo() != codegenoptions::NoDebugInfo) &&
"Shouldn't query vtable linkage without key function, "
"optimizations, or debug info");
if (!def && CodeGenOpts.OptimizationLevel > 0)
return llvm::GlobalVariable::AvailableExternallyLinkage;
if (keyFunction->isInlined())
return !Context.getLangOpts().AppleKext ?
llvm::GlobalVariable::LinkOnceODRLinkage :
llvm::Function::InternalLinkage;
return llvm::GlobalVariable::ExternalLinkage;
case TSK_ImplicitInstantiation:
return !Context.getLangOpts().AppleKext ?
llvm::GlobalVariable::LinkOnceODRLinkage :
llvm::Function::InternalLinkage;
case TSK_ExplicitInstantiationDefinition:
return !Context.getLangOpts().AppleKext ?
llvm::GlobalVariable::WeakODRLinkage :
llvm::Function::InternalLinkage;
case TSK_ExplicitInstantiationDeclaration:
Don't emit an available_externally vtable pointing to linkonce_odr funcs. This fixes pr13124. From the discussion at http://lists.cs.uiuc.edu/pipermail/cfe-dev/2012-June/022606.html we know that we cannot make funcions in a weak_odr vtable also weak_odr. They should remain linkonce_odr. The side effect is that we cannot emit a available_externally vtable unless we also emit a copy of the function. This also has an issue: If codegen is going to output a function, sema has to mark it used. Given llvm.org/pr9114, it looks like sema cannot be more aggressive at marking functions used because of vtables. This leaves us with a few unpleasant options: * Marking functions in vtables used if possible. This sounds a bit sloppy, so we should avoid it. * Producing available_externally vtables only when all the functions in it are already used or weak_odr. This would cover cases like -------------------- struct foo { virtual ~foo(); }; struct bar : public foo { virtual void zed(); }; void f() { foo *x(new bar); delete x; } void g(bar *x) { x->~bar(); // force the destructor to be used } -------------------------- and ---------------------------------- template<typename T> struct bar { virtual ~bar(); }; template<typename T> bar<T>::~bar() { } // make the destructor weak_odr instead of linkonce_odr extern template class bar<int>; void f() { bar<int> *x(new bar<int>); delete x; } ---------------------------- These look like corner cases, so it is unclear if it is worth it. * And finally: Just nuke this optimization. That is what this patch implements. llvm-svn: 189852
2013-09-04 05:05:13 +08:00
llvm_unreachable("Should not have been asked to emit this");
}
}
// -fapple-kext mode does not support weak linkage, so we must use
// internal linkage.
if (Context.getLangOpts().AppleKext)
return llvm::Function::InternalLinkage;
llvm::GlobalVariable::LinkageTypes DiscardableODRLinkage =
llvm::GlobalValue::LinkOnceODRLinkage;
llvm::GlobalVariable::LinkageTypes NonDiscardableODRLinkage =
llvm::GlobalValue::WeakODRLinkage;
if (RD->hasAttr<DLLExportAttr>()) {
// Cannot discard exported vtables.
DiscardableODRLinkage = NonDiscardableODRLinkage;
} else if (RD->hasAttr<DLLImportAttr>()) {
// Imported vtables are available externally.
DiscardableODRLinkage = llvm::GlobalVariable::AvailableExternallyLinkage;
NonDiscardableODRLinkage = llvm::GlobalVariable::AvailableExternallyLinkage;
}
switch (RD->getTemplateSpecializationKind()) {
case TSK_Undeclared:
case TSK_ExplicitSpecialization:
case TSK_ImplicitInstantiation:
return DiscardableODRLinkage;
case TSK_ExplicitInstantiationDeclaration:
// Explicit instantiations in MSVC do not provide vtables, so we must emit
// our own.
if (getTarget().getCXXABI().isMicrosoft())
return DiscardableODRLinkage;
return shouldEmitAvailableExternallyVTable(*this, RD)
? llvm::GlobalVariable::AvailableExternallyLinkage
: llvm::GlobalVariable::ExternalLinkage;
case TSK_ExplicitInstantiationDefinition:
return NonDiscardableODRLinkage;
}
llvm_unreachable("Invalid TemplateSpecializationKind!");
}
/// This is a callback from Sema to tell us that a particular vtable is
/// required to be emitted in this translation unit.
///
/// This is only called for vtables that _must_ be emitted (mainly due to key
/// functions). For weak vtables, CodeGen tracks when they are needed and
/// emits them as-needed.
void CodeGenModule::EmitVTable(CXXRecordDecl *theClass) {
VTables.GenerateClassData(theClass);
}
void
CodeGenVTables::GenerateClassData(const CXXRecordDecl *RD) {
if (CGDebugInfo *DI = CGM.getModuleDebugInfo())
DI->completeClassData(RD);
if (RD->getNumVBases())
CGM.getCXXABI().emitVirtualInheritanceTables(RD);
CGM.getCXXABI().emitVTableDefinitions(*this, RD);
}
/// At this point in the translation unit, does it appear that can we
/// rely on the vtable being defined elsewhere in the program?
///
/// The response is really only definitive when called at the end of
/// the translation unit.
///
/// The only semantic restriction here is that the object file should
/// not contain a vtable definition when that vtable is defined
/// strongly elsewhere. Otherwise, we'd just like to avoid emitting
/// vtables when unnecessary.
bool CodeGenVTables::isVTableExternal(const CXXRecordDecl *RD) {
assert(RD->isDynamicClass() && "Non-dynamic classes have no VTable.");
// We always synthesize vtables if they are needed in the MS ABI. MSVC doesn't
// emit them even if there is an explicit template instantiation.
if (CGM.getTarget().getCXXABI().isMicrosoft())
return false;
// If we have an explicit instantiation declaration (and not a
// definition), the vtable is defined elsewhere.
TemplateSpecializationKind TSK = RD->getTemplateSpecializationKind();
if (TSK == TSK_ExplicitInstantiationDeclaration)
return true;
// Otherwise, if the class is an instantiated template, the
// vtable must be defined here.
if (TSK == TSK_ImplicitInstantiation ||
TSK == TSK_ExplicitInstantiationDefinition)
return false;
// Otherwise, if the class doesn't have a key function (possibly
// anymore), the vtable must be defined here.
const CXXMethodDecl *keyFunction = CGM.getContext().getCurrentKeyFunction(RD);
if (!keyFunction)
return false;
// Otherwise, if we don't have a definition of the key function, the
// vtable must be defined somewhere else.
return !keyFunction->hasBody();
}
/// Given that we're currently at the end of the translation unit, and
/// we've emitted a reference to the vtable for this class, should
/// we define that vtable?
static bool shouldEmitVTableAtEndOfTranslationUnit(CodeGenModule &CGM,
const CXXRecordDecl *RD) {
// If vtable is internal then it has to be done.
if (!CGM.getVTables().isVTableExternal(RD))
return true;
// If it's external then maybe we will need it as available_externally.
return shouldEmitAvailableExternallyVTable(CGM, RD);
}
/// Given that at some point we emitted a reference to one or more
/// vtables, and that we are now at the end of the translation unit,
/// decide whether we should emit them.
void CodeGenModule::EmitDeferredVTables() {
#ifndef NDEBUG
// Remember the size of DeferredVTables, because we're going to assume
// that this entire operation doesn't modify it.
size_t savedSize = DeferredVTables.size();
#endif
for (const CXXRecordDecl *RD : DeferredVTables)
if (shouldEmitVTableAtEndOfTranslationUnit(*this, RD))
VTables.GenerateClassData(RD);
else if (shouldOpportunisticallyEmitVTables())
OpportunisticVTables.push_back(RD);
assert(savedSize == DeferredVTables.size() &&
"deferred extra vtables during vtable emission?");
DeferredVTables.clear();
}
bool CodeGenModule::HasLTOVisibilityPublicStd(const CXXRecordDecl *RD) {
if (!getCodeGenOpts().LTOVisibilityPublicStd)
return false;
const DeclContext *DC = RD;
while (1) {
auto *D = cast<Decl>(DC);
DC = DC->getParent();
if (isa<TranslationUnitDecl>(DC->getRedeclContext())) {
if (auto *ND = dyn_cast<NamespaceDecl>(D))
if (const IdentifierInfo *II = ND->getIdentifier())
if (II->isStr("std") || II->isStr("stdext"))
return true;
break;
}
}
return false;
}
bool CodeGenModule::HasHiddenLTOVisibility(const CXXRecordDecl *RD) {
LinkageInfo LV = RD->getLinkageAndVisibility();
if (!isExternallyVisible(LV.getLinkage()))
return true;
if (RD->hasAttr<LTOVisibilityPublicAttr>() || RD->hasAttr<UuidAttr>())
return false;
if (getTriple().isOSBinFormatCOFF()) {
if (RD->hasAttr<DLLExportAttr>() || RD->hasAttr<DLLImportAttr>())
return false;
} else {
if (LV.getVisibility() != HiddenVisibility)
return false;
}
return !HasLTOVisibilityPublicStd(RD);
}
Reland: Dead Virtual Function Elimination Remove dead virtual functions from vtables with replaceNonMetadataUsesWith, so that CGProfile metadata gets cleaned up correctly. Original commit message: Currently, it is hard for the compiler to remove unused C++ virtual functions, because they are all referenced from vtables, which are referenced by constructors. This means that if the constructor is called from any live code, then we keep every virtual function in the final link, even if there are no call sites which can use it. This patch allows unused virtual functions to be removed during LTO (and regular compilation in limited circumstances) by using type metadata to match virtual function call sites to the vtable slots they might load from. This information can then be used in the global dead code elimination pass instead of the references from vtables to virtual functions, to more accurately determine which functions are reachable. To make this transformation safe, I have changed clang's code-generation to always load virtual function pointers using the llvm.type.checked.load intrinsic, instead of regular load instructions. I originally tried writing this using clang's existing code-generation, which uses the llvm.type.test and llvm.assume intrinsics after doing a normal load. However, it is possible for optimisations to obscure the relationship between the GEP, load and llvm.type.test, causing GlobalDCE to fail to find virtual function call sites. The existing linkage and visibility types don't accurately describe the scope in which a virtual call could be made which uses a given vtable. This is wider than the visibility of the type itself, because a virtual function call could be made using a more-visible base class. I've added a new !vcall_visibility metadata type to represent this, described in TypeMetadata.rst. The internalization pass and libLTO have been updated to change this metadata when linking is performed. This doesn't currently work with ThinLTO, because it needs to see every call to llvm.type.checked.load in the linkage unit. It might be possible to extend this optimisation to be able to use the ThinLTO summary, as was done for devirtualization, but until then that combination is rejected in the clang driver. To test this, I've written a fuzzer which generates random C++ programs with complex class inheritance graphs, and virtual functions called through object and function pointers of different types. The programs are spread across multiple translation units and DSOs to test the different visibility restrictions. I've also tried doing bootstrap builds of LLVM to test this. This isn't ideal, because only classes in anonymous namespaces can be optimised with -fvisibility=default, and some parts of LLVM (plugins and bugpoint) do not work correctly with -fvisibility=hidden. However, there are only 12 test failures when building with -fvisibility=hidden (and an unmodified compiler), and this change does not cause any new failures for either value of -fvisibility. On the 7 C++ sub-benchmarks of SPEC2006, this gives a geomean code-size reduction of ~6%, over a baseline compiled with "-O2 -flto -fvisibility=hidden -fwhole-program-vtables". The best cases are reductions of ~14% in 450.soplex and 483.xalancbmk, and there are no code size increases. I've also run this on a set of 8 mbed-os examples compiled for Armv7M, which show a geomean size reduction of ~3%, again with no size increases. I had hoped that this would have no effect on performance, which would allow it to awlays be enabled (when using -fwhole-program-vtables). However, the changes in clang to use the llvm.type.checked.load intrinsic are causing ~1% performance regression in the C++ parts of SPEC2006. It should be possible to recover some of this perf loss by teaching optimisations about the llvm.type.checked.load intrinsic, which would make it worth turning this on by default (though it's still dependent on -fwhole-program-vtables). Differential revision: https://reviews.llvm.org/D63932 llvm-svn: 375094
2019-10-17 17:58:57 +08:00
llvm::GlobalObject::VCallVisibility
CodeGenModule::GetVCallVisibilityLevel(const CXXRecordDecl *RD) {
LinkageInfo LV = RD->getLinkageAndVisibility();
llvm::GlobalObject::VCallVisibility TypeVis;
if (!isExternallyVisible(LV.getLinkage()))
TypeVis = llvm::GlobalObject::VCallVisibilityTranslationUnit;
else if (HasHiddenLTOVisibility(RD))
TypeVis = llvm::GlobalObject::VCallVisibilityLinkageUnit;
else
TypeVis = llvm::GlobalObject::VCallVisibilityPublic;
for (auto B : RD->bases())
if (B.getType()->getAsCXXRecordDecl()->isDynamicClass())
TypeVis = std::min(TypeVis,
GetVCallVisibilityLevel(B.getType()->getAsCXXRecordDecl()));
for (auto B : RD->vbases())
if (B.getType()->getAsCXXRecordDecl()->isDynamicClass())
TypeVis = std::min(TypeVis,
GetVCallVisibilityLevel(B.getType()->getAsCXXRecordDecl()));
return TypeVis;
}
void CodeGenModule::EmitVTableTypeMetadata(const CXXRecordDecl *RD,
llvm::GlobalVariable *VTable,
const VTableLayout &VTLayout) {
if (!getCodeGenOpts().LTOUnit)
return;
CharUnits PointerWidth =
Context.toCharUnitsFromBits(Context.getTargetInfo().getPointerWidth(0));
typedef std::pair<const CXXRecordDecl *, unsigned> AddressPoint;
std::vector<AddressPoint> AddressPoints;
for (auto &&AP : VTLayout.getAddressPoints())
AddressPoints.push_back(std::make_pair(
AP.first.getBase(), VTLayout.getVTableOffset(AP.second.VTableIndex) +
AP.second.AddressPointIndex));
// Sort the address points for determinism.
llvm::sort(AddressPoints, [this](const AddressPoint &AP1,
const AddressPoint &AP2) {
if (&AP1 == &AP2)
return false;
std::string S1;
llvm::raw_string_ostream O1(S1);
getCXXABI().getMangleContext().mangleTypeName(
QualType(AP1.first->getTypeForDecl(), 0), O1);
O1.flush();
std::string S2;
llvm::raw_string_ostream O2(S2);
getCXXABI().getMangleContext().mangleTypeName(
QualType(AP2.first->getTypeForDecl(), 0), O2);
O2.flush();
if (S1 < S2)
return true;
if (S1 != S2)
return false;
return AP1.second < AP2.second;
});
ArrayRef<VTableComponent> Comps = VTLayout.vtable_components();
for (auto AP : AddressPoints) {
// Create type metadata for the address point.
AddVTableTypeMetadata(VTable, PointerWidth * AP.second, AP.first);
// The class associated with each address point could also potentially be
// used for indirect calls via a member function pointer, so we need to
// annotate the address of each function pointer with the appropriate member
// function pointer type.
for (unsigned I = 0; I != Comps.size(); ++I) {
if (Comps[I].getKind() != VTableComponent::CK_FunctionPointer)
continue;
llvm::Metadata *MD = CreateMetadataIdentifierForVirtualMemPtrType(
Context.getMemberPointerType(
Comps[I].getFunctionDecl()->getType(),
Context.getRecordType(AP.first).getTypePtr()));
VTable->addTypeMetadata((PointerWidth * I).getQuantity(), MD);
}
}
Reland: Dead Virtual Function Elimination Remove dead virtual functions from vtables with replaceNonMetadataUsesWith, so that CGProfile metadata gets cleaned up correctly. Original commit message: Currently, it is hard for the compiler to remove unused C++ virtual functions, because they are all referenced from vtables, which are referenced by constructors. This means that if the constructor is called from any live code, then we keep every virtual function in the final link, even if there are no call sites which can use it. This patch allows unused virtual functions to be removed during LTO (and regular compilation in limited circumstances) by using type metadata to match virtual function call sites to the vtable slots they might load from. This information can then be used in the global dead code elimination pass instead of the references from vtables to virtual functions, to more accurately determine which functions are reachable. To make this transformation safe, I have changed clang's code-generation to always load virtual function pointers using the llvm.type.checked.load intrinsic, instead of regular load instructions. I originally tried writing this using clang's existing code-generation, which uses the llvm.type.test and llvm.assume intrinsics after doing a normal load. However, it is possible for optimisations to obscure the relationship between the GEP, load and llvm.type.test, causing GlobalDCE to fail to find virtual function call sites. The existing linkage and visibility types don't accurately describe the scope in which a virtual call could be made which uses a given vtable. This is wider than the visibility of the type itself, because a virtual function call could be made using a more-visible base class. I've added a new !vcall_visibility metadata type to represent this, described in TypeMetadata.rst. The internalization pass and libLTO have been updated to change this metadata when linking is performed. This doesn't currently work with ThinLTO, because it needs to see every call to llvm.type.checked.load in the linkage unit. It might be possible to extend this optimisation to be able to use the ThinLTO summary, as was done for devirtualization, but until then that combination is rejected in the clang driver. To test this, I've written a fuzzer which generates random C++ programs with complex class inheritance graphs, and virtual functions called through object and function pointers of different types. The programs are spread across multiple translation units and DSOs to test the different visibility restrictions. I've also tried doing bootstrap builds of LLVM to test this. This isn't ideal, because only classes in anonymous namespaces can be optimised with -fvisibility=default, and some parts of LLVM (plugins and bugpoint) do not work correctly with -fvisibility=hidden. However, there are only 12 test failures when building with -fvisibility=hidden (and an unmodified compiler), and this change does not cause any new failures for either value of -fvisibility. On the 7 C++ sub-benchmarks of SPEC2006, this gives a geomean code-size reduction of ~6%, over a baseline compiled with "-O2 -flto -fvisibility=hidden -fwhole-program-vtables". The best cases are reductions of ~14% in 450.soplex and 483.xalancbmk, and there are no code size increases. I've also run this on a set of 8 mbed-os examples compiled for Armv7M, which show a geomean size reduction of ~3%, again with no size increases. I had hoped that this would have no effect on performance, which would allow it to awlays be enabled (when using -fwhole-program-vtables). However, the changes in clang to use the llvm.type.checked.load intrinsic are causing ~1% performance regression in the C++ parts of SPEC2006. It should be possible to recover some of this perf loss by teaching optimisations about the llvm.type.checked.load intrinsic, which would make it worth turning this on by default (though it's still dependent on -fwhole-program-vtables). Differential revision: https://reviews.llvm.org/D63932 llvm-svn: 375094
2019-10-17 17:58:57 +08:00
[WPD/VFE] Always emit vcall_visibility metadata for -fwhole-program-vtables Summary: First patch to support Safe Whole Program Devirtualization Enablement, see RFC here: http://lists.llvm.org/pipermail/llvm-dev/2019-December/137543.html Always emit !vcall_visibility metadata under -fwhole-program-vtables, and not just for -fvirtual-function-elimination. The vcall visibility metadata will (in a subsequent patch) be used to communicate to WPD which vtables are safe to devirtualize, and we will optionally convert the metadata to hidden visibility at link time. Subsequent follow on patches will help enable this by adding vcall_visibility metadata to the ThinLTO summaries, and always emit type test intrinsics under -fwhole-program-vtables (and not just for vtables with hidden visibility). In order to do this safely with VFE, since for VFE all vtable loads must be type checked loads which will no longer be the case, this patch adds a new "Virtual Function Elim" module flag to communicate to GlobalDCE whether to perform VFE using the vcall_visibility metadata. One additional advantage of using the vcall_visibility metadata to drive more WPD at LTO link time is that we can use the same mechanism to enable more aggressive VFE at LTO link time as well. The link time option proposed in the RFC will convert vcall_visibility metadata to hidden (aka linkage unit visibility), which combined with -fvirtual-function-elimination will allow it to be done more aggressively at LTO link time under the same conditions. Reviewers: pcc, ostannard, evgeny777, steven_wu Subscribers: mehdi_amini, Prazek, hiraditya, dexonsmith, davidxl, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D71907
2019-12-27 00:32:42 +08:00
if (getCodeGenOpts().VirtualFunctionElimination ||
getCodeGenOpts().WholeProgramVTables) {
Reland: Dead Virtual Function Elimination Remove dead virtual functions from vtables with replaceNonMetadataUsesWith, so that CGProfile metadata gets cleaned up correctly. Original commit message: Currently, it is hard for the compiler to remove unused C++ virtual functions, because they are all referenced from vtables, which are referenced by constructors. This means that if the constructor is called from any live code, then we keep every virtual function in the final link, even if there are no call sites which can use it. This patch allows unused virtual functions to be removed during LTO (and regular compilation in limited circumstances) by using type metadata to match virtual function call sites to the vtable slots they might load from. This information can then be used in the global dead code elimination pass instead of the references from vtables to virtual functions, to more accurately determine which functions are reachable. To make this transformation safe, I have changed clang's code-generation to always load virtual function pointers using the llvm.type.checked.load intrinsic, instead of regular load instructions. I originally tried writing this using clang's existing code-generation, which uses the llvm.type.test and llvm.assume intrinsics after doing a normal load. However, it is possible for optimisations to obscure the relationship between the GEP, load and llvm.type.test, causing GlobalDCE to fail to find virtual function call sites. The existing linkage and visibility types don't accurately describe the scope in which a virtual call could be made which uses a given vtable. This is wider than the visibility of the type itself, because a virtual function call could be made using a more-visible base class. I've added a new !vcall_visibility metadata type to represent this, described in TypeMetadata.rst. The internalization pass and libLTO have been updated to change this metadata when linking is performed. This doesn't currently work with ThinLTO, because it needs to see every call to llvm.type.checked.load in the linkage unit. It might be possible to extend this optimisation to be able to use the ThinLTO summary, as was done for devirtualization, but until then that combination is rejected in the clang driver. To test this, I've written a fuzzer which generates random C++ programs with complex class inheritance graphs, and virtual functions called through object and function pointers of different types. The programs are spread across multiple translation units and DSOs to test the different visibility restrictions. I've also tried doing bootstrap builds of LLVM to test this. This isn't ideal, because only classes in anonymous namespaces can be optimised with -fvisibility=default, and some parts of LLVM (plugins and bugpoint) do not work correctly with -fvisibility=hidden. However, there are only 12 test failures when building with -fvisibility=hidden (and an unmodified compiler), and this change does not cause any new failures for either value of -fvisibility. On the 7 C++ sub-benchmarks of SPEC2006, this gives a geomean code-size reduction of ~6%, over a baseline compiled with "-O2 -flto -fvisibility=hidden -fwhole-program-vtables". The best cases are reductions of ~14% in 450.soplex and 483.xalancbmk, and there are no code size increases. I've also run this on a set of 8 mbed-os examples compiled for Armv7M, which show a geomean size reduction of ~3%, again with no size increases. I had hoped that this would have no effect on performance, which would allow it to awlays be enabled (when using -fwhole-program-vtables). However, the changes in clang to use the llvm.type.checked.load intrinsic are causing ~1% performance regression in the C++ parts of SPEC2006. It should be possible to recover some of this perf loss by teaching optimisations about the llvm.type.checked.load intrinsic, which would make it worth turning this on by default (though it's still dependent on -fwhole-program-vtables). Differential revision: https://reviews.llvm.org/D63932 llvm-svn: 375094
2019-10-17 17:58:57 +08:00
llvm::GlobalObject::VCallVisibility TypeVis = GetVCallVisibilityLevel(RD);
if (TypeVis != llvm::GlobalObject::VCallVisibilityPublic)
[WPD/VFE] Always emit vcall_visibility metadata for -fwhole-program-vtables Summary: First patch to support Safe Whole Program Devirtualization Enablement, see RFC here: http://lists.llvm.org/pipermail/llvm-dev/2019-December/137543.html Always emit !vcall_visibility metadata under -fwhole-program-vtables, and not just for -fvirtual-function-elimination. The vcall visibility metadata will (in a subsequent patch) be used to communicate to WPD which vtables are safe to devirtualize, and we will optionally convert the metadata to hidden visibility at link time. Subsequent follow on patches will help enable this by adding vcall_visibility metadata to the ThinLTO summaries, and always emit type test intrinsics under -fwhole-program-vtables (and not just for vtables with hidden visibility). In order to do this safely with VFE, since for VFE all vtable loads must be type checked loads which will no longer be the case, this patch adds a new "Virtual Function Elim" module flag to communicate to GlobalDCE whether to perform VFE using the vcall_visibility metadata. One additional advantage of using the vcall_visibility metadata to drive more WPD at LTO link time is that we can use the same mechanism to enable more aggressive VFE at LTO link time as well. The link time option proposed in the RFC will convert vcall_visibility metadata to hidden (aka linkage unit visibility), which combined with -fvirtual-function-elimination will allow it to be done more aggressively at LTO link time under the same conditions. Reviewers: pcc, ostannard, evgeny777, steven_wu Subscribers: mehdi_amini, Prazek, hiraditya, dexonsmith, davidxl, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D71907
2019-12-27 00:32:42 +08:00
VTable->setVCallVisibilityMetadata(TypeVis);
Reland: Dead Virtual Function Elimination Remove dead virtual functions from vtables with replaceNonMetadataUsesWith, so that CGProfile metadata gets cleaned up correctly. Original commit message: Currently, it is hard for the compiler to remove unused C++ virtual functions, because they are all referenced from vtables, which are referenced by constructors. This means that if the constructor is called from any live code, then we keep every virtual function in the final link, even if there are no call sites which can use it. This patch allows unused virtual functions to be removed during LTO (and regular compilation in limited circumstances) by using type metadata to match virtual function call sites to the vtable slots they might load from. This information can then be used in the global dead code elimination pass instead of the references from vtables to virtual functions, to more accurately determine which functions are reachable. To make this transformation safe, I have changed clang's code-generation to always load virtual function pointers using the llvm.type.checked.load intrinsic, instead of regular load instructions. I originally tried writing this using clang's existing code-generation, which uses the llvm.type.test and llvm.assume intrinsics after doing a normal load. However, it is possible for optimisations to obscure the relationship between the GEP, load and llvm.type.test, causing GlobalDCE to fail to find virtual function call sites. The existing linkage and visibility types don't accurately describe the scope in which a virtual call could be made which uses a given vtable. This is wider than the visibility of the type itself, because a virtual function call could be made using a more-visible base class. I've added a new !vcall_visibility metadata type to represent this, described in TypeMetadata.rst. The internalization pass and libLTO have been updated to change this metadata when linking is performed. This doesn't currently work with ThinLTO, because it needs to see every call to llvm.type.checked.load in the linkage unit. It might be possible to extend this optimisation to be able to use the ThinLTO summary, as was done for devirtualization, but until then that combination is rejected in the clang driver. To test this, I've written a fuzzer which generates random C++ programs with complex class inheritance graphs, and virtual functions called through object and function pointers of different types. The programs are spread across multiple translation units and DSOs to test the different visibility restrictions. I've also tried doing bootstrap builds of LLVM to test this. This isn't ideal, because only classes in anonymous namespaces can be optimised with -fvisibility=default, and some parts of LLVM (plugins and bugpoint) do not work correctly with -fvisibility=hidden. However, there are only 12 test failures when building with -fvisibility=hidden (and an unmodified compiler), and this change does not cause any new failures for either value of -fvisibility. On the 7 C++ sub-benchmarks of SPEC2006, this gives a geomean code-size reduction of ~6%, over a baseline compiled with "-O2 -flto -fvisibility=hidden -fwhole-program-vtables". The best cases are reductions of ~14% in 450.soplex and 483.xalancbmk, and there are no code size increases. I've also run this on a set of 8 mbed-os examples compiled for Armv7M, which show a geomean size reduction of ~3%, again with no size increases. I had hoped that this would have no effect on performance, which would allow it to awlays be enabled (when using -fwhole-program-vtables). However, the changes in clang to use the llvm.type.checked.load intrinsic are causing ~1% performance regression in the C++ parts of SPEC2006. It should be possible to recover some of this perf loss by teaching optimisations about the llvm.type.checked.load intrinsic, which would make it worth turning this on by default (though it's still dependent on -fwhole-program-vtables). Differential revision: https://reviews.llvm.org/D63932 llvm-svn: 375094
2019-10-17 17:58:57 +08:00
}
}