llvm-project/llvm/lib/Target/BPF/BTFDebug.h

Ignoring revisions in .git-blame-ignore-revs. Click here to bypass and see the normal blame view.

362 lines
11 KiB
C
Raw Normal View History

//===- BTFDebug.h -----------------------------------------------*- C++ -*-===//
//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//
//===----------------------------------------------------------------------===//
///
/// \file
/// This file contains support for writing BTF debug info.
///
//===----------------------------------------------------------------------===//
#ifndef LLVM_LIB_TARGET_BPF_BTFDEBUG_H
#define LLVM_LIB_TARGET_BPF_BTFDEBUG_H
#include "llvm/ADT/StringMap.h"
#include "llvm/CodeGen/DebugHandlerBase.h"
#include <set>
#include <unordered_map>
#include "BTF.h"
namespace llvm {
class AsmPrinter;
class BTFDebug;
class DIType;
class MCStreamer;
class MCSymbol;
class MachineFunction;
/// The base class for BTF type generation.
class BTFTypeBase {
protected:
uint8_t Kind;
[BPF] Support for compile once and run everywhere Introduction ============ This patch added intial support for bpf program compile once and run everywhere (CO-RE). The main motivation is for bpf program which depends on kernel headers which may vary between different kernel versions. The initial discussion can be found at https://lwn.net/Articles/773198/. Currently, bpf program accesses kernel internal data structure through bpf_probe_read() helper. The idea is to capture the kernel data structure to be accessed through bpf_probe_read() and relocate them on different kernel versions. On each host, right before bpf program load, the bpfloader will look at the types of the native linux through vmlinux BTF, calculates proper access offset and patch the instruction. To accommodate this, three intrinsic functions preserve_{array,union,struct}_access_index are introduced which in clang will preserve the base pointer, struct/union/array access_index and struct/union debuginfo type information. Later, bpf IR pass can reconstruct the whole gep access chains without looking at gep itself. This patch did the following: . An IR pass is added to convert preserve_*_access_index to global variable who name encodes the getelementptr access pattern. The global variable has metadata attached to describe the corresponding struct/union debuginfo type. . An SimplifyPatchable MachineInstruction pass is added to remove unnecessary loads. . The BTF output pass is enhanced to generate relocation records located in .BTF.ext section. Typical CO-RE also needs support of global variables which can be assigned to different values to different hosts. For example, kernel version can be used to guard different versions of codes. This patch added the support for patchable externals as well. Example ======= The following is an example. struct pt_regs { long arg1; long arg2; }; struct sk_buff { int i; struct net_device *dev; }; #define _(x) (__builtin_preserve_access_index(x)) static int (*bpf_probe_read)(void *dst, int size, const void *unsafe_ptr) = (void *) 4; extern __attribute__((section(".BPF.patchable_externs"))) unsigned __kernel_version; int bpf_prog(struct pt_regs *ctx) { struct net_device *dev = 0; // ctx->arg* does not need bpf_probe_read if (__kernel_version >= 41608) bpf_probe_read(&dev, sizeof(dev), _(&((struct sk_buff *)ctx->arg1)->dev)); else bpf_probe_read(&dev, sizeof(dev), _(&((struct sk_buff *)ctx->arg2)->dev)); return dev != 0; } In the above, we want to translate the third argument of bpf_probe_read() as relocations. -bash-4.4$ clang -target bpf -O2 -g -S trace.c The compiler will generate two new subsections in .BTF.ext, OffsetReloc and ExternReloc. OffsetReloc is to record the structure member offset operations, and ExternalReloc is to record the external globals where only u8, u16, u32 and u64 are supported. BPFOffsetReloc Size struct SecLOffsetReloc for ELF section #1 A number of struct BPFOffsetReloc for ELF section #1 struct SecOffsetReloc for ELF section #2 A number of struct BPFOffsetReloc for ELF section #2 ... BPFExternReloc Size struct SecExternReloc for ELF section #1 A number of struct BPFExternReloc for ELF section #1 struct SecExternReloc for ELF section #2 A number of struct BPFExternReloc for ELF section #2 struct BPFOffsetReloc { uint32_t InsnOffset; ///< Byte offset in this section uint32_t TypeID; ///< TypeID for the relocation uint32_t OffsetNameOff; ///< The string to traverse types }; struct BPFExternReloc { uint32_t InsnOffset; ///< Byte offset in this section uint32_t ExternNameOff; ///< The string for external variable }; Note that only externs with attribute section ".BPF.patchable_externs" are considered for Extern Reloc which will be patched by bpf loader right before the load. For the above test case, two offset records and one extern record will be generated: OffsetReloc records: .long .Ltmp12 # Insn Offset .long 7 # TypeId .long 242 # Type Decode String .long .Ltmp18 # Insn Offset .long 7 # TypeId .long 242 # Type Decode String ExternReloc record: .long .Ltmp5 # Insn Offset .long 165 # External Variable In string table: .ascii "0:1" # string offset=242 .ascii "__kernel_version" # string offset=165 The default member offset can be calculated as the 2nd member offset (0 representing the 1st member) of struct "sk_buff". The asm code: .Ltmp5: .Ltmp6: r2 = 0 r3 = 41608 .Ltmp7: .Ltmp8: .loc 1 18 9 is_stmt 0 # t.c:18:9 .Ltmp9: if r3 > r2 goto LBB0_2 .Ltmp10: .Ltmp11: .loc 1 0 9 # t.c:0:9 .Ltmp12: r2 = 8 .Ltmp13: .loc 1 19 66 is_stmt 1 # t.c:19:66 .Ltmp14: .Ltmp15: r3 = *(u64 *)(r1 + 0) goto LBB0_3 .Ltmp16: .Ltmp17: LBB0_2: .loc 1 0 66 is_stmt 0 # t.c:0:66 .Ltmp18: r2 = 8 .loc 1 21 66 is_stmt 1 # t.c:21:66 .Ltmp19: r3 = *(u64 *)(r1 + 8) .Ltmp20: .Ltmp21: LBB0_3: .loc 1 0 66 is_stmt 0 # t.c:0:66 r3 += r2 r1 = r10 .Ltmp22: .Ltmp23: .Ltmp24: r1 += -8 r2 = 8 call 4 For instruction .Ltmp12 and .Ltmp18, "r2 = 8", the number 8 is the structure offset based on the current BTF. Loader needs to adjust it if it changes on the host. For instruction .Ltmp5, "r2 = 0", the external variable got a default value 0, loader needs to supply an appropriate value for the particular host. Compiling to generate object code and disassemble: 0000000000000000 bpf_prog: 0: b7 02 00 00 00 00 00 00 r2 = 0 1: 7b 2a f8 ff 00 00 00 00 *(u64 *)(r10 - 8) = r2 2: b7 02 00 00 00 00 00 00 r2 = 0 3: b7 03 00 00 88 a2 00 00 r3 = 41608 4: 2d 23 03 00 00 00 00 00 if r3 > r2 goto +3 <LBB0_2> 5: b7 02 00 00 08 00 00 00 r2 = 8 6: 79 13 00 00 00 00 00 00 r3 = *(u64 *)(r1 + 0) 7: 05 00 02 00 00 00 00 00 goto +2 <LBB0_3> 0000000000000040 LBB0_2: 8: b7 02 00 00 08 00 00 00 r2 = 8 9: 79 13 08 00 00 00 00 00 r3 = *(u64 *)(r1 + 8) 0000000000000050 LBB0_3: 10: 0f 23 00 00 00 00 00 00 r3 += r2 11: bf a1 00 00 00 00 00 00 r1 = r10 12: 07 01 00 00 f8 ff ff ff r1 += -8 13: b7 02 00 00 08 00 00 00 r2 = 8 14: 85 00 00 00 04 00 00 00 call 4 Instructions #2, #5 and #8 need relocation resoutions from the loader. Signed-off-by: Yonghong Song <yhs@fb.com> Differential Revision: https://reviews.llvm.org/D61524 llvm-svn: 365503
2019-07-09 23:28:41 +08:00
bool IsCompleted;
uint32_t Id;
struct BTF::CommonType BTFType;
public:
[BPF] Support for compile once and run everywhere Introduction ============ This patch added intial support for bpf program compile once and run everywhere (CO-RE). The main motivation is for bpf program which depends on kernel headers which may vary between different kernel versions. The initial discussion can be found at https://lwn.net/Articles/773198/. Currently, bpf program accesses kernel internal data structure through bpf_probe_read() helper. The idea is to capture the kernel data structure to be accessed through bpf_probe_read() and relocate them on different kernel versions. On each host, right before bpf program load, the bpfloader will look at the types of the native linux through vmlinux BTF, calculates proper access offset and patch the instruction. To accommodate this, three intrinsic functions preserve_{array,union,struct}_access_index are introduced which in clang will preserve the base pointer, struct/union/array access_index and struct/union debuginfo type information. Later, bpf IR pass can reconstruct the whole gep access chains without looking at gep itself. This patch did the following: . An IR pass is added to convert preserve_*_access_index to global variable who name encodes the getelementptr access pattern. The global variable has metadata attached to describe the corresponding struct/union debuginfo type. . An SimplifyPatchable MachineInstruction pass is added to remove unnecessary loads. . The BTF output pass is enhanced to generate relocation records located in .BTF.ext section. Typical CO-RE also needs support of global variables which can be assigned to different values to different hosts. For example, kernel version can be used to guard different versions of codes. This patch added the support for patchable externals as well. Example ======= The following is an example. struct pt_regs { long arg1; long arg2; }; struct sk_buff { int i; struct net_device *dev; }; #define _(x) (__builtin_preserve_access_index(x)) static int (*bpf_probe_read)(void *dst, int size, const void *unsafe_ptr) = (void *) 4; extern __attribute__((section(".BPF.patchable_externs"))) unsigned __kernel_version; int bpf_prog(struct pt_regs *ctx) { struct net_device *dev = 0; // ctx->arg* does not need bpf_probe_read if (__kernel_version >= 41608) bpf_probe_read(&dev, sizeof(dev), _(&((struct sk_buff *)ctx->arg1)->dev)); else bpf_probe_read(&dev, sizeof(dev), _(&((struct sk_buff *)ctx->arg2)->dev)); return dev != 0; } In the above, we want to translate the third argument of bpf_probe_read() as relocations. -bash-4.4$ clang -target bpf -O2 -g -S trace.c The compiler will generate two new subsections in .BTF.ext, OffsetReloc and ExternReloc. OffsetReloc is to record the structure member offset operations, and ExternalReloc is to record the external globals where only u8, u16, u32 and u64 are supported. BPFOffsetReloc Size struct SecLOffsetReloc for ELF section #1 A number of struct BPFOffsetReloc for ELF section #1 struct SecOffsetReloc for ELF section #2 A number of struct BPFOffsetReloc for ELF section #2 ... BPFExternReloc Size struct SecExternReloc for ELF section #1 A number of struct BPFExternReloc for ELF section #1 struct SecExternReloc for ELF section #2 A number of struct BPFExternReloc for ELF section #2 struct BPFOffsetReloc { uint32_t InsnOffset; ///< Byte offset in this section uint32_t TypeID; ///< TypeID for the relocation uint32_t OffsetNameOff; ///< The string to traverse types }; struct BPFExternReloc { uint32_t InsnOffset; ///< Byte offset in this section uint32_t ExternNameOff; ///< The string for external variable }; Note that only externs with attribute section ".BPF.patchable_externs" are considered for Extern Reloc which will be patched by bpf loader right before the load. For the above test case, two offset records and one extern record will be generated: OffsetReloc records: .long .Ltmp12 # Insn Offset .long 7 # TypeId .long 242 # Type Decode String .long .Ltmp18 # Insn Offset .long 7 # TypeId .long 242 # Type Decode String ExternReloc record: .long .Ltmp5 # Insn Offset .long 165 # External Variable In string table: .ascii "0:1" # string offset=242 .ascii "__kernel_version" # string offset=165 The default member offset can be calculated as the 2nd member offset (0 representing the 1st member) of struct "sk_buff". The asm code: .Ltmp5: .Ltmp6: r2 = 0 r3 = 41608 .Ltmp7: .Ltmp8: .loc 1 18 9 is_stmt 0 # t.c:18:9 .Ltmp9: if r3 > r2 goto LBB0_2 .Ltmp10: .Ltmp11: .loc 1 0 9 # t.c:0:9 .Ltmp12: r2 = 8 .Ltmp13: .loc 1 19 66 is_stmt 1 # t.c:19:66 .Ltmp14: .Ltmp15: r3 = *(u64 *)(r1 + 0) goto LBB0_3 .Ltmp16: .Ltmp17: LBB0_2: .loc 1 0 66 is_stmt 0 # t.c:0:66 .Ltmp18: r2 = 8 .loc 1 21 66 is_stmt 1 # t.c:21:66 .Ltmp19: r3 = *(u64 *)(r1 + 8) .Ltmp20: .Ltmp21: LBB0_3: .loc 1 0 66 is_stmt 0 # t.c:0:66 r3 += r2 r1 = r10 .Ltmp22: .Ltmp23: .Ltmp24: r1 += -8 r2 = 8 call 4 For instruction .Ltmp12 and .Ltmp18, "r2 = 8", the number 8 is the structure offset based on the current BTF. Loader needs to adjust it if it changes on the host. For instruction .Ltmp5, "r2 = 0", the external variable got a default value 0, loader needs to supply an appropriate value for the particular host. Compiling to generate object code and disassemble: 0000000000000000 bpf_prog: 0: b7 02 00 00 00 00 00 00 r2 = 0 1: 7b 2a f8 ff 00 00 00 00 *(u64 *)(r10 - 8) = r2 2: b7 02 00 00 00 00 00 00 r2 = 0 3: b7 03 00 00 88 a2 00 00 r3 = 41608 4: 2d 23 03 00 00 00 00 00 if r3 > r2 goto +3 <LBB0_2> 5: b7 02 00 00 08 00 00 00 r2 = 8 6: 79 13 00 00 00 00 00 00 r3 = *(u64 *)(r1 + 0) 7: 05 00 02 00 00 00 00 00 goto +2 <LBB0_3> 0000000000000040 LBB0_2: 8: b7 02 00 00 08 00 00 00 r2 = 8 9: 79 13 08 00 00 00 00 00 r3 = *(u64 *)(r1 + 8) 0000000000000050 LBB0_3: 10: 0f 23 00 00 00 00 00 00 r3 += r2 11: bf a1 00 00 00 00 00 00 r1 = r10 12: 07 01 00 00 f8 ff ff ff r1 += -8 13: b7 02 00 00 08 00 00 00 r2 = 8 14: 85 00 00 00 04 00 00 00 call 4 Instructions #2, #5 and #8 need relocation resoutions from the loader. Signed-off-by: Yonghong Song <yhs@fb.com> Differential Revision: https://reviews.llvm.org/D61524 llvm-svn: 365503
2019-07-09 23:28:41 +08:00
BTFTypeBase() : IsCompleted(false) {}
virtual ~BTFTypeBase() = default;
void setId(uint32_t Id) { this->Id = Id; }
uint32_t getId() { return Id; }
uint32_t roundupToBytes(uint32_t NumBits) { return (NumBits + 7) >> 3; }
/// Get the size of this BTF type entry.
virtual uint32_t getSize() { return BTF::CommonTypeSize; }
/// Complete BTF type generation after all related DebugInfo types
/// have been visited so their BTF type id's are available
/// for cross referece.
virtual void completeType(BTFDebug &BDebug) {}
/// Emit types for this BTF type entry.
virtual void emitType(MCStreamer &OS);
};
/// Handle several derived types include pointer, const,
/// volatile, typedef and restrict.
class BTFTypeDerived : public BTFTypeBase {
const DIDerivedType *DTy;
[BPF] Support for compile once and run everywhere Introduction ============ This patch added intial support for bpf program compile once and run everywhere (CO-RE). The main motivation is for bpf program which depends on kernel headers which may vary between different kernel versions. The initial discussion can be found at https://lwn.net/Articles/773198/. Currently, bpf program accesses kernel internal data structure through bpf_probe_read() helper. The idea is to capture the kernel data structure to be accessed through bpf_probe_read() and relocate them on different kernel versions. On each host, right before bpf program load, the bpfloader will look at the types of the native linux through vmlinux BTF, calculates proper access offset and patch the instruction. To accommodate this, three intrinsic functions preserve_{array,union,struct}_access_index are introduced which in clang will preserve the base pointer, struct/union/array access_index and struct/union debuginfo type information. Later, bpf IR pass can reconstruct the whole gep access chains without looking at gep itself. This patch did the following: . An IR pass is added to convert preserve_*_access_index to global variable who name encodes the getelementptr access pattern. The global variable has metadata attached to describe the corresponding struct/union debuginfo type. . An SimplifyPatchable MachineInstruction pass is added to remove unnecessary loads. . The BTF output pass is enhanced to generate relocation records located in .BTF.ext section. Typical CO-RE also needs support of global variables which can be assigned to different values to different hosts. For example, kernel version can be used to guard different versions of codes. This patch added the support for patchable externals as well. Example ======= The following is an example. struct pt_regs { long arg1; long arg2; }; struct sk_buff { int i; struct net_device *dev; }; #define _(x) (__builtin_preserve_access_index(x)) static int (*bpf_probe_read)(void *dst, int size, const void *unsafe_ptr) = (void *) 4; extern __attribute__((section(".BPF.patchable_externs"))) unsigned __kernel_version; int bpf_prog(struct pt_regs *ctx) { struct net_device *dev = 0; // ctx->arg* does not need bpf_probe_read if (__kernel_version >= 41608) bpf_probe_read(&dev, sizeof(dev), _(&((struct sk_buff *)ctx->arg1)->dev)); else bpf_probe_read(&dev, sizeof(dev), _(&((struct sk_buff *)ctx->arg2)->dev)); return dev != 0; } In the above, we want to translate the third argument of bpf_probe_read() as relocations. -bash-4.4$ clang -target bpf -O2 -g -S trace.c The compiler will generate two new subsections in .BTF.ext, OffsetReloc and ExternReloc. OffsetReloc is to record the structure member offset operations, and ExternalReloc is to record the external globals where only u8, u16, u32 and u64 are supported. BPFOffsetReloc Size struct SecLOffsetReloc for ELF section #1 A number of struct BPFOffsetReloc for ELF section #1 struct SecOffsetReloc for ELF section #2 A number of struct BPFOffsetReloc for ELF section #2 ... BPFExternReloc Size struct SecExternReloc for ELF section #1 A number of struct BPFExternReloc for ELF section #1 struct SecExternReloc for ELF section #2 A number of struct BPFExternReloc for ELF section #2 struct BPFOffsetReloc { uint32_t InsnOffset; ///< Byte offset in this section uint32_t TypeID; ///< TypeID for the relocation uint32_t OffsetNameOff; ///< The string to traverse types }; struct BPFExternReloc { uint32_t InsnOffset; ///< Byte offset in this section uint32_t ExternNameOff; ///< The string for external variable }; Note that only externs with attribute section ".BPF.patchable_externs" are considered for Extern Reloc which will be patched by bpf loader right before the load. For the above test case, two offset records and one extern record will be generated: OffsetReloc records: .long .Ltmp12 # Insn Offset .long 7 # TypeId .long 242 # Type Decode String .long .Ltmp18 # Insn Offset .long 7 # TypeId .long 242 # Type Decode String ExternReloc record: .long .Ltmp5 # Insn Offset .long 165 # External Variable In string table: .ascii "0:1" # string offset=242 .ascii "__kernel_version" # string offset=165 The default member offset can be calculated as the 2nd member offset (0 representing the 1st member) of struct "sk_buff". The asm code: .Ltmp5: .Ltmp6: r2 = 0 r3 = 41608 .Ltmp7: .Ltmp8: .loc 1 18 9 is_stmt 0 # t.c:18:9 .Ltmp9: if r3 > r2 goto LBB0_2 .Ltmp10: .Ltmp11: .loc 1 0 9 # t.c:0:9 .Ltmp12: r2 = 8 .Ltmp13: .loc 1 19 66 is_stmt 1 # t.c:19:66 .Ltmp14: .Ltmp15: r3 = *(u64 *)(r1 + 0) goto LBB0_3 .Ltmp16: .Ltmp17: LBB0_2: .loc 1 0 66 is_stmt 0 # t.c:0:66 .Ltmp18: r2 = 8 .loc 1 21 66 is_stmt 1 # t.c:21:66 .Ltmp19: r3 = *(u64 *)(r1 + 8) .Ltmp20: .Ltmp21: LBB0_3: .loc 1 0 66 is_stmt 0 # t.c:0:66 r3 += r2 r1 = r10 .Ltmp22: .Ltmp23: .Ltmp24: r1 += -8 r2 = 8 call 4 For instruction .Ltmp12 and .Ltmp18, "r2 = 8", the number 8 is the structure offset based on the current BTF. Loader needs to adjust it if it changes on the host. For instruction .Ltmp5, "r2 = 0", the external variable got a default value 0, loader needs to supply an appropriate value for the particular host. Compiling to generate object code and disassemble: 0000000000000000 bpf_prog: 0: b7 02 00 00 00 00 00 00 r2 = 0 1: 7b 2a f8 ff 00 00 00 00 *(u64 *)(r10 - 8) = r2 2: b7 02 00 00 00 00 00 00 r2 = 0 3: b7 03 00 00 88 a2 00 00 r3 = 41608 4: 2d 23 03 00 00 00 00 00 if r3 > r2 goto +3 <LBB0_2> 5: b7 02 00 00 08 00 00 00 r2 = 8 6: 79 13 00 00 00 00 00 00 r3 = *(u64 *)(r1 + 0) 7: 05 00 02 00 00 00 00 00 goto +2 <LBB0_3> 0000000000000040 LBB0_2: 8: b7 02 00 00 08 00 00 00 r2 = 8 9: 79 13 08 00 00 00 00 00 r3 = *(u64 *)(r1 + 8) 0000000000000050 LBB0_3: 10: 0f 23 00 00 00 00 00 00 r3 += r2 11: bf a1 00 00 00 00 00 00 r1 = r10 12: 07 01 00 00 f8 ff ff ff r1 += -8 13: b7 02 00 00 08 00 00 00 r2 = 8 14: 85 00 00 00 04 00 00 00 call 4 Instructions #2, #5 and #8 need relocation resoutions from the loader. Signed-off-by: Yonghong Song <yhs@fb.com> Differential Revision: https://reviews.llvm.org/D61524 llvm-svn: 365503
2019-07-09 23:28:41 +08:00
bool NeedsFixup;
public:
[BPF] Support for compile once and run everywhere Introduction ============ This patch added intial support for bpf program compile once and run everywhere (CO-RE). The main motivation is for bpf program which depends on kernel headers which may vary between different kernel versions. The initial discussion can be found at https://lwn.net/Articles/773198/. Currently, bpf program accesses kernel internal data structure through bpf_probe_read() helper. The idea is to capture the kernel data structure to be accessed through bpf_probe_read() and relocate them on different kernel versions. On each host, right before bpf program load, the bpfloader will look at the types of the native linux through vmlinux BTF, calculates proper access offset and patch the instruction. To accommodate this, three intrinsic functions preserve_{array,union,struct}_access_index are introduced which in clang will preserve the base pointer, struct/union/array access_index and struct/union debuginfo type information. Later, bpf IR pass can reconstruct the whole gep access chains without looking at gep itself. This patch did the following: . An IR pass is added to convert preserve_*_access_index to global variable who name encodes the getelementptr access pattern. The global variable has metadata attached to describe the corresponding struct/union debuginfo type. . An SimplifyPatchable MachineInstruction pass is added to remove unnecessary loads. . The BTF output pass is enhanced to generate relocation records located in .BTF.ext section. Typical CO-RE also needs support of global variables which can be assigned to different values to different hosts. For example, kernel version can be used to guard different versions of codes. This patch added the support for patchable externals as well. Example ======= The following is an example. struct pt_regs { long arg1; long arg2; }; struct sk_buff { int i; struct net_device *dev; }; #define _(x) (__builtin_preserve_access_index(x)) static int (*bpf_probe_read)(void *dst, int size, const void *unsafe_ptr) = (void *) 4; extern __attribute__((section(".BPF.patchable_externs"))) unsigned __kernel_version; int bpf_prog(struct pt_regs *ctx) { struct net_device *dev = 0; // ctx->arg* does not need bpf_probe_read if (__kernel_version >= 41608) bpf_probe_read(&dev, sizeof(dev), _(&((struct sk_buff *)ctx->arg1)->dev)); else bpf_probe_read(&dev, sizeof(dev), _(&((struct sk_buff *)ctx->arg2)->dev)); return dev != 0; } In the above, we want to translate the third argument of bpf_probe_read() as relocations. -bash-4.4$ clang -target bpf -O2 -g -S trace.c The compiler will generate two new subsections in .BTF.ext, OffsetReloc and ExternReloc. OffsetReloc is to record the structure member offset operations, and ExternalReloc is to record the external globals where only u8, u16, u32 and u64 are supported. BPFOffsetReloc Size struct SecLOffsetReloc for ELF section #1 A number of struct BPFOffsetReloc for ELF section #1 struct SecOffsetReloc for ELF section #2 A number of struct BPFOffsetReloc for ELF section #2 ... BPFExternReloc Size struct SecExternReloc for ELF section #1 A number of struct BPFExternReloc for ELF section #1 struct SecExternReloc for ELF section #2 A number of struct BPFExternReloc for ELF section #2 struct BPFOffsetReloc { uint32_t InsnOffset; ///< Byte offset in this section uint32_t TypeID; ///< TypeID for the relocation uint32_t OffsetNameOff; ///< The string to traverse types }; struct BPFExternReloc { uint32_t InsnOffset; ///< Byte offset in this section uint32_t ExternNameOff; ///< The string for external variable }; Note that only externs with attribute section ".BPF.patchable_externs" are considered for Extern Reloc which will be patched by bpf loader right before the load. For the above test case, two offset records and one extern record will be generated: OffsetReloc records: .long .Ltmp12 # Insn Offset .long 7 # TypeId .long 242 # Type Decode String .long .Ltmp18 # Insn Offset .long 7 # TypeId .long 242 # Type Decode String ExternReloc record: .long .Ltmp5 # Insn Offset .long 165 # External Variable In string table: .ascii "0:1" # string offset=242 .ascii "__kernel_version" # string offset=165 The default member offset can be calculated as the 2nd member offset (0 representing the 1st member) of struct "sk_buff". The asm code: .Ltmp5: .Ltmp6: r2 = 0 r3 = 41608 .Ltmp7: .Ltmp8: .loc 1 18 9 is_stmt 0 # t.c:18:9 .Ltmp9: if r3 > r2 goto LBB0_2 .Ltmp10: .Ltmp11: .loc 1 0 9 # t.c:0:9 .Ltmp12: r2 = 8 .Ltmp13: .loc 1 19 66 is_stmt 1 # t.c:19:66 .Ltmp14: .Ltmp15: r3 = *(u64 *)(r1 + 0) goto LBB0_3 .Ltmp16: .Ltmp17: LBB0_2: .loc 1 0 66 is_stmt 0 # t.c:0:66 .Ltmp18: r2 = 8 .loc 1 21 66 is_stmt 1 # t.c:21:66 .Ltmp19: r3 = *(u64 *)(r1 + 8) .Ltmp20: .Ltmp21: LBB0_3: .loc 1 0 66 is_stmt 0 # t.c:0:66 r3 += r2 r1 = r10 .Ltmp22: .Ltmp23: .Ltmp24: r1 += -8 r2 = 8 call 4 For instruction .Ltmp12 and .Ltmp18, "r2 = 8", the number 8 is the structure offset based on the current BTF. Loader needs to adjust it if it changes on the host. For instruction .Ltmp5, "r2 = 0", the external variable got a default value 0, loader needs to supply an appropriate value for the particular host. Compiling to generate object code and disassemble: 0000000000000000 bpf_prog: 0: b7 02 00 00 00 00 00 00 r2 = 0 1: 7b 2a f8 ff 00 00 00 00 *(u64 *)(r10 - 8) = r2 2: b7 02 00 00 00 00 00 00 r2 = 0 3: b7 03 00 00 88 a2 00 00 r3 = 41608 4: 2d 23 03 00 00 00 00 00 if r3 > r2 goto +3 <LBB0_2> 5: b7 02 00 00 08 00 00 00 r2 = 8 6: 79 13 00 00 00 00 00 00 r3 = *(u64 *)(r1 + 0) 7: 05 00 02 00 00 00 00 00 goto +2 <LBB0_3> 0000000000000040 LBB0_2: 8: b7 02 00 00 08 00 00 00 r2 = 8 9: 79 13 08 00 00 00 00 00 r3 = *(u64 *)(r1 + 8) 0000000000000050 LBB0_3: 10: 0f 23 00 00 00 00 00 00 r3 += r2 11: bf a1 00 00 00 00 00 00 r1 = r10 12: 07 01 00 00 f8 ff ff ff r1 += -8 13: b7 02 00 00 08 00 00 00 r2 = 8 14: 85 00 00 00 04 00 00 00 call 4 Instructions #2, #5 and #8 need relocation resoutions from the loader. Signed-off-by: Yonghong Song <yhs@fb.com> Differential Revision: https://reviews.llvm.org/D61524 llvm-svn: 365503
2019-07-09 23:28:41 +08:00
BTFTypeDerived(const DIDerivedType *Ty, unsigned Tag, bool NeedsFixup);
void completeType(BTFDebug &BDebug);
void emitType(MCStreamer &OS);
[BPF] Support for compile once and run everywhere Introduction ============ This patch added intial support for bpf program compile once and run everywhere (CO-RE). The main motivation is for bpf program which depends on kernel headers which may vary between different kernel versions. The initial discussion can be found at https://lwn.net/Articles/773198/. Currently, bpf program accesses kernel internal data structure through bpf_probe_read() helper. The idea is to capture the kernel data structure to be accessed through bpf_probe_read() and relocate them on different kernel versions. On each host, right before bpf program load, the bpfloader will look at the types of the native linux through vmlinux BTF, calculates proper access offset and patch the instruction. To accommodate this, three intrinsic functions preserve_{array,union,struct}_access_index are introduced which in clang will preserve the base pointer, struct/union/array access_index and struct/union debuginfo type information. Later, bpf IR pass can reconstruct the whole gep access chains without looking at gep itself. This patch did the following: . An IR pass is added to convert preserve_*_access_index to global variable who name encodes the getelementptr access pattern. The global variable has metadata attached to describe the corresponding struct/union debuginfo type. . An SimplifyPatchable MachineInstruction pass is added to remove unnecessary loads. . The BTF output pass is enhanced to generate relocation records located in .BTF.ext section. Typical CO-RE also needs support of global variables which can be assigned to different values to different hosts. For example, kernel version can be used to guard different versions of codes. This patch added the support for patchable externals as well. Example ======= The following is an example. struct pt_regs { long arg1; long arg2; }; struct sk_buff { int i; struct net_device *dev; }; #define _(x) (__builtin_preserve_access_index(x)) static int (*bpf_probe_read)(void *dst, int size, const void *unsafe_ptr) = (void *) 4; extern __attribute__((section(".BPF.patchable_externs"))) unsigned __kernel_version; int bpf_prog(struct pt_regs *ctx) { struct net_device *dev = 0; // ctx->arg* does not need bpf_probe_read if (__kernel_version >= 41608) bpf_probe_read(&dev, sizeof(dev), _(&((struct sk_buff *)ctx->arg1)->dev)); else bpf_probe_read(&dev, sizeof(dev), _(&((struct sk_buff *)ctx->arg2)->dev)); return dev != 0; } In the above, we want to translate the third argument of bpf_probe_read() as relocations. -bash-4.4$ clang -target bpf -O2 -g -S trace.c The compiler will generate two new subsections in .BTF.ext, OffsetReloc and ExternReloc. OffsetReloc is to record the structure member offset operations, and ExternalReloc is to record the external globals where only u8, u16, u32 and u64 are supported. BPFOffsetReloc Size struct SecLOffsetReloc for ELF section #1 A number of struct BPFOffsetReloc for ELF section #1 struct SecOffsetReloc for ELF section #2 A number of struct BPFOffsetReloc for ELF section #2 ... BPFExternReloc Size struct SecExternReloc for ELF section #1 A number of struct BPFExternReloc for ELF section #1 struct SecExternReloc for ELF section #2 A number of struct BPFExternReloc for ELF section #2 struct BPFOffsetReloc { uint32_t InsnOffset; ///< Byte offset in this section uint32_t TypeID; ///< TypeID for the relocation uint32_t OffsetNameOff; ///< The string to traverse types }; struct BPFExternReloc { uint32_t InsnOffset; ///< Byte offset in this section uint32_t ExternNameOff; ///< The string for external variable }; Note that only externs with attribute section ".BPF.patchable_externs" are considered for Extern Reloc which will be patched by bpf loader right before the load. For the above test case, two offset records and one extern record will be generated: OffsetReloc records: .long .Ltmp12 # Insn Offset .long 7 # TypeId .long 242 # Type Decode String .long .Ltmp18 # Insn Offset .long 7 # TypeId .long 242 # Type Decode String ExternReloc record: .long .Ltmp5 # Insn Offset .long 165 # External Variable In string table: .ascii "0:1" # string offset=242 .ascii "__kernel_version" # string offset=165 The default member offset can be calculated as the 2nd member offset (0 representing the 1st member) of struct "sk_buff". The asm code: .Ltmp5: .Ltmp6: r2 = 0 r3 = 41608 .Ltmp7: .Ltmp8: .loc 1 18 9 is_stmt 0 # t.c:18:9 .Ltmp9: if r3 > r2 goto LBB0_2 .Ltmp10: .Ltmp11: .loc 1 0 9 # t.c:0:9 .Ltmp12: r2 = 8 .Ltmp13: .loc 1 19 66 is_stmt 1 # t.c:19:66 .Ltmp14: .Ltmp15: r3 = *(u64 *)(r1 + 0) goto LBB0_3 .Ltmp16: .Ltmp17: LBB0_2: .loc 1 0 66 is_stmt 0 # t.c:0:66 .Ltmp18: r2 = 8 .loc 1 21 66 is_stmt 1 # t.c:21:66 .Ltmp19: r3 = *(u64 *)(r1 + 8) .Ltmp20: .Ltmp21: LBB0_3: .loc 1 0 66 is_stmt 0 # t.c:0:66 r3 += r2 r1 = r10 .Ltmp22: .Ltmp23: .Ltmp24: r1 += -8 r2 = 8 call 4 For instruction .Ltmp12 and .Ltmp18, "r2 = 8", the number 8 is the structure offset based on the current BTF. Loader needs to adjust it if it changes on the host. For instruction .Ltmp5, "r2 = 0", the external variable got a default value 0, loader needs to supply an appropriate value for the particular host. Compiling to generate object code and disassemble: 0000000000000000 bpf_prog: 0: b7 02 00 00 00 00 00 00 r2 = 0 1: 7b 2a f8 ff 00 00 00 00 *(u64 *)(r10 - 8) = r2 2: b7 02 00 00 00 00 00 00 r2 = 0 3: b7 03 00 00 88 a2 00 00 r3 = 41608 4: 2d 23 03 00 00 00 00 00 if r3 > r2 goto +3 <LBB0_2> 5: b7 02 00 00 08 00 00 00 r2 = 8 6: 79 13 00 00 00 00 00 00 r3 = *(u64 *)(r1 + 0) 7: 05 00 02 00 00 00 00 00 goto +2 <LBB0_3> 0000000000000040 LBB0_2: 8: b7 02 00 00 08 00 00 00 r2 = 8 9: 79 13 08 00 00 00 00 00 r3 = *(u64 *)(r1 + 8) 0000000000000050 LBB0_3: 10: 0f 23 00 00 00 00 00 00 r3 += r2 11: bf a1 00 00 00 00 00 00 r1 = r10 12: 07 01 00 00 f8 ff ff ff r1 += -8 13: b7 02 00 00 08 00 00 00 r2 = 8 14: 85 00 00 00 04 00 00 00 call 4 Instructions #2, #5 and #8 need relocation resoutions from the loader. Signed-off-by: Yonghong Song <yhs@fb.com> Differential Revision: https://reviews.llvm.org/D61524 llvm-svn: 365503
2019-07-09 23:28:41 +08:00
void setPointeeType(uint32_t PointeeType);
};
/// Handle struct or union forward declaration.
class BTFTypeFwd : public BTFTypeBase {
StringRef Name;
public:
BTFTypeFwd(StringRef Name, bool IsUnion);
void completeType(BTFDebug &BDebug);
void emitType(MCStreamer &OS);
};
/// Handle int type.
class BTFTypeInt : public BTFTypeBase {
StringRef Name;
uint32_t IntVal; ///< Encoding, offset, bits
public:
BTFTypeInt(uint32_t Encoding, uint32_t SizeInBits, uint32_t OffsetInBits,
StringRef TypeName);
uint32_t getSize() { return BTFTypeBase::getSize() + sizeof(uint32_t); }
void completeType(BTFDebug &BDebug);
void emitType(MCStreamer &OS);
};
/// Handle enumerate type.
class BTFTypeEnum : public BTFTypeBase {
const DICompositeType *ETy;
std::vector<struct BTF::BTFEnum> EnumValues;
public:
BTFTypeEnum(const DICompositeType *ETy, uint32_t NumValues);
uint32_t getSize() {
return BTFTypeBase::getSize() + EnumValues.size() * BTF::BTFEnumSize;
}
void completeType(BTFDebug &BDebug);
void emitType(MCStreamer &OS);
};
/// Handle array type.
class BTFTypeArray : public BTFTypeBase {
struct BTF::BTFArray ArrayInfo;
public:
BTFTypeArray(uint32_t ElemTypeId, uint32_t NumElems);
uint32_t getSize() { return BTFTypeBase::getSize() + BTF::BTFArraySize; }
void completeType(BTFDebug &BDebug);
void emitType(MCStreamer &OS);
};
/// Handle struct/union type.
class BTFTypeStruct : public BTFTypeBase {
const DICompositeType *STy;
bool HasBitField;
std::vector<struct BTF::BTFMember> Members;
public:
BTFTypeStruct(const DICompositeType *STy, bool IsStruct, bool HasBitField,
uint32_t NumMembers);
uint32_t getSize() {
return BTFTypeBase::getSize() + Members.size() * BTF::BTFMemberSize;
}
void completeType(BTFDebug &BDebug);
void emitType(MCStreamer &OS);
[BPF] Support for compile once and run everywhere Introduction ============ This patch added intial support for bpf program compile once and run everywhere (CO-RE). The main motivation is for bpf program which depends on kernel headers which may vary between different kernel versions. The initial discussion can be found at https://lwn.net/Articles/773198/. Currently, bpf program accesses kernel internal data structure through bpf_probe_read() helper. The idea is to capture the kernel data structure to be accessed through bpf_probe_read() and relocate them on different kernel versions. On each host, right before bpf program load, the bpfloader will look at the types of the native linux through vmlinux BTF, calculates proper access offset and patch the instruction. To accommodate this, three intrinsic functions preserve_{array,union,struct}_access_index are introduced which in clang will preserve the base pointer, struct/union/array access_index and struct/union debuginfo type information. Later, bpf IR pass can reconstruct the whole gep access chains without looking at gep itself. This patch did the following: . An IR pass is added to convert preserve_*_access_index to global variable who name encodes the getelementptr access pattern. The global variable has metadata attached to describe the corresponding struct/union debuginfo type. . An SimplifyPatchable MachineInstruction pass is added to remove unnecessary loads. . The BTF output pass is enhanced to generate relocation records located in .BTF.ext section. Typical CO-RE also needs support of global variables which can be assigned to different values to different hosts. For example, kernel version can be used to guard different versions of codes. This patch added the support for patchable externals as well. Example ======= The following is an example. struct pt_regs { long arg1; long arg2; }; struct sk_buff { int i; struct net_device *dev; }; #define _(x) (__builtin_preserve_access_index(x)) static int (*bpf_probe_read)(void *dst, int size, const void *unsafe_ptr) = (void *) 4; extern __attribute__((section(".BPF.patchable_externs"))) unsigned __kernel_version; int bpf_prog(struct pt_regs *ctx) { struct net_device *dev = 0; // ctx->arg* does not need bpf_probe_read if (__kernel_version >= 41608) bpf_probe_read(&dev, sizeof(dev), _(&((struct sk_buff *)ctx->arg1)->dev)); else bpf_probe_read(&dev, sizeof(dev), _(&((struct sk_buff *)ctx->arg2)->dev)); return dev != 0; } In the above, we want to translate the third argument of bpf_probe_read() as relocations. -bash-4.4$ clang -target bpf -O2 -g -S trace.c The compiler will generate two new subsections in .BTF.ext, OffsetReloc and ExternReloc. OffsetReloc is to record the structure member offset operations, and ExternalReloc is to record the external globals where only u8, u16, u32 and u64 are supported. BPFOffsetReloc Size struct SecLOffsetReloc for ELF section #1 A number of struct BPFOffsetReloc for ELF section #1 struct SecOffsetReloc for ELF section #2 A number of struct BPFOffsetReloc for ELF section #2 ... BPFExternReloc Size struct SecExternReloc for ELF section #1 A number of struct BPFExternReloc for ELF section #1 struct SecExternReloc for ELF section #2 A number of struct BPFExternReloc for ELF section #2 struct BPFOffsetReloc { uint32_t InsnOffset; ///< Byte offset in this section uint32_t TypeID; ///< TypeID for the relocation uint32_t OffsetNameOff; ///< The string to traverse types }; struct BPFExternReloc { uint32_t InsnOffset; ///< Byte offset in this section uint32_t ExternNameOff; ///< The string for external variable }; Note that only externs with attribute section ".BPF.patchable_externs" are considered for Extern Reloc which will be patched by bpf loader right before the load. For the above test case, two offset records and one extern record will be generated: OffsetReloc records: .long .Ltmp12 # Insn Offset .long 7 # TypeId .long 242 # Type Decode String .long .Ltmp18 # Insn Offset .long 7 # TypeId .long 242 # Type Decode String ExternReloc record: .long .Ltmp5 # Insn Offset .long 165 # External Variable In string table: .ascii "0:1" # string offset=242 .ascii "__kernel_version" # string offset=165 The default member offset can be calculated as the 2nd member offset (0 representing the 1st member) of struct "sk_buff". The asm code: .Ltmp5: .Ltmp6: r2 = 0 r3 = 41608 .Ltmp7: .Ltmp8: .loc 1 18 9 is_stmt 0 # t.c:18:9 .Ltmp9: if r3 > r2 goto LBB0_2 .Ltmp10: .Ltmp11: .loc 1 0 9 # t.c:0:9 .Ltmp12: r2 = 8 .Ltmp13: .loc 1 19 66 is_stmt 1 # t.c:19:66 .Ltmp14: .Ltmp15: r3 = *(u64 *)(r1 + 0) goto LBB0_3 .Ltmp16: .Ltmp17: LBB0_2: .loc 1 0 66 is_stmt 0 # t.c:0:66 .Ltmp18: r2 = 8 .loc 1 21 66 is_stmt 1 # t.c:21:66 .Ltmp19: r3 = *(u64 *)(r1 + 8) .Ltmp20: .Ltmp21: LBB0_3: .loc 1 0 66 is_stmt 0 # t.c:0:66 r3 += r2 r1 = r10 .Ltmp22: .Ltmp23: .Ltmp24: r1 += -8 r2 = 8 call 4 For instruction .Ltmp12 and .Ltmp18, "r2 = 8", the number 8 is the structure offset based on the current BTF. Loader needs to adjust it if it changes on the host. For instruction .Ltmp5, "r2 = 0", the external variable got a default value 0, loader needs to supply an appropriate value for the particular host. Compiling to generate object code and disassemble: 0000000000000000 bpf_prog: 0: b7 02 00 00 00 00 00 00 r2 = 0 1: 7b 2a f8 ff 00 00 00 00 *(u64 *)(r10 - 8) = r2 2: b7 02 00 00 00 00 00 00 r2 = 0 3: b7 03 00 00 88 a2 00 00 r3 = 41608 4: 2d 23 03 00 00 00 00 00 if r3 > r2 goto +3 <LBB0_2> 5: b7 02 00 00 08 00 00 00 r2 = 8 6: 79 13 00 00 00 00 00 00 r3 = *(u64 *)(r1 + 0) 7: 05 00 02 00 00 00 00 00 goto +2 <LBB0_3> 0000000000000040 LBB0_2: 8: b7 02 00 00 08 00 00 00 r2 = 8 9: 79 13 08 00 00 00 00 00 r3 = *(u64 *)(r1 + 8) 0000000000000050 LBB0_3: 10: 0f 23 00 00 00 00 00 00 r3 += r2 11: bf a1 00 00 00 00 00 00 r1 = r10 12: 07 01 00 00 f8 ff ff ff r1 += -8 13: b7 02 00 00 08 00 00 00 r2 = 8 14: 85 00 00 00 04 00 00 00 call 4 Instructions #2, #5 and #8 need relocation resoutions from the loader. Signed-off-by: Yonghong Song <yhs@fb.com> Differential Revision: https://reviews.llvm.org/D61524 llvm-svn: 365503
2019-07-09 23:28:41 +08:00
std::string getName();
};
/// Handle function pointer.
class BTFTypeFuncProto : public BTFTypeBase {
const DISubroutineType *STy;
std::unordered_map<uint32_t, StringRef> FuncArgNames;
std::vector<struct BTF::BTFParam> Parameters;
public:
BTFTypeFuncProto(const DISubroutineType *STy, uint32_t NumParams,
const std::unordered_map<uint32_t, StringRef> &FuncArgNames);
uint32_t getSize() {
return BTFTypeBase::getSize() + Parameters.size() * BTF::BTFParamSize;
}
void completeType(BTFDebug &BDebug);
void emitType(MCStreamer &OS);
};
/// Handle subprogram
class BTFTypeFunc : public BTFTypeBase {
StringRef Name;
public:
BTFTypeFunc(StringRef FuncName, uint32_t ProtoTypeId, uint32_t Scope);
uint32_t getSize() { return BTFTypeBase::getSize(); }
void completeType(BTFDebug &BDebug);
void emitType(MCStreamer &OS);
};
[BPF] Add BTF Var and DataSec Support Two new kinds, BTF_KIND_VAR and BTF_KIND_DATASEC, are added. BTF_KIND_VAR has the following specification: btf_type.name: var name btf_type.info: type kind btf_type.type: var type // btf_type is followed by one u32 u32: varinfo (currently, only 0 - static, 1 - global allocated in elf sections) Not all globals are supported in this patch. The following globals are supported: . static variables with or without section attributes . global variables with section attributes The inclusion of globals with section attributes is for future potential extraction of key/value type id's from map definition. BTF_KIND_DATASEC has the following specification: btf_type.name: section name associated with variable or one of .data/.bss/.readonly btf_type.info: type kind and vlen for # of variables btf_type.size: 0 #vlen number of the following: u32: id of corresponding BTF_KIND_VAR u32: in-session offset of the var u32: the size of memory var occupied At the time of debug info emission, the data section size is unknown, so the btf_type.size = 0 for BTF_KIND_DATASEC. The loader can patch it during loading time. The in-session offseet of the var is only available for static variables. For global variables, the loader neeeds to assign the global variable symbol value in symbol table to in-section offset. The size of memory is used to specify the amount of the memory a variable occupies. Typically, it equals to the type size, but for certain structures, e.g., struct tt { int a; int b; char c[]; }; static volatile struct tt s2 = {3, 4, "abcdefghi"}; The static variable s2 has size of 20. Note that for BTF_KIND_DATASEC name, the section name does not contain object name. The compiler does have input module name. For example, two cases below: . clang -target bpf -O2 -g -c test.c The compiler knows the input file (module) is test.c and can generate sec name like test.data/test.bss etc. . clang -target bpf -O2 -g -emit-llvm -c test.c -o - | llc -march=bpf -filetype=obj -o test.o The llc compiler has the input file as stdin, and would generate something like stdin.data/stdin.bss etc. which does not really make sense. For any user specificed section name, e.g., static volatile int a __attribute__((section("id1"))); static volatile const int b __attribute__((section("id2"))); The DataSec with name "id1" and "id2" does not contain information whether the section is readonly or not. The loader needs to check the corresponding elf section flags for such information. A simple example: -bash-4.4$ cat t.c int g1; int g2 = 3; const int g3 = 4; static volatile int s1; struct tt { int a; int b; char c[]; }; static volatile struct tt s2 = {3, 4, "abcdefghi"}; static volatile const int s3 = 4; int m __attribute__((section("maps"), used)) = 4; int test() { return g1 + g2 + g3 + s1 + s2.a + s3 + m; } -bash-4.4$ clang -target bpf -O2 -g -S t.c Checking t.s, 4 BTF_KIND_VAR's are generated (s1, s2, s3 and m). 4 BTF_KIND_DATASEC's are generated with names ".data", ".bss", ".rodata" and "maps". Signed-off-by: Yonghong Song <yhs@fb.com> Differential Revision: https://reviews.llvm.org/D59441 llvm-svn: 356326
2019-03-16 23:36:31 +08:00
/// Handle variable instances
class BTFKindVar : public BTFTypeBase {
StringRef Name;
uint32_t Info;
public:
BTFKindVar(StringRef VarName, uint32_t TypeId, uint32_t VarInfo);
uint32_t getSize() { return BTFTypeBase::getSize() + 4; }
void completeType(BTFDebug &BDebug);
void emitType(MCStreamer &OS);
};
/// Handle data sections
class BTFKindDataSec : public BTFTypeBase {
AsmPrinter *Asm;
std::string Name;
std::vector<std::tuple<uint32_t, const MCSymbol *, uint32_t>> Vars;
public:
BTFKindDataSec(AsmPrinter *AsmPrt, std::string SecName);
uint32_t getSize() {
return BTFTypeBase::getSize() + BTF::BTFDataSecVarSize * Vars.size();
}
void addVar(uint32_t Id, const MCSymbol *Sym, uint32_t Size) {
Vars.push_back(std::make_tuple(Id, Sym, Size));
}
std::string getName() { return Name; }
void completeType(BTFDebug &BDebug);
void emitType(MCStreamer &OS);
};
/// String table.
class BTFStringTable {
/// String table size in bytes.
uint32_t Size;
/// A mapping from string table offset to the index
/// of the Table. It is used to avoid putting
/// duplicated strings in the table.
[BPF] do compile-once run-everywhere relocation for bitfields A bpf specific clang intrinsic is introduced: u32 __builtin_preserve_field_info(member_access, info_kind) Depending on info_kind, different information will be returned to the program. A relocation is also recorded for this builtin so that bpf loader can patch the instruction on the target host. This clang intrinsic is used to get certain information to facilitate struct/union member relocations. The offset relocation is extended by 4 bytes to include relocation kind. Currently supported relocation kinds are enum { FIELD_BYTE_OFFSET = 0, FIELD_BYTE_SIZE, FIELD_EXISTENCE, FIELD_SIGNEDNESS, FIELD_LSHIFT_U64, FIELD_RSHIFT_U64, }; for __builtin_preserve_field_info. The old access offset relocation is covered by FIELD_BYTE_OFFSET = 0. An example: struct s { int a; int b1:9; int b2:4; }; enum { FIELD_BYTE_OFFSET = 0, FIELD_BYTE_SIZE, FIELD_EXISTENCE, FIELD_SIGNEDNESS, FIELD_LSHIFT_U64, FIELD_RSHIFT_U64, }; void bpf_probe_read(void *, unsigned, const void *); int field_read(struct s *arg) { unsigned long long ull = 0; unsigned offset = __builtin_preserve_field_info(arg->b2, FIELD_BYTE_OFFSET); unsigned size = __builtin_preserve_field_info(arg->b2, FIELD_BYTE_SIZE); #ifdef USE_PROBE_READ bpf_probe_read(&ull, size, (const void *)arg + offset); unsigned lshift = __builtin_preserve_field_info(arg->b2, FIELD_LSHIFT_U64); #if __BYTE_ORDER__ == __ORDER_BIG_ENDIAN__ lshift = lshift + (size << 3) - 64; #endif #else switch(size) { case 1: ull = *(unsigned char *)((void *)arg + offset); break; case 2: ull = *(unsigned short *)((void *)arg + offset); break; case 4: ull = *(unsigned int *)((void *)arg + offset); break; case 8: ull = *(unsigned long long *)((void *)arg + offset); break; } unsigned lshift = __builtin_preserve_field_info(arg->b2, FIELD_LSHIFT_U64); #endif ull <<= lshift; if (__builtin_preserve_field_info(arg->b2, FIELD_SIGNEDNESS)) return (long long)ull >> __builtin_preserve_field_info(arg->b2, FIELD_RSHIFT_U64); return ull >> __builtin_preserve_field_info(arg->b2, FIELD_RSHIFT_U64); } There is a minor overhead for bpf_probe_read() on big endian. The code and relocation generated for field_read where bpf_probe_read() is used to access argument data on little endian mode: r3 = r1 r1 = 0 r1 = 4 <=== relocation (FIELD_BYTE_OFFSET) r3 += r1 r1 = r10 r1 += -8 r2 = 4 <=== relocation (FIELD_BYTE_SIZE) call bpf_probe_read r2 = 51 <=== relocation (FIELD_LSHIFT_U64) r1 = *(u64 *)(r10 - 8) r1 <<= r2 r2 = 60 <=== relocation (FIELD_RSHIFT_U64) r0 = r1 r0 >>= r2 r3 = 1 <=== relocation (FIELD_SIGNEDNESS) if r3 == 0 goto LBB0_2 r1 s>>= r2 r0 = r1 LBB0_2: exit Compare to the above code between relocations FIELD_LSHIFT_U64 and FIELD_LSHIFT_U64, the code with big endian mode has four more instructions. r1 = 41 <=== relocation (FIELD_LSHIFT_U64) r6 += r1 r6 += -64 r6 <<= 32 r6 >>= 32 r1 = *(u64 *)(r10 - 8) r1 <<= r6 r2 = 60 <=== relocation (FIELD_RSHIFT_U64) The code and relocation generated when using direct load. r2 = 0 r3 = 4 r4 = 4 if r4 s> 3 goto LBB0_3 if r4 == 1 goto LBB0_5 if r4 == 2 goto LBB0_6 goto LBB0_9 LBB0_6: # %sw.bb1 r1 += r3 r2 = *(u16 *)(r1 + 0) goto LBB0_9 LBB0_3: # %entry if r4 == 4 goto LBB0_7 if r4 == 8 goto LBB0_8 goto LBB0_9 LBB0_8: # %sw.bb9 r1 += r3 r2 = *(u64 *)(r1 + 0) goto LBB0_9 LBB0_5: # %sw.bb r1 += r3 r2 = *(u8 *)(r1 + 0) goto LBB0_9 LBB0_7: # %sw.bb5 r1 += r3 r2 = *(u32 *)(r1 + 0) LBB0_9: # %sw.epilog r1 = 51 r2 <<= r1 r1 = 60 r0 = r2 r0 >>= r1 r3 = 1 if r3 == 0 goto LBB0_11 r2 s>>= r1 r0 = r2 LBB0_11: # %sw.epilog exit Considering verifier is able to do limited constant propogation following branches. The following is the code actually traversed. r2 = 0 r3 = 4 <=== relocation r4 = 4 <=== relocation if r4 s> 3 goto LBB0_3 LBB0_3: # %entry if r4 == 4 goto LBB0_7 LBB0_7: # %sw.bb5 r1 += r3 r2 = *(u32 *)(r1 + 0) LBB0_9: # %sw.epilog r1 = 51 <=== relocation r2 <<= r1 r1 = 60 <=== relocation r0 = r2 r0 >>= r1 r3 = 1 if r3 == 0 goto LBB0_11 r2 s>>= r1 r0 = r2 LBB0_11: # %sw.epilog exit For native load case, the load size is calculated to be the same as the size of load width LLVM otherwise used to load the value which is then used to extract the bitfield value. Differential Revision: https://reviews.llvm.org/D67980 llvm-svn: 374099
2019-10-09 02:23:17 +08:00
std::map<uint32_t, uint32_t> OffsetToIdMap;
/// A vector of strings to represent the string table.
std::vector<std::string> Table;
public:
BTFStringTable() : Size(0) {}
uint32_t getSize() { return Size; }
std::vector<std::string> &getTable() { return Table; }
/// Add a string to the string table and returns its offset
/// in the table.
uint32_t addString(StringRef S);
};
/// Represent one func and its type id.
struct BTFFuncInfo {
const MCSymbol *Label; ///< Func MCSymbol
uint32_t TypeId; ///< Type id referring to .BTF type section
};
/// Represent one line info.
struct BTFLineInfo {
MCSymbol *Label; ///< MCSymbol identifying insn for the lineinfo
uint32_t FileNameOff; ///< file name offset in the .BTF string table
uint32_t LineOff; ///< line offset in the .BTF string table
uint32_t LineNum; ///< the line number
uint32_t ColumnNum; ///< the column number
};
[BPF] Enable relocation location for load/store/shifts Previous btf field relocation is always at assignment like r1 = 4 which is converted from an ld_imm64 instruction. This patch did an optimization such that relocation instruction might be load/store/shift. Specically, the following insns may also have relocation, except BPF_MOV: LDB, LDH, LDW, LDD, STB, STH, STW, STD, LDB32, LDH32, LDW32, STB32, STH32, STW32, SLL, SRL, SRA To accomplish this, a few BPF target specific codegen only instructions are invented. They are generated at backend BPF SimplifyPatchable phase, which is at early llc phase when SSA form is available. The new codegen only instructions will be converted to real proper instructions at the codegen and BTF emission stage. Note that, as revealed by a few tests, this optimization might be actual generating more relocations: Scenario 1: if (...) { ... __builtin_preserve_field_info(arg->b2, 0) ... } else { ... __builtin_preserve_field_info(arg->b2, 0) ... } Compiler could do CSE to only have one relocation. But if both of the above is translated into codegen internal instructions, the compiler will not be able to do that. Scenario 2: offset = ... __builtin_preserve_field_info(arg->b2, 0) ... ... ... offset ... ... offset ... ... offset ... For whatever reason, the compiler might be temporarily do copy propagation of the righthand of "offset" assignment like ... __builtin_preserve_field_info(arg->b2, 0) ... ... __builtin_preserve_field_info(arg->b2, 0) ... and CSE will be able to deduplicate later. But if these intrinsics are converted to BPF pseudo instructions, they will not be able to get deduplicated. I do not expect we have big instruction count difference. It may actually reduce instruction count since now relocation is in deeper insn dependency chain. For example, for test offset-reloc-fieldinfo-2.ll, this patch generates 7 instead of 6 relocations for non-alu32 mode, but it actually reduced instruction count from 29 to 26. Differential Revision: https://reviews.llvm.org/D71790
2019-12-20 07:21:53 +08:00
/// Represent one field relocation.
[BPF] do compile-once run-everywhere relocation for bitfields A bpf specific clang intrinsic is introduced: u32 __builtin_preserve_field_info(member_access, info_kind) Depending on info_kind, different information will be returned to the program. A relocation is also recorded for this builtin so that bpf loader can patch the instruction on the target host. This clang intrinsic is used to get certain information to facilitate struct/union member relocations. The offset relocation is extended by 4 bytes to include relocation kind. Currently supported relocation kinds are enum { FIELD_BYTE_OFFSET = 0, FIELD_BYTE_SIZE, FIELD_EXISTENCE, FIELD_SIGNEDNESS, FIELD_LSHIFT_U64, FIELD_RSHIFT_U64, }; for __builtin_preserve_field_info. The old access offset relocation is covered by FIELD_BYTE_OFFSET = 0. An example: struct s { int a; int b1:9; int b2:4; }; enum { FIELD_BYTE_OFFSET = 0, FIELD_BYTE_SIZE, FIELD_EXISTENCE, FIELD_SIGNEDNESS, FIELD_LSHIFT_U64, FIELD_RSHIFT_U64, }; void bpf_probe_read(void *, unsigned, const void *); int field_read(struct s *arg) { unsigned long long ull = 0; unsigned offset = __builtin_preserve_field_info(arg->b2, FIELD_BYTE_OFFSET); unsigned size = __builtin_preserve_field_info(arg->b2, FIELD_BYTE_SIZE); #ifdef USE_PROBE_READ bpf_probe_read(&ull, size, (const void *)arg + offset); unsigned lshift = __builtin_preserve_field_info(arg->b2, FIELD_LSHIFT_U64); #if __BYTE_ORDER__ == __ORDER_BIG_ENDIAN__ lshift = lshift + (size << 3) - 64; #endif #else switch(size) { case 1: ull = *(unsigned char *)((void *)arg + offset); break; case 2: ull = *(unsigned short *)((void *)arg + offset); break; case 4: ull = *(unsigned int *)((void *)arg + offset); break; case 8: ull = *(unsigned long long *)((void *)arg + offset); break; } unsigned lshift = __builtin_preserve_field_info(arg->b2, FIELD_LSHIFT_U64); #endif ull <<= lshift; if (__builtin_preserve_field_info(arg->b2, FIELD_SIGNEDNESS)) return (long long)ull >> __builtin_preserve_field_info(arg->b2, FIELD_RSHIFT_U64); return ull >> __builtin_preserve_field_info(arg->b2, FIELD_RSHIFT_U64); } There is a minor overhead for bpf_probe_read() on big endian. The code and relocation generated for field_read where bpf_probe_read() is used to access argument data on little endian mode: r3 = r1 r1 = 0 r1 = 4 <=== relocation (FIELD_BYTE_OFFSET) r3 += r1 r1 = r10 r1 += -8 r2 = 4 <=== relocation (FIELD_BYTE_SIZE) call bpf_probe_read r2 = 51 <=== relocation (FIELD_LSHIFT_U64) r1 = *(u64 *)(r10 - 8) r1 <<= r2 r2 = 60 <=== relocation (FIELD_RSHIFT_U64) r0 = r1 r0 >>= r2 r3 = 1 <=== relocation (FIELD_SIGNEDNESS) if r3 == 0 goto LBB0_2 r1 s>>= r2 r0 = r1 LBB0_2: exit Compare to the above code between relocations FIELD_LSHIFT_U64 and FIELD_LSHIFT_U64, the code with big endian mode has four more instructions. r1 = 41 <=== relocation (FIELD_LSHIFT_U64) r6 += r1 r6 += -64 r6 <<= 32 r6 >>= 32 r1 = *(u64 *)(r10 - 8) r1 <<= r6 r2 = 60 <=== relocation (FIELD_RSHIFT_U64) The code and relocation generated when using direct load. r2 = 0 r3 = 4 r4 = 4 if r4 s> 3 goto LBB0_3 if r4 == 1 goto LBB0_5 if r4 == 2 goto LBB0_6 goto LBB0_9 LBB0_6: # %sw.bb1 r1 += r3 r2 = *(u16 *)(r1 + 0) goto LBB0_9 LBB0_3: # %entry if r4 == 4 goto LBB0_7 if r4 == 8 goto LBB0_8 goto LBB0_9 LBB0_8: # %sw.bb9 r1 += r3 r2 = *(u64 *)(r1 + 0) goto LBB0_9 LBB0_5: # %sw.bb r1 += r3 r2 = *(u8 *)(r1 + 0) goto LBB0_9 LBB0_7: # %sw.bb5 r1 += r3 r2 = *(u32 *)(r1 + 0) LBB0_9: # %sw.epilog r1 = 51 r2 <<= r1 r1 = 60 r0 = r2 r0 >>= r1 r3 = 1 if r3 == 0 goto LBB0_11 r2 s>>= r1 r0 = r2 LBB0_11: # %sw.epilog exit Considering verifier is able to do limited constant propogation following branches. The following is the code actually traversed. r2 = 0 r3 = 4 <=== relocation r4 = 4 <=== relocation if r4 s> 3 goto LBB0_3 LBB0_3: # %entry if r4 == 4 goto LBB0_7 LBB0_7: # %sw.bb5 r1 += r3 r2 = *(u32 *)(r1 + 0) LBB0_9: # %sw.epilog r1 = 51 <=== relocation r2 <<= r1 r1 = 60 <=== relocation r0 = r2 r0 >>= r1 r3 = 1 if r3 == 0 goto LBB0_11 r2 s>>= r1 r0 = r2 LBB0_11: # %sw.epilog exit For native load case, the load size is calculated to be the same as the size of load width LLVM otherwise used to load the value which is then used to extract the bitfield value. Differential Revision: https://reviews.llvm.org/D67980 llvm-svn: 374099
2019-10-09 02:23:17 +08:00
struct BTFFieldReloc {
[BPF] Support for compile once and run everywhere Introduction ============ This patch added intial support for bpf program compile once and run everywhere (CO-RE). The main motivation is for bpf program which depends on kernel headers which may vary between different kernel versions. The initial discussion can be found at https://lwn.net/Articles/773198/. Currently, bpf program accesses kernel internal data structure through bpf_probe_read() helper. The idea is to capture the kernel data structure to be accessed through bpf_probe_read() and relocate them on different kernel versions. On each host, right before bpf program load, the bpfloader will look at the types of the native linux through vmlinux BTF, calculates proper access offset and patch the instruction. To accommodate this, three intrinsic functions preserve_{array,union,struct}_access_index are introduced which in clang will preserve the base pointer, struct/union/array access_index and struct/union debuginfo type information. Later, bpf IR pass can reconstruct the whole gep access chains without looking at gep itself. This patch did the following: . An IR pass is added to convert preserve_*_access_index to global variable who name encodes the getelementptr access pattern. The global variable has metadata attached to describe the corresponding struct/union debuginfo type. . An SimplifyPatchable MachineInstruction pass is added to remove unnecessary loads. . The BTF output pass is enhanced to generate relocation records located in .BTF.ext section. Typical CO-RE also needs support of global variables which can be assigned to different values to different hosts. For example, kernel version can be used to guard different versions of codes. This patch added the support for patchable externals as well. Example ======= The following is an example. struct pt_regs { long arg1; long arg2; }; struct sk_buff { int i; struct net_device *dev; }; #define _(x) (__builtin_preserve_access_index(x)) static int (*bpf_probe_read)(void *dst, int size, const void *unsafe_ptr) = (void *) 4; extern __attribute__((section(".BPF.patchable_externs"))) unsigned __kernel_version; int bpf_prog(struct pt_regs *ctx) { struct net_device *dev = 0; // ctx->arg* does not need bpf_probe_read if (__kernel_version >= 41608) bpf_probe_read(&dev, sizeof(dev), _(&((struct sk_buff *)ctx->arg1)->dev)); else bpf_probe_read(&dev, sizeof(dev), _(&((struct sk_buff *)ctx->arg2)->dev)); return dev != 0; } In the above, we want to translate the third argument of bpf_probe_read() as relocations. -bash-4.4$ clang -target bpf -O2 -g -S trace.c The compiler will generate two new subsections in .BTF.ext, OffsetReloc and ExternReloc. OffsetReloc is to record the structure member offset operations, and ExternalReloc is to record the external globals where only u8, u16, u32 and u64 are supported. BPFOffsetReloc Size struct SecLOffsetReloc for ELF section #1 A number of struct BPFOffsetReloc for ELF section #1 struct SecOffsetReloc for ELF section #2 A number of struct BPFOffsetReloc for ELF section #2 ... BPFExternReloc Size struct SecExternReloc for ELF section #1 A number of struct BPFExternReloc for ELF section #1 struct SecExternReloc for ELF section #2 A number of struct BPFExternReloc for ELF section #2 struct BPFOffsetReloc { uint32_t InsnOffset; ///< Byte offset in this section uint32_t TypeID; ///< TypeID for the relocation uint32_t OffsetNameOff; ///< The string to traverse types }; struct BPFExternReloc { uint32_t InsnOffset; ///< Byte offset in this section uint32_t ExternNameOff; ///< The string for external variable }; Note that only externs with attribute section ".BPF.patchable_externs" are considered for Extern Reloc which will be patched by bpf loader right before the load. For the above test case, two offset records and one extern record will be generated: OffsetReloc records: .long .Ltmp12 # Insn Offset .long 7 # TypeId .long 242 # Type Decode String .long .Ltmp18 # Insn Offset .long 7 # TypeId .long 242 # Type Decode String ExternReloc record: .long .Ltmp5 # Insn Offset .long 165 # External Variable In string table: .ascii "0:1" # string offset=242 .ascii "__kernel_version" # string offset=165 The default member offset can be calculated as the 2nd member offset (0 representing the 1st member) of struct "sk_buff". The asm code: .Ltmp5: .Ltmp6: r2 = 0 r3 = 41608 .Ltmp7: .Ltmp8: .loc 1 18 9 is_stmt 0 # t.c:18:9 .Ltmp9: if r3 > r2 goto LBB0_2 .Ltmp10: .Ltmp11: .loc 1 0 9 # t.c:0:9 .Ltmp12: r2 = 8 .Ltmp13: .loc 1 19 66 is_stmt 1 # t.c:19:66 .Ltmp14: .Ltmp15: r3 = *(u64 *)(r1 + 0) goto LBB0_3 .Ltmp16: .Ltmp17: LBB0_2: .loc 1 0 66 is_stmt 0 # t.c:0:66 .Ltmp18: r2 = 8 .loc 1 21 66 is_stmt 1 # t.c:21:66 .Ltmp19: r3 = *(u64 *)(r1 + 8) .Ltmp20: .Ltmp21: LBB0_3: .loc 1 0 66 is_stmt 0 # t.c:0:66 r3 += r2 r1 = r10 .Ltmp22: .Ltmp23: .Ltmp24: r1 += -8 r2 = 8 call 4 For instruction .Ltmp12 and .Ltmp18, "r2 = 8", the number 8 is the structure offset based on the current BTF. Loader needs to adjust it if it changes on the host. For instruction .Ltmp5, "r2 = 0", the external variable got a default value 0, loader needs to supply an appropriate value for the particular host. Compiling to generate object code and disassemble: 0000000000000000 bpf_prog: 0: b7 02 00 00 00 00 00 00 r2 = 0 1: 7b 2a f8 ff 00 00 00 00 *(u64 *)(r10 - 8) = r2 2: b7 02 00 00 00 00 00 00 r2 = 0 3: b7 03 00 00 88 a2 00 00 r3 = 41608 4: 2d 23 03 00 00 00 00 00 if r3 > r2 goto +3 <LBB0_2> 5: b7 02 00 00 08 00 00 00 r2 = 8 6: 79 13 00 00 00 00 00 00 r3 = *(u64 *)(r1 + 0) 7: 05 00 02 00 00 00 00 00 goto +2 <LBB0_3> 0000000000000040 LBB0_2: 8: b7 02 00 00 08 00 00 00 r2 = 8 9: 79 13 08 00 00 00 00 00 r3 = *(u64 *)(r1 + 8) 0000000000000050 LBB0_3: 10: 0f 23 00 00 00 00 00 00 r3 += r2 11: bf a1 00 00 00 00 00 00 r1 = r10 12: 07 01 00 00 f8 ff ff ff r1 += -8 13: b7 02 00 00 08 00 00 00 r2 = 8 14: 85 00 00 00 04 00 00 00 call 4 Instructions #2, #5 and #8 need relocation resoutions from the loader. Signed-off-by: Yonghong Song <yhs@fb.com> Differential Revision: https://reviews.llvm.org/D61524 llvm-svn: 365503
2019-07-09 23:28:41 +08:00
const MCSymbol *Label; ///< MCSymbol identifying insn for the reloc
uint32_t TypeID; ///< Type ID
uint32_t OffsetNameOff; ///< The string to traverse types
[BPF] do compile-once run-everywhere relocation for bitfields A bpf specific clang intrinsic is introduced: u32 __builtin_preserve_field_info(member_access, info_kind) Depending on info_kind, different information will be returned to the program. A relocation is also recorded for this builtin so that bpf loader can patch the instruction on the target host. This clang intrinsic is used to get certain information to facilitate struct/union member relocations. The offset relocation is extended by 4 bytes to include relocation kind. Currently supported relocation kinds are enum { FIELD_BYTE_OFFSET = 0, FIELD_BYTE_SIZE, FIELD_EXISTENCE, FIELD_SIGNEDNESS, FIELD_LSHIFT_U64, FIELD_RSHIFT_U64, }; for __builtin_preserve_field_info. The old access offset relocation is covered by FIELD_BYTE_OFFSET = 0. An example: struct s { int a; int b1:9; int b2:4; }; enum { FIELD_BYTE_OFFSET = 0, FIELD_BYTE_SIZE, FIELD_EXISTENCE, FIELD_SIGNEDNESS, FIELD_LSHIFT_U64, FIELD_RSHIFT_U64, }; void bpf_probe_read(void *, unsigned, const void *); int field_read(struct s *arg) { unsigned long long ull = 0; unsigned offset = __builtin_preserve_field_info(arg->b2, FIELD_BYTE_OFFSET); unsigned size = __builtin_preserve_field_info(arg->b2, FIELD_BYTE_SIZE); #ifdef USE_PROBE_READ bpf_probe_read(&ull, size, (const void *)arg + offset); unsigned lshift = __builtin_preserve_field_info(arg->b2, FIELD_LSHIFT_U64); #if __BYTE_ORDER__ == __ORDER_BIG_ENDIAN__ lshift = lshift + (size << 3) - 64; #endif #else switch(size) { case 1: ull = *(unsigned char *)((void *)arg + offset); break; case 2: ull = *(unsigned short *)((void *)arg + offset); break; case 4: ull = *(unsigned int *)((void *)arg + offset); break; case 8: ull = *(unsigned long long *)((void *)arg + offset); break; } unsigned lshift = __builtin_preserve_field_info(arg->b2, FIELD_LSHIFT_U64); #endif ull <<= lshift; if (__builtin_preserve_field_info(arg->b2, FIELD_SIGNEDNESS)) return (long long)ull >> __builtin_preserve_field_info(arg->b2, FIELD_RSHIFT_U64); return ull >> __builtin_preserve_field_info(arg->b2, FIELD_RSHIFT_U64); } There is a minor overhead for bpf_probe_read() on big endian. The code and relocation generated for field_read where bpf_probe_read() is used to access argument data on little endian mode: r3 = r1 r1 = 0 r1 = 4 <=== relocation (FIELD_BYTE_OFFSET) r3 += r1 r1 = r10 r1 += -8 r2 = 4 <=== relocation (FIELD_BYTE_SIZE) call bpf_probe_read r2 = 51 <=== relocation (FIELD_LSHIFT_U64) r1 = *(u64 *)(r10 - 8) r1 <<= r2 r2 = 60 <=== relocation (FIELD_RSHIFT_U64) r0 = r1 r0 >>= r2 r3 = 1 <=== relocation (FIELD_SIGNEDNESS) if r3 == 0 goto LBB0_2 r1 s>>= r2 r0 = r1 LBB0_2: exit Compare to the above code between relocations FIELD_LSHIFT_U64 and FIELD_LSHIFT_U64, the code with big endian mode has four more instructions. r1 = 41 <=== relocation (FIELD_LSHIFT_U64) r6 += r1 r6 += -64 r6 <<= 32 r6 >>= 32 r1 = *(u64 *)(r10 - 8) r1 <<= r6 r2 = 60 <=== relocation (FIELD_RSHIFT_U64) The code and relocation generated when using direct load. r2 = 0 r3 = 4 r4 = 4 if r4 s> 3 goto LBB0_3 if r4 == 1 goto LBB0_5 if r4 == 2 goto LBB0_6 goto LBB0_9 LBB0_6: # %sw.bb1 r1 += r3 r2 = *(u16 *)(r1 + 0) goto LBB0_9 LBB0_3: # %entry if r4 == 4 goto LBB0_7 if r4 == 8 goto LBB0_8 goto LBB0_9 LBB0_8: # %sw.bb9 r1 += r3 r2 = *(u64 *)(r1 + 0) goto LBB0_9 LBB0_5: # %sw.bb r1 += r3 r2 = *(u8 *)(r1 + 0) goto LBB0_9 LBB0_7: # %sw.bb5 r1 += r3 r2 = *(u32 *)(r1 + 0) LBB0_9: # %sw.epilog r1 = 51 r2 <<= r1 r1 = 60 r0 = r2 r0 >>= r1 r3 = 1 if r3 == 0 goto LBB0_11 r2 s>>= r1 r0 = r2 LBB0_11: # %sw.epilog exit Considering verifier is able to do limited constant propogation following branches. The following is the code actually traversed. r2 = 0 r3 = 4 <=== relocation r4 = 4 <=== relocation if r4 s> 3 goto LBB0_3 LBB0_3: # %entry if r4 == 4 goto LBB0_7 LBB0_7: # %sw.bb5 r1 += r3 r2 = *(u32 *)(r1 + 0) LBB0_9: # %sw.epilog r1 = 51 <=== relocation r2 <<= r1 r1 = 60 <=== relocation r0 = r2 r0 >>= r1 r3 = 1 if r3 == 0 goto LBB0_11 r2 s>>= r1 r0 = r2 LBB0_11: # %sw.epilog exit For native load case, the load size is calculated to be the same as the size of load width LLVM otherwise used to load the value which is then used to extract the bitfield value. Differential Revision: https://reviews.llvm.org/D67980 llvm-svn: 374099
2019-10-09 02:23:17 +08:00
uint32_t RelocKind; ///< What to patch the instruction
[BPF] Support for compile once and run everywhere Introduction ============ This patch added intial support for bpf program compile once and run everywhere (CO-RE). The main motivation is for bpf program which depends on kernel headers which may vary between different kernel versions. The initial discussion can be found at https://lwn.net/Articles/773198/. Currently, bpf program accesses kernel internal data structure through bpf_probe_read() helper. The idea is to capture the kernel data structure to be accessed through bpf_probe_read() and relocate them on different kernel versions. On each host, right before bpf program load, the bpfloader will look at the types of the native linux through vmlinux BTF, calculates proper access offset and patch the instruction. To accommodate this, three intrinsic functions preserve_{array,union,struct}_access_index are introduced which in clang will preserve the base pointer, struct/union/array access_index and struct/union debuginfo type information. Later, bpf IR pass can reconstruct the whole gep access chains without looking at gep itself. This patch did the following: . An IR pass is added to convert preserve_*_access_index to global variable who name encodes the getelementptr access pattern. The global variable has metadata attached to describe the corresponding struct/union debuginfo type. . An SimplifyPatchable MachineInstruction pass is added to remove unnecessary loads. . The BTF output pass is enhanced to generate relocation records located in .BTF.ext section. Typical CO-RE also needs support of global variables which can be assigned to different values to different hosts. For example, kernel version can be used to guard different versions of codes. This patch added the support for patchable externals as well. Example ======= The following is an example. struct pt_regs { long arg1; long arg2; }; struct sk_buff { int i; struct net_device *dev; }; #define _(x) (__builtin_preserve_access_index(x)) static int (*bpf_probe_read)(void *dst, int size, const void *unsafe_ptr) = (void *) 4; extern __attribute__((section(".BPF.patchable_externs"))) unsigned __kernel_version; int bpf_prog(struct pt_regs *ctx) { struct net_device *dev = 0; // ctx->arg* does not need bpf_probe_read if (__kernel_version >= 41608) bpf_probe_read(&dev, sizeof(dev), _(&((struct sk_buff *)ctx->arg1)->dev)); else bpf_probe_read(&dev, sizeof(dev), _(&((struct sk_buff *)ctx->arg2)->dev)); return dev != 0; } In the above, we want to translate the third argument of bpf_probe_read() as relocations. -bash-4.4$ clang -target bpf -O2 -g -S trace.c The compiler will generate two new subsections in .BTF.ext, OffsetReloc and ExternReloc. OffsetReloc is to record the structure member offset operations, and ExternalReloc is to record the external globals where only u8, u16, u32 and u64 are supported. BPFOffsetReloc Size struct SecLOffsetReloc for ELF section #1 A number of struct BPFOffsetReloc for ELF section #1 struct SecOffsetReloc for ELF section #2 A number of struct BPFOffsetReloc for ELF section #2 ... BPFExternReloc Size struct SecExternReloc for ELF section #1 A number of struct BPFExternReloc for ELF section #1 struct SecExternReloc for ELF section #2 A number of struct BPFExternReloc for ELF section #2 struct BPFOffsetReloc { uint32_t InsnOffset; ///< Byte offset in this section uint32_t TypeID; ///< TypeID for the relocation uint32_t OffsetNameOff; ///< The string to traverse types }; struct BPFExternReloc { uint32_t InsnOffset; ///< Byte offset in this section uint32_t ExternNameOff; ///< The string for external variable }; Note that only externs with attribute section ".BPF.patchable_externs" are considered for Extern Reloc which will be patched by bpf loader right before the load. For the above test case, two offset records and one extern record will be generated: OffsetReloc records: .long .Ltmp12 # Insn Offset .long 7 # TypeId .long 242 # Type Decode String .long .Ltmp18 # Insn Offset .long 7 # TypeId .long 242 # Type Decode String ExternReloc record: .long .Ltmp5 # Insn Offset .long 165 # External Variable In string table: .ascii "0:1" # string offset=242 .ascii "__kernel_version" # string offset=165 The default member offset can be calculated as the 2nd member offset (0 representing the 1st member) of struct "sk_buff". The asm code: .Ltmp5: .Ltmp6: r2 = 0 r3 = 41608 .Ltmp7: .Ltmp8: .loc 1 18 9 is_stmt 0 # t.c:18:9 .Ltmp9: if r3 > r2 goto LBB0_2 .Ltmp10: .Ltmp11: .loc 1 0 9 # t.c:0:9 .Ltmp12: r2 = 8 .Ltmp13: .loc 1 19 66 is_stmt 1 # t.c:19:66 .Ltmp14: .Ltmp15: r3 = *(u64 *)(r1 + 0) goto LBB0_3 .Ltmp16: .Ltmp17: LBB0_2: .loc 1 0 66 is_stmt 0 # t.c:0:66 .Ltmp18: r2 = 8 .loc 1 21 66 is_stmt 1 # t.c:21:66 .Ltmp19: r3 = *(u64 *)(r1 + 8) .Ltmp20: .Ltmp21: LBB0_3: .loc 1 0 66 is_stmt 0 # t.c:0:66 r3 += r2 r1 = r10 .Ltmp22: .Ltmp23: .Ltmp24: r1 += -8 r2 = 8 call 4 For instruction .Ltmp12 and .Ltmp18, "r2 = 8", the number 8 is the structure offset based on the current BTF. Loader needs to adjust it if it changes on the host. For instruction .Ltmp5, "r2 = 0", the external variable got a default value 0, loader needs to supply an appropriate value for the particular host. Compiling to generate object code and disassemble: 0000000000000000 bpf_prog: 0: b7 02 00 00 00 00 00 00 r2 = 0 1: 7b 2a f8 ff 00 00 00 00 *(u64 *)(r10 - 8) = r2 2: b7 02 00 00 00 00 00 00 r2 = 0 3: b7 03 00 00 88 a2 00 00 r3 = 41608 4: 2d 23 03 00 00 00 00 00 if r3 > r2 goto +3 <LBB0_2> 5: b7 02 00 00 08 00 00 00 r2 = 8 6: 79 13 00 00 00 00 00 00 r3 = *(u64 *)(r1 + 0) 7: 05 00 02 00 00 00 00 00 goto +2 <LBB0_3> 0000000000000040 LBB0_2: 8: b7 02 00 00 08 00 00 00 r2 = 8 9: 79 13 08 00 00 00 00 00 r3 = *(u64 *)(r1 + 8) 0000000000000050 LBB0_3: 10: 0f 23 00 00 00 00 00 00 r3 += r2 11: bf a1 00 00 00 00 00 00 r1 = r10 12: 07 01 00 00 f8 ff ff ff r1 += -8 13: b7 02 00 00 08 00 00 00 r2 = 8 14: 85 00 00 00 04 00 00 00 call 4 Instructions #2, #5 and #8 need relocation resoutions from the loader. Signed-off-by: Yonghong Song <yhs@fb.com> Differential Revision: https://reviews.llvm.org/D61524 llvm-svn: 365503
2019-07-09 23:28:41 +08:00
};
/// Collect and emit BTF information.
class BTFDebug : public DebugHandlerBase {
MCStreamer &OS;
bool SkipInstruction;
bool LineInfoGenerated;
uint32_t SecNameOff;
uint32_t ArrayIndexTypeId;
[BPF] Support for compile once and run everywhere Introduction ============ This patch added intial support for bpf program compile once and run everywhere (CO-RE). The main motivation is for bpf program which depends on kernel headers which may vary between different kernel versions. The initial discussion can be found at https://lwn.net/Articles/773198/. Currently, bpf program accesses kernel internal data structure through bpf_probe_read() helper. The idea is to capture the kernel data structure to be accessed through bpf_probe_read() and relocate them on different kernel versions. On each host, right before bpf program load, the bpfloader will look at the types of the native linux through vmlinux BTF, calculates proper access offset and patch the instruction. To accommodate this, three intrinsic functions preserve_{array,union,struct}_access_index are introduced which in clang will preserve the base pointer, struct/union/array access_index and struct/union debuginfo type information. Later, bpf IR pass can reconstruct the whole gep access chains without looking at gep itself. This patch did the following: . An IR pass is added to convert preserve_*_access_index to global variable who name encodes the getelementptr access pattern. The global variable has metadata attached to describe the corresponding struct/union debuginfo type. . An SimplifyPatchable MachineInstruction pass is added to remove unnecessary loads. . The BTF output pass is enhanced to generate relocation records located in .BTF.ext section. Typical CO-RE also needs support of global variables which can be assigned to different values to different hosts. For example, kernel version can be used to guard different versions of codes. This patch added the support for patchable externals as well. Example ======= The following is an example. struct pt_regs { long arg1; long arg2; }; struct sk_buff { int i; struct net_device *dev; }; #define _(x) (__builtin_preserve_access_index(x)) static int (*bpf_probe_read)(void *dst, int size, const void *unsafe_ptr) = (void *) 4; extern __attribute__((section(".BPF.patchable_externs"))) unsigned __kernel_version; int bpf_prog(struct pt_regs *ctx) { struct net_device *dev = 0; // ctx->arg* does not need bpf_probe_read if (__kernel_version >= 41608) bpf_probe_read(&dev, sizeof(dev), _(&((struct sk_buff *)ctx->arg1)->dev)); else bpf_probe_read(&dev, sizeof(dev), _(&((struct sk_buff *)ctx->arg2)->dev)); return dev != 0; } In the above, we want to translate the third argument of bpf_probe_read() as relocations. -bash-4.4$ clang -target bpf -O2 -g -S trace.c The compiler will generate two new subsections in .BTF.ext, OffsetReloc and ExternReloc. OffsetReloc is to record the structure member offset operations, and ExternalReloc is to record the external globals where only u8, u16, u32 and u64 are supported. BPFOffsetReloc Size struct SecLOffsetReloc for ELF section #1 A number of struct BPFOffsetReloc for ELF section #1 struct SecOffsetReloc for ELF section #2 A number of struct BPFOffsetReloc for ELF section #2 ... BPFExternReloc Size struct SecExternReloc for ELF section #1 A number of struct BPFExternReloc for ELF section #1 struct SecExternReloc for ELF section #2 A number of struct BPFExternReloc for ELF section #2 struct BPFOffsetReloc { uint32_t InsnOffset; ///< Byte offset in this section uint32_t TypeID; ///< TypeID for the relocation uint32_t OffsetNameOff; ///< The string to traverse types }; struct BPFExternReloc { uint32_t InsnOffset; ///< Byte offset in this section uint32_t ExternNameOff; ///< The string for external variable }; Note that only externs with attribute section ".BPF.patchable_externs" are considered for Extern Reloc which will be patched by bpf loader right before the load. For the above test case, two offset records and one extern record will be generated: OffsetReloc records: .long .Ltmp12 # Insn Offset .long 7 # TypeId .long 242 # Type Decode String .long .Ltmp18 # Insn Offset .long 7 # TypeId .long 242 # Type Decode String ExternReloc record: .long .Ltmp5 # Insn Offset .long 165 # External Variable In string table: .ascii "0:1" # string offset=242 .ascii "__kernel_version" # string offset=165 The default member offset can be calculated as the 2nd member offset (0 representing the 1st member) of struct "sk_buff". The asm code: .Ltmp5: .Ltmp6: r2 = 0 r3 = 41608 .Ltmp7: .Ltmp8: .loc 1 18 9 is_stmt 0 # t.c:18:9 .Ltmp9: if r3 > r2 goto LBB0_2 .Ltmp10: .Ltmp11: .loc 1 0 9 # t.c:0:9 .Ltmp12: r2 = 8 .Ltmp13: .loc 1 19 66 is_stmt 1 # t.c:19:66 .Ltmp14: .Ltmp15: r3 = *(u64 *)(r1 + 0) goto LBB0_3 .Ltmp16: .Ltmp17: LBB0_2: .loc 1 0 66 is_stmt 0 # t.c:0:66 .Ltmp18: r2 = 8 .loc 1 21 66 is_stmt 1 # t.c:21:66 .Ltmp19: r3 = *(u64 *)(r1 + 8) .Ltmp20: .Ltmp21: LBB0_3: .loc 1 0 66 is_stmt 0 # t.c:0:66 r3 += r2 r1 = r10 .Ltmp22: .Ltmp23: .Ltmp24: r1 += -8 r2 = 8 call 4 For instruction .Ltmp12 and .Ltmp18, "r2 = 8", the number 8 is the structure offset based on the current BTF. Loader needs to adjust it if it changes on the host. For instruction .Ltmp5, "r2 = 0", the external variable got a default value 0, loader needs to supply an appropriate value for the particular host. Compiling to generate object code and disassemble: 0000000000000000 bpf_prog: 0: b7 02 00 00 00 00 00 00 r2 = 0 1: 7b 2a f8 ff 00 00 00 00 *(u64 *)(r10 - 8) = r2 2: b7 02 00 00 00 00 00 00 r2 = 0 3: b7 03 00 00 88 a2 00 00 r3 = 41608 4: 2d 23 03 00 00 00 00 00 if r3 > r2 goto +3 <LBB0_2> 5: b7 02 00 00 08 00 00 00 r2 = 8 6: 79 13 00 00 00 00 00 00 r3 = *(u64 *)(r1 + 0) 7: 05 00 02 00 00 00 00 00 goto +2 <LBB0_3> 0000000000000040 LBB0_2: 8: b7 02 00 00 08 00 00 00 r2 = 8 9: 79 13 08 00 00 00 00 00 r3 = *(u64 *)(r1 + 8) 0000000000000050 LBB0_3: 10: 0f 23 00 00 00 00 00 00 r3 += r2 11: bf a1 00 00 00 00 00 00 r1 = r10 12: 07 01 00 00 f8 ff ff ff r1 += -8 13: b7 02 00 00 08 00 00 00 r2 = 8 14: 85 00 00 00 04 00 00 00 call 4 Instructions #2, #5 and #8 need relocation resoutions from the loader. Signed-off-by: Yonghong Song <yhs@fb.com> Differential Revision: https://reviews.llvm.org/D61524 llvm-svn: 365503
2019-07-09 23:28:41 +08:00
bool MapDefNotCollected;
BTFStringTable StringTable;
std::vector<std::unique_ptr<BTFTypeBase>> TypeEntries;
std::unordered_map<const DIType *, uint32_t> DIToIdMap;
std::map<uint32_t, std::vector<BTFFuncInfo>> FuncInfoTable;
std::map<uint32_t, std::vector<BTFLineInfo>> LineInfoTable;
[BPF] do compile-once run-everywhere relocation for bitfields A bpf specific clang intrinsic is introduced: u32 __builtin_preserve_field_info(member_access, info_kind) Depending on info_kind, different information will be returned to the program. A relocation is also recorded for this builtin so that bpf loader can patch the instruction on the target host. This clang intrinsic is used to get certain information to facilitate struct/union member relocations. The offset relocation is extended by 4 bytes to include relocation kind. Currently supported relocation kinds are enum { FIELD_BYTE_OFFSET = 0, FIELD_BYTE_SIZE, FIELD_EXISTENCE, FIELD_SIGNEDNESS, FIELD_LSHIFT_U64, FIELD_RSHIFT_U64, }; for __builtin_preserve_field_info. The old access offset relocation is covered by FIELD_BYTE_OFFSET = 0. An example: struct s { int a; int b1:9; int b2:4; }; enum { FIELD_BYTE_OFFSET = 0, FIELD_BYTE_SIZE, FIELD_EXISTENCE, FIELD_SIGNEDNESS, FIELD_LSHIFT_U64, FIELD_RSHIFT_U64, }; void bpf_probe_read(void *, unsigned, const void *); int field_read(struct s *arg) { unsigned long long ull = 0; unsigned offset = __builtin_preserve_field_info(arg->b2, FIELD_BYTE_OFFSET); unsigned size = __builtin_preserve_field_info(arg->b2, FIELD_BYTE_SIZE); #ifdef USE_PROBE_READ bpf_probe_read(&ull, size, (const void *)arg + offset); unsigned lshift = __builtin_preserve_field_info(arg->b2, FIELD_LSHIFT_U64); #if __BYTE_ORDER__ == __ORDER_BIG_ENDIAN__ lshift = lshift + (size << 3) - 64; #endif #else switch(size) { case 1: ull = *(unsigned char *)((void *)arg + offset); break; case 2: ull = *(unsigned short *)((void *)arg + offset); break; case 4: ull = *(unsigned int *)((void *)arg + offset); break; case 8: ull = *(unsigned long long *)((void *)arg + offset); break; } unsigned lshift = __builtin_preserve_field_info(arg->b2, FIELD_LSHIFT_U64); #endif ull <<= lshift; if (__builtin_preserve_field_info(arg->b2, FIELD_SIGNEDNESS)) return (long long)ull >> __builtin_preserve_field_info(arg->b2, FIELD_RSHIFT_U64); return ull >> __builtin_preserve_field_info(arg->b2, FIELD_RSHIFT_U64); } There is a minor overhead for bpf_probe_read() on big endian. The code and relocation generated for field_read where bpf_probe_read() is used to access argument data on little endian mode: r3 = r1 r1 = 0 r1 = 4 <=== relocation (FIELD_BYTE_OFFSET) r3 += r1 r1 = r10 r1 += -8 r2 = 4 <=== relocation (FIELD_BYTE_SIZE) call bpf_probe_read r2 = 51 <=== relocation (FIELD_LSHIFT_U64) r1 = *(u64 *)(r10 - 8) r1 <<= r2 r2 = 60 <=== relocation (FIELD_RSHIFT_U64) r0 = r1 r0 >>= r2 r3 = 1 <=== relocation (FIELD_SIGNEDNESS) if r3 == 0 goto LBB0_2 r1 s>>= r2 r0 = r1 LBB0_2: exit Compare to the above code between relocations FIELD_LSHIFT_U64 and FIELD_LSHIFT_U64, the code with big endian mode has four more instructions. r1 = 41 <=== relocation (FIELD_LSHIFT_U64) r6 += r1 r6 += -64 r6 <<= 32 r6 >>= 32 r1 = *(u64 *)(r10 - 8) r1 <<= r6 r2 = 60 <=== relocation (FIELD_RSHIFT_U64) The code and relocation generated when using direct load. r2 = 0 r3 = 4 r4 = 4 if r4 s> 3 goto LBB0_3 if r4 == 1 goto LBB0_5 if r4 == 2 goto LBB0_6 goto LBB0_9 LBB0_6: # %sw.bb1 r1 += r3 r2 = *(u16 *)(r1 + 0) goto LBB0_9 LBB0_3: # %entry if r4 == 4 goto LBB0_7 if r4 == 8 goto LBB0_8 goto LBB0_9 LBB0_8: # %sw.bb9 r1 += r3 r2 = *(u64 *)(r1 + 0) goto LBB0_9 LBB0_5: # %sw.bb r1 += r3 r2 = *(u8 *)(r1 + 0) goto LBB0_9 LBB0_7: # %sw.bb5 r1 += r3 r2 = *(u32 *)(r1 + 0) LBB0_9: # %sw.epilog r1 = 51 r2 <<= r1 r1 = 60 r0 = r2 r0 >>= r1 r3 = 1 if r3 == 0 goto LBB0_11 r2 s>>= r1 r0 = r2 LBB0_11: # %sw.epilog exit Considering verifier is able to do limited constant propogation following branches. The following is the code actually traversed. r2 = 0 r3 = 4 <=== relocation r4 = 4 <=== relocation if r4 s> 3 goto LBB0_3 LBB0_3: # %entry if r4 == 4 goto LBB0_7 LBB0_7: # %sw.bb5 r1 += r3 r2 = *(u32 *)(r1 + 0) LBB0_9: # %sw.epilog r1 = 51 <=== relocation r2 <<= r1 r1 = 60 <=== relocation r0 = r2 r0 >>= r1 r3 = 1 if r3 == 0 goto LBB0_11 r2 s>>= r1 r0 = r2 LBB0_11: # %sw.epilog exit For native load case, the load size is calculated to be the same as the size of load width LLVM otherwise used to load the value which is then used to extract the bitfield value. Differential Revision: https://reviews.llvm.org/D67980 llvm-svn: 374099
2019-10-09 02:23:17 +08:00
std::map<uint32_t, std::vector<BTFFieldReloc>> FieldRelocTable;
StringMap<std::vector<std::string>> FileContent;
[BPF] Support for compile once and run everywhere Introduction ============ This patch added intial support for bpf program compile once and run everywhere (CO-RE). The main motivation is for bpf program which depends on kernel headers which may vary between different kernel versions. The initial discussion can be found at https://lwn.net/Articles/773198/. Currently, bpf program accesses kernel internal data structure through bpf_probe_read() helper. The idea is to capture the kernel data structure to be accessed through bpf_probe_read() and relocate them on different kernel versions. On each host, right before bpf program load, the bpfloader will look at the types of the native linux through vmlinux BTF, calculates proper access offset and patch the instruction. To accommodate this, three intrinsic functions preserve_{array,union,struct}_access_index are introduced which in clang will preserve the base pointer, struct/union/array access_index and struct/union debuginfo type information. Later, bpf IR pass can reconstruct the whole gep access chains without looking at gep itself. This patch did the following: . An IR pass is added to convert preserve_*_access_index to global variable who name encodes the getelementptr access pattern. The global variable has metadata attached to describe the corresponding struct/union debuginfo type. . An SimplifyPatchable MachineInstruction pass is added to remove unnecessary loads. . The BTF output pass is enhanced to generate relocation records located in .BTF.ext section. Typical CO-RE also needs support of global variables which can be assigned to different values to different hosts. For example, kernel version can be used to guard different versions of codes. This patch added the support for patchable externals as well. Example ======= The following is an example. struct pt_regs { long arg1; long arg2; }; struct sk_buff { int i; struct net_device *dev; }; #define _(x) (__builtin_preserve_access_index(x)) static int (*bpf_probe_read)(void *dst, int size, const void *unsafe_ptr) = (void *) 4; extern __attribute__((section(".BPF.patchable_externs"))) unsigned __kernel_version; int bpf_prog(struct pt_regs *ctx) { struct net_device *dev = 0; // ctx->arg* does not need bpf_probe_read if (__kernel_version >= 41608) bpf_probe_read(&dev, sizeof(dev), _(&((struct sk_buff *)ctx->arg1)->dev)); else bpf_probe_read(&dev, sizeof(dev), _(&((struct sk_buff *)ctx->arg2)->dev)); return dev != 0; } In the above, we want to translate the third argument of bpf_probe_read() as relocations. -bash-4.4$ clang -target bpf -O2 -g -S trace.c The compiler will generate two new subsections in .BTF.ext, OffsetReloc and ExternReloc. OffsetReloc is to record the structure member offset operations, and ExternalReloc is to record the external globals where only u8, u16, u32 and u64 are supported. BPFOffsetReloc Size struct SecLOffsetReloc for ELF section #1 A number of struct BPFOffsetReloc for ELF section #1 struct SecOffsetReloc for ELF section #2 A number of struct BPFOffsetReloc for ELF section #2 ... BPFExternReloc Size struct SecExternReloc for ELF section #1 A number of struct BPFExternReloc for ELF section #1 struct SecExternReloc for ELF section #2 A number of struct BPFExternReloc for ELF section #2 struct BPFOffsetReloc { uint32_t InsnOffset; ///< Byte offset in this section uint32_t TypeID; ///< TypeID for the relocation uint32_t OffsetNameOff; ///< The string to traverse types }; struct BPFExternReloc { uint32_t InsnOffset; ///< Byte offset in this section uint32_t ExternNameOff; ///< The string for external variable }; Note that only externs with attribute section ".BPF.patchable_externs" are considered for Extern Reloc which will be patched by bpf loader right before the load. For the above test case, two offset records and one extern record will be generated: OffsetReloc records: .long .Ltmp12 # Insn Offset .long 7 # TypeId .long 242 # Type Decode String .long .Ltmp18 # Insn Offset .long 7 # TypeId .long 242 # Type Decode String ExternReloc record: .long .Ltmp5 # Insn Offset .long 165 # External Variable In string table: .ascii "0:1" # string offset=242 .ascii "__kernel_version" # string offset=165 The default member offset can be calculated as the 2nd member offset (0 representing the 1st member) of struct "sk_buff". The asm code: .Ltmp5: .Ltmp6: r2 = 0 r3 = 41608 .Ltmp7: .Ltmp8: .loc 1 18 9 is_stmt 0 # t.c:18:9 .Ltmp9: if r3 > r2 goto LBB0_2 .Ltmp10: .Ltmp11: .loc 1 0 9 # t.c:0:9 .Ltmp12: r2 = 8 .Ltmp13: .loc 1 19 66 is_stmt 1 # t.c:19:66 .Ltmp14: .Ltmp15: r3 = *(u64 *)(r1 + 0) goto LBB0_3 .Ltmp16: .Ltmp17: LBB0_2: .loc 1 0 66 is_stmt 0 # t.c:0:66 .Ltmp18: r2 = 8 .loc 1 21 66 is_stmt 1 # t.c:21:66 .Ltmp19: r3 = *(u64 *)(r1 + 8) .Ltmp20: .Ltmp21: LBB0_3: .loc 1 0 66 is_stmt 0 # t.c:0:66 r3 += r2 r1 = r10 .Ltmp22: .Ltmp23: .Ltmp24: r1 += -8 r2 = 8 call 4 For instruction .Ltmp12 and .Ltmp18, "r2 = 8", the number 8 is the structure offset based on the current BTF. Loader needs to adjust it if it changes on the host. For instruction .Ltmp5, "r2 = 0", the external variable got a default value 0, loader needs to supply an appropriate value for the particular host. Compiling to generate object code and disassemble: 0000000000000000 bpf_prog: 0: b7 02 00 00 00 00 00 00 r2 = 0 1: 7b 2a f8 ff 00 00 00 00 *(u64 *)(r10 - 8) = r2 2: b7 02 00 00 00 00 00 00 r2 = 0 3: b7 03 00 00 88 a2 00 00 r3 = 41608 4: 2d 23 03 00 00 00 00 00 if r3 > r2 goto +3 <LBB0_2> 5: b7 02 00 00 08 00 00 00 r2 = 8 6: 79 13 00 00 00 00 00 00 r3 = *(u64 *)(r1 + 0) 7: 05 00 02 00 00 00 00 00 goto +2 <LBB0_3> 0000000000000040 LBB0_2: 8: b7 02 00 00 08 00 00 00 r2 = 8 9: 79 13 08 00 00 00 00 00 r3 = *(u64 *)(r1 + 8) 0000000000000050 LBB0_3: 10: 0f 23 00 00 00 00 00 00 r3 += r2 11: bf a1 00 00 00 00 00 00 r1 = r10 12: 07 01 00 00 f8 ff ff ff r1 += -8 13: b7 02 00 00 08 00 00 00 r2 = 8 14: 85 00 00 00 04 00 00 00 call 4 Instructions #2, #5 and #8 need relocation resoutions from the loader. Signed-off-by: Yonghong Song <yhs@fb.com> Differential Revision: https://reviews.llvm.org/D61524 llvm-svn: 365503
2019-07-09 23:28:41 +08:00
std::map<std::string, std::unique_ptr<BTFKindDataSec>> DataSecEntries;
std::vector<BTFTypeStruct *> StructTypes;
[BPF] do compile-once run-everywhere relocation for bitfields A bpf specific clang intrinsic is introduced: u32 __builtin_preserve_field_info(member_access, info_kind) Depending on info_kind, different information will be returned to the program. A relocation is also recorded for this builtin so that bpf loader can patch the instruction on the target host. This clang intrinsic is used to get certain information to facilitate struct/union member relocations. The offset relocation is extended by 4 bytes to include relocation kind. Currently supported relocation kinds are enum { FIELD_BYTE_OFFSET = 0, FIELD_BYTE_SIZE, FIELD_EXISTENCE, FIELD_SIGNEDNESS, FIELD_LSHIFT_U64, FIELD_RSHIFT_U64, }; for __builtin_preserve_field_info. The old access offset relocation is covered by FIELD_BYTE_OFFSET = 0. An example: struct s { int a; int b1:9; int b2:4; }; enum { FIELD_BYTE_OFFSET = 0, FIELD_BYTE_SIZE, FIELD_EXISTENCE, FIELD_SIGNEDNESS, FIELD_LSHIFT_U64, FIELD_RSHIFT_U64, }; void bpf_probe_read(void *, unsigned, const void *); int field_read(struct s *arg) { unsigned long long ull = 0; unsigned offset = __builtin_preserve_field_info(arg->b2, FIELD_BYTE_OFFSET); unsigned size = __builtin_preserve_field_info(arg->b2, FIELD_BYTE_SIZE); #ifdef USE_PROBE_READ bpf_probe_read(&ull, size, (const void *)arg + offset); unsigned lshift = __builtin_preserve_field_info(arg->b2, FIELD_LSHIFT_U64); #if __BYTE_ORDER__ == __ORDER_BIG_ENDIAN__ lshift = lshift + (size << 3) - 64; #endif #else switch(size) { case 1: ull = *(unsigned char *)((void *)arg + offset); break; case 2: ull = *(unsigned short *)((void *)arg + offset); break; case 4: ull = *(unsigned int *)((void *)arg + offset); break; case 8: ull = *(unsigned long long *)((void *)arg + offset); break; } unsigned lshift = __builtin_preserve_field_info(arg->b2, FIELD_LSHIFT_U64); #endif ull <<= lshift; if (__builtin_preserve_field_info(arg->b2, FIELD_SIGNEDNESS)) return (long long)ull >> __builtin_preserve_field_info(arg->b2, FIELD_RSHIFT_U64); return ull >> __builtin_preserve_field_info(arg->b2, FIELD_RSHIFT_U64); } There is a minor overhead for bpf_probe_read() on big endian. The code and relocation generated for field_read where bpf_probe_read() is used to access argument data on little endian mode: r3 = r1 r1 = 0 r1 = 4 <=== relocation (FIELD_BYTE_OFFSET) r3 += r1 r1 = r10 r1 += -8 r2 = 4 <=== relocation (FIELD_BYTE_SIZE) call bpf_probe_read r2 = 51 <=== relocation (FIELD_LSHIFT_U64) r1 = *(u64 *)(r10 - 8) r1 <<= r2 r2 = 60 <=== relocation (FIELD_RSHIFT_U64) r0 = r1 r0 >>= r2 r3 = 1 <=== relocation (FIELD_SIGNEDNESS) if r3 == 0 goto LBB0_2 r1 s>>= r2 r0 = r1 LBB0_2: exit Compare to the above code between relocations FIELD_LSHIFT_U64 and FIELD_LSHIFT_U64, the code with big endian mode has four more instructions. r1 = 41 <=== relocation (FIELD_LSHIFT_U64) r6 += r1 r6 += -64 r6 <<= 32 r6 >>= 32 r1 = *(u64 *)(r10 - 8) r1 <<= r6 r2 = 60 <=== relocation (FIELD_RSHIFT_U64) The code and relocation generated when using direct load. r2 = 0 r3 = 4 r4 = 4 if r4 s> 3 goto LBB0_3 if r4 == 1 goto LBB0_5 if r4 == 2 goto LBB0_6 goto LBB0_9 LBB0_6: # %sw.bb1 r1 += r3 r2 = *(u16 *)(r1 + 0) goto LBB0_9 LBB0_3: # %entry if r4 == 4 goto LBB0_7 if r4 == 8 goto LBB0_8 goto LBB0_9 LBB0_8: # %sw.bb9 r1 += r3 r2 = *(u64 *)(r1 + 0) goto LBB0_9 LBB0_5: # %sw.bb r1 += r3 r2 = *(u8 *)(r1 + 0) goto LBB0_9 LBB0_7: # %sw.bb5 r1 += r3 r2 = *(u32 *)(r1 + 0) LBB0_9: # %sw.epilog r1 = 51 r2 <<= r1 r1 = 60 r0 = r2 r0 >>= r1 r3 = 1 if r3 == 0 goto LBB0_11 r2 s>>= r1 r0 = r2 LBB0_11: # %sw.epilog exit Considering verifier is able to do limited constant propogation following branches. The following is the code actually traversed. r2 = 0 r3 = 4 <=== relocation r4 = 4 <=== relocation if r4 s> 3 goto LBB0_3 LBB0_3: # %entry if r4 == 4 goto LBB0_7 LBB0_7: # %sw.bb5 r1 += r3 r2 = *(u32 *)(r1 + 0) LBB0_9: # %sw.epilog r1 = 51 <=== relocation r2 <<= r1 r1 = 60 <=== relocation r0 = r2 r0 >>= r1 r3 = 1 if r3 == 0 goto LBB0_11 r2 s>>= r1 r0 = r2 LBB0_11: # %sw.epilog exit For native load case, the load size is calculated to be the same as the size of load width LLVM otherwise used to load the value which is then used to extract the bitfield value. Differential Revision: https://reviews.llvm.org/D67980 llvm-svn: 374099
2019-10-09 02:23:17 +08:00
std::map<std::string, uint32_t> PatchImms;
[BPF] Support for compile once and run everywhere Introduction ============ This patch added intial support for bpf program compile once and run everywhere (CO-RE). The main motivation is for bpf program which depends on kernel headers which may vary between different kernel versions. The initial discussion can be found at https://lwn.net/Articles/773198/. Currently, bpf program accesses kernel internal data structure through bpf_probe_read() helper. The idea is to capture the kernel data structure to be accessed through bpf_probe_read() and relocate them on different kernel versions. On each host, right before bpf program load, the bpfloader will look at the types of the native linux through vmlinux BTF, calculates proper access offset and patch the instruction. To accommodate this, three intrinsic functions preserve_{array,union,struct}_access_index are introduced which in clang will preserve the base pointer, struct/union/array access_index and struct/union debuginfo type information. Later, bpf IR pass can reconstruct the whole gep access chains without looking at gep itself. This patch did the following: . An IR pass is added to convert preserve_*_access_index to global variable who name encodes the getelementptr access pattern. The global variable has metadata attached to describe the corresponding struct/union debuginfo type. . An SimplifyPatchable MachineInstruction pass is added to remove unnecessary loads. . The BTF output pass is enhanced to generate relocation records located in .BTF.ext section. Typical CO-RE also needs support of global variables which can be assigned to different values to different hosts. For example, kernel version can be used to guard different versions of codes. This patch added the support for patchable externals as well. Example ======= The following is an example. struct pt_regs { long arg1; long arg2; }; struct sk_buff { int i; struct net_device *dev; }; #define _(x) (__builtin_preserve_access_index(x)) static int (*bpf_probe_read)(void *dst, int size, const void *unsafe_ptr) = (void *) 4; extern __attribute__((section(".BPF.patchable_externs"))) unsigned __kernel_version; int bpf_prog(struct pt_regs *ctx) { struct net_device *dev = 0; // ctx->arg* does not need bpf_probe_read if (__kernel_version >= 41608) bpf_probe_read(&dev, sizeof(dev), _(&((struct sk_buff *)ctx->arg1)->dev)); else bpf_probe_read(&dev, sizeof(dev), _(&((struct sk_buff *)ctx->arg2)->dev)); return dev != 0; } In the above, we want to translate the third argument of bpf_probe_read() as relocations. -bash-4.4$ clang -target bpf -O2 -g -S trace.c The compiler will generate two new subsections in .BTF.ext, OffsetReloc and ExternReloc. OffsetReloc is to record the structure member offset operations, and ExternalReloc is to record the external globals where only u8, u16, u32 and u64 are supported. BPFOffsetReloc Size struct SecLOffsetReloc for ELF section #1 A number of struct BPFOffsetReloc for ELF section #1 struct SecOffsetReloc for ELF section #2 A number of struct BPFOffsetReloc for ELF section #2 ... BPFExternReloc Size struct SecExternReloc for ELF section #1 A number of struct BPFExternReloc for ELF section #1 struct SecExternReloc for ELF section #2 A number of struct BPFExternReloc for ELF section #2 struct BPFOffsetReloc { uint32_t InsnOffset; ///< Byte offset in this section uint32_t TypeID; ///< TypeID for the relocation uint32_t OffsetNameOff; ///< The string to traverse types }; struct BPFExternReloc { uint32_t InsnOffset; ///< Byte offset in this section uint32_t ExternNameOff; ///< The string for external variable }; Note that only externs with attribute section ".BPF.patchable_externs" are considered for Extern Reloc which will be patched by bpf loader right before the load. For the above test case, two offset records and one extern record will be generated: OffsetReloc records: .long .Ltmp12 # Insn Offset .long 7 # TypeId .long 242 # Type Decode String .long .Ltmp18 # Insn Offset .long 7 # TypeId .long 242 # Type Decode String ExternReloc record: .long .Ltmp5 # Insn Offset .long 165 # External Variable In string table: .ascii "0:1" # string offset=242 .ascii "__kernel_version" # string offset=165 The default member offset can be calculated as the 2nd member offset (0 representing the 1st member) of struct "sk_buff". The asm code: .Ltmp5: .Ltmp6: r2 = 0 r3 = 41608 .Ltmp7: .Ltmp8: .loc 1 18 9 is_stmt 0 # t.c:18:9 .Ltmp9: if r3 > r2 goto LBB0_2 .Ltmp10: .Ltmp11: .loc 1 0 9 # t.c:0:9 .Ltmp12: r2 = 8 .Ltmp13: .loc 1 19 66 is_stmt 1 # t.c:19:66 .Ltmp14: .Ltmp15: r3 = *(u64 *)(r1 + 0) goto LBB0_3 .Ltmp16: .Ltmp17: LBB0_2: .loc 1 0 66 is_stmt 0 # t.c:0:66 .Ltmp18: r2 = 8 .loc 1 21 66 is_stmt 1 # t.c:21:66 .Ltmp19: r3 = *(u64 *)(r1 + 8) .Ltmp20: .Ltmp21: LBB0_3: .loc 1 0 66 is_stmt 0 # t.c:0:66 r3 += r2 r1 = r10 .Ltmp22: .Ltmp23: .Ltmp24: r1 += -8 r2 = 8 call 4 For instruction .Ltmp12 and .Ltmp18, "r2 = 8", the number 8 is the structure offset based on the current BTF. Loader needs to adjust it if it changes on the host. For instruction .Ltmp5, "r2 = 0", the external variable got a default value 0, loader needs to supply an appropriate value for the particular host. Compiling to generate object code and disassemble: 0000000000000000 bpf_prog: 0: b7 02 00 00 00 00 00 00 r2 = 0 1: 7b 2a f8 ff 00 00 00 00 *(u64 *)(r10 - 8) = r2 2: b7 02 00 00 00 00 00 00 r2 = 0 3: b7 03 00 00 88 a2 00 00 r3 = 41608 4: 2d 23 03 00 00 00 00 00 if r3 > r2 goto +3 <LBB0_2> 5: b7 02 00 00 08 00 00 00 r2 = 8 6: 79 13 00 00 00 00 00 00 r3 = *(u64 *)(r1 + 0) 7: 05 00 02 00 00 00 00 00 goto +2 <LBB0_3> 0000000000000040 LBB0_2: 8: b7 02 00 00 08 00 00 00 r2 = 8 9: 79 13 08 00 00 00 00 00 r3 = *(u64 *)(r1 + 8) 0000000000000050 LBB0_3: 10: 0f 23 00 00 00 00 00 00 r3 += r2 11: bf a1 00 00 00 00 00 00 r1 = r10 12: 07 01 00 00 f8 ff ff ff r1 += -8 13: b7 02 00 00 08 00 00 00 r2 = 8 14: 85 00 00 00 04 00 00 00 call 4 Instructions #2, #5 and #8 need relocation resoutions from the loader. Signed-off-by: Yonghong Song <yhs@fb.com> Differential Revision: https://reviews.llvm.org/D61524 llvm-svn: 365503
2019-07-09 23:28:41 +08:00
std::map<StringRef, std::pair<bool, std::vector<BTFTypeDerived *>>>
FixupDerivedTypes;
std::set<const Function *>ProtoFunctions;
/// Add types to TypeEntries.
/// @{
/// Add types to TypeEntries and DIToIdMap.
[BPF] Add BTF Var and DataSec Support Two new kinds, BTF_KIND_VAR and BTF_KIND_DATASEC, are added. BTF_KIND_VAR has the following specification: btf_type.name: var name btf_type.info: type kind btf_type.type: var type // btf_type is followed by one u32 u32: varinfo (currently, only 0 - static, 1 - global allocated in elf sections) Not all globals are supported in this patch. The following globals are supported: . static variables with or without section attributes . global variables with section attributes The inclusion of globals with section attributes is for future potential extraction of key/value type id's from map definition. BTF_KIND_DATASEC has the following specification: btf_type.name: section name associated with variable or one of .data/.bss/.readonly btf_type.info: type kind and vlen for # of variables btf_type.size: 0 #vlen number of the following: u32: id of corresponding BTF_KIND_VAR u32: in-session offset of the var u32: the size of memory var occupied At the time of debug info emission, the data section size is unknown, so the btf_type.size = 0 for BTF_KIND_DATASEC. The loader can patch it during loading time. The in-session offseet of the var is only available for static variables. For global variables, the loader neeeds to assign the global variable symbol value in symbol table to in-section offset. The size of memory is used to specify the amount of the memory a variable occupies. Typically, it equals to the type size, but for certain structures, e.g., struct tt { int a; int b; char c[]; }; static volatile struct tt s2 = {3, 4, "abcdefghi"}; The static variable s2 has size of 20. Note that for BTF_KIND_DATASEC name, the section name does not contain object name. The compiler does have input module name. For example, two cases below: . clang -target bpf -O2 -g -c test.c The compiler knows the input file (module) is test.c and can generate sec name like test.data/test.bss etc. . clang -target bpf -O2 -g -emit-llvm -c test.c -o - | llc -march=bpf -filetype=obj -o test.o The llc compiler has the input file as stdin, and would generate something like stdin.data/stdin.bss etc. which does not really make sense. For any user specificed section name, e.g., static volatile int a __attribute__((section("id1"))); static volatile const int b __attribute__((section("id2"))); The DataSec with name "id1" and "id2" does not contain information whether the section is readonly or not. The loader needs to check the corresponding elf section flags for such information. A simple example: -bash-4.4$ cat t.c int g1; int g2 = 3; const int g3 = 4; static volatile int s1; struct tt { int a; int b; char c[]; }; static volatile struct tt s2 = {3, 4, "abcdefghi"}; static volatile const int s3 = 4; int m __attribute__((section("maps"), used)) = 4; int test() { return g1 + g2 + g3 + s1 + s2.a + s3 + m; } -bash-4.4$ clang -target bpf -O2 -g -S t.c Checking t.s, 4 BTF_KIND_VAR's are generated (s1, s2, s3 and m). 4 BTF_KIND_DATASEC's are generated with names ".data", ".bss", ".rodata" and "maps". Signed-off-by: Yonghong Song <yhs@fb.com> Differential Revision: https://reviews.llvm.org/D59441 llvm-svn: 356326
2019-03-16 23:36:31 +08:00
uint32_t addType(std::unique_ptr<BTFTypeBase> TypeEntry, const DIType *Ty);
/// Add types to TypeEntries only and return type id.
uint32_t addType(std::unique_ptr<BTFTypeBase> TypeEntry);
/// @}
/// IR type visiting functions.
/// @{
void visitTypeEntry(const DIType *Ty);
[BPF] Support for compile once and run everywhere Introduction ============ This patch added intial support for bpf program compile once and run everywhere (CO-RE). The main motivation is for bpf program which depends on kernel headers which may vary between different kernel versions. The initial discussion can be found at https://lwn.net/Articles/773198/. Currently, bpf program accesses kernel internal data structure through bpf_probe_read() helper. The idea is to capture the kernel data structure to be accessed through bpf_probe_read() and relocate them on different kernel versions. On each host, right before bpf program load, the bpfloader will look at the types of the native linux through vmlinux BTF, calculates proper access offset and patch the instruction. To accommodate this, three intrinsic functions preserve_{array,union,struct}_access_index are introduced which in clang will preserve the base pointer, struct/union/array access_index and struct/union debuginfo type information. Later, bpf IR pass can reconstruct the whole gep access chains without looking at gep itself. This patch did the following: . An IR pass is added to convert preserve_*_access_index to global variable who name encodes the getelementptr access pattern. The global variable has metadata attached to describe the corresponding struct/union debuginfo type. . An SimplifyPatchable MachineInstruction pass is added to remove unnecessary loads. . The BTF output pass is enhanced to generate relocation records located in .BTF.ext section. Typical CO-RE also needs support of global variables which can be assigned to different values to different hosts. For example, kernel version can be used to guard different versions of codes. This patch added the support for patchable externals as well. Example ======= The following is an example. struct pt_regs { long arg1; long arg2; }; struct sk_buff { int i; struct net_device *dev; }; #define _(x) (__builtin_preserve_access_index(x)) static int (*bpf_probe_read)(void *dst, int size, const void *unsafe_ptr) = (void *) 4; extern __attribute__((section(".BPF.patchable_externs"))) unsigned __kernel_version; int bpf_prog(struct pt_regs *ctx) { struct net_device *dev = 0; // ctx->arg* does not need bpf_probe_read if (__kernel_version >= 41608) bpf_probe_read(&dev, sizeof(dev), _(&((struct sk_buff *)ctx->arg1)->dev)); else bpf_probe_read(&dev, sizeof(dev), _(&((struct sk_buff *)ctx->arg2)->dev)); return dev != 0; } In the above, we want to translate the third argument of bpf_probe_read() as relocations. -bash-4.4$ clang -target bpf -O2 -g -S trace.c The compiler will generate two new subsections in .BTF.ext, OffsetReloc and ExternReloc. OffsetReloc is to record the structure member offset operations, and ExternalReloc is to record the external globals where only u8, u16, u32 and u64 are supported. BPFOffsetReloc Size struct SecLOffsetReloc for ELF section #1 A number of struct BPFOffsetReloc for ELF section #1 struct SecOffsetReloc for ELF section #2 A number of struct BPFOffsetReloc for ELF section #2 ... BPFExternReloc Size struct SecExternReloc for ELF section #1 A number of struct BPFExternReloc for ELF section #1 struct SecExternReloc for ELF section #2 A number of struct BPFExternReloc for ELF section #2 struct BPFOffsetReloc { uint32_t InsnOffset; ///< Byte offset in this section uint32_t TypeID; ///< TypeID for the relocation uint32_t OffsetNameOff; ///< The string to traverse types }; struct BPFExternReloc { uint32_t InsnOffset; ///< Byte offset in this section uint32_t ExternNameOff; ///< The string for external variable }; Note that only externs with attribute section ".BPF.patchable_externs" are considered for Extern Reloc which will be patched by bpf loader right before the load. For the above test case, two offset records and one extern record will be generated: OffsetReloc records: .long .Ltmp12 # Insn Offset .long 7 # TypeId .long 242 # Type Decode String .long .Ltmp18 # Insn Offset .long 7 # TypeId .long 242 # Type Decode String ExternReloc record: .long .Ltmp5 # Insn Offset .long 165 # External Variable In string table: .ascii "0:1" # string offset=242 .ascii "__kernel_version" # string offset=165 The default member offset can be calculated as the 2nd member offset (0 representing the 1st member) of struct "sk_buff". The asm code: .Ltmp5: .Ltmp6: r2 = 0 r3 = 41608 .Ltmp7: .Ltmp8: .loc 1 18 9 is_stmt 0 # t.c:18:9 .Ltmp9: if r3 > r2 goto LBB0_2 .Ltmp10: .Ltmp11: .loc 1 0 9 # t.c:0:9 .Ltmp12: r2 = 8 .Ltmp13: .loc 1 19 66 is_stmt 1 # t.c:19:66 .Ltmp14: .Ltmp15: r3 = *(u64 *)(r1 + 0) goto LBB0_3 .Ltmp16: .Ltmp17: LBB0_2: .loc 1 0 66 is_stmt 0 # t.c:0:66 .Ltmp18: r2 = 8 .loc 1 21 66 is_stmt 1 # t.c:21:66 .Ltmp19: r3 = *(u64 *)(r1 + 8) .Ltmp20: .Ltmp21: LBB0_3: .loc 1 0 66 is_stmt 0 # t.c:0:66 r3 += r2 r1 = r10 .Ltmp22: .Ltmp23: .Ltmp24: r1 += -8 r2 = 8 call 4 For instruction .Ltmp12 and .Ltmp18, "r2 = 8", the number 8 is the structure offset based on the current BTF. Loader needs to adjust it if it changes on the host. For instruction .Ltmp5, "r2 = 0", the external variable got a default value 0, loader needs to supply an appropriate value for the particular host. Compiling to generate object code and disassemble: 0000000000000000 bpf_prog: 0: b7 02 00 00 00 00 00 00 r2 = 0 1: 7b 2a f8 ff 00 00 00 00 *(u64 *)(r10 - 8) = r2 2: b7 02 00 00 00 00 00 00 r2 = 0 3: b7 03 00 00 88 a2 00 00 r3 = 41608 4: 2d 23 03 00 00 00 00 00 if r3 > r2 goto +3 <LBB0_2> 5: b7 02 00 00 08 00 00 00 r2 = 8 6: 79 13 00 00 00 00 00 00 r3 = *(u64 *)(r1 + 0) 7: 05 00 02 00 00 00 00 00 goto +2 <LBB0_3> 0000000000000040 LBB0_2: 8: b7 02 00 00 08 00 00 00 r2 = 8 9: 79 13 08 00 00 00 00 00 r3 = *(u64 *)(r1 + 8) 0000000000000050 LBB0_3: 10: 0f 23 00 00 00 00 00 00 r3 += r2 11: bf a1 00 00 00 00 00 00 r1 = r10 12: 07 01 00 00 f8 ff ff ff r1 += -8 13: b7 02 00 00 08 00 00 00 r2 = 8 14: 85 00 00 00 04 00 00 00 call 4 Instructions #2, #5 and #8 need relocation resoutions from the loader. Signed-off-by: Yonghong Song <yhs@fb.com> Differential Revision: https://reviews.llvm.org/D61524 llvm-svn: 365503
2019-07-09 23:28:41 +08:00
void visitTypeEntry(const DIType *Ty, uint32_t &TypeId, bool CheckPointer,
bool SeenPointer);
[BPF] Add BTF Var and DataSec Support Two new kinds, BTF_KIND_VAR and BTF_KIND_DATASEC, are added. BTF_KIND_VAR has the following specification: btf_type.name: var name btf_type.info: type kind btf_type.type: var type // btf_type is followed by one u32 u32: varinfo (currently, only 0 - static, 1 - global allocated in elf sections) Not all globals are supported in this patch. The following globals are supported: . static variables with or without section attributes . global variables with section attributes The inclusion of globals with section attributes is for future potential extraction of key/value type id's from map definition. BTF_KIND_DATASEC has the following specification: btf_type.name: section name associated with variable or one of .data/.bss/.readonly btf_type.info: type kind and vlen for # of variables btf_type.size: 0 #vlen number of the following: u32: id of corresponding BTF_KIND_VAR u32: in-session offset of the var u32: the size of memory var occupied At the time of debug info emission, the data section size is unknown, so the btf_type.size = 0 for BTF_KIND_DATASEC. The loader can patch it during loading time. The in-session offseet of the var is only available for static variables. For global variables, the loader neeeds to assign the global variable symbol value in symbol table to in-section offset. The size of memory is used to specify the amount of the memory a variable occupies. Typically, it equals to the type size, but for certain structures, e.g., struct tt { int a; int b; char c[]; }; static volatile struct tt s2 = {3, 4, "abcdefghi"}; The static variable s2 has size of 20. Note that for BTF_KIND_DATASEC name, the section name does not contain object name. The compiler does have input module name. For example, two cases below: . clang -target bpf -O2 -g -c test.c The compiler knows the input file (module) is test.c and can generate sec name like test.data/test.bss etc. . clang -target bpf -O2 -g -emit-llvm -c test.c -o - | llc -march=bpf -filetype=obj -o test.o The llc compiler has the input file as stdin, and would generate something like stdin.data/stdin.bss etc. which does not really make sense. For any user specificed section name, e.g., static volatile int a __attribute__((section("id1"))); static volatile const int b __attribute__((section("id2"))); The DataSec with name "id1" and "id2" does not contain information whether the section is readonly or not. The loader needs to check the corresponding elf section flags for such information. A simple example: -bash-4.4$ cat t.c int g1; int g2 = 3; const int g3 = 4; static volatile int s1; struct tt { int a; int b; char c[]; }; static volatile struct tt s2 = {3, 4, "abcdefghi"}; static volatile const int s3 = 4; int m __attribute__((section("maps"), used)) = 4; int test() { return g1 + g2 + g3 + s1 + s2.a + s3 + m; } -bash-4.4$ clang -target bpf -O2 -g -S t.c Checking t.s, 4 BTF_KIND_VAR's are generated (s1, s2, s3 and m). 4 BTF_KIND_DATASEC's are generated with names ".data", ".bss", ".rodata" and "maps". Signed-off-by: Yonghong Song <yhs@fb.com> Differential Revision: https://reviews.llvm.org/D59441 llvm-svn: 356326
2019-03-16 23:36:31 +08:00
void visitBasicType(const DIBasicType *BTy, uint32_t &TypeId);
void visitSubroutineType(
const DISubroutineType *STy, bool ForSubprog,
const std::unordered_map<uint32_t, StringRef> &FuncArgNames,
uint32_t &TypeId);
[BPF] Add BTF Var and DataSec Support Two new kinds, BTF_KIND_VAR and BTF_KIND_DATASEC, are added. BTF_KIND_VAR has the following specification: btf_type.name: var name btf_type.info: type kind btf_type.type: var type // btf_type is followed by one u32 u32: varinfo (currently, only 0 - static, 1 - global allocated in elf sections) Not all globals are supported in this patch. The following globals are supported: . static variables with or without section attributes . global variables with section attributes The inclusion of globals with section attributes is for future potential extraction of key/value type id's from map definition. BTF_KIND_DATASEC has the following specification: btf_type.name: section name associated with variable or one of .data/.bss/.readonly btf_type.info: type kind and vlen for # of variables btf_type.size: 0 #vlen number of the following: u32: id of corresponding BTF_KIND_VAR u32: in-session offset of the var u32: the size of memory var occupied At the time of debug info emission, the data section size is unknown, so the btf_type.size = 0 for BTF_KIND_DATASEC. The loader can patch it during loading time. The in-session offseet of the var is only available for static variables. For global variables, the loader neeeds to assign the global variable symbol value in symbol table to in-section offset. The size of memory is used to specify the amount of the memory a variable occupies. Typically, it equals to the type size, but for certain structures, e.g., struct tt { int a; int b; char c[]; }; static volatile struct tt s2 = {3, 4, "abcdefghi"}; The static variable s2 has size of 20. Note that for BTF_KIND_DATASEC name, the section name does not contain object name. The compiler does have input module name. For example, two cases below: . clang -target bpf -O2 -g -c test.c The compiler knows the input file (module) is test.c and can generate sec name like test.data/test.bss etc. . clang -target bpf -O2 -g -emit-llvm -c test.c -o - | llc -march=bpf -filetype=obj -o test.o The llc compiler has the input file as stdin, and would generate something like stdin.data/stdin.bss etc. which does not really make sense. For any user specificed section name, e.g., static volatile int a __attribute__((section("id1"))); static volatile const int b __attribute__((section("id2"))); The DataSec with name "id1" and "id2" does not contain information whether the section is readonly or not. The loader needs to check the corresponding elf section flags for such information. A simple example: -bash-4.4$ cat t.c int g1; int g2 = 3; const int g3 = 4; static volatile int s1; struct tt { int a; int b; char c[]; }; static volatile struct tt s2 = {3, 4, "abcdefghi"}; static volatile const int s3 = 4; int m __attribute__((section("maps"), used)) = 4; int test() { return g1 + g2 + g3 + s1 + s2.a + s3 + m; } -bash-4.4$ clang -target bpf -O2 -g -S t.c Checking t.s, 4 BTF_KIND_VAR's are generated (s1, s2, s3 and m). 4 BTF_KIND_DATASEC's are generated with names ".data", ".bss", ".rodata" and "maps". Signed-off-by: Yonghong Song <yhs@fb.com> Differential Revision: https://reviews.llvm.org/D59441 llvm-svn: 356326
2019-03-16 23:36:31 +08:00
void visitFwdDeclType(const DICompositeType *CTy, bool IsUnion,
uint32_t &TypeId);
void visitCompositeType(const DICompositeType *CTy, uint32_t &TypeId);
void visitStructType(const DICompositeType *STy, bool IsStruct,
uint32_t &TypeId);
void visitArrayType(const DICompositeType *ATy, uint32_t &TypeId);
void visitEnumType(const DICompositeType *ETy, uint32_t &TypeId);
[BPF] Support for compile once and run everywhere Introduction ============ This patch added intial support for bpf program compile once and run everywhere (CO-RE). The main motivation is for bpf program which depends on kernel headers which may vary between different kernel versions. The initial discussion can be found at https://lwn.net/Articles/773198/. Currently, bpf program accesses kernel internal data structure through bpf_probe_read() helper. The idea is to capture the kernel data structure to be accessed through bpf_probe_read() and relocate them on different kernel versions. On each host, right before bpf program load, the bpfloader will look at the types of the native linux through vmlinux BTF, calculates proper access offset and patch the instruction. To accommodate this, three intrinsic functions preserve_{array,union,struct}_access_index are introduced which in clang will preserve the base pointer, struct/union/array access_index and struct/union debuginfo type information. Later, bpf IR pass can reconstruct the whole gep access chains without looking at gep itself. This patch did the following: . An IR pass is added to convert preserve_*_access_index to global variable who name encodes the getelementptr access pattern. The global variable has metadata attached to describe the corresponding struct/union debuginfo type. . An SimplifyPatchable MachineInstruction pass is added to remove unnecessary loads. . The BTF output pass is enhanced to generate relocation records located in .BTF.ext section. Typical CO-RE also needs support of global variables which can be assigned to different values to different hosts. For example, kernel version can be used to guard different versions of codes. This patch added the support for patchable externals as well. Example ======= The following is an example. struct pt_regs { long arg1; long arg2; }; struct sk_buff { int i; struct net_device *dev; }; #define _(x) (__builtin_preserve_access_index(x)) static int (*bpf_probe_read)(void *dst, int size, const void *unsafe_ptr) = (void *) 4; extern __attribute__((section(".BPF.patchable_externs"))) unsigned __kernel_version; int bpf_prog(struct pt_regs *ctx) { struct net_device *dev = 0; // ctx->arg* does not need bpf_probe_read if (__kernel_version >= 41608) bpf_probe_read(&dev, sizeof(dev), _(&((struct sk_buff *)ctx->arg1)->dev)); else bpf_probe_read(&dev, sizeof(dev), _(&((struct sk_buff *)ctx->arg2)->dev)); return dev != 0; } In the above, we want to translate the third argument of bpf_probe_read() as relocations. -bash-4.4$ clang -target bpf -O2 -g -S trace.c The compiler will generate two new subsections in .BTF.ext, OffsetReloc and ExternReloc. OffsetReloc is to record the structure member offset operations, and ExternalReloc is to record the external globals where only u8, u16, u32 and u64 are supported. BPFOffsetReloc Size struct SecLOffsetReloc for ELF section #1 A number of struct BPFOffsetReloc for ELF section #1 struct SecOffsetReloc for ELF section #2 A number of struct BPFOffsetReloc for ELF section #2 ... BPFExternReloc Size struct SecExternReloc for ELF section #1 A number of struct BPFExternReloc for ELF section #1 struct SecExternReloc for ELF section #2 A number of struct BPFExternReloc for ELF section #2 struct BPFOffsetReloc { uint32_t InsnOffset; ///< Byte offset in this section uint32_t TypeID; ///< TypeID for the relocation uint32_t OffsetNameOff; ///< The string to traverse types }; struct BPFExternReloc { uint32_t InsnOffset; ///< Byte offset in this section uint32_t ExternNameOff; ///< The string for external variable }; Note that only externs with attribute section ".BPF.patchable_externs" are considered for Extern Reloc which will be patched by bpf loader right before the load. For the above test case, two offset records and one extern record will be generated: OffsetReloc records: .long .Ltmp12 # Insn Offset .long 7 # TypeId .long 242 # Type Decode String .long .Ltmp18 # Insn Offset .long 7 # TypeId .long 242 # Type Decode String ExternReloc record: .long .Ltmp5 # Insn Offset .long 165 # External Variable In string table: .ascii "0:1" # string offset=242 .ascii "__kernel_version" # string offset=165 The default member offset can be calculated as the 2nd member offset (0 representing the 1st member) of struct "sk_buff". The asm code: .Ltmp5: .Ltmp6: r2 = 0 r3 = 41608 .Ltmp7: .Ltmp8: .loc 1 18 9 is_stmt 0 # t.c:18:9 .Ltmp9: if r3 > r2 goto LBB0_2 .Ltmp10: .Ltmp11: .loc 1 0 9 # t.c:0:9 .Ltmp12: r2 = 8 .Ltmp13: .loc 1 19 66 is_stmt 1 # t.c:19:66 .Ltmp14: .Ltmp15: r3 = *(u64 *)(r1 + 0) goto LBB0_3 .Ltmp16: .Ltmp17: LBB0_2: .loc 1 0 66 is_stmt 0 # t.c:0:66 .Ltmp18: r2 = 8 .loc 1 21 66 is_stmt 1 # t.c:21:66 .Ltmp19: r3 = *(u64 *)(r1 + 8) .Ltmp20: .Ltmp21: LBB0_3: .loc 1 0 66 is_stmt 0 # t.c:0:66 r3 += r2 r1 = r10 .Ltmp22: .Ltmp23: .Ltmp24: r1 += -8 r2 = 8 call 4 For instruction .Ltmp12 and .Ltmp18, "r2 = 8", the number 8 is the structure offset based on the current BTF. Loader needs to adjust it if it changes on the host. For instruction .Ltmp5, "r2 = 0", the external variable got a default value 0, loader needs to supply an appropriate value for the particular host. Compiling to generate object code and disassemble: 0000000000000000 bpf_prog: 0: b7 02 00 00 00 00 00 00 r2 = 0 1: 7b 2a f8 ff 00 00 00 00 *(u64 *)(r10 - 8) = r2 2: b7 02 00 00 00 00 00 00 r2 = 0 3: b7 03 00 00 88 a2 00 00 r3 = 41608 4: 2d 23 03 00 00 00 00 00 if r3 > r2 goto +3 <LBB0_2> 5: b7 02 00 00 08 00 00 00 r2 = 8 6: 79 13 00 00 00 00 00 00 r3 = *(u64 *)(r1 + 0) 7: 05 00 02 00 00 00 00 00 goto +2 <LBB0_3> 0000000000000040 LBB0_2: 8: b7 02 00 00 08 00 00 00 r2 = 8 9: 79 13 08 00 00 00 00 00 r3 = *(u64 *)(r1 + 8) 0000000000000050 LBB0_3: 10: 0f 23 00 00 00 00 00 00 r3 += r2 11: bf a1 00 00 00 00 00 00 r1 = r10 12: 07 01 00 00 f8 ff ff ff r1 += -8 13: b7 02 00 00 08 00 00 00 r2 = 8 14: 85 00 00 00 04 00 00 00 call 4 Instructions #2, #5 and #8 need relocation resoutions from the loader. Signed-off-by: Yonghong Song <yhs@fb.com> Differential Revision: https://reviews.llvm.org/D61524 llvm-svn: 365503
2019-07-09 23:28:41 +08:00
void visitDerivedType(const DIDerivedType *DTy, uint32_t &TypeId,
bool CheckPointer, bool SeenPointer);
void visitMapDefType(const DIType *Ty, uint32_t &TypeId);
/// @}
/// Get the file content for the subprogram. Certain lines of the file
/// later may be put into string table and referenced by line info.
std::string populateFileContent(const DISubprogram *SP);
/// Construct a line info.
void constructLineInfo(const DISubprogram *SP, MCSymbol *Label, uint32_t Line,
uint32_t Column);
[BPF] Add BTF Var and DataSec Support Two new kinds, BTF_KIND_VAR and BTF_KIND_DATASEC, are added. BTF_KIND_VAR has the following specification: btf_type.name: var name btf_type.info: type kind btf_type.type: var type // btf_type is followed by one u32 u32: varinfo (currently, only 0 - static, 1 - global allocated in elf sections) Not all globals are supported in this patch. The following globals are supported: . static variables with or without section attributes . global variables with section attributes The inclusion of globals with section attributes is for future potential extraction of key/value type id's from map definition. BTF_KIND_DATASEC has the following specification: btf_type.name: section name associated with variable or one of .data/.bss/.readonly btf_type.info: type kind and vlen for # of variables btf_type.size: 0 #vlen number of the following: u32: id of corresponding BTF_KIND_VAR u32: in-session offset of the var u32: the size of memory var occupied At the time of debug info emission, the data section size is unknown, so the btf_type.size = 0 for BTF_KIND_DATASEC. The loader can patch it during loading time. The in-session offseet of the var is only available for static variables. For global variables, the loader neeeds to assign the global variable symbol value in symbol table to in-section offset. The size of memory is used to specify the amount of the memory a variable occupies. Typically, it equals to the type size, but for certain structures, e.g., struct tt { int a; int b; char c[]; }; static volatile struct tt s2 = {3, 4, "abcdefghi"}; The static variable s2 has size of 20. Note that for BTF_KIND_DATASEC name, the section name does not contain object name. The compiler does have input module name. For example, two cases below: . clang -target bpf -O2 -g -c test.c The compiler knows the input file (module) is test.c and can generate sec name like test.data/test.bss etc. . clang -target bpf -O2 -g -emit-llvm -c test.c -o - | llc -march=bpf -filetype=obj -o test.o The llc compiler has the input file as stdin, and would generate something like stdin.data/stdin.bss etc. which does not really make sense. For any user specificed section name, e.g., static volatile int a __attribute__((section("id1"))); static volatile const int b __attribute__((section("id2"))); The DataSec with name "id1" and "id2" does not contain information whether the section is readonly or not. The loader needs to check the corresponding elf section flags for such information. A simple example: -bash-4.4$ cat t.c int g1; int g2 = 3; const int g3 = 4; static volatile int s1; struct tt { int a; int b; char c[]; }; static volatile struct tt s2 = {3, 4, "abcdefghi"}; static volatile const int s3 = 4; int m __attribute__((section("maps"), used)) = 4; int test() { return g1 + g2 + g3 + s1 + s2.a + s3 + m; } -bash-4.4$ clang -target bpf -O2 -g -S t.c Checking t.s, 4 BTF_KIND_VAR's are generated (s1, s2, s3 and m). 4 BTF_KIND_DATASEC's are generated with names ".data", ".bss", ".rodata" and "maps". Signed-off-by: Yonghong Song <yhs@fb.com> Differential Revision: https://reviews.llvm.org/D59441 llvm-svn: 356326
2019-03-16 23:36:31 +08:00
/// Generate types and variables for globals.
[BPF] Support for compile once and run everywhere Introduction ============ This patch added intial support for bpf program compile once and run everywhere (CO-RE). The main motivation is for bpf program which depends on kernel headers which may vary between different kernel versions. The initial discussion can be found at https://lwn.net/Articles/773198/. Currently, bpf program accesses kernel internal data structure through bpf_probe_read() helper. The idea is to capture the kernel data structure to be accessed through bpf_probe_read() and relocate them on different kernel versions. On each host, right before bpf program load, the bpfloader will look at the types of the native linux through vmlinux BTF, calculates proper access offset and patch the instruction. To accommodate this, three intrinsic functions preserve_{array,union,struct}_access_index are introduced which in clang will preserve the base pointer, struct/union/array access_index and struct/union debuginfo type information. Later, bpf IR pass can reconstruct the whole gep access chains without looking at gep itself. This patch did the following: . An IR pass is added to convert preserve_*_access_index to global variable who name encodes the getelementptr access pattern. The global variable has metadata attached to describe the corresponding struct/union debuginfo type. . An SimplifyPatchable MachineInstruction pass is added to remove unnecessary loads. . The BTF output pass is enhanced to generate relocation records located in .BTF.ext section. Typical CO-RE also needs support of global variables which can be assigned to different values to different hosts. For example, kernel version can be used to guard different versions of codes. This patch added the support for patchable externals as well. Example ======= The following is an example. struct pt_regs { long arg1; long arg2; }; struct sk_buff { int i; struct net_device *dev; }; #define _(x) (__builtin_preserve_access_index(x)) static int (*bpf_probe_read)(void *dst, int size, const void *unsafe_ptr) = (void *) 4; extern __attribute__((section(".BPF.patchable_externs"))) unsigned __kernel_version; int bpf_prog(struct pt_regs *ctx) { struct net_device *dev = 0; // ctx->arg* does not need bpf_probe_read if (__kernel_version >= 41608) bpf_probe_read(&dev, sizeof(dev), _(&((struct sk_buff *)ctx->arg1)->dev)); else bpf_probe_read(&dev, sizeof(dev), _(&((struct sk_buff *)ctx->arg2)->dev)); return dev != 0; } In the above, we want to translate the third argument of bpf_probe_read() as relocations. -bash-4.4$ clang -target bpf -O2 -g -S trace.c The compiler will generate two new subsections in .BTF.ext, OffsetReloc and ExternReloc. OffsetReloc is to record the structure member offset operations, and ExternalReloc is to record the external globals where only u8, u16, u32 and u64 are supported. BPFOffsetReloc Size struct SecLOffsetReloc for ELF section #1 A number of struct BPFOffsetReloc for ELF section #1 struct SecOffsetReloc for ELF section #2 A number of struct BPFOffsetReloc for ELF section #2 ... BPFExternReloc Size struct SecExternReloc for ELF section #1 A number of struct BPFExternReloc for ELF section #1 struct SecExternReloc for ELF section #2 A number of struct BPFExternReloc for ELF section #2 struct BPFOffsetReloc { uint32_t InsnOffset; ///< Byte offset in this section uint32_t TypeID; ///< TypeID for the relocation uint32_t OffsetNameOff; ///< The string to traverse types }; struct BPFExternReloc { uint32_t InsnOffset; ///< Byte offset in this section uint32_t ExternNameOff; ///< The string for external variable }; Note that only externs with attribute section ".BPF.patchable_externs" are considered for Extern Reloc which will be patched by bpf loader right before the load. For the above test case, two offset records and one extern record will be generated: OffsetReloc records: .long .Ltmp12 # Insn Offset .long 7 # TypeId .long 242 # Type Decode String .long .Ltmp18 # Insn Offset .long 7 # TypeId .long 242 # Type Decode String ExternReloc record: .long .Ltmp5 # Insn Offset .long 165 # External Variable In string table: .ascii "0:1" # string offset=242 .ascii "__kernel_version" # string offset=165 The default member offset can be calculated as the 2nd member offset (0 representing the 1st member) of struct "sk_buff". The asm code: .Ltmp5: .Ltmp6: r2 = 0 r3 = 41608 .Ltmp7: .Ltmp8: .loc 1 18 9 is_stmt 0 # t.c:18:9 .Ltmp9: if r3 > r2 goto LBB0_2 .Ltmp10: .Ltmp11: .loc 1 0 9 # t.c:0:9 .Ltmp12: r2 = 8 .Ltmp13: .loc 1 19 66 is_stmt 1 # t.c:19:66 .Ltmp14: .Ltmp15: r3 = *(u64 *)(r1 + 0) goto LBB0_3 .Ltmp16: .Ltmp17: LBB0_2: .loc 1 0 66 is_stmt 0 # t.c:0:66 .Ltmp18: r2 = 8 .loc 1 21 66 is_stmt 1 # t.c:21:66 .Ltmp19: r3 = *(u64 *)(r1 + 8) .Ltmp20: .Ltmp21: LBB0_3: .loc 1 0 66 is_stmt 0 # t.c:0:66 r3 += r2 r1 = r10 .Ltmp22: .Ltmp23: .Ltmp24: r1 += -8 r2 = 8 call 4 For instruction .Ltmp12 and .Ltmp18, "r2 = 8", the number 8 is the structure offset based on the current BTF. Loader needs to adjust it if it changes on the host. For instruction .Ltmp5, "r2 = 0", the external variable got a default value 0, loader needs to supply an appropriate value for the particular host. Compiling to generate object code and disassemble: 0000000000000000 bpf_prog: 0: b7 02 00 00 00 00 00 00 r2 = 0 1: 7b 2a f8 ff 00 00 00 00 *(u64 *)(r10 - 8) = r2 2: b7 02 00 00 00 00 00 00 r2 = 0 3: b7 03 00 00 88 a2 00 00 r3 = 41608 4: 2d 23 03 00 00 00 00 00 if r3 > r2 goto +3 <LBB0_2> 5: b7 02 00 00 08 00 00 00 r2 = 8 6: 79 13 00 00 00 00 00 00 r3 = *(u64 *)(r1 + 0) 7: 05 00 02 00 00 00 00 00 goto +2 <LBB0_3> 0000000000000040 LBB0_2: 8: b7 02 00 00 08 00 00 00 r2 = 8 9: 79 13 08 00 00 00 00 00 r3 = *(u64 *)(r1 + 8) 0000000000000050 LBB0_3: 10: 0f 23 00 00 00 00 00 00 r3 += r2 11: bf a1 00 00 00 00 00 00 r1 = r10 12: 07 01 00 00 f8 ff ff ff r1 += -8 13: b7 02 00 00 08 00 00 00 r2 = 8 14: 85 00 00 00 04 00 00 00 call 4 Instructions #2, #5 and #8 need relocation resoutions from the loader. Signed-off-by: Yonghong Song <yhs@fb.com> Differential Revision: https://reviews.llvm.org/D61524 llvm-svn: 365503
2019-07-09 23:28:41 +08:00
void processGlobals(bool ProcessingMapDef);
/// Generate types for function prototypes.
void processFuncPrototypes(const Function *);
[BPF] Enable relocation location for load/store/shifts Previous btf field relocation is always at assignment like r1 = 4 which is converted from an ld_imm64 instruction. This patch did an optimization such that relocation instruction might be load/store/shift. Specically, the following insns may also have relocation, except BPF_MOV: LDB, LDH, LDW, LDD, STB, STH, STW, STD, LDB32, LDH32, LDW32, STB32, STH32, STW32, SLL, SRL, SRA To accomplish this, a few BPF target specific codegen only instructions are invented. They are generated at backend BPF SimplifyPatchable phase, which is at early llc phase when SSA form is available. The new codegen only instructions will be converted to real proper instructions at the codegen and BTF emission stage. Note that, as revealed by a few tests, this optimization might be actual generating more relocations: Scenario 1: if (...) { ... __builtin_preserve_field_info(arg->b2, 0) ... } else { ... __builtin_preserve_field_info(arg->b2, 0) ... } Compiler could do CSE to only have one relocation. But if both of the above is translated into codegen internal instructions, the compiler will not be able to do that. Scenario 2: offset = ... __builtin_preserve_field_info(arg->b2, 0) ... ... ... offset ... ... offset ... ... offset ... For whatever reason, the compiler might be temporarily do copy propagation of the righthand of "offset" assignment like ... __builtin_preserve_field_info(arg->b2, 0) ... ... __builtin_preserve_field_info(arg->b2, 0) ... and CSE will be able to deduplicate later. But if these intrinsics are converted to BPF pseudo instructions, they will not be able to get deduplicated. I do not expect we have big instruction count difference. It may actually reduce instruction count since now relocation is in deeper insn dependency chain. For example, for test offset-reloc-fieldinfo-2.ll, this patch generates 7 instead of 6 relocations for non-alu32 mode, but it actually reduced instruction count from 29 to 26. Differential Revision: https://reviews.llvm.org/D71790
2019-12-20 07:21:53 +08:00
/// Generate one field relocation record.
void generateFieldReloc(const MCSymbol *ORSym, DIType *RootTy,
StringRef AccessPattern);
[BPF] Support for compile once and run everywhere Introduction ============ This patch added intial support for bpf program compile once and run everywhere (CO-RE). The main motivation is for bpf program which depends on kernel headers which may vary between different kernel versions. The initial discussion can be found at https://lwn.net/Articles/773198/. Currently, bpf program accesses kernel internal data structure through bpf_probe_read() helper. The idea is to capture the kernel data structure to be accessed through bpf_probe_read() and relocate them on different kernel versions. On each host, right before bpf program load, the bpfloader will look at the types of the native linux through vmlinux BTF, calculates proper access offset and patch the instruction. To accommodate this, three intrinsic functions preserve_{array,union,struct}_access_index are introduced which in clang will preserve the base pointer, struct/union/array access_index and struct/union debuginfo type information. Later, bpf IR pass can reconstruct the whole gep access chains without looking at gep itself. This patch did the following: . An IR pass is added to convert preserve_*_access_index to global variable who name encodes the getelementptr access pattern. The global variable has metadata attached to describe the corresponding struct/union debuginfo type. . An SimplifyPatchable MachineInstruction pass is added to remove unnecessary loads. . The BTF output pass is enhanced to generate relocation records located in .BTF.ext section. Typical CO-RE also needs support of global variables which can be assigned to different values to different hosts. For example, kernel version can be used to guard different versions of codes. This patch added the support for patchable externals as well. Example ======= The following is an example. struct pt_regs { long arg1; long arg2; }; struct sk_buff { int i; struct net_device *dev; }; #define _(x) (__builtin_preserve_access_index(x)) static int (*bpf_probe_read)(void *dst, int size, const void *unsafe_ptr) = (void *) 4; extern __attribute__((section(".BPF.patchable_externs"))) unsigned __kernel_version; int bpf_prog(struct pt_regs *ctx) { struct net_device *dev = 0; // ctx->arg* does not need bpf_probe_read if (__kernel_version >= 41608) bpf_probe_read(&dev, sizeof(dev), _(&((struct sk_buff *)ctx->arg1)->dev)); else bpf_probe_read(&dev, sizeof(dev), _(&((struct sk_buff *)ctx->arg2)->dev)); return dev != 0; } In the above, we want to translate the third argument of bpf_probe_read() as relocations. -bash-4.4$ clang -target bpf -O2 -g -S trace.c The compiler will generate two new subsections in .BTF.ext, OffsetReloc and ExternReloc. OffsetReloc is to record the structure member offset operations, and ExternalReloc is to record the external globals where only u8, u16, u32 and u64 are supported. BPFOffsetReloc Size struct SecLOffsetReloc for ELF section #1 A number of struct BPFOffsetReloc for ELF section #1 struct SecOffsetReloc for ELF section #2 A number of struct BPFOffsetReloc for ELF section #2 ... BPFExternReloc Size struct SecExternReloc for ELF section #1 A number of struct BPFExternReloc for ELF section #1 struct SecExternReloc for ELF section #2 A number of struct BPFExternReloc for ELF section #2 struct BPFOffsetReloc { uint32_t InsnOffset; ///< Byte offset in this section uint32_t TypeID; ///< TypeID for the relocation uint32_t OffsetNameOff; ///< The string to traverse types }; struct BPFExternReloc { uint32_t InsnOffset; ///< Byte offset in this section uint32_t ExternNameOff; ///< The string for external variable }; Note that only externs with attribute section ".BPF.patchable_externs" are considered for Extern Reloc which will be patched by bpf loader right before the load. For the above test case, two offset records and one extern record will be generated: OffsetReloc records: .long .Ltmp12 # Insn Offset .long 7 # TypeId .long 242 # Type Decode String .long .Ltmp18 # Insn Offset .long 7 # TypeId .long 242 # Type Decode String ExternReloc record: .long .Ltmp5 # Insn Offset .long 165 # External Variable In string table: .ascii "0:1" # string offset=242 .ascii "__kernel_version" # string offset=165 The default member offset can be calculated as the 2nd member offset (0 representing the 1st member) of struct "sk_buff". The asm code: .Ltmp5: .Ltmp6: r2 = 0 r3 = 41608 .Ltmp7: .Ltmp8: .loc 1 18 9 is_stmt 0 # t.c:18:9 .Ltmp9: if r3 > r2 goto LBB0_2 .Ltmp10: .Ltmp11: .loc 1 0 9 # t.c:0:9 .Ltmp12: r2 = 8 .Ltmp13: .loc 1 19 66 is_stmt 1 # t.c:19:66 .Ltmp14: .Ltmp15: r3 = *(u64 *)(r1 + 0) goto LBB0_3 .Ltmp16: .Ltmp17: LBB0_2: .loc 1 0 66 is_stmt 0 # t.c:0:66 .Ltmp18: r2 = 8 .loc 1 21 66 is_stmt 1 # t.c:21:66 .Ltmp19: r3 = *(u64 *)(r1 + 8) .Ltmp20: .Ltmp21: LBB0_3: .loc 1 0 66 is_stmt 0 # t.c:0:66 r3 += r2 r1 = r10 .Ltmp22: .Ltmp23: .Ltmp24: r1 += -8 r2 = 8 call 4 For instruction .Ltmp12 and .Ltmp18, "r2 = 8", the number 8 is the structure offset based on the current BTF. Loader needs to adjust it if it changes on the host. For instruction .Ltmp5, "r2 = 0", the external variable got a default value 0, loader needs to supply an appropriate value for the particular host. Compiling to generate object code and disassemble: 0000000000000000 bpf_prog: 0: b7 02 00 00 00 00 00 00 r2 = 0 1: 7b 2a f8 ff 00 00 00 00 *(u64 *)(r10 - 8) = r2 2: b7 02 00 00 00 00 00 00 r2 = 0 3: b7 03 00 00 88 a2 00 00 r3 = 41608 4: 2d 23 03 00 00 00 00 00 if r3 > r2 goto +3 <LBB0_2> 5: b7 02 00 00 08 00 00 00 r2 = 8 6: 79 13 00 00 00 00 00 00 r3 = *(u64 *)(r1 + 0) 7: 05 00 02 00 00 00 00 00 goto +2 <LBB0_3> 0000000000000040 LBB0_2: 8: b7 02 00 00 08 00 00 00 r2 = 8 9: 79 13 08 00 00 00 00 00 r3 = *(u64 *)(r1 + 8) 0000000000000050 LBB0_3: 10: 0f 23 00 00 00 00 00 00 r3 += r2 11: bf a1 00 00 00 00 00 00 r1 = r10 12: 07 01 00 00 f8 ff ff ff r1 += -8 13: b7 02 00 00 08 00 00 00 r2 = 8 14: 85 00 00 00 04 00 00 00 call 4 Instructions #2, #5 and #8 need relocation resoutions from the loader. Signed-off-by: Yonghong Song <yhs@fb.com> Differential Revision: https://reviews.llvm.org/D61524 llvm-svn: 365503
2019-07-09 23:28:41 +08:00
/// Populating unprocessed struct type.
unsigned populateStructType(const DIType *Ty);
[BPF] Enable relocation location for load/store/shifts Previous btf field relocation is always at assignment like r1 = 4 which is converted from an ld_imm64 instruction. This patch did an optimization such that relocation instruction might be load/store/shift. Specically, the following insns may also have relocation, except BPF_MOV: LDB, LDH, LDW, LDD, STB, STH, STW, STD, LDB32, LDH32, LDW32, STB32, STH32, STW32, SLL, SRL, SRA To accomplish this, a few BPF target specific codegen only instructions are invented. They are generated at backend BPF SimplifyPatchable phase, which is at early llc phase when SSA form is available. The new codegen only instructions will be converted to real proper instructions at the codegen and BTF emission stage. Note that, as revealed by a few tests, this optimization might be actual generating more relocations: Scenario 1: if (...) { ... __builtin_preserve_field_info(arg->b2, 0) ... } else { ... __builtin_preserve_field_info(arg->b2, 0) ... } Compiler could do CSE to only have one relocation. But if both of the above is translated into codegen internal instructions, the compiler will not be able to do that. Scenario 2: offset = ... __builtin_preserve_field_info(arg->b2, 0) ... ... ... offset ... ... offset ... ... offset ... For whatever reason, the compiler might be temporarily do copy propagation of the righthand of "offset" assignment like ... __builtin_preserve_field_info(arg->b2, 0) ... ... __builtin_preserve_field_info(arg->b2, 0) ... and CSE will be able to deduplicate later. But if these intrinsics are converted to BPF pseudo instructions, they will not be able to get deduplicated. I do not expect we have big instruction count difference. It may actually reduce instruction count since now relocation is in deeper insn dependency chain. For example, for test offset-reloc-fieldinfo-2.ll, this patch generates 7 instead of 6 relocations for non-alu32 mode, but it actually reduced instruction count from 29 to 26. Differential Revision: https://reviews.llvm.org/D71790
2019-12-20 07:21:53 +08:00
/// Process relocation instructions.
void processReloc(const MachineOperand &MO);
[BPF] Add BTF Var and DataSec Support Two new kinds, BTF_KIND_VAR and BTF_KIND_DATASEC, are added. BTF_KIND_VAR has the following specification: btf_type.name: var name btf_type.info: type kind btf_type.type: var type // btf_type is followed by one u32 u32: varinfo (currently, only 0 - static, 1 - global allocated in elf sections) Not all globals are supported in this patch. The following globals are supported: . static variables with or without section attributes . global variables with section attributes The inclusion of globals with section attributes is for future potential extraction of key/value type id's from map definition. BTF_KIND_DATASEC has the following specification: btf_type.name: section name associated with variable or one of .data/.bss/.readonly btf_type.info: type kind and vlen for # of variables btf_type.size: 0 #vlen number of the following: u32: id of corresponding BTF_KIND_VAR u32: in-session offset of the var u32: the size of memory var occupied At the time of debug info emission, the data section size is unknown, so the btf_type.size = 0 for BTF_KIND_DATASEC. The loader can patch it during loading time. The in-session offseet of the var is only available for static variables. For global variables, the loader neeeds to assign the global variable symbol value in symbol table to in-section offset. The size of memory is used to specify the amount of the memory a variable occupies. Typically, it equals to the type size, but for certain structures, e.g., struct tt { int a; int b; char c[]; }; static volatile struct tt s2 = {3, 4, "abcdefghi"}; The static variable s2 has size of 20. Note that for BTF_KIND_DATASEC name, the section name does not contain object name. The compiler does have input module name. For example, two cases below: . clang -target bpf -O2 -g -c test.c The compiler knows the input file (module) is test.c and can generate sec name like test.data/test.bss etc. . clang -target bpf -O2 -g -emit-llvm -c test.c -o - | llc -march=bpf -filetype=obj -o test.o The llc compiler has the input file as stdin, and would generate something like stdin.data/stdin.bss etc. which does not really make sense. For any user specificed section name, e.g., static volatile int a __attribute__((section("id1"))); static volatile const int b __attribute__((section("id2"))); The DataSec with name "id1" and "id2" does not contain information whether the section is readonly or not. The loader needs to check the corresponding elf section flags for such information. A simple example: -bash-4.4$ cat t.c int g1; int g2 = 3; const int g3 = 4; static volatile int s1; struct tt { int a; int b; char c[]; }; static volatile struct tt s2 = {3, 4, "abcdefghi"}; static volatile const int s3 = 4; int m __attribute__((section("maps"), used)) = 4; int test() { return g1 + g2 + g3 + s1 + s2.a + s3 + m; } -bash-4.4$ clang -target bpf -O2 -g -S t.c Checking t.s, 4 BTF_KIND_VAR's are generated (s1, s2, s3 and m). 4 BTF_KIND_DATASEC's are generated with names ".data", ".bss", ".rodata" and "maps". Signed-off-by: Yonghong Song <yhs@fb.com> Differential Revision: https://reviews.llvm.org/D59441 llvm-svn: 356326
2019-03-16 23:36:31 +08:00
/// Emit common header of .BTF and .BTF.ext sections.
void emitCommonHeader();
/// Emit the .BTF section.
void emitBTFSection();
/// Emit the .BTF.ext section.
void emitBTFExtSection();
protected:
/// Gather pre-function debug information.
void beginFunctionImpl(const MachineFunction *MF) override;
/// Post process after all instructions in this function are processed.
void endFunctionImpl(const MachineFunction *MF) override;
public:
BTFDebug(AsmPrinter *AP);
[BPF] Support for compile once and run everywhere Introduction ============ This patch added intial support for bpf program compile once and run everywhere (CO-RE). The main motivation is for bpf program which depends on kernel headers which may vary between different kernel versions. The initial discussion can be found at https://lwn.net/Articles/773198/. Currently, bpf program accesses kernel internal data structure through bpf_probe_read() helper. The idea is to capture the kernel data structure to be accessed through bpf_probe_read() and relocate them on different kernel versions. On each host, right before bpf program load, the bpfloader will look at the types of the native linux through vmlinux BTF, calculates proper access offset and patch the instruction. To accommodate this, three intrinsic functions preserve_{array,union,struct}_access_index are introduced which in clang will preserve the base pointer, struct/union/array access_index and struct/union debuginfo type information. Later, bpf IR pass can reconstruct the whole gep access chains without looking at gep itself. This patch did the following: . An IR pass is added to convert preserve_*_access_index to global variable who name encodes the getelementptr access pattern. The global variable has metadata attached to describe the corresponding struct/union debuginfo type. . An SimplifyPatchable MachineInstruction pass is added to remove unnecessary loads. . The BTF output pass is enhanced to generate relocation records located in .BTF.ext section. Typical CO-RE also needs support of global variables which can be assigned to different values to different hosts. For example, kernel version can be used to guard different versions of codes. This patch added the support for patchable externals as well. Example ======= The following is an example. struct pt_regs { long arg1; long arg2; }; struct sk_buff { int i; struct net_device *dev; }; #define _(x) (__builtin_preserve_access_index(x)) static int (*bpf_probe_read)(void *dst, int size, const void *unsafe_ptr) = (void *) 4; extern __attribute__((section(".BPF.patchable_externs"))) unsigned __kernel_version; int bpf_prog(struct pt_regs *ctx) { struct net_device *dev = 0; // ctx->arg* does not need bpf_probe_read if (__kernel_version >= 41608) bpf_probe_read(&dev, sizeof(dev), _(&((struct sk_buff *)ctx->arg1)->dev)); else bpf_probe_read(&dev, sizeof(dev), _(&((struct sk_buff *)ctx->arg2)->dev)); return dev != 0; } In the above, we want to translate the third argument of bpf_probe_read() as relocations. -bash-4.4$ clang -target bpf -O2 -g -S trace.c The compiler will generate two new subsections in .BTF.ext, OffsetReloc and ExternReloc. OffsetReloc is to record the structure member offset operations, and ExternalReloc is to record the external globals where only u8, u16, u32 and u64 are supported. BPFOffsetReloc Size struct SecLOffsetReloc for ELF section #1 A number of struct BPFOffsetReloc for ELF section #1 struct SecOffsetReloc for ELF section #2 A number of struct BPFOffsetReloc for ELF section #2 ... BPFExternReloc Size struct SecExternReloc for ELF section #1 A number of struct BPFExternReloc for ELF section #1 struct SecExternReloc for ELF section #2 A number of struct BPFExternReloc for ELF section #2 struct BPFOffsetReloc { uint32_t InsnOffset; ///< Byte offset in this section uint32_t TypeID; ///< TypeID for the relocation uint32_t OffsetNameOff; ///< The string to traverse types }; struct BPFExternReloc { uint32_t InsnOffset; ///< Byte offset in this section uint32_t ExternNameOff; ///< The string for external variable }; Note that only externs with attribute section ".BPF.patchable_externs" are considered for Extern Reloc which will be patched by bpf loader right before the load. For the above test case, two offset records and one extern record will be generated: OffsetReloc records: .long .Ltmp12 # Insn Offset .long 7 # TypeId .long 242 # Type Decode String .long .Ltmp18 # Insn Offset .long 7 # TypeId .long 242 # Type Decode String ExternReloc record: .long .Ltmp5 # Insn Offset .long 165 # External Variable In string table: .ascii "0:1" # string offset=242 .ascii "__kernel_version" # string offset=165 The default member offset can be calculated as the 2nd member offset (0 representing the 1st member) of struct "sk_buff". The asm code: .Ltmp5: .Ltmp6: r2 = 0 r3 = 41608 .Ltmp7: .Ltmp8: .loc 1 18 9 is_stmt 0 # t.c:18:9 .Ltmp9: if r3 > r2 goto LBB0_2 .Ltmp10: .Ltmp11: .loc 1 0 9 # t.c:0:9 .Ltmp12: r2 = 8 .Ltmp13: .loc 1 19 66 is_stmt 1 # t.c:19:66 .Ltmp14: .Ltmp15: r3 = *(u64 *)(r1 + 0) goto LBB0_3 .Ltmp16: .Ltmp17: LBB0_2: .loc 1 0 66 is_stmt 0 # t.c:0:66 .Ltmp18: r2 = 8 .loc 1 21 66 is_stmt 1 # t.c:21:66 .Ltmp19: r3 = *(u64 *)(r1 + 8) .Ltmp20: .Ltmp21: LBB0_3: .loc 1 0 66 is_stmt 0 # t.c:0:66 r3 += r2 r1 = r10 .Ltmp22: .Ltmp23: .Ltmp24: r1 += -8 r2 = 8 call 4 For instruction .Ltmp12 and .Ltmp18, "r2 = 8", the number 8 is the structure offset based on the current BTF. Loader needs to adjust it if it changes on the host. For instruction .Ltmp5, "r2 = 0", the external variable got a default value 0, loader needs to supply an appropriate value for the particular host. Compiling to generate object code and disassemble: 0000000000000000 bpf_prog: 0: b7 02 00 00 00 00 00 00 r2 = 0 1: 7b 2a f8 ff 00 00 00 00 *(u64 *)(r10 - 8) = r2 2: b7 02 00 00 00 00 00 00 r2 = 0 3: b7 03 00 00 88 a2 00 00 r3 = 41608 4: 2d 23 03 00 00 00 00 00 if r3 > r2 goto +3 <LBB0_2> 5: b7 02 00 00 08 00 00 00 r2 = 8 6: 79 13 00 00 00 00 00 00 r3 = *(u64 *)(r1 + 0) 7: 05 00 02 00 00 00 00 00 goto +2 <LBB0_3> 0000000000000040 LBB0_2: 8: b7 02 00 00 08 00 00 00 r2 = 8 9: 79 13 08 00 00 00 00 00 r3 = *(u64 *)(r1 + 8) 0000000000000050 LBB0_3: 10: 0f 23 00 00 00 00 00 00 r3 += r2 11: bf a1 00 00 00 00 00 00 r1 = r10 12: 07 01 00 00 f8 ff ff ff r1 += -8 13: b7 02 00 00 08 00 00 00 r2 = 8 14: 85 00 00 00 04 00 00 00 call 4 Instructions #2, #5 and #8 need relocation resoutions from the loader. Signed-off-by: Yonghong Song <yhs@fb.com> Differential Revision: https://reviews.llvm.org/D61524 llvm-svn: 365503
2019-07-09 23:28:41 +08:00
///
bool InstLower(const MachineInstr *MI, MCInst &OutMI);
/// Get the special array index type id.
uint32_t getArrayIndexTypeId() {
assert(ArrayIndexTypeId);
return ArrayIndexTypeId;
}
/// Add string to the string table.
size_t addString(StringRef S) { return StringTable.addString(S); }
/// Get the type id for a particular DIType.
uint32_t getTypeId(const DIType *Ty) {
assert(Ty && "Invalid null Type");
assert(DIToIdMap.find(Ty) != DIToIdMap.end() &&
"DIType not added in the BDIToIdMap");
return DIToIdMap[Ty];
}
void setSymbolSize(const MCSymbol *Symbol, uint64_t Size) override {}
/// Process beginning of an instruction.
void beginInstruction(const MachineInstr *MI) override;
/// Complete all the types and emit the BTF sections.
void endModule() override;
};
} // end namespace llvm
#endif