Go to file

Aries 2c6aaa910c Add endprg, vid.v instruction definition		2023-01-17 14:39:57 +08:00
.github	[NFC] Fix exception in version-check.py script	2022-09-15 13:34:29 +02:00
bolt	[DebugInfo] llvm::Optional => std::optional	2022-12-05 00:09:22 +00:00
clang	WORKAROUND: Fix libclc building error	2023-01-16 16:05:08 +08:00
clang-tools-extra	Remove unused #include "llvm/ADT/Optional.h"	2022-12-05 06:31:11 +00:00
cmake	[cmake] Add missing CMakePushCheckState include to FindLibEdit.cmake	2022-11-07 18:20:19 +01:00
compiler-rt	Add CSRs and kernel metadata buffer offset constant definition ventus.h	2023-01-06 09:32:07 +08:00
cross-project-tests	[dexter-tests] Add attribute optnone to main function	2022-10-26 20:57:49 +00:00
flang	Remove unused #include "llvm/ADT/Optional.h"	2022-12-05 06:31:11 +00:00
libc	[libc][Obvious] Update error bounds for uint_test.QuickMulHiTests.	2022-12-02 18:13:35 -05:00
libclc	Fix prepare-builtins DSO linking error	2023-01-16 10:48:53 +08:00
libcxx	[libc++][NFC] Fix typo in comment	2022-12-02 12:20:27 -08:00
libcxxabi	[libc++abi][LIT][AIX] Use Vector instructions available on Power7 in vec_reg_restore.pass.cpp	2022-11-29 14:08:03 -05:00
libunwind	[CMake] Use LLVM_TARGET_TRIPLE in runtimes	2022-11-29 04:08:24 +00:00
lld	Remove unused #include "llvm/ADT/Optional.h"	2022-12-05 06:31:11 +00:00
lldb	Remove unused #include "llvm/ADT/Optional.h"	2022-12-05 06:31:11 +00:00
llvm	Add endprg, vid.v instruction definition	2023-01-17 14:39:57 +08:00
llvm-libgcc	[cmake] Slight fix ups to make robust to the full range of GNUInstallDirs	2022-07-26 14:48:49 +00:00
mlir	[mlir] Use std::nullopt instead of None in comments (NFC)	2022-12-04 19:58:32 -08:00
openmp	[OpenMP][libomptarget] Add hasQueue() function in NextGen plugin's AsyncInfoWrapperTy	2022-12-04 13:24:40 +01:00
polly	Remove unused #include "llvm/ADT/Optional.h"	2022-12-05 06:31:11 +00:00
pstl	Revert "[cmake] Use `CMAKE_INSTALL_LIBDIR` too"	2022-08-18 22:46:32 -04:00
runtimes	[runtimes] Fix runtimes-test-depends	2022-11-30 16:55:51 -08:00
third-party	[llvm] [cmake] Set EXCLUDE_FROM_ALL on gtest and TestingSupport	2022-11-24 17:52:22 +01:00
utils	[bazel] Add a missing dependency after `786cbb09ed`	2022-12-02 16:56:05 +01:00
.arcconfig	…
.arclint	…
.clang-format	…
.clang-tidy	Add -misc-const-correctness to .clang-tidy	2022-08-08 13:00:52 -07:00
.git-blame-ignore-revs	Add __config formatting to .git-blame-ignore-revs	2022-06-14 09:52:49 -04:00
.gitignore	gitignore install folder	2022-12-07 09:07:45 +08:00
.mailmap	.mailmap: add entry for myself	2022-12-03 09:52:57 +01:00
CONTRIBUTING.md	docs: update some bug tracker references (NFC)	2022-01-10 15:59:08 -08:00
LICENSE.TXT	[docs] Add LICENSE.txt to the root of the mono-repo	2022-08-24 09:35:00 +02:00
README-LLVM.md	Update README	2022-12-29 09:56:50 +08:00
README.md	Update README	2023-01-17 10:35:11 +08:00
SECURITY.md	…

README.md

This is the Ventus GPGPU port of LLVM Compiler Infrastructure

Ventus GPGPU is based on RISCV RV32IMAZfinxZve32f ISA with fully redefined concept of V-extension.

For more architecture detail, please refer to Ventus GPGPU Arch

Getting Started

Build the toolchain

Assume you have already installed essential build tools such as cmake, clang, ninja etc.

git clone https://github.com/THU-DSP-LAB/llvm-project.git
cd llvm-project
mkdir build && cd build
cmake -DCMAKE_BUILD_TYPE=Debug -DLLVM_CCACHE_BUILD=ON -DLLVM_ENABLE_PROJECTS="clang;lld;libclc" -DLLVM_TARGETS_TO_BUILD="AMDGPU;X86;RISCV" -DLLVM_TARGET_ARCH="riscv32" -DBUILD_SHARED_LIBS=ON -DLLVM_BUILD_LLVM_DYLIB=ON -DLLVM_ENABLE_TERMINFO=ON -DCMAKE_INSTALL_PREFIX=../install -G Ninja ../llvm
ninja
ninja install

Build pocl

# Add ventus-llvm built llvm-config to PATH
export PATH=<path_to_ventus_llvm_install_bin>:$PATH
git clone https://github.com/THU-DSP-LAB/pocl.git
cd pocl
mkdir build && cd build
cmake -DCMAKE_INSTALL_PREFIX=../install -DENABLE_HOST_CPU_DEVICES=OFF -DENABLE_VENTUS=ON -DENABLE_ICD=ON -DDEFAULT_ENABLE_ICD=ON -DENABLE_TESTS=OFF -DSTATIC_LLVM=OFF -G Ninja ../
ninja

# You can use following command to building pocl with system installed llvm.
cmake -DCMAKE_INSTALL_PREFIX=../install -DENABLE_ICD=ON -DENABLE_TESTS=OFF -DSTATIC_LLVM=OFF -DLLC_HOST_CPU=x86-64 -G Ninja ../

Build pocl as libOpenCL.so(icd loader+icd driver) instead of libpocl.so(icd driver), -DENABLE_ICD=OFF must be specified to cmake.

NOTE: the install folder of ventus-pocl should be merged with the install folder of ventus-llvm in order to correctly locate shared libraries, header files etc.

NOTE: -DPOCL_DEBUG_MESSAGES=ON is default on, you can set env variable POCL_DEBUG to enable debugging output(see pocl_debug.c for details).

Build libclc(llvm version instead of pocl one) (Work in progress)

# Add ventus-llvm built llvm-config to PATH
export PATH=<path_to_ventus_llvm_install_bin>:$PATH
cd llvm-project
mkdir build-libclc && cd build-libclc
cmake ../libclc -DCMAKE_LLAsm_COMPILER_WORKS=ON -DCMAKE_CLC_COMPILER_WORKS=ON -DCMAKE_CLC_COMPILER_FORCED=ON  -DCMAKE_LLAsm_FLAGS="-target riscv32 -mcpu=ventus-gpgpu" -DLIBCLC_TARGETS_TO_BUILD="riscv32--" -DCMAKE_CXX_FLAGS="-I <llvm-project-root>/llvm/include -std=c++17" -G Ninja
ninja
# Manually copy kernel builtins to the location where pocl driver can locate
cp riscv32--.bc <pocl_install_dir>/share/pocl/kernel-riscv32.bc

Build icd loader

You can use apt install ocl-icd-libopencl1 to install ocl icd loader libOpenCL.so, but we want to build our own icd loader with debug information, so here it is.

git clone https://github.com/OCL-dev/ocl-icd.git
cd ocl-icd
./bootstrap
./configure
make
ls .libs # libOpenCL.so is located in .libs

Then copy libOpenCL.so* from ocl-icd/.libs to ventus-llvm/install/lib folder(where LLVM shared libraries located)

Run export LD_LIBRARY_PATH=<path_to>/ventus-llvm/install/lib to tell OpenCL application to use your own built libOpenCL.so, also to correctly locate LLVM shared libraries.

Bridge icd loader `libOpenCL.so` and pocl ocl device driver `libpocl.so`(pocl built with ENABLE_ICD=ON)

Run export OCL_ICD_VENDORS=<path_to>/libpocl.so to tell ocl icd loader where the icd driver is.

Finally, run export POCL_DEVICES="ventus" to tell pocl driver which device is available(should we set ventus as default device?).

You will see Ventus GPGPU device is found if your setup is correct.

$ <pocl-install-dir>/bin/poclcc -l

LIST OF DEVICES:
0:
  Vendor:   THU
    Name:   Ventus GPGPU device
 Version:   2.2 HSTR: THU-ventus-gpgpu

Also, you can try to set POCL_DEBUG=all and run example under <pocl-build-dir> to see the full OpenCL software stack execution pipeline. For example(Work in progress).

aries@legion:~/workspace/ventus-pocl/build-ventus$ ./examples/vecadd/vecadd
** Final POCL_DEBUG flags: FFFFFFFFFFFFFFFF
[2023-01-16 07:59:38.108659834]POCL: in fn POclCreateCommandQueue at line 99:
  |   GENERAL |  Created Command Queue 3 (0x56134aba4a10) on device 0
[2023-01-16 07:59:38.109002683]POCL: in fn void pocl_llvm_create_context(cl_context) at line 431:
  |      LLVM |  Created context 2 (0x56134aba48e0)
[2023-01-16 07:59:38.139981683132353]POCL: in fn POclCreateContext at line 228:
  |   GENERAL |  Created Context 2 (0x56134aba48e0)
[2023-01-16 07:59:38.109029362]POCL: in fn POclRetainContext at line 33:
  | REFCOUNTS |  Retain Context 2 (0x56134aba48e0), Refcount: 2
[2023-01-16 07:59:38.30173808955]POCL: in fn POclCreateCommandQueue at line 99:
  |   GENERAL |  Created Command Queue 4 (0x56134aba7230) on device 0
[2023-01-16 07:59:38.109127629]POCL: in fn POclRetainContext at line 33:
  | REFCOUNTS |  Retain Context 2 (0x56134aba48e0), Refcount: 3
[2023-01-16 07:59:38.109146625]POCL: in fn compile_and_link_program at line 720:
  |      LLVM |  building program with options 
[2023-01-16 07:59:38.94639713517827]POCL: in fn compile_and_link_program at line 757:
  |      LLVM |  building program for 1 devs with options 
[2023-01-16 07:59:38.109153330]POCL: in fn compile_and_link_program at line 761:
  |      LLVM |     BUILDING for device: ventus
[2023-01-16 07:59:38.206267587868]POCL: in fn pocl_driver_build_source at line 712:
  |      LLVM |  building from sources for device 0
[2023-01-16 07:59:38.139981683681094]POCL: in fn int pocl_llvm_build_program(cl_program, unsigned int, cl_uint, const cl_program *, const char **, int) at line 406:
  |      LLVM |  all build options: -DPOCL_DEVICE_ADDRESS_BITS=32 -D__USE_CLANG_OPENCL_C_H -xcl -Dinline= -I. -cl-kernel-arg-info  -D__ENDIAN_LITTLE__=1 -DCL_DEVICE_MAX_GLOBAL_VARIABLE_SIZE=0 -D__OPENCL_VERSION__=200 -cl-std=CL2.0 -D__OPENCL_C_VERSION__=200 -fno-builtin -triple=riscv32 -target-cpu ventus-gpgpu 
[2023-01-16 07:59:38.337265138]POCL: in fn POclRetainContext at line 33:
  | REFCOUNTS |  Retain Context 2 (0x56134aba48e0), Refcount: 4
[2023-01-16 07:59:38.30402060096]POCL: in fn POclCreateBuffer at line 262:
  |    MEMORY |  Created Buffer 6 (0x56134aba8fb0), MEM_HOST_PTR: 0x56134abab710, device_ptrs[0]: 0x56134aee15e0, SIZE 512, FLAGS 36 
[2023-01-16 07:59:38.206495725866]POCL: in fn POclRetainContext at line 33:
  | REFCOUNTS |  Retain Context 2 (0x56134aba48e0), Refcount: 5
[2023-01-16 07:59:38.30402070991]POCL: in fn POclCreateBuffer at line 262:
  |    MEMORY |  Created Buffer 7 (0x56134ac25be0), MEM_HOST_PTR: 0x56134aedc5c0, device_ptrs[0]: 0x56134aee5920, SIZE 512, FLAGS 36 
[2023-01-16 07:59:38.206495738647]POCL: in fn POclRetainContext at line 33:
  | REFCOUNTS |  Retain Context 2 (0x56134aba48e0), Refcount: 6
[2023-01-16 07:59:38.30402084819]POCL: in fn POclCreateBuffer at line 262:
  |    MEMORY |  Created Buffer 8 (0x56134ababd20), MEM_HOST_PTR: (nil), device_ptrs[0]: 0x56134ac99af0, SIZE 512, FLAGS 1 
[2023-01-16 07:59:38.337321919]POCL: in fn POclCreateKernel at line 139:
  |   GENERAL |  Created Kernel vecadd (0x56134abafe80)
[2023-01-16 07:59:38.94639941695914]POCL: in fn POclSetKernelArg at line 113:
  |   GENERAL |  Kernel          vecadd || SetArg idx   0 ||   float* || Local 0 || Size      8 || Value 0x7ffea8ab8580 || Pointer 0x56134aba8fb0 || *(uint32*)Value:        0 || *(uint64*)Value:        0 ||
Hex Value:  B08FBA4A 13560000
[2023-01-16 07:59:38.337380166]POCL: in fn POclSetKernelArg at line 113:
  |   GENERAL |  Kernel          vecadd || SetArg idx   1 ||   float* || Local 0 || Size      8 || Value 0x7ffea8ab8588 || Pointer 0x56134ac25be0 || *(uint32*)Value:        0 || *(uint64*)Value:        0 ||
Hex Value:  E05BC24A 13560000
[2023-01-16 07:59:38.337387290]POCL: in fn POclSetKernelArg at line 113:
  |   GENERAL |  Kernel          vecadd || SetArg idx   2 ||   float* || Local 0 || Size      8 || Value 0x7ffea8ab8590 || Pointer 0x56134ababd20 || *(uint32*)Value:        0 || *(uint64*)Value:        0 ||
Hex Value:  20BDBA4A 13560000
[2023-01-16 07:59:38.337396648]POCL: in fn pocl_kernel_calc_wg_size at line 173:
  |   GENERAL |  Preparing kernel vecadd with local size 128 x 1 x 1 group sizes 1 x 1 x 1...
[2023-01-16 07:59:38.337407683]POCL: in fn POclRetainCommandQueue at line 35:
  | REFCOUNTS |  Retain Command Queue 4 (0x56134aba7230), Refcount: 2
[2023-01-16 07:59:38.30402184622]POCL: in fn pocl_create_event at line 528:
  |    EVENTS |  Created event 1 (0x56134ab4bec0) Command ndrange_kernel
[2023-01-16 07:59:38.337438693]POCL: in fn pocl_create_command_struct at line 672:
  |    EVENTS |  Created immediate command struct: CMD 0x56134abaf990 (event 1 / 0x56134ab4bec0, type: ndrange_kernel)
[2023-01-16 07:59:38.140729235867670]POCL: in fn POclRetainCommandQueue at line 35:
  | REFCOUNTS |  Retain Command Queue 3 (0x56134aba4a10), Refcount: 2
[2023-01-16 07:59:38.30402222755]POCL: in fn pocl_create_event at line 528:
  |    EVENTS |  Created event 2 (0x56134aba6270) Command migrate_mem_objects
[2023-01-16 07:59:38.337454896]POCL: in fn pocl_create_command_struct at line 649:
  |    EVENTS |  event pointer provided
[2023-01-16 07:59:38.337460553]POCL: in fn pocl_create_command_struct at line 672:
  |    EVENTS |  Created immediate command struct: CMD 0x56134abad5f0 (event 2 / 0x56134aba6270, type: migrate_mem_objects)
[2023-01-16 07:59:38.337465721]POCL: in fn pocl_command_enqueue at line 1237:
  |    EVENTS |  Pushed Event 2 to CQ 3.
[2023-01-16 07:59:38.140729235893092]POCL: in fn pocl_update_event_queued at line 2084:
  |    EVENTS |  Event queued: 2
[2023-01-16 07:59:38.140729235922914]POCL: in fn pocl_update_event_submitted at line 2104:
  |    EVENTS |  Event submitted: 2
[2023-01-16 07:59:38.2676586394723155918]POCL: in fn pocl_update_event_running_unlocked at line 2123:
  |    EVENTS |  Event running: 2
[2023-01-16 07:59:38.140729235937301]POCL: in fn pocl_update_event_finished at line 2217:
  |    EVENTS |  ventus: Command complete, event 2
[2023-01-16 07:59:38.140729235941632]POCL: in fn pocl_exec_command at line 498:
  |    TIMING |       >>>         6.286  us    Event Migrate Buffer(s)     
[2023-01-16 07:59:38.69057002800]POCL: in fn POclReleaseMemObject at line 50:
  | REFCOUNTS |  Release Memory Object 6 (0x56134aba8fb0), Refcount: 2
[2023-01-16 07:59:38.215085903784]POCL: in fn POclReleaseEvent at line 40:
  | REFCOUNTS |  Release Event 2 (0x56134aba6270), Refcount: 1
[2023-01-16 07:59:38.140729235963562]POCL: in fn pocl_create_event_sync at line 544:
  |    EVENTS |  create event sync: waiting 1 , notifier 2
[2023-01-16 07:59:38.206495976875]POCL: in fn POclReleaseEvent at line 40:
  | REFCOUNTS |  Release Event 1 (0x56134ab4bec0), Refcount: 3
[2023-01-16 07:59:38.140729235973060]POCL: in fn POclRetainKernel at line 34:
  | REFCOUNTS |  Retain Kernel vecadd (0x56134abafe80), Refcount: 2
[2023-01-16 07:59:38.337557073]POCL: in fn pocl_command_enqueue at line 1191:
  |    EVENTS |  In-order Q; adding event syncs
[2023-01-16 07:59:38.337560356]POCL: in fn pocl_command_enqueue at line 1237:
  |    EVENTS |  Pushed Event 1 to CQ 4.
[2023-01-16 07:59:38.140729235985422]POCL: in fn pocl_update_event_queued at line 2084:
  |    EVENTS |  Event queued: 1
Segmentation fault (core dumped)

Compile a OpenCL C program into Ventus GPGPU assembly

vector_add.cl:

__kernel void vectorAdd(__global float* A, __global float* B) {
  unsigned tid = get_global_id(0);
  A[tid] += B[tid];
}

Compiler OpenCL C into Ventus assembly:

clang -cl-std=CL2.0 -target riscv32 -mcpu=ventus-gpgpu -O1 -S vector_add.cl -o vector_add.s

NOTE: OpenCL host side program should be linked with icd loader -lOpenCL.

TODOs

Emit barrier instruction for all stores to local/global memory except sGPR spill.
Stacks for sGPR spilling and per-thread usage is supported by using RISCV::X2 as warp level stack, RISCV::X4 as per-thread level stack. But the 2 stack size calculation are not yet splitted out, so a lot of stack slots are wasted.
VentusRegextInsertion pass may generate incorrect register ordering for next instruction, see FIXME in that pass. To avoid breaking def-use chain, we could keep the extended instruction unmodified by removing Op.setRegIgnoreDUChain() from the pass, the elf generation pass should ignore the higher bit(>2^5) of the register encoding automatically.
Pattern match VV and VX optimization. There is only type information in the DAG pattern matching, we can't specify whether to match a DAG to a vop.vv or vop.vx MIR in a tblgen pattern, so a fix pass should be ran after codegen pass.