llvm-project

Commit Graph

Author	SHA1	Message	Date
Yaxun (Sam) Liu	cb08558caa	[HIP] Fix regressions due to fp contract change Recently HIP toolchain made a change to use clang instead of opt/llc to do compilation (https://reviews.llvm.org/D81861). The intention is to make HIP toolchain canonical like other toolchains. However, this change introduced an unintentional change regarding backend fp fuse option, which caused regressions in some HIP applications. Basically before the change, HIP toolchain used clang to generate bitcode, then use opt/llc to optimize bitcode and generate ISA. As such, the amdgpu backend takes the default fp fuse mode which is 'Standard'. This mode respect contract flag of fmul/fadd instructions and do not fuse fmul/fadd instructions without contract flag. However, after the change, HIP toolchain now use clang to generate IR, do optimization, and generate ISA as one process. Now amdgpu backend fp fuse option is determined by -ffp-contract option, which is 'fast' by default. And this -ffp-contract=fast language option is translated to 'Fast' fp fuse option in backend. Suddenly backend starts to fuse fmul/fadd instructions without contract flag. This causes wrong result for some device library functions, e.g. tan(-1e20), which should return 0.8446, now returns -0.933. What is worse is that since backend with 'Fast' fp fuse option does not respect contract flag, there is no way to use #pragma clang fp contract directive to enforce fp contract requirements. This patch fixes the regression by introducing a new value 'fast-honor-pragmas' for -ffp-contract and use it for HIP by default. 'fast-honor-pragmas' is equivalent to 'fast' in frontend but let the backend to use 'Standard' fp fuse option. 'fast-honor-pragmas' is useful since 'Fast' fp fuse option in backend does not honor contract flag, it is of little use to HIP applications since all code with #pragma STDC FP_CONTRACT or any IR from a source compiled with -ffp-contract=on is broken. Differential Revision: https://reviews.llvm.org/D90174	2020-11-24 08:10:06 -05:00
Yaxun (Sam) Liu	3f4b5893ef	[AMDGPU] Add option -munsafe-fp-atomics Add an option -munsafe-fp-atomics for AMDGPU target. When enabled, clang adds function attribute "amdgpu-unsafe-fp-atomics" to any functions for amdgpu target. This allows amdgpu backend to use unsafe fp atomic instructions in these functions. Differential Revision: https://reviews.llvm.org/D91546	2020-11-16 21:52:12 -05:00
CJ Johnson	69cd776e1e	[CodeGen] Apply 'nonnull' and 'dereferenceable(N)' to 'this' pointer arguments. * Adds 'nonnull' and 'dereferenceable(N)' to 'this' pointer arguments * Gates 'nonnull' on -f(no-)delete-null-pointer-checks * Introduces this-nonnull.cpp and microsoft-abi-this-nullable.cpp tests to explicitly test the behavior of this change * Refactors hundreds of over-constrained clang tests to permit these attributes, where needed * Updates Clang12 patch notes mentioning this change Reviewed-by: rsmith, jdoerfert Differential Revision: https://reviews.llvm.org/D17993	2020-11-16 17:39:17 -08:00
Michael Liao	f375885ab8	[InferAddrSpace] Teach to handle assumed address space. - In certain cases, a generic pointer could be assumed as a pointer to the global memory space or other spaces. With a dedicated target hook to query that address space from a given value, infer-address-space pass could infer and propagate that to all its users. Differential Revision: https://reviews.llvm.org/D91121	2020-11-16 17:06:33 -05:00
Michael Liao	8920ef06a1	[hip] Remove the coercion on aggregate kernel arguments. - If an aggregate argument is indirectly accessed within kernels, direct passing results in unpromotable `alloca`, which degrade performance significantly. InferAddrSpace pass is enhanced in [D91121](https://reviews.llvm.org/D91121) to take the assumption that generic pointers loaded from the constant memory could be regarded global ones. The need for the coercion on aggregate arguments is mitigated. Differential Revision: https://reviews.llvm.org/D89980	2020-11-12 21:19:30 -05:00
Simon Pilgrim	fc80931b87	[CodeGenCUDA] Fix check prefix typo on device-stub.cu tests Noticed while fixing unused prefix warnings	2020-11-11 15:44:57 +00:00
Michael Liao	23c6d1501d	[amdgpu] Add `llvm.amdgcn.endpgm` support. - `llvm.amdgcn.endpgm` is added to enable "abort" support. Differential Revision: https://reviews.llvm.org/D90809	2020-11-05 19:06:50 -05:00
Artem Belevich	be86b6773b	[CUDA] Allow local static variables with target attributes. While CUDA documentation claims that such variables are not allowed[1], NVCC has been accepting them since CUDA-10.0[2] and some headers in CUDA-11 rely on this working. 1. https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#static-variables-function 2. https://godbolt.org/z/zsodzc Differential Revision: https://reviews.llvm.org/D88345	2020-11-03 10:30:38 -08:00
Yaxun (Sam) Liu	abd8cd9199	[CUDA][HIP] Fix linkage for -fgpu-rdc Currently for explicit template function instantiation in CUDA/HIP device compilation clang emits instantiated kernel with external linkage and instantiated device function with internal linkage. This is fine for -fno-gpu-rdc since there is only one TU. However this causes duplicate symbols for kernels for -fgpu-rdc if the same instantiation happen in multiple TU. Or missing symbols if a device function calls an explicitly instantiated template function in a different TU. To make explicit template function instantiation work for -fgpu-rdc we need to follow the C++ linkage paradigm, i.e. use weak_odr linkage. Differential Revision: https://reviews.llvm.org/D90311	2020-11-03 08:07:19 -05:00
Artem Belevich	0a3ebb4d8d	Revert "[CUDA] Allow local static variables with target attributes." This reverts commit `f38a9e5117` Which triggered assertions.	2020-11-02 15:09:07 -08:00
Artem Belevich	f38a9e5117	[CUDA] Allow local static variables with target attributes. While CUDA documentation claims that such variables are not allowed[1], NVCC has been accepting them since CUDA-10.0[2] and some headers in CUDA-11 rely on this working. 1. https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#static-variables-function 2. https://godbolt.org/z/zsodzc Differential Revision: https://reviews.llvm.org/D88345	2020-11-02 14:37:13 -08:00
Yaxun (Sam) Liu	e384e94fbe	Revert "[HIP] Change default --gpu-max-threads-per-block value to 1024" This reverts commit `187658b8a6` due to AMDGPU backend issues.	2020-10-15 17:25:55 -04:00
Fangrui Song	a2cc883368	[CUDA] Don't call __cudaRegisterVariable on C++17 inline variables D17779: host-side shadow variables of external declarations of device-side global variables have internal linkage and are referenced by `__cuda_register_globals`. nvcc from CUDA 11 does not allow `__device__ inline` or `__device__ constexpr` (C++17 inline variables) but clang has incorrectly supported them for a while: ``` error: A __device__ variable cannot be marked constexpr error: An inline __device__/__constant__/__managed__ variable must have internal linkage when the program is compiled in whole program mode (-rdc=false) ``` If such a variable (which has a comdat group) is discarded (a copy from another translation unit is prevailing and selected), accessing the variable from outside the section group (`__cuda_register_globals`) is a violation of the ELF specification and will be rejected by linkers: > A symbol table entry with STB_LOCAL binding that is defined relative to one of a group's sections, and that is contained in a symbol table section that is not part of the group, must be discarded if the group members are discarded. References to this symbol table entry from outside the group are not allowed. As a workaround, don't register such inline variables for now. (If we register the variables in all TUs, we will keep multiple instances of the shadow and break the C++ semantics for inline variables). We should reject such variables in Sema but our internal users need some time to migrate. Reviewed By: tra Differential Revision: https://reviews.llvm.org/D88786	2020-10-05 12:53:59 -07:00
Yaxun (Sam) Liu	dc6a0b0ec7	[HIP] Align device binary To facilitate faster loading of device binaries and share them among processes, HIP runtime favors their alignment being 4096 bytes. HIP runtime can load unaligned device binaries, however, aligning them at 4096 bytes results in faster loading and less shared memory usage. This patch adds an option -bundle-align to clang-offload-bundler which allows bundles to be aligned at specified alignment. By default it is 1, which is NFC compared to existing format. This patch then aligns embedded fat binary and device binary inside fat binary at 4096 bytes. It has been verified this change does not cause significant overall file size increase for typical HIP applications (less than 1%). Differential Revision: https://reviews.llvm.org/D88734	2020-10-02 18:10:44 -04:00
Yaxun (Sam) Liu	187658b8a6	Recommit "[HIP] Change default --gpu-max-threads-per-block value to 1024" Recommit `04abbb3a78`	2020-09-28 22:43:17 -04:00
Stanislav Mekhanoshin	59691dc874	[AMDGPU] Make ds fp atomics overloadable Differential Revision: https://reviews.llvm.org/D87947	2020-09-23 11:39:50 -07:00
Yaxun (Sam) Liu	301e23305d	[CUDA][HIP] Fix static device var used by host code only A static device variable may be accessed in host code through cudaMemCpyFromSymbol etc. Currently clang does not emit the static device variable if it is only referenced by host code, which causes host code to fail at run time. This patch fixes that. Differential Revision: https://reviews.llvm.org/D88115	2020-09-23 08:18:19 -04:00
Michael Liao	4d4f092283	[clang][codegen] Skip adding default function attributes on intrinsics. - After loading builtin bitcode for linking, skip adding default function attributes on LLVM intrinsics as their attributes are well-defined and retrieved directly from internal definitions. Adding extra attributes on intrinsics results in inconsistent result when `-save-temps` is present. Also, that makes few optimizations conservative. Differential Revision: https://reviews.llvm.org/D87761	2020-09-16 14:10:05 -04:00
Yaxun (Sam) Liu	62dbb7e54c	Revert "[HIP] Change default --gpu-max-threads-per-block value to 1024" Temporarily revert commit `04abbb3a78` due to regressions in some HIP apps due backend issues revealed by this change. Will re-commit it when backend issues are fixed.	2020-09-02 16:12:28 -04:00
Yaxun (Sam) Liu	fb04d7b4a6	[CUDA][HIP] Do not externalize implicit constant static variable Differential Revision: https://reviews.llvm.org/D85686	2020-08-10 19:02:49 -04:00
Michael Liao	c7b683c126	[PGO][CUDA][HIP] Skip generating profile on the device stub and wrong-side functions. - Skip generating profile data on `__global__` function in the host compilation. It's a host-side stub function only and don't have profile instrumentation generated on the real function body. The extra profile data results in the malformed instrumentation profile data. - Skip generating region mapping on functions in the wrong-side, i.e., + For the device compilation, skip host-only functions; and, + For the host compilation, skip device-only functions (including `__global__` functions.) - As the device-side profiling is not ready yet, only host-side profile code generation is checked. Differential Revision: https://reviews.llvm.org/D85276	2020-08-10 11:01:46 -04:00
Matt Arsenault	30eeb742f1	clang: Use byref for aggregate kernel arguments Add address space to indirect abi info and use it for kernels. Previously, indirect arguments assumed assumed a stack passed object in the alloca address space using byval. A stack pointer is unsuitable for kernel arguments, which are passed in a separate, constant buffer with a different address space. Start using the new byref for aggregate kernel arguments. Previously these were emitted as raw struct arguments, and turned into loads in the backend. These will lower identically, although with byref you now have the option of applying an explicit alignment. In the future, a reasonable implementation would use byref for all kernel arguments (this would be a practical problem at the moment due to losing things like noalias on pointer arguments). This is mostly to avoid fighting the optimizer's treatment of aggregate load/store. SROA and instcombine both turn aggregate loads and stores into a long sequence of element loads and stores, rather than the optimizable memcpy I would expect in this situation. Now an explicit memcpy will be introduced up-front which is better understood and helps eliminate the alloca in more situations. This skips using byref in the case where HIP kernel pointer arguments in structs are promoted to global pointers. At minimum an additional patch is needed to allow coercion with indirect arguments. This also skips using it for OpenCL due to the current workaround used to support kernels calling kernels. Distinct function bodies would need to be generated up front instead of emitting an illegal call.	2020-08-06 15:52:26 -04:00
Yaxun (Sam) Liu	45f2a56856	[CUDA][HIP] Support accessing static device variable in host code for -fno-gpu-rdc nvcc supports accessing file-scope static device variables in host code by host APIs like cudaMemcpyToSymbol etc. CUDA/HIP let users access device variables in host code by shadow variables. In host compilation, clang emits a shadow variable for each device variable, and calls __RegisterVariable to register it in init function. The address of the shadow variable and the device side mangled name of the device variable is passed to __RegisterVariable. Runtime looks up the symbol by name in the device binary to find the address of the device variable. The problem with static device variables is that they have internal linkage, therefore their name may be changed by the linker if there are multiple symbols with the same name. Also they end up as local symbols in the elf file, whereas the runtime only looks up the global symbols. Another reason for making the static device variables external linkage is that they may be initialized externally by host code and their final value may be accessed by host code after kernel execution, therefore they actually have external linkage. Giving them internal linkage will cause incorrect optimizations on them. To support accessing static device var in host code for -fno-gpu-rdc mode, change the intnernal linkage to external linkage. The name does not need change since there is only one TU for -fno-gpu-rdc mode. Also the externalization is done only if the device static var is referenced by host code. Differential Revision: https://reviews.llvm.org/D80858	2020-08-05 07:57:38 -04:00
Yaxun (Sam) Liu	1eaad01046	[CUDA][HIP] Let lambda be host device by default This patch let lambda be host device by default and adds diagnostics for capturing host variable by reference in device lambda. Differential Revision: https://reviews.llvm.org/D78655	2020-07-08 13:10:26 -04:00
Michael Liao	471c806a45	[hip] Refine `clang/test/CodeGenCUDA/amdgpu-kernel-arg-pointer-type.cu` - Require target x86 being enabled as well.	2020-06-25 23:57:08 -04:00
Michael Liao	0723b1891f	[hip] Re-enable `clang/test/CodeGenCUDA/amdgpu-kernel-arg-pointer-type.cu` - Require amdgpu target being enabled.	2020-06-25 22:29:27 -04:00
Michael Liao	d3f437d351	[hip] Disable test temporarily due to failures on build servers.	2020-06-25 22:04:20 -04:00
Michael Liao	dccfaacf93	[InferAddressSpaces] Handle the pair of `ptrtoint`/`inttoptr`. Summary: - `ptrtoint` and `inttoptr` are defined as no-op casts if the integer value as the same size as the pointer value. The pair of `ptrtoint`/`inttoptr` is in fact a no-op cast sequence between different address spaces. Teach `infer-address-spaces` to handle them like a `bitcast`. Reviewers: arsenm, chandlerc Subscribers: jvesely, wdng, nhaehnle, hiraditya, kerbowa, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D81938	2020-06-25 20:46:56 -04:00
Yaxun (Sam) Liu	049d860707	[CUDA][HIP] Fix constexpr variables for C++17 constexpr variables are compile time constants and implicitly const, therefore they are safe to emit on both device and host side. Besides, in many cases they are intended for both device and host, therefore it makes sense to emit them on both device and host sides if necessary. In most cases constexpr variables are used as rvalue and the variables themselves do not need to be emitted. However if their address is taken, then they need to be emitted. For C++14, clang is able to handle that since clang emits them with available_externally linkage together with the initializer. However for C++17, the constexpr static data member of a class or template class become inline variables implicitly. Therefore they become definitions with linkonce_odr or weak_odr linkages. As such, they can not have available_externally linkage. This patch fixes that by adding implicit constant attribute to file scope constexpr variables and constexpr static data members in device compilation. Differential Revision: https://reviews.llvm.org/D79237	2020-06-03 21:56:52 -04:00
Yaxun (Sam) Liu	04abbb3a78	[HIP] Change default --gpu-max-threads-per-block value to 1024 Differential Revision: https://reviews.llvm.org/D76795	2020-06-03 11:09:22 -04:00
John McCall	8a8d703be0	Fix how cc1 command line options are mapped into FP options. Canonicalize on storing FP options in LangOptions instead of redundantly in CodeGenOptions. Incorporate -ffast-math directly into the values of those LangOptions rather than considering it separately when building FPOptions. Build IR attributes from those options rather than a mix of sources. We should really simplify the driver/cc1 interaction here and have the driver pass down options that cc1 directly honors. That can happen in a follow-up, though. Patch by Michele Scandale! https://reviews.llvm.org/D80315	2020-06-01 22:00:30 -04:00
Yaxun (Sam) Liu	361e4f14e3	Fix debug info for NoDebug attr NoDebug attr does not totally eliminate debug info about a function when inlining is enabled. This is inconsistent with when inlining is disabled. This patch fixes that. Differential Revision: https://reviews.llvm.org/D79967	2020-05-21 09:02:56 -04:00
Eli Friedman	62f3ef2b53	[CGCall] Annotate references with "align" attribute. If we're going to assume references are dereferenceable, we should also assume they're aligned: otherwise, we can't actually dereference them. See also D80072. Differential Revision: https://reviews.llvm.org/D80166	2020-05-19 20:21:30 -07:00
Yaxun (Sam) Liu	1b7bf1bd75	[HIP] Do not emit debug info for stub function The stub function is generated by compiler and its instructions have nothing to do with the kernel source code. Currently clang generates debug info for the stub function, which causes confusion for the HIP debugger. For example, when users set break point on a line of a kernel, the debugger should break on that line when the kernel is executed and reaches that line, but instead the debugger breaks in the stub function. This patch disables debug info for stub function for HIP. Differential Revision: https://reviews.llvm.org/D79866	2020-05-13 17:55:40 -04:00
Michael Liao	9142c0b46b	[clang][codegen] Hoist parameter attribute setting in function prolog. Summary: - If the coerced type is still a pointer, it should be set with proper parameter attributes, such as `noalias`, `nonnull`, and etc. Hoist that (pointer) parameter attribute setting so that the coerced pointer parameter could be marked properly. Depends on D79394 Reviewers: rjmccall, kerbowa, yaxunl Subscribers: jvesely, nhaehnle, cfe-commits Tags: #clang Differential Revision: https://reviews.llvm.org/D79395	2020-05-05 15:31:51 -04:00
Michael Liao	612720db87	[hip] Remove test using `hip_pinned_shadow` attribute. NFC.	2020-04-27 16:44:59 -04:00
Yaxun (Sam) Liu	2c31aa2de1	Speed up deferred diagnostic emitter Move function emitDeferredDiags from Sema to DeferredDiagsEmitter since it is only used by DeferredDiagsEmitter. Also skip visited functions to avoid exponential compile time. Differential Revision: https://reviews.llvm.org/D77028	2020-04-06 13:07:43 -04:00
Michael Liao	b952d799ca	[cuda][hip] Fix `RegisterVar` function prototype. Summary: - `RegisterVar` has `void` return type and `size_t` in its variable size parameter in HIP or CUDA 9.0+. Reviewers: tra, yaxunl Subscribers: cfe-commits Tags: #clang Differential Revision: https://reviews.llvm.org/D77398	2020-04-03 12:57:09 -04:00
Matt Arsenault	ce2258c1cd	clang/AMDGPU: Stop setting old denormal subtarget features	2020-04-02 17:17:12 -04:00
Yaxun (Sam) Liu	369e26ca9e	[AMDGPU] Add __builtin_amdgcn_workgroup_size_x/y/z The main purpose of introducing these builtins is to add a range metadata [1, 1025) on the work group size loaded from dispatch ptr, which cannot be done by source code. Differential Revision: https://reviews.llvm.org/D76772	2020-03-28 01:03:20 -04:00
Michael Liao	5be9b8cbe2	[cuda][hip] Add CUDA builtin surface/texture reference support. Summary: - Re-commit after fix Sema checks on partial template specialization. Reviewers: tra, rjmccall, yaxunl, a.sidorin Subscribers: cfe-commits Tags: #clang Differential Revision: https://reviews.llvm.org/D76365	2020-03-27 17:18:49 -04:00
Artem Belevich	fe8063e1a0	Revert "[cuda][hip] Add CUDA builtin surface/texture reference support." This reverts commit `6a9ad5f3f4`. The patch breaks CUDA copmilation. Differential Revision: https://reviews.llvm.org/D76365	2020-03-27 10:01:38 -07:00
Michael Liao	6a9ad5f3f4	[cuda][hip] Add CUDA builtin surface/texture reference support. Summary: - Even though the bindless surface/texture interfaces are promoted, there are still code using surface/texture references. For example, [PR#26400](https://bugs.llvm.org/show_bug.cgi?id=26400) reports the compilation issue for code using `tex2D` with texture references. For better compatibility, this patch proposes the support of surface/texture references. - Due to the absent documentation and magic headers, it's believed that `nvcc` does use builtins for texture support. From the limited NVVM documentation[^nvvm] and NVPTX backend texture/surface related tests[^test], it's believed that surface/texture references are supported by replacing their reference types, which are annotated with `device_builtin_surface_type`/`device_builtin_texture_type`, with the corresponding handle-like object types, `cudaSurfaceObject_t` or `cudaTextureObject_t`, in the device-side compilation. On the host side, that global handle variables are registered and will be established and updated later when corresponding binding/unbinding APIs are called[^bind]. Surface/texture references are most like device global variables but represented in different types on the host and device sides. - In this patch, the following changes are proposed to support that behavior: + Refine `device_builtin_surface_type` and `device_builtin_texture_type` attributes to be applied on `Type` decl only to check whether a variable is of the surface/texture reference type. + Add hooks in code generation to replace that reference types with the correponding object types as well as all accesses to them. In particular, `nvvm.texsurf.handle.internal` should be used to load object handles from global reference variables[^texsurf] as well as metadata annotations. + Generate host-side registration with proper template argument parsing. --- [^nvvm]: https://docs.nvidia.com/cuda/pdf/NVVM_IR_Specification.pdf [^test]: https://raw.githubusercontent.com/llvm/llvm-project/master/llvm/test/CodeGen/NVPTX/tex-read-cuda.ll [^bind]: See section 3.2.11.1.2 ``Texture reference API` in [CUDA C Programming Guide](https://docs.nvidia.com/cuda/pdf/CUDA_C_Programming_Guide.pdf). [^texsurf]: According to NVVM IR, `nvvm.texsurf.handle` should be used. But, the current backend doesn't have that supported. We may revise that later. Reviewers: tra, rjmccall, yaxunl, a.sidorin Subscribers: cfe-commits Tags: #clang Differential Revision: https://reviews.llvm.org/D76365	2020-03-26 14:44:52 -04:00
Yaxun (Sam) Liu	60963fa630	[HIP] Let clang recognize .hip extension Differential Revision: https://reviews.llvm.org/D76039	2020-03-17 11:22:55 -04:00
Yaxun (Sam) Liu	0ffb12ca67	[HIP] Mark kernels with uniform-work-group-size=true Differential Revision: https://reviews.llvm.org/D76076	2020-03-13 06:56:56 -04:00
Yaxun (Sam) Liu	22c457a869	[HIP] Fix device stub name HIP emits a device stub function for each kernel in host code. The HIP debugger requires device stub function to have a different unmangled name as the kernel. Currently the name of the device stub function is the mangled name with a postfix .stub. However, this does not work with the HIP debugger since the unmangled name is the same as the kernel. This patch adds prefix __device__stub__ to the unmangled name of the device stub before mangling, therefore the device stub function has a valid mangled name which is different than the device kernel name. The device side kernel name is kept unchanged. kernels with extern "C" also gets the prefix added to the corresponding device stub function. Differential Revision: https://reviews.llvm.org/D68578	2020-03-09 16:40:05 -04:00
Matt Arsenault	a4e71f01c0	Assume ieee behavior without denormal-fp-math attribute	2020-03-07 12:10:56 -05:00
Matt Arsenault	00b2a9df45	Reapply "clang: Treat ieee mode as the default for denormal-fp-math" This reverts commit `737394c490`. The fp-model test was failing on platforms that enable denormal flushing based on -ffast-math. This needs to reset to IEEE, not the default in these cases. Change-Id: Ibbad32f66d0d0b89b9c1173a3a96fb1a570ddd89	2020-03-06 11:46:55 -08:00
Jeremy Morse	737394c490	Revert "clang: Treat ieee mode as the default for denormal-fp-math" This reverts commit `c64ca93053`. This patch tripped a few build bots: http://lab.llvm.org:8011/builders/clang-x86_64-debian-fast/builds/24703/ http://lab.llvm.org:8011/builders/clang-cmake-x86_64-avx2-linux/builds/13465/ http://lab.llvm.org:8011/builders/clang-with-lto-ubuntu/builds/15994/ Reverting to clear the bots.	2020-03-05 10:55:24 +00:00
Matt Arsenault	c64ca93053	clang: Treat ieee mode as the default for denormal-fp-math The IR hasn't switched the default yet, so explicitly add the ieee attributes. I'm still not really sure how the target default denormal mode should interact with -fno-unsafe-math-optimizations. The target may have selected the default mode to be non-IEEE based on the flags or based on its true behavior, but we don't know which is the case. Since the only users of a non-IEEE mode without a flag still support IEEE mode, just reset to IEEE.	2020-03-04 23:34:02 -05:00
hsmahesha	cac068600e	[HIP] Make sure, unused hip-pinned-shadow global var is kept within device code Summary: hip-pinned-shadow global var should remain in the final code object irrespective of whether it is used or not within the code. Add it to used list, so that it will not get eliminated when it is unused. Reviewers: yaxunl, tra, hliao Reviewed By: yaxunl Subscribers: hliao, cfe-commits Tags: #clang Differential Revision: https://reviews.llvm.org/D75402	2020-03-04 10:54:26 +05:30
Yaxun (Sam) Liu	a57d9652a0	Make __builtin_amdgcn_dispatch_ptr dereferenceable and align at 4 Differential Revision: https://reviews.llvm.org/D75028	2020-02-25 13:58:20 -05:00
Yaxun (Sam) Liu	fb44b9db95	[OpenCL][CUDA][HIP][SYCL] Add norecurse norecurse function attr indicates the function is not called recursively directly or indirectly. Add norecurse to OpenCL functions, SYCL functions in device compilation and CUDA/HIP kernels. Although there is LLVM pass adding norecurse to functions, it only works for whole-program compilation. Also FE adding norecurse can make that pass run faster since functions with norecurse do not need to be checked again. Differential Revision: https://reviews.llvm.org/D73651	2020-02-16 20:41:00 -05:00
Matt Arsenault	a3c814d234	Separately track input and output denormal mode AMDGPU and x86 at least both have separate controls for whether denormal results are flushed on output, and for whether denormals are implicitly treated as 0 as an input. The current DAGCombiner use only really cares about the input treatment of denormals.	2020-02-04 12:59:21 -05:00
Matt Arsenault	a4451d88ee	Consolidate internal denormal flushing controls Currently there are 4 different mechanisms for controlling denormal flushing behavior, and about as many equivalent frontend controls. - AMDGPU uses the fp32-denormals and fp64-f16-denormals subtarget features - NVPTX uses the nvptx-f32ftz attribute - ARM directly uses the denormal-fp-math attribute - Other targets indirectly use denormal-fp-math in one DAGCombine - cl-denorms-are-zero has a corresponding denorms-are-zero attribute AMDGPU wants a distinct control for f32 flushing from f16/f64, and as far as I can tell the same is true for NVPTX (based on the attribute name). Work on consolidating these into the denormal-fp-math attribute, and a new type specific denormal-fp-math-f32 variant. Only ARM seems to support the two different flush modes, so this is overkill for the other use cases. Ideally we would error on the unsupported positive-zero mode on other targets from somewhere. Move the logic for selecting the flush mode into the compiler driver, instead of handling it in cc1. denormal-fp-math/denormal-fp-math-f32 are now both cc1 flags, but denormal-fp-math-f32 is not yet exposed as a user flag. -cl-denorms-are-zero, -fcuda-flush-denormals-to-zero and -fno-cuda-flush-denormals-to-zero will be mapped to -fp-denormal-math-f32=ieee or preserve-sign rather than the old attributes. Stop emitting the denorms-are-zero attribute for the OpenCL flag. It has no in-tree users. The meaning would also be target dependent, such as the AMDGPU choice to treat this as only meaning allow flushing of f32 and not f16 or f64. The naming is also potentially confusing, since DAZ in other contexts refers to instructions implicitly treating input denormals as zero, not necessarily flushing output denormals to zero. This also does not attempt to change the behavior for the current attribute. The LangRef now states that the default is ieee behavior, but this is inaccurate for the current implementation. The clang handling is slightly hacky to avoid touching the existing denormal-fp-math uses. Fixing this will be left for a future patch. AMDGPU is still using the subtarget feature to control the denormal mode, but the new attribute are now emitted. A future change will switch this and remove the subtarget features.	2020-01-17 20:09:53 -05:00
Yaxun (Sam) Liu	9f2d8b5c0c	[HIP] Add option --gpu-max-threads-per-block=n Add this option to change the default launch bounds. Differential Revision: https://reviews.llvm.org/D71221	2020-01-07 11:18:00 -05:00
Matt Arsenault	e531750c6c	clang: Add -fconvergent-functions flag The CUDA builtin library is apparently compiled in C++ mode, so the assumption of convergent needs to be made in a typically non-SPMD language. The functions in the library should still be assumed convergent. Currently they are not, which is potentially incorrect and this happens to work after the library is linked.	2019-11-19 23:20:15 +05:30
Michael Liao	0a220de9e9	[HIP] Fix visibility for 'extern' device variables. Summary: - Fix a bug which misses the change for a variable to be set with target-specific attributes. Reviewers: yaxunl Subscribers: jvesely, nhaehnle, cfe-commits Tags: #clang Differential Revision: https://reviews.llvm.org/D63020	2019-11-05 14:19:32 -05:00
Michael Liao	15140e4bac	[hip] Enable pointer argument lowering through coercing type. Reviewers: tra, rjmccall, yaxunl Subscribers: jvesely, nhaehnle, cfe-commits Tags: #clang Differential Revision: https://reviews.llvm.org/D69826	2019-11-05 13:05:05 -05:00
Yaxun (Sam) Liu	4264e7bbfd	[CUDA][HIP] Disable emitting llvm.linker.options in device compilation The linker options (e.g. pragma detect_mismatch) are intended for host compilation only, therefore disable it for device compilation. Differential Revision: https://reviews.llvm.org/D57829	2019-11-04 23:21:39 -05:00
Yaxun (Sam) Liu	68f5ca4e19	[HIP] Add option -fgpu-allow-device-init Add this option to allow device side class type global variables with non-trivial ctor/dtor. device side init/fini functions will be emitted, which will be executed by HIP runtime when the fat binary is loaded/unloaded. This feature is to facilitate implementation of device side sanitizer which requires global vars with non-trival ctors. By default this option is disabled. Differential Revision: https://reviews.llvm.org/D69268	2019-10-22 16:06:20 -04:00
Michael Liao	243ebfba17	[hip][cuda] Fix the extended lambda name mangling issue. Summary: - HIP/CUDA host side needs to use device kernel symbol name to match the device side binaries. Without a consistent naming between host- and device-side compilations, it's risky that wrong device binaries are executed. Consistent naming is usually not an issue until unnamed types are used, especially the lambda. In this patch, the consistent name mangling is addressed for the extended lambdas, i.e. the lambdas annotated with `__device__`. - In [Itanium C++ ABI][1], the mangling of the lambda is generally unspecified unless, in certain cases, ODR rule is required to ensure consisent naming cross TUs. The extended lambda is such a case as its name may be part of a device kernel function, e.g., the extended lambda is used as a template argument and etc. Thus, we need to force ODR for extended lambdas as they are referenced in both device- and host-side TUs. Furthermore, if a extended lambda is nested in other (extended or not) lambdas, those lambdas are required to follow ODR naming as well. This patch revises the current lambda mangle numbering to force ODR from an extended lambda to all its parent lambdas. - On the other side, the aforementioned ODR naming should not change those lambdas' original linkages, i.e., we cannot replace the original `internal` with `linkonce_odr`; otherwise, we may violate ODR in general. This patch introduces a new field `HasKnownInternalLinkage` in lambda data to decouple the current linkage calculation based on mangling number assigned. [1]: https://itanium-cxx-abi.github.io/cxx-abi/abi.html Reviewers: tra, rsmith, yaxunl, martong, shafik Subscribers: cfe-commits Tags: #clang Differential Revision: https://reviews.llvm.org/D68818 llvm-svn: 375309	2019-10-19 00:15:19 +00:00
Yaxun Liu	229c78d3a5	[CUDA][HIP] Fix host/device check with -fopenmp CUDA/HIP program may be compiled with -fopenmp. In this case, -fopenmp is only passed to host compilation to take advantages of multi-threads computation. CUDA/HIP and OpenMP both use Sema::DeviceCallGraph to store functions to be analyzed and remove them once they decide the function is sure to be emitted. CUDA/HIP and OpenMP have different functions to determine if a function is sure to be emitted. To check host/device correctly for CUDA/HIP when -fopenmp is enabled, there needs a unified logic to determine whether a function is to be emitted. The logic needs to be aware of both CUDA and OpenMP logic. Differential Revision: https://reviews.llvm.org/D67837 llvm-svn: 374263	2019-10-09 23:54:10 +00:00
Yaxun Liu	1282889347	[HIP] Support new kernel launching API Differential Revision: https://reviews.llvm.org/D67947 llvm-svn: 372773	2019-09-24 19:16:40 +00:00
Yaxun Liu	1bea97c971	[AMDGPU] Set default flat work group size to (1,256) for HIP Differential Revision: https://reviews.llvm.org/D67048 llvm-svn: 370808	2019-09-03 18:50:24 +00:00
Tim Northover	a009a60a91	IR: print value numbers for unnamed function arguments For consistency with normal instructions and clarity when reading IR, it's best to print the %0, %1, ... names of function arguments in definitions. Also modifies the parser to accept IR in that form for obvious reasons. llvm-svn: 367755	2019-08-03 14:28:34 +00:00
Vyacheslav Zakharin	de811d1f51	[clang] Preserve names of addrspacecast'ed values. Differential Revision: https://reviews.llvm.org/D63846 llvm-svn: 365666	2019-07-10 17:10:05 +00:00
Christudasan Devadasan	18ba9d6077	[AMDGPU] Increased the number of implicit argument bytes for both OpenCL and HIP (CLANG). To enable a new implicit kernel argument, increased the number of argument bytes from 48 to 56. Reviewed By: yaxunl Differential Revision: https://reviews.llvm.org/D63756 llvm-svn: 365643	2019-07-10 15:10:08 +00:00
Yaxun Liu	c3dfe9082b	[HIP] Support attribute hip_pinned_shadow This patch introduces support of hip_pinned_shadow variable for HIP. A hip_pinned_shadow variable is a global variable with attribute hip_pinned_shadow. It has external linkage on device side and has no initializer. It has internal linkage on host side and has initializer or static constructor. It can be accessed in both device code and host code. This allows HIP runtime to implement support of HIP texture reference. Differential Revision: https://reviews.llvm.org/D62738 llvm-svn: 364381	2019-06-26 03:47:37 +00:00
Yaxun Liu	cabce71845	[AMDGPU] Enable the implicit arguments for HIP (CLANG) Enable 48-bytes of implicit arguments for HIP as well. Earlier it was enabled for OpenCL. This code is specific to AMDGPU target. Differential Revision: https://reviews.llvm.org/D62244 llvm-svn: 363414	2019-06-14 15:54:47 +00:00
Tim Northover	c46827c7ed	LLVM IR: Generate new-style byval-with-Type from Clang LLVM IR recently added a Type parameter to the byval Attribute, so that when pointers become opaque and no longer have an element type the information will still be present in IR. For now the Type parameter is optional (which is why Clang didn't need this change at the time), but it will become mandatory soon. llvm-svn: 362652	2019-06-05 21:12:14 +00:00
Michael Liao	4b7a713acc	[CUDA][HIP] Skip setting `externally_initialized` for static device variables. Summary: - By declaring device variables as `static`, we assume they won't be addressable from the host side. Thus, no `externally_initialized` is required. Reviewers: yaxunl Subscribers: cfe-commits Tags: #clang Differential Revision: https://reviews.llvm.org/D62603 llvm-svn: 361994	2019-05-29 17:23:27 +00:00
Yaxun Liu	dc805a4906	Fix failure of lit test dependent-libs.cu llvm-svn: 361905	2019-05-29 01:34:44 +00:00
Yaxun Liu	02afe4e077	[CUDA][HIP] Emit dependent libs for host only Recently D60274 was introduced to allow lld to handle dependent libs. However current usage of dependent libs (e.g. pragma comment(lib, *) in windows header files) are intended for host only. Emitting the metadata in device IR causes link error in device path. Until there is a way to different it dependent libs for device or host, metadata for dependent libs should be emitted for host only. This patch enforces that. Differential Revision: https://reviews.llvm.org/D62483 llvm-svn: 361880	2019-05-28 21:18:59 +00:00
Michael Liao	3820506960	[HIP] Fix visibility of `__constant__` variables. Summary: - `__constant__` variables should not be `hidden` as the linker may turn them into `LOCAL` symbols. Reviewers: yaxunl Subscribers: jvesely, nhaehnle, cfe-commits Tags: #clang Differential Revision: https://reviews.llvm.org/D61194 llvm-svn: 359344	2019-04-26 19:31:48 +00:00
Aaron Enye Shi	8129521318	[HIP-Clang] Fat binary should not be produced for non GPU code 2 Also for CUDA, we need to disable producing these fat binary functions when there is no GPU code. Reviewers: yaxunl, tra Differential Revision: https://reviews.llvm.org/D60141 llvm-svn: 357526	2019-04-02 20:49:41 +00:00
Aaron Enye Shi	13d8e92940	[HIP-Clang] Fat binary should not be produced for non GPU code Skip producing the fat binary functions for HIP when no device code is present. Reviewers: yaxunl Differential Review: https://reviews.llvm.org/D60141 llvm-svn: 357520	2019-04-02 20:10:18 +00:00
Michael Liao	982cbb6232	[CUDA][HIP][DebugInfo] Skip reference device function Summary: - A device functions could be used as a non-type template parameter in a global/host function template. However, we should not try to retrieve that device function and reference it in the host-side debug info as it's only valid at device side. Subscribers: aprantl, jdoerfert, cfe-commits Tags: #clang Differential Revision: https://reviews.llvm.org/D58992 llvm-svn: 355551	2019-03-06 21:16:27 +00:00
Yaxun Liu	e739ac0e25	[HIP] change kernel stub name Add .stub to kernel stub function name so that it is different from kernel name in device code. This is necessary to let debugger find correct symbol for kernel. Differential Revision: https://reviews.llvm.org/D58518 llvm-svn: 354948	2019-02-27 02:02:52 +00:00
Yaxun Liu	00ebc0cb92	revert r354615: [HIP] change kernel stub name It caused regressions. Differential Revision: https://reviews.llvm.org/D58518 llvm-svn: 354651	2019-02-22 04:20:12 +00:00
Yaxun Liu	8d7cf0e2d4	[HIP] change kernel stub name Add .stub to kernel stub function name so that it is different from kernel name in device code. This is necessary to let debugger find correct symbol for kernel Differential Revision: https://reviews.llvm.org/D58518 llvm-svn: 354615	2019-02-21 20:12:16 +00:00
Yaxun Liu	c18e9ecd4f	[CUDA][HIP] Use device side kernel and variable names when registering them __hipRegisterFunction and __hipRegisterVar need to accept device side kernel and variable names so that HIP runtime can associate kernel stub functions in host code with kernel symbols in fat binaries, and associate shadow variables in host code with device variables in fat binaries. Currently, clang assumes kernel functions and device variables have the same name as the kernel stub functions and shadow variables. However, when host is compiled in windows with MSVC C++ ABI and device is compiled with Itanium C++ ABI (e.g. AMDGPU), kernels and device symbols in fat binary are mangled differently than host. This patch gets the device side kernel and variable name by mangling them in the mangle context of aux target. Differential Revision: https://reviews.llvm.org/D58163 llvm-svn: 354004	2019-02-14 02:00:09 +00:00
Alexey Bataev	1a9e05d7da	[DEBUG_INFO][NVPTX] Generate correct data about variable address class. Summary: Added ability to generate correct debug info data about the variable address class. Currently, for all the locals and globals the default values are used, ADDR_local_space(6) for locals and ADDR_global_space(5) for globals. The values are taken from the table in https://docs.nvidia.com/cuda/archive/10.0/ptx-writers-guide-to-interoperability/index.html#cuda-specific-dwarf. We need to emit correct data for address classes of, at least, shared and constant globals. Currently, all these variables are treated by the cuda-gdb debugger as the variables in the global address space and, thus, it require manual data type casting. Reviewers: echristo, probinson Subscribers: jholewinski, aprantl, cfe-commits Differential Revision: https://reviews.llvm.org/D57162 llvm-svn: 353204	2019-02-05 19:45:57 +00:00
Yaxun Liu	277e064bf5	Do not copy long double and 128-bit fp format from aux target for AMDGPU rC352620 caused regressions because it copied floating point format from aux target. floating point format decides whether extended long double is supported. It is x86_fp80 on x86 but IEEE double on amdgcn. Document usage of long doubel type in HIP programming guide https://github.com/ROCm-Developer-Tools/HIP/pull/890 Differential Revision: https://reviews.llvm.org/D57527 llvm-svn: 352801	2019-01-31 21:57:51 +00:00
Artem Belevich	c62214da3d	[CUDA] add support for the new kernel launch API in CUDA-9.2+. Instead of calling CUDA runtime to arrange function arguments, the new API constructs arguments in a local array and the kernels are launched with __cudaLaunchKernel(). The old API has been deprecated and is expected to go away in the next CUDA release. Differential Revision: https://reviews.llvm.org/D57488 llvm-svn: 352799	2019-01-31 21:34:03 +00:00
Artem Belevich	9953577cb2	[CUDA] Treat extern global variable shadows same as regular extern vars. This fixes compiler crash when we attempted to compile this code: extern __device__ int data; __device__ int data = 1; Differential Revision: https://reviews.llvm.org/D56033 llvm-svn: 349981	2018-12-22 01:11:09 +00:00
Artem Belevich	7b05666a19	[CUDA] Make all host-side shadows of device-side variables undef. The host-side code can't (and should not) access the values that may only exist on the device side. E.g. address of a __device__ function does not exist on the host side as we don't generate the code for it there. Differential Revision: https://reviews.llvm.org/D55663 llvm-svn: 349087	2018-12-13 21:43:04 +00:00
Sean Fertile	d900dd0c23	Revert "[CodeGenCXX] Treat 'this' as noalias in constructors" This reverts commit https://reviews.llvm.org/rL344150 which causes MachineOutliner related failures on the ppc64le multistage buildbot. llvm-svn: 344526	2018-10-15 15:43:00 +00:00
Anton Bikineev	cc7e74753a	[CodeGenCXX] Treat 'this' as noalias in constructors This is currently a clang extension and a resolution of the defect report in the C++ Standard. Differential Revision: https://reviews.llvm.org/D46441 llvm-svn: 344150	2018-10-10 16:14:51 +00:00
Yaxun Liu	9767089d00	[HIP] Support early finalization of device code for -fno-gpu-rdc This patch renames -f{no-}cuda-rdc to -f{no-}gpu-rdc and keeps the original options as aliases. When -fgpu-rdc is off, clang will assume the device code in each translation unit does not call external functions except those in the device library, therefore it is possible to compile the device code in each translation unit to self-contained kernels and embed them in the host object, so that the host object behaves like usual host object which can be linked by lld. The benefits of this feature is: 1. allow users to create static libraries which can be linked by host linker; 2. amortized device code linking time. This patch modifies HIP action builder to insert actions for linking device code and generating HIP fatbin, and pass HIP fatbin to host backend action. It extracts code for constructing command for generating HIP fatbin as a function so that it can be reused by early finalization. It also modifies codegen of HIP host constructor functions to embed the device fatbin when it is available. Differential Revision: https://reviews.llvm.org/D52377 llvm-svn: 343611	2018-10-02 17:48:54 +00:00
Artem Belevich	78929efb4d	[CUDA] Ignore uncallable functions when we check for usual deallocators. Previously clang considered function variants from both sides of compilation and that resulted in picking up wrong deallocation function. Differential Revision: https://reviews.llvm.org/D51808 llvm-svn: 342749	2018-09-21 17:29:33 +00:00
Matt Arsenault	a13746b7eb	Rename -mlink-cuda-bitcode to -mlink-builtin-bitcode The same semantics work for OpenCL, and probably any offload language. Keep the old name around as an alias. llvm-svn: 340193	2018-08-20 18:16:48 +00:00
Yaxun Liu	94ff57f5b1	[HIP] Make __hip_gpubin_handle hidden to avoid being merged across different shared libraries Different shared libraries contain different fat binary, which is stored in a global variable __hip_gpubin_handle. Since different compilation units share the same fat binary, this variable has linkonce linkage. However, it should not be merged across different shared libraries. This patch set the visibility of the global variable to be hidden, which will make it invisible in the shared library, therefore preventing it from being merged. Differential Revision: https://reviews.llvm.org/D50596 llvm-svn: 340056	2018-08-17 17:47:31 +00:00
Matt Arsenault	c65f966d76	Try to make builtin address space declarations not useless The way address space declarations for builtins currently work is nearly useless. The code assumes the address spaces used for builtins is a confusingly named "target address space" from user code using __attribute__((address_space(N))) that matches the builtin declaration. There's no way to use this to declare a builtin that returns a language specific address space. The terminology used is highly cofusing since it has nothing to do with the the address space selected by the target to use for a language address space. This feature is essentially unused as-is. AMDGPU and NVPTX are the only in-tree targets attempting to use this. The AMDGPU builtins certainly do not behave as intended (i.e. all of the builtins returning pointers can never compile because the numbered address space never matches the expected named address space). The NVPTX builtins are missing tests for some, and the others seem to rely on an implicit addrspacecast. Change the used address space for builtins based on a target hook to allow using a language address space for a builtin. This allows the same builtin declaration to be used for multiple languages with similarly purposed address spaces (e.g. the same AMDGPU builtin can be used in OpenCL and CUDA even though the constant address spaces are arbitarily different). This breaks the possibility of using arbitrary numbered address spaces alongside the named address spaces for builtins. If this is an issue we probably need to introduce another builtin declaration character to distinguish language address spaces from so-called "target address spaces". llvm-svn: 338707	2018-08-02 12:14:28 +00:00
Yaxun Liu	a4005e13f7	[CUDA][HIP] Allow function-scope static const variable CUDA 8.0 E.3.9.4 says: Within the body of a __device__ or __global__ function, only __shared__ variables or variables without any device memory qualifiers may be declared with static storage class. It is unclear how a function-scope non-const static variable without device memory qualifier is implemented, therefore only static const variable without device memory qualifier is allowed, which can be emitted as a global variable in constant address space. Currently clang only allows function-scope static variable with __shared__ qualifier. This patch also allows function-scope static const variable without device memory qualifier and emits it as a global variable in constant address space. Differential Revision: https://reviews.llvm.org/D49931 llvm-svn: 338188	2018-07-28 03:05:25 +00:00
Yaxun Liu	e1bfbc589f	[HIP] Support -fcuda-flush-denormals-to-zero for amdgcn Differential Revision: https://reviews.llvm.org/D48287 llvm-svn: 337639	2018-07-21 02:02:22 +00:00
Yaxun Liu	f99752b66b	[HIP] Register/unregister device fat binary only once HIP generates one fat binary for all devices after linking. However, for each compilation unit a ctor function is emitted which register the same fat binary. Measures need to be taken to make sure the fat binary is only registered once. Currently each ctor function calls __hipRegisterFatBinary and stores the returned value to __hip_gpubin_handle. This patch changes the linkage of __hip_gpubin_handle to be linkonce so that they are shared between LLVM modules. Then this patch adds check of value of __hip_gpubin_handle to make sure __hipRegisterFatBinary is only called once. The code is equivalent to void *_gpubin_handle; void ctor() { if (__hip_gpubin_handle == 0) { __hip_gpubin_handle = __hipRegisterFatBinary(...); } // register kernels and variables. } The patch also does similar change to dtors so that __hipUnregisterFatBinary is called once. Differential Revision: https://reviews.llvm.org/D49083 llvm-svn: 337631	2018-07-20 22:45:24 +00:00
Joel E. Denny	72c2783012	[FileCheck] Add -allow-deprecated-dag-overlap to failing clang tests See https://reviews.llvm.org/D47106 for details. Reviewed By: probinson Differential Revision: https://reviews.llvm.org/D47172 llvm-svn: 336844	2018-07-11 20:26:20 +00:00
Artem Belevich	c66d254ded	[CUDA] Use atexit() to call module destructor. This matches the way NVCC does it. Doing module cleanup at global destructor phase used to work, but is, apparently, too late for the CUDA runtime in CUDA-9.2, which ends up crashing with double-free. Differential Revision: https://reviews.llvm.org/D48613 llvm-svn: 335763	2018-06-27 18:32:51 +00:00
Yaxun Liu	aa24601f98	[CUDA][HIP] Allow CUDA __global__ functions to have amdgpu kernel attributes There are HIP applications e.g. Tensorflow 1.3 using amdgpu kernel attributes, however currently they are only allowed on OpenCL kernel functions. This patch will allow amdgpu kernel attributes to be applied to CUDA/HIP __global__ functions. Differential Revision: https://reviews.llvm.org/D47958 llvm-svn: 334561	2018-06-12 23:58:59 +00:00

1 2 3 4 5

246 Commits