Summary:
Before this change, X86_32ABIInfo::classifyArgument would be called
twice on vector arguments to vectorcall functions. This function has
side effects to track GPR register usage, and this would lead to
incorrect GPR usage in some cases. The specific case I noticed is from
running out of XMM registers with mixed FP and vector arguments and no
aggregates of any kind. Consider this prototype:
void __vectorcall vectorcall_indirect_vec(
double xmm0, double xmm1, double xmm2, double xmm3, double xmm4,
__m128 xmm5,
__m128 ecx,
int edx,
__m128 mem);
classifyArgument has no effects when called on a plain FP type, but when
called on a vector type, it modifies FreeRegs to model GPR consumption.
However, this should not happen during the vector call first pass.
I refactored the code to unify vectorcall HVA logic with regcall HVA
logic. The conventions pass HVAs in registers differently (expanded vs.
not expanded), but if they do not fit in registers, they both pass them
indirectly by address.
Reviewers: erichkeane, craig.topper
Subscribers: cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D72110
Summary:
Before this patch adding a new /D flag when compiling a source file that consumed a PCH with clang-cl would issue a diagnostic and then fail. With the patch, the diagnostic is still issued but the definition is accepted. This matches the msvc behavior. The fuzzy-pch-msvc.c is a clone of the existing fuzzy-pch.c tests with some msvc specific rework.
msvc diagnostic:
warning C4605: '/DBAR=int' specified on current command line, but was not specified when precompiled header was built
Output of the CHECK-BAR test prior to the code change:
<built-in>(1,9): warning: definition of macro 'BAR' does not match definition in precompiled header [-Wclang-cl-pch]
#define BAR int
^
D:\repos\llvm\llvm-project\clang\test\PCH\fuzzy-pch-msvc.c(12,1): error: unknown type name 'BAR'
BAR bar = 17;
^
D:\repos\llvm\llvm-project\clang\test\PCH\fuzzy-pch-msvc.c(23,4): error: BAR was not defined
# error BAR was not defined
^
1 warning and 2 errors generated.
Reviewers: rnk, thakis, hans, zturner
Subscribers: mikerice, aganea, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D72405
Pass small FP values in GPRs or stack memory according the the normal
convention. This is what gcc -mno-sse does on Win64.
I adjusted the conditions under which we emit an error to check if the
argument or return value would be passed in an XMM register when SSE is
disabled. This has a side effect of no longer emitting an error for FP
arguments marked 'inreg' when targetting x86 with SSE disabled. Our
calling convention logic was already assigning it to FP0/FP1, and then
we emitted this error. That seems unnecessary, we can ignore 'inreg' and
compile it without SSE.
Reviewers: jyknight, aemerson
Differential Revision: https://reviews.llvm.org/D70465
This allows us to generate better code for selecting the fixup
to load.
Previously when the sign was set we had to load offset 0. And
when it was clear we had to load offset 4. This required a testl,
setns, zero extend, and finally a mul by 4. By switching the offsets
we can just shift the sign bit into the lsb and multiply it by 4.
On Fuchsia, pthread API is emulated on top of C11 thread API. Using C11
thread API directly is more efficient.
While this implementation is only used by Fuchsia at the moment, it's
not Fuchsia specific, and could be used by other platforms that use C11
threads rather than pthreads in the future.
Differential Revision: https://reviews.llvm.org/D64378
The reference counter for global objects marked with declare target is INF. This patch prevents the runtime from incrementing /decrementing INF refcounts. Without it, the map(delete: global_object) directive actually deallocates the global on the device. With this patch, such a directive becomes a no-op.
Differential Revision: https://reviews.llvm.org/D72525
Summary:
- `dead-mi-elimination` assumes MIR in the SSA form and cannot be
arranged after phi elimination or DeSSA. It's enhanced to handle the
dead register definition by skipping use check on it. Once a register
def is `dead`, all its uses, if any, should be `undef`.
- Re-arrange the DIE in RA phase for AMDGPU by placing it directly after
`detect-dead-lanes`.
- Many relevant tests are refined due to different register assignment.
Reviewers: rampitec, qcolombet, sunfish
Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D72709
This commit defines a new SPIR-V dialect attribute for specifying
a SPIR-V target environment. It is a dictionary attribute containing
the SPIR-V version, supported extension list, and allowed capability
list. A SPIRVConversionTarget subclass is created to take in the
target environment and sets proper dynmaically legal ops by querying
the op availability interface of SPIR-V ops to make sure they are
available in the specified target environment. All existing conversions
targeting SPIR-V is changed to use this SPIRVConversionTarget. It
probes whether the input IR has a `spv.target_env` attribute,
otherwise, it uses the default target environment: SPIR-V 1.0 with
Shader capability and no extra extensions.
Differential Revision: https://reviews.llvm.org/D72256
For IR input files, we currently use LLVM diagnostic handler even the
compilation is from clang. As a result, we are not able to use -Rpass
to get the transformation reports. Some warnings are not handled
properly either: We found many mysterious warnings in our ThinLTO backend
compilations in SamplePGO and CSPGO. An example of the warning:
"warning: net/proto2/public/metadata_lite.h:51:21: 0.02% (1 / 4999)"
This turns out to be a warning by Wmisexpect, which is supposed to be
filtered out by default. But since the filter is in clang's
diagnostic hander, we emit these incomplete warnings from LLVM's
diagnostic handler.
This patch uses clang diagnostic handler for IR input files. We create
a fake backendconsumer just to install the diagnostic handler.
With this change, we will have proper handling of all the warnings and we can
use -Rpass* options in IR input files compilation.
Also note that with is patch, LLVM's diagnostic options, like
"-mllvm -pass-remarks=*", are no longer be able to get optimization remarks.
Differential Revision: https://reviews.llvm.org/D72523
Summary:
This allows for users to cache printer state, which can be costly to recompute. Each of the IR print methods gain a new overload taking this new state class.
Depends On D72293
Reviewed By: jpienaar
Differential Revision: https://reviews.llvm.org/D72294
Summary:
This was previously disabled as FunctionType TypeAttrs could not be roundtripped in the IR. This has been fixed, so we can now generically print FuncOp.
Depends On D72429
Reviewed By: jpienaar, mehdi_amini
Differential Revision: https://reviews.llvm.org/D72642
Allow to build PCH's (with -building-pch-with-obj and the extra .o file)
with -fmodules-codegen -fmodules-debuginfo to allow emitting shared code
into the extra .o file, similarly to how it works with modules. A bit of
a misnomer, but the underlying functionality is the same. This saves up
to 20% of build time here.
Differential Revision: https://reviews.llvm.org/D69778
If a header contains 'extern template', then the template should be provided
somewhere by an explicit instantiation, so it is not necessary to generate
a copy. Worse, this can lead to an unresolved symbol, because the codegen's
object file will not actually contain functions from such a template
because of the GVA_AvailableExternally, but the object file for the explicit
instantiation will not contain them either because it will be blocked
by the information provided by the module.
Differential Revision: https://reviews.llvm.org/D69779
Summary:
This diff fixes issues with the semantics of linalg.generic on tensors that appeared when converting directly from HLO to linalg.generic.
The changes are self-contained within MLIR and can be captured and tested independently of XLA.
The linalg.generic and indexed_generic are updated to:
To allow progressive lowering from the value world (a.k.a tensor values) to
the buffer world (a.k.a memref values), a linalg.generic op accepts
mixing input and output ranked tensor values with input and output memrefs.
```
%1 = linalg.generic #trait_attribute %A, %B {other-attributes} :
tensor<?x?xf32>,
memref<?x?xf32, stride_specification>
-> (tensor<?x?xf32>)
```
In this case, the number of outputs (args_out) must match the sum of (1) the
number of output buffer operands and (2) the number of tensor return values.
The semantics is that the linalg.indexed_generic op produces (i.e.
allocates and fills) its return values.
Tensor values must be legalized by a buffer allocation pass before most
transformations can be applied. Such legalization moves tensor return values
into output buffer operands and updates the region argument accordingly.
Transformations that create control-flow around linalg.indexed_generic
operations are not expected to mix with tensors because SSA values do not
escape naturally. Still, transformations and rewrites that take advantage of
tensor SSA values are expected to be useful and will be added in the near
future.
Subscribers: bmahjour, mehdi_amini, rriddle, jpienaar, burmako, shauheen, antiagainst, arpith-jacob, mgester, lucyrfox, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D72555
There are no special virtual function handlers (like __cxa_pure_virtual)
defined for NVPTX target, so just emit such functions as null pointers
to prevent issues with linking and unresolved references.
Summary: bfloat16 doesn't have a valid APFloat format, so we have to use double semantics when storing it. This change makes sure that hexadecimal values can be round-tripped properly given this fact.
Reviewed By: mehdi_amini
Differential Revision: https://reviews.llvm.org/D72667
These intrinsics expand to a variable number of instructions so just like in
ISelLowering.cpp we use custom code to deal with them.
Committing Tim's original patch.
Differential Revision: https://reviews.llvm.org/D65656
This code is untested in tree because the "APFloat::semanticsPrecision(sem) >= SrcVT.getSizeInBits() - 1" check is false for most combinations for int and fp types except maybe i32 and f64. For that you would need i32 to be an illegal type, but f64 to be legal and have custom handling for legalizing the split sint_to_fp. The precision check itself was added in 2010 to fix a double rounding issue in the algorithm that would occur if the sint_to_fp was not able to do the conversion without rounding.
Differential Revision: https://reviews.llvm.org/D72728
The tests aren't concerned at all by the actual sanitizer - only by blacklist being reported as a dependency.
We're unfortunately limited by platform support for any particular sanitizer but we can at least use one that is widely supported.
Post-commit review:
https://reviews.llvm.org/D72729
When multiple guard intrinsics are merged into one, currently the
result of eraseInstFromFunction() is returned -- however, this
should only be done if the current instruction is being removed.
In this case we're removing a different instruction and should
instead report that the current one has been modified by returning it.
For this test case, this reduces the number of instcombine iterations
from 5 to 2 (the minimum possible).
Differential Revision: https://reviews.llvm.org/D72558
Summary:
This patch adds an option to limit debug info by only emitting complete class
type information when its constructor is emitted. This applies to classes
that have nontrivial user defined constructors.
I implemented the option by adding another level to `DebugInfoKind`, and
a flag `-flimit-debug-info-constructor`.
Total object file size on Windows, compiling with RelWithDebInfo:
before: 4,257,448 kb
after: 2,104,963 kb
And on Linux
before: 9,225,140 kb
after: 4,387,464 kb
According to the Windows clang.pdb files, here is a list of types that are no
longer complete with this option enabled: https://reviews.llvm.org/P8182
Reviewers: rnk, dblaikie
Subscribers: aprantl, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D72427
The primary motivation for this is to add another dimension to the
Swift LLDB test matrix, but this seems generally useful.
Differential Revision: https://reviews.llvm.org/D72662
This also makes this function consistent with the rest of the
libc++ provided fallbacks.
The locale support in msvcrt.dll is very limited anyway; it can
only be configured processwide, not per thread, and it only seems
to support the locales "C" and "" (the user set locale), so it's
hard to make any meaningful automatic test for it. But manually tested,
this change does make time formatting locale code in libc++ output
times in the user requested format, when using locale "".
Differential Revision: https://reviews.llvm.org/D69554
TSan spuriously reports for any OpenMP application a race on the initialization
of a runtime internal mutex:
```
Atomic read of size 1 at 0x7b6800005940 by thread T4:
#0 pthread_mutex_lock <null> (a.out+0x43f39e)
#1 __kmp_resume_64 <null> (libomp.so.5+0x84db4)
Previous write of size 1 at 0x7b6800005940 by thread T7:
#0 pthread_mutex_init <null> (a.out+0x424793)
#1 __kmp_suspend_initialize_thread <null> (libomp.so.5+0x8422e)
```
According to @AndreyChurbanov this is a false positive report, as the control
flow of the runtime guarantees the ordering of the mutex initialization and
the lock:
https://software.intel.com/en-us/forums/intel-open-source-openmp-runtime-library/topic/530363
To suppress this report, I suggest the use of
TSAN_OPTIONS='ignore_uninstrumented_modules=1'.
With this patch, a runtime warning is provided in case an OpenMP application
is built with Tsan and executed without this Tsan-option.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D70412
This ports the MergeFunctions pass to the NewPM. This was rather
straightforward, as no analyses are used.
Additionally MergeFunctions needs to be conditionally enabled in
the PassBuilder, but I left that part out of this patch.
Differential Revision: https://reviews.llvm.org/D72537
This fixes the issue encountered in D71164. Instead of using a
range-based for, manually iterate over the users and advance the
iterator beforehand, so we do not skip any users due to iterator
invalidation.
Differential Revision: https://reviews.llvm.org/D72657
Summary:
[nfc][libomptarget] Refactor nxptx/target_impl.cu
Use __kmpc_impl_atomic_add instead of atomicAdd to match the rest of the file.
Alternatively, target_impl.cu could use the cuda functions directly. Using a mixture in this
file was an oversight, happy to resolve in either direction.
Removed some comments that look outdated.
Call __kmpc_impl_unset_lock directly to avoid a redundant diagnostic and remove an implict
dependency on interface.h.
Reviewers: ABataev, grokos, jdoerfert
Reviewed By: jdoerfert
Subscribers: jfb, openmp-commits
Tags: #openmp
Differential Revision: https://reviews.llvm.org/D72719
Summary:
[nfc][libomptarget] Refactor amdgcn target_impl
Removes references to internal libraries from the header
Standardises on C++ mangling for all the target_impl functions
Update comment block
clang-format
Move some functions into a new target_impl.hip source file
This lays the groundwork for implementing the remaining unresolved
symbols in the target_impl.hip source.
Reviewers: jdoerfert, grokos, ABataev, ronlieb
Reviewed By: jdoerfert
Subscribers: jvesely, mgorny, jfb, openmp-commits
Tags: #openmp
Differential Revision: https://reviews.llvm.org/D72712
Summary:
Mem op clustering adds a weak edge in the DAG between two loads or
stores that should be clustered, but the direction of this edge is
pretty arbitrary (it depends on the sort order of MemOpInfo, which
represents the operands of a load or store). This often means that two
loads or stores will get reordered even if they would naturally have
been scheduled together anyway, which leads to test case churn and goes
against the scheduler's "do no harm" philosophy.
The fix makes sure that the direction of the edge always matches the
original code order of the instructions.
Reviewers: atrick, MatzeB, arsenm, rampitec, t.p.northover
Subscribers: jvesely, wdng, nhaehnle, kristof.beyls, hiraditya, javed.absar, arphaman, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D72706