Adds basic parsing/sema/serialization support for the
#pragma omp target parallel loop directive.
Differential Revision: https://reviews.llvm.org/D122359
GCC supports power-of-2 size structures for the arguments. Clang supports fewer than GCC. But Clang always crashes for the unsupported cases.
This patch adds sema checks to do the diagnosts to solve these crashes.
Reviewed By: jyu2
Differential Revision: https://reviews.llvm.org/D107141
Motivation:
```
int test(int x, int y) {
int r = 0;
[[clang::always_inline]] r += foo(x, y); // force compiler to inline this function here
return r;
}
```
In 2018, @kuhar proposed "Introduce per-callsite inline intrinsics" in https://reviews.llvm.org/D51200 to solve this motivation case (and many others).
This patch solves this problem with call site attribute. "noinline" statement attribute already landed in D119061. Also, some LLVM Inliner fixes landed so call site attribute is stronger than function attribute.
Reviewed By: aaron.ballman
Differential Revision: https://reviews.llvm.org/D120717
Clang is crashing on the following statement
char var[9];
__asm__ ("" : "=r" (var) : "0" (var));
This is similar to existing test: crbug_999160_regtest
The issue happens when EmitAsmStmt is trying to convert input to match
output type length. However, that is not guaranteed to be successful all the
time and if the statement itself is invalid like having an array type in
the example, we should give a regular error message here instead of
using assert().
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D120596
Motivation:
```
int foo(int x, int y) { // any compiler will happily inline this function
return x / y;
}
int test(int x, int y) {
int r = 0;
[[clang::noinline]] r += foo(x, y); // for some reason we don't want any inlining here
return r;
}
```
In 2018, @kuhar proposed "Introduce per-callsite inline intrinsics" in https://reviews.llvm.org/D51200 to solve this motivation case (and many others).
This patch solves this problem with call site attribute. The implementation is "smaller" wrt approach which uses new intrinsics and thanks to https://reviews.llvm.org/D79121 (Add nomerge statement attribute to clang), we have got some basic infrastructure to deal with attrs on statements with call expressions.
GCC devs are more inclined to call attribute solution as well, as builtins are problematic for them - https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104187. But they have no patch proposal yet so.. We have free hands here.
If this approach makes sense, next future steps would be support for call site attributes for always_inline / flatten.
Reviewed By: aaron.ballman, kuhar
Differential Revision: https://reviews.llvm.org/D119061
I noticed that the following case would compile in Clang but not GCC:
void *x(void) {
void *p = &&foo;
asm goto ("# %0\n\t# %l1":"+r"(p):::foo);
foo:;
return p;
}
Changing the output template above from %l2 would compile in GCC but not
Clang.
This demonstrates that when using tied outputs (say via the "+r" output
constraint), the hidden inputs occur or are numbered BEFORE the labels,
at least with GCC.
In fact, GCC does denote this in its documentation:
https://gcc.gnu.org/onlinedocs/gcc-11.2.0/gcc/Extended-Asm.html#Goto-Labels
> Output operand with constraint modifier ‘+’ is counted as two operands
> because it is considered as one output and one input operand.
For the sake of compatibility, I think it's worthwhile to just make this
change.
It's better to use symbolic names for compatibility (especially now
between released version of Clang that support asm goto with outputs).
ie. %l1 from the above would be %l[foo]. The GCC docs also make this
recommendation.
Also, I cleaned up some cruft in GCCAsmStmt::getNamedOperand. AFAICT,
NumPlusOperands was no longer used, though I couldn't find which commit
didn't clean that up correctly.
Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98096
Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103640
Link: https://gcc.gnu.org/onlinedocs/gcc-11.2.0/gcc/Extended-Asm.html#Goto-Labels
Reviewed By: void
Differential Revision: https://reviews.llvm.org/D115471
This implements the clang side of D116531. The elementtype
attribute is added for all indirect constraints (*) and tests are
updated accordingly.
Differential Revision: https://reviews.llvm.org/D116666
In 32bit mode, attaching TBAA metadata to the store following the call
to inline assembler results in describing the wrong type by making a
fake lvalue(i.e., whatever the inline assembler happens to leave in
EAX:EDX.) Even if inline assembler somehow describes the correct type,
setting TBAA information on return type of call to inline assembler is
likely not correct, since TBAA rules need not apply to inline assembler.
Differential Revision: https://reviews.llvm.org/D115320
With this,
void f() { __asm__("mov eax, ebx"); }
now compiles with clang with -masm=intel.
This matches gcc.
The flag is not accepted in clang-cl mode. It has no effect on
MSVC-style `__asm {}` blocks, which are unconditionally in intel
mode both before and after this change.
One difference to gcc is that in clang, inline asm strings are
"local" while they're "global" in gcc. Building the following with
-masm=intel works with clang, but not with gcc where the ".att_syntax"
from the 2nd __asm__() is in effect until file end (or until a
".intel_syntax" somewhere later in the file):
__asm__("mov eax, ebx");
__asm__(".att_syntax\nmovl %ebx, %eax");
__asm__("mov eax, ebx");
This also updates clang's intrinsic headers to work both in
-masm=att (the default) and -masm=intel modes.
The official solution for this according to "Multiple assembler dialects in asm
templates" in gcc docs->Extensions->Inline Assembly->Extended Asm
is to write every inline asm snippet twice:
bt{l %[Offset],%[Base] | %[Base],%[Offset]}
This works in LLVM after D113932 and D113894, so use that.
(Just putting `.att_syntax` at the start of the snippet works in some but not
all cases: When LLVM interpolates in parameters like `%0`, it uses at&t or
intel syntax according to the inline asm snippet's flavor, so the `.att_syntax`
within the snippet happens to late: The interpolated-in parameter is already
in intel style, and then won't parse in the switched `.att_syntax`.)
It might be nice to invent a `#pragma clang asm_dialect push "att"` /
`#pragma clang asm_dialect pop` to be able to force asm style per snippet,
so that the inline asm string doesn't contain the same code in two variants,
but let's leave that for a follow-up.
Fixes PR21401 and PR20241.
Differential Revision: https://reviews.llvm.org/D113707
This patch removes the assumption propagation that was added in D110655
primarily to get assumption informatino on opaque call sites for
optimizations. The analysis done in D111445 allows us to do this more
intelligently in the back-end.
Depends on D111445
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D111463
Modify the IfStmt node to suppoort constant evaluated expressions.
Add a new ExpressionEvaluationContext::ImmediateFunctionContext to
keep track of immediate function contexts.
This proved easier/better/probably more efficient than walking the AST
backward as it allows diagnosing nested if consteval statements.
This patch adds OpenMP assumption attributes to call sites in applicable
regions. Currently this applies the caller's assumption attributes to
any calls contained within it. So, if a call occurs inside an OpenMP
assumes region to a function outside that region, we will assume that
call respects the assumptions. This is primarily useful for inline
assembly calls used heavily in the OpenMP GPU device runtime, which
allows us to then make judgements about what the ASM will do.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D110655
This patch supports OpenMP 5.0 metadirective features.
It is implemented keeping the OpenMP 5.1 features like dynamic user condition in mind.
A new function, getBestWhenMatchForContext, is defined in llvm/Frontend/OpenMP/OMPContext.h
Currently this function return the index of the when clause with the highest score from the ones applicable in the Context.
But this function is declared with an array which can be used in OpenMP 5.1 implementation to select all the valid when clauses which can be resolved in runtime. Currently this array is set to null by default and its implementation is left for future.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D91944
This fixes a bug in clang where, when clang sees a switch with a
fallthrough to a default like this:
static void funcA(void) {}
static void funcB(void) {}
int main(int argc, char **argv) {
switch (argc) {
case 0:
funcA();
break;
case 10:
default:
funcB();
break;
}
}
It does not add a proper debug location for that switch case, such as
case 10: above.
Patch by Shubham Rastogi!
Differential Revision: https://reviews.llvm.org/D109940
This patch supports OpenMP 5.0 metadirective features.
It is implemented keeping the OpenMP 5.1 features like dynamic user condition in mind.
A new function, getBestWhenMatchForContext, is defined in llvm/Frontend/OpenMP/OMPContext.h
Currently this function return the index of the when clause with the highest score from the ones applicable in the Context.
But this function is declared with an array which can be used in OpenMP 5.1 implementation to select all the valid when clauses which can be resolved in runtime. Currently this array is set to null by default and its implementation is left for future.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D91944
This patch supports OpenMP 5.0 metadirective features.
It is implemented keeping the OpenMP 5.1 features like dynamic user condition in mind.
A new function, getBestWhenMatchForContext, is defined in llvm/Frontend/OpenMP/OMPContext.h
Currently this function return the index of the when clause with the highest score from the ones applicable in the Context.
But this function is declared with an array which can be used in OpenMP 5.1 implementation to select all the valid when clauses which can be resolved in runtime. Currently this array is set to null by default and its implementation is left for future.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D91944
In LLVM IR terms the ACLE type 'data512_t' is essentially an aggregate
type { [8 x i64] }. When emitting code for inline assembly operands,
clang tries to scalarize aggregate types to an integer of the equivalent
length, otherwise it passes them by-reference. This patch adds a target
hook to tell whether a given inline assembly operand is scalarizable
so that clang can emit code to pass/return it by-value.
Differential Revision: https://reviews.llvm.org/D94098
This is part of a patch series working towards the ability to make
SourceLocation into a 64-bit type to handle larger translation units.
!srcloc is generated in clang codegen, and pulled back out by llvm
functions like AsmPrinter::emitInlineAsm that need to report errors in
the inline asm. From there it goes to LLVMContext::emitError, is
stored in DiagnosticInfoInlineAsm, and ends up back in clang, at
BackendConsumer::InlineAsmDiagHandler(), which reconstitutes a true
clang::SourceLocation from the integer cookie.
Throughout this code path, it's now 64-bit rather than 32, which means
that if SourceLocation is expanded to a 64-bit type, this error report
won't lose half of the data.
The compiler will tolerate both of i32 and i64 !srcloc metadata in
input IR without faulting. Test added in llvm/MC. (The semantic
accuracy of the metadata is another matter, but I don't know of any
situation where that matters: if you're reading an IR file written by
a previous run of clang, you don't have the SourceManager that can
relate those source locations back to the original source files.)
Original version of the patch by Mikhail Maltsev.
Reviewed By: dexonsmith
Differential Revision: https://reviews.llvm.org/D105491
This change is intended as initial setup. The plan is to add
more semantic checks later. I plan to update the documentation
as more semantic checks are added (instead of documenting the
details up front). Most of the code closely mirrors that for
the Swift calling convention. Three places are marked as
[FIXME: swiftasynccc]; those will be addressed once the
corresponding convention is introduced in LLVM.
Reviewed By: rjmccall
Differential Revision: https://reviews.llvm.org/D95561
Implementation of the unroll directive introduced in OpenMP 5.1. Follows the approach from D76342 for the tile directive (i.e. AST-based, not using the OpenMPIRBuilder). Tries to use `llvm.loop.unroll.*` metadata where possible, but has to fall back to an AST representation of the outer loop if the partially unrolled generated loop is associated with another directive (because it needs to compute the number of iterations).
Reviewed By: ABataev
Differential Revision: https://reviews.llvm.org/D99459
This patch is the Part-1 (FE Clang) implementation of HW Exception handling.
This new feature adds the support of Hardware Exception for Microsoft Windows
SEH (Structured Exception Handling).
This is the first step of this project; only X86_64 target is enabled in this patch.
Compiler options:
For clang-cl.exe, the option is -EHa, the same as MSVC.
For clang.exe, the extra option is -fasync-exceptions,
plus -triple x86_64-windows -fexceptions and -fcxx-exceptions as usual.
NOTE:: Without the -EHa or -fasync-exceptions, this patch is a NO-DIFF change.
The rules for C code:
For C-code, one way (MSVC approach) to achieve SEH -EHa semantic is to follow
three rules:
* First, no exception can move in or out of _try region., i.e., no "potential
faulty instruction can be moved across _try boundary.
* Second, the order of exceptions for instructions 'directly' under a _try
must be preserved (not applied to those in callees).
* Finally, global states (local/global/heap variables) that can be read
outside of _try region must be updated in memory (not just in register)
before the subsequent exception occurs.
The impact to C++ code:
Although SEH is a feature for C code, -EHa does have a profound effect on C++
side. When a C++ function (in the same compilation unit with option -EHa ) is
called by a SEH C function, a hardware exception occurs in C++ code can also
be handled properly by an upstream SEH _try-handler or a C++ catch(...).
As such, when that happens in the middle of an object's life scope, the dtor
must be invoked the same way as C++ Synchronous Exception during unwinding
process.
Design:
A natural way to achieve the rules above in LLVM today is to allow an EH edge
added on memory/computation instruction (previous iload/istore idea) so that
exception path is modeled in Flow graph preciously. However, tracking every
single memory instruction and potential faulty instruction can create many
Invokes, complicate flow graph and possibly result in negative performance
impact for downstream optimization and code generation. Making all
optimizations be aware of the new semantic is also substantial.
This design does not intend to model exception path at instruction level.
Instead, the proposed design tracks and reports EH state at BLOCK-level to
reduce the complexity of flow graph and minimize the performance-impact on CPP
code under -EHa option.
One key element of this design is the ability to compute State number at
block-level. Our algorithm is based on the following rationales:
A _try scope is always a SEME (Single Entry Multiple Exits) region as jumping
into a _try is not allowed. The single entry must start with a seh_try_begin()
invoke with a correct State number that is the initial state of the SEME.
Through control-flow, state number is propagated into all blocks. Side exits
marked by seh_try_end() will unwind to parent state based on existing
SEHUnwindMap[].
Note side exits can ONLY jump into parent scopes (lower state number).
Thus, when a block succeeds various states from its predecessors, the lowest
State triumphs others. If some exits flow to unreachable, propagation on those
paths terminate, not affecting remaining blocks.
For CPP code, object lifetime region is usually a SEME as SEH _try.
However there is one rare exception: jumping into a lifetime that has Dtor but
has no Ctor is warned, but allowed:
Warning: jump bypasses variable with a non-trivial destructor
In that case, the region is actually a MEME (multiple entry multiple exits).
Our solution is to inject a eha_scope_begin() invoke in the side entry block to
ensure a correct State.
Implementation:
Part-1: Clang implementation described below.
Two intrinsic are created to track CPP object scopes; eha_scope_begin() and eha_scope_end().
_scope_begin() is immediately added after ctor() is called and EHStack is pushed.
So it must be an invoke, not a call. With that it's also guaranteed an
EH-cleanup-pad is created regardless whether there exists a call in this scope.
_scope_end is added before dtor(). These two intrinsics make the computation of
Block-State possible in downstream code gen pass, even in the presence of
ctor/dtor inlining.
Two intrinsic, seh_try_begin() and seh_try_end(), are added for C-code to mark
_try boundary and to prevent from exceptions being moved across _try boundary.
All memory instructions inside a _try are considered as 'volatile' to assure
2nd and 3rd rules for C-code above. This is a little sub-optimized. But it's
acceptable as the amount of code directly under _try is very small.
Part-2 (will be in Part-2 patch): LLVM implementation described below.
For both C++ & C-code, the state of each block is computed at the same place in
BE (WinEHPreparing pass) where all other EH tables/maps are calculated.
In addition to _scope_begin & _scope_end, the computation of block state also
rely on the existing State tracking code (UnwindMap and InvokeStateMap).
For both C++ & C-code, the state of each block with potential trap instruction
is marked and reported in DAG Instruction Selection pass, the same place where
the state for -EHsc (synchronous exceptions) is done.
If the first instruction in a reported block scope can trap, a Nop is injected
before this instruction. This nop is needed to accommodate LLVM Windows EH
implementation, in which the address in IPToState table is offset by +1.
(note the purpose of that is to ensure the return address of a call is in the
same scope as the call address.
The handler for catch(...) for -EHa must handle HW exception. So it is
'adjective' flag is reset (it cannot be IsStdDotDot (0x40) that only catches
C++ exceptions).
Suppress push/popTerminate() scope (from noexcept/noTHrow) so that HW
exceptions can be passed through.
Original llvm-dev [RFC] discussions can be found in these two threads below:
https://lists.llvm.org/pipermail/llvm-dev/2020-March/140541.htmlhttps://lists.llvm.org/pipermail/llvm-dev/2020-April/141338.html
Differential Revision: https://reviews.llvm.org/D80344/new/
I've taken the following steps to add unwinding support from inline assembly:
1) Add a new `unwind` "attribute" (like `sideeffect`) to the asm syntax:
```
invoke void asm sideeffect unwind "call thrower", "~{dirflag},~{fpsr},~{flags}"()
to label %exit unwind label %uexit
```
2.) Add Bitcode writing/reading support + LLVM-IR parsing.
3.) Emit EHLabels around inline assembly lowering (SelectionDAGBuilder + GlobalISel) when `InlineAsm::canThrow` is enabled.
4.) Tweak InstCombineCalls/InlineFunction pass to not mark inline assembly "calls" as nounwind.
5.) Add clang support by introducing a new clobber: "unwind", which lower to the `canThrow` being enabled.
6.) Don't allow unwinding callbr.
Reviewed By: Amanieu
Differential Revision: https://reviews.llvm.org/D95745
Currently Clang does not add mustprogress to inifinite loops with a
known constant condition, matching C11 behavior. The forward progress
guarantee in C++11 and later should allow us to add mustprogress to any
loop (http://eel.is/c++draft/intro.progress#1).
This allows us to simplify the code dealing with adding mustprogress a
bit.
Reviewed By: aaron.ballman, lebedev.ri
Differential Revision: https://reviews.llvm.org/D96418
This is a Clang-only change and depends on the existing "musttail"
support already implemented in LLVM.
The [[clang::musttail]] attribute goes on a return statement, not
a function definition. There are several constraints that the user
must follow when using [[clang::musttail]], and these constraints
are verified by Sema.
Tail calls are supported on regular function calls, calls through a
function pointer, member function calls, and even pointer to member.
Future work would be to throw a warning if a users tries to pass
a pointer or reference to a local variable through a musttail call.
Reviewed By: rsmith
Differential Revision: https://reviews.llvm.org/D99517
08196e0b2e exposed LowerExpectIntrinsic's
internal implementation detail in the form of
LikelyBranchWeight/UnlikelyBranchWeight options to the outside.
While this isn't incorrect from the results viewpoint,
this is suboptimal from the layering viewpoint,
and causes confusion - should transforms also use those weights,
or should they use something else, D98898?
So go back to status quo by making LikelyBranchWeight/UnlikelyBranchWeight
internal again, and fixing all the code that used it directly,
which currently is only clang codegen, thankfully,
to emit proper @llvm.expect intrinsics instead.
The condition variable is in scope in the loop increment, so we need to
emit the jump destination from wthin the scope of the condition
variable.
For GCC compatibility (and compatibility with real-world 'FOR_EACH'
macros), 'continue' is permitted in a statement expression within the
condition of a for loop, though, so there are two cases here:
* If the for loop has no condition variable, we can emit the jump
destination before emitting the condition.
* If the for loop has a condition variable, we must defer emitting the
jump destination until after emitting the variable. We diagnose a
'continue' appearing in the initializer of the condition variable,
because it would jump past the initializer into the scope of that
variable.
Reviewed By: rjmccall
Differential Revision: https://reviews.llvm.org/D98816
Added basic parsing/sema/serialization support for interop directive.
Support for the 'init' clause.
Differential Revision: https://reviews.llvm.org/D98558
Initial support for using the OpenMPIRBuilder by clang to generate loops using the OpenMPIRBuilder. This initial support is intentionally limited to:
* Only the worksharing-loop directive.
* Recognizes only the nowait clause.
* No loop nests with more than one loop.
* Untested with templates, exceptions.
* Semantic checking left to the existing infrastructure.
This patch introduces a new AST node, OMPCanonicalLoop, which becomes parent of any loop that has to adheres to the restrictions as specified by the OpenMP standard. These restrictions allow OMPCanonicalLoop to provide the following additional information that depends on base language semantics:
* The distance function: How many loop iterations there will be before entering the loop nest.
* The loop variable function: Conversion from a logical iteration number to the loop variable.
These allow the OpenMPIRBuilder to act solely using logical iteration numbers without needing to be concerned with iterator semantics between calling the distance function and determining what the value of the loop variable ought to be. Any OpenMP logical should be done by the OpenMPIRBuilder such that it can be reused MLIR OpenMP dialect and thus by flang.
The distance and loop variable function are implemented using lambdas (or more exactly: CapturedStmt because lambda implementation is more interviewed with the parser). It is up to the OpenMPIRBuilder how they are called which depends on what is done with the loop. By default, these are emitted as outlined functions but we might think about emitting them inline as the OpenMPRuntime does.
For compatibility with the current OpenMP implementation, even though not necessary for the OpenMPIRBuilder, OMPCanonicalLoop can still be nested within OMPLoopDirectives' CapturedStmt. Although OMPCanonicalLoop's are not currently generated when the OpenMPIRBuilder is not enabled, these can just be skipped when not using the OpenMPIRBuilder in case we don't want to make the AST dependent on the EnableOMPBuilder setting.
Loop nests with more than one loop require support by the OpenMPIRBuilder (D93268). A simple implementation of non-rectangular loop nests would add another lambda function that returns whether a loop iteration of the rectangular overapproximation is also within its non-rectangular subset.
Reviewed By: jdenny
Differential Revision: https://reviews.llvm.org/D94973
The tile directive is in OpenMP's Technical Report 8 and foreseeably will be part of the upcoming OpenMP 5.1 standard.
This implementation is based on an AST transformation providing a de-sugared loop nest. This makes it simple to forward the de-sugared transformation to loop associated directives taking the tiled loops. In contrast to other loop associated directives, the OMPTileDirective does not use CapturedStmts. Letting loop associated directives consume loops from different capture context would be difficult.
A significant amount of code generation logic is taking place in the Sema class. Eventually, I would prefer if these would move into the CodeGen component such that we could make use of the OpenMPIRBuilder, together with flang. Only expressions converting between the language's iteration variable and the logical iteration space need to take place in the semantic analyzer: Getting the of iterations (e.g. the overload resolution of `std::distance`) and converting the logical iteration number to the iteration variable (e.g. overload resolution of `iteration + .omp.iv`). In clang, only CXXForRangeStmt is also represented by its de-sugared components. However, OpenMP loop are not defined as syntatic sugar. Starting with an AST-based approach allows us to gradually move generated AST statements into CodeGen, instead all at once.
I would also like to refactor `checkOpenMPLoop` into its functionalities in a follow-up. In this patch it is used twice. Once for checking proper nesting and emitting diagnostics, and additionally for deriving the logical iteration space per-loop (instead of for the loop nest).
Differential Revision: https://reviews.llvm.org/D76342
This is an enhancement to LLVM Source-Based Code Coverage in clang to track how
many times individual branch-generating conditions are taken (evaluate to TRUE)
and not taken (evaluate to FALSE). Individual conditions may comprise larger
boolean expressions using boolean logical operators. This functionality is
very similar to what is supported by GCOV except that it is very closely
anchored to the ASTs.
Differential Revision: https://reviews.llvm.org/D84467
Matrix types in memory are represented as arrays, but accessed through
vector pointers, with the alignment specified on the access operation.
For inline assembly, update pointer arguments to use vector pointers.
Otherwise there will be a mis-match if the matrix is also an
input-argument which is represented as vector.
Reviewed By: nickdesaulniers
Differential Revision: https://reviews.llvm.org/D91631
D86841 had an error where for statements with no conditional were
required to make progress. This is not true, this patch removes that
line, and adds regression tests.
Differential Revision: https://reviews.llvm.org/D91075
Since C++11, the C++ standard has a forward progress guarantee
[intro.progress], so all such functions must have the `mustprogress`
requirement. In addition, from C11 and onwards, loops without a non-zero
constant conditional or no conditional are also required to make
progress (C11 6.8.5p6). This patch implements these attribute deductions
so they can be used by the optimization passes.
Differential Revision: https://reviews.llvm.org/D86841