In the existing logic, for a given alloca, as long as its pointer value is stored into another location, it's considered as escaped.
This is a bit too conservative. Specifically, in non-optimized build mode, it's often to have patterns of code that first store an alloca somewhere and then load it right away.
These used should be handled without conservatively marking them escaped.
This patch tracks how the memory location where an alloca pointer is stored into is being used. As long as we only try to load from that location and nothing else, we can still
consider the original alloca not escaping and keep it on the stack instead of putting it on the frame.
Differential Revision: https://reviews.llvm.org/D91305
This patch is added to remove the unreachable MBBs reference in the jump table.
Differential Revisien: https://reviews.llvm.org/D90498
Reviewed by: amyk, bsaleil
The current code allows strided layouts, but the number of elements allocated is ambiguous. It could be either the number of elements in the shape (the current implementation), or the amount of elements required to not index out-of-bounds with the given maps (which would require evaluating the layout map).
If we require the canonical layouts, the two will be the same.
Reviewed By: nicolasvasilache, ftynse
Differential Revision: https://reviews.llvm.org/D91523
This option doesn't enable any unique feature/code-patch. Also, it is
neither tested nor documented.
Differential Revision: https://reviews.llvm.org/D91537
This adds a simple definition of a "workshare loop" operation for
the OpenMP MLIR dialect, excluding the "reduction" and "allocate"
clauses and without a custom parser and pretty printer.
The schedule clause also does not yet accept the modifiers that are
permitted in OpenMP 5.0.
Co-authored-by: Kiran Chandramohan <kiran.chandramohan@arm.com>
Reviewed By: ftynse, clementval
Differential Revision: https://reviews.llvm.org/D86071
This defines a 'fastcc' for the VE target and implements vreg-to-vreg
copy for parameter passing. The 'fastcc' extends the standard CC for
SX-Aurora with register passing of vector-typed parameters and return
values.
Reviewed By: kaz7
Differential Revision: https://reviews.llvm.org/D90842
In order to support attribute((constructor)) and attribute((destructor)),
which is used by various LLVM non-C++ runtime components, AIX will include
crti[_64].o and -bcdtors for C language link invocations by default.
Differential Revision: https://reviews.llvm.org/D91361
LLDB is currently always activating C++ when parsing expressions as LLDB itself
is using C++ features when creating the final AST that will be codegen'd
(specifically, references to variables, namespaces and using declarations are
used).
This is causing problems for users that have variables in non-C++ programs (e.g.
plain C or Objective-C) that have names which are keywords in C++. Expressions
referencing those variables fail to parse as LLDB's Clang parser thinks those
identifiers are C++ keywords and not identifiers that may belong to a
declaration.
We can't just disable C++ in the expression parser for those situations as
replacing the functionality of the injected C++ code isn't trivial. So this
patch is just disabling most keywords that are exclusive to C++ in LLDB's Clang
parser when we are in a non-C++ expression. There are a few keywords we can't
disable for now:
* `using` as that's currently used in some situations to inject variables into the expression function.
* `__null` as that's used by LLDB to define `NULL`/`Nil`/`nil`.
Getting rid of these last two keywords is possible but is a large enough change
that this will be handled in follow up patches.
Note that this only changes the keyword status of those tokens but this patch
does not remove any C++ functionality from the expression parser. The type
system still follows C++ rules and so does the rest of the expression parser.
There is another small change that gives the hardcoded macro definitions in LLDB
a higher precedence than the macros imported from the Objective-C modules. The
reason for this is that the Objective-C modules in LLDB are actually parsed in
Objective-C++ mode and they end up providing the C++ definitions of certain
system macros (like `NULL` being defined as `nullptr`). So we have to move the
LLDB definition forward and surround the definition from the module with an
`#ifdef` to make sure that we use the correct LLDB definition that doesn't
reference C++ keywords. Or to give an example, this is how the expression source
code changes:
Before:
```
#define NULL (nullptr) // injected module definition
#ifndef NULL
#define NULL (__null) // hardcoded LLDB definition
#endif
```
After:
```
#ifndef NULL
#define NULL (__null) // hardcoded LLDB definition
#endif
#ifndef NULL
#define NULL (nullptr) // injected module definition
#endif
```
Fixes rdar://10356912
Reviewed By: shafik
Differential Revision: https://reviews.llvm.org/D82770
This patch adds a new pass to add !annotation metadata for entries in
@llvm.global.anotations, which is generated using
__attribute__((annotate("_name"))) on functions in Clang.
This has been discussed on llvm-dev as part of
RFC: Combining Annotation Metadata and Remarks
http://lists.llvm.org/pipermail/llvm-dev/2020-November/146393.html
Reviewed By: thegameg
Differential Revision: https://reviews.llvm.org/D91195
The logic of vector on boolean was missed. This patch adds the logic and test on
it.
Reviewed By: mravishankar
Differential Revision: https://reviews.llvm.org/D91403
Adapt the declarations of `svpattern` and `svprfop` to the most recent
one defined in section "5. Enum declarations" of the SVE ACLE
specifications [1].
The signature of the intrinsics using these enums have been changed
accordingly.
A test has been added to make sure that `svpattern` and `svprfop` are
not typedefs.
[1] https://developer.arm.com/documentation/100987/latest, version
00bet6
Reviewed By: joechrisellis
Differential Revision: https://reviews.llvm.org/D91333
Original commit message: "
Move the test compiler setup in a common place. NFCI
This patch reduces the copy paste in the unittest/CodeGen folder by moving the
common compiler setup phase in a header file.
Differential revision: https://reviews.llvm.org/D91061
"
This patch includes a fix for the memory leaks pointed out by @vitalybuka
This patch fixes the function isWideningInstruction for scalable vectors.
Now the cost model can check the widening pattern for SVE.
Differential Revision: https://reviews.llvm.org/D91260
In `GetGlobalSizeFromDescriptor` we use `dladdr` to get info on the the
current address. `dladdr` returns 0 if it failed.
During testing on Linux this returned 0 to indicate failure, and
populated the `info` structure with a NULL pointer which was
dereferenced later.
This patch checks for `dladdr` returning 0, and in that case returns 0
from `GetGlobalSizeFromDescriptor` to indicate failure of identifying
the address.
This occurs when `GetModuleNameAndOffsetForPC` succeeds for some address
not in a dynamically loaded library. One example is when the found
"module" is '[stack]' having come from parsing /proc/self/maps.
Differential Revision: https://reviews.llvm.org/D91344
Create a helper GetOffsetRegSetData() method to get pointer
to the regset data accounting for the necessary offset. Establish
the offsets in the constructor and store them in the structure. This
avoids having to add new Get*Offset() methods and combines some common
code.
Differential Revision: https://reviews.llvm.org/D91411
Eliminate the remaining swith-case code for register getters,
and migrate YMM registers to regset-oriented model. Since these
registers are recombined from XMM and YMM_Hi128 XSAVE blocks, while LLDB
gdb-server protocol transmits YMM registers whole, the offset-based
model will not work here. Nevertheless, some improvement was possible.
Replace generic 'XSaveRegSet' along with sub-sets for XSAVE components
with 'YMMRegSet' (and more regsets in the future as further components
are implemented). Create a helper GetYMMSplitReg() method that obtains
pointers to the appropriate XMM and YMM_Hi128 blocks to reduce code
duplication.
Differential Revision: https://reviews.llvm.org/D91293
Use offset-based method to access x86 debug registers. This also
involves adding a test for the correctness of these offsets, and making
GetDR() method of NativeRegisterContextWatchpoint_x86 public to avoid
duplicate code.
Differential Revision: https://reviews.llvm.org/D91268
Use offset-based method to access base x87 FPU registers, using offsets
relative to the position of 'struct FPR', as determined by the location
of first register in it (fctrl). Change m_fpr to use a fixed-size array
matching FXSAVE size (512 bytes). Add unit tests for verifying
RegisterInfo offsets and sizes against the FXSAVE layout.
Differential Revision: https://reviews.llvm.org/D91248
Read and write registers from m_gpr using offsets from RegisterInfo
rather than explicit switch-case. This eliminates a lot of redundant
code, and avoids mistakes such as type mismatches seen recently (wrt
segment registers). The same logic will be extended to other register
sets in the future.
Make m_gpr an uint8_t std::array to ease accesses. Ideally, we could
avoid including <machine/reg.h> entirely in the future and instead
get the correct GPR size from Utility/RegisterContextFreeBSD_* somehow.
While at it, modify register set logic to use an explicit enum with
llvm::Optional<>, making the code cleaner and at the same time enabling
compiler warnings for unhandled sets.
Since now we're fully relying on 'struct GPR' defined
in Utility/RegisterContextFreeBSD_* being entirely in sync with
the system structure, add unit tests to verify the field offsets
and sizes.
Differential Revision: https://reviews.llvm.org/D91216
scf.parallel is currently not a good fit for tiling on tensors.
Instead provide a path to parallelism directly through scf.for.
For now, this transformation ignores the distribution scheme and always does a block-cyclic mapping (where block is the tile size).
Differential revision: https://reviews.llvm.org/D90475
This patch updates Clang's IRGen to add !annotation nodes with an
"auto-init" annotation to all stores for auto-initialization.
As discussed in 'RFC: Combining Annotation Metadata and Remarks'
(http://lists.llvm.org/pipermail/llvm-dev/2020-November/146393.html)
this allows using optimization remarks to track down where auto-init
code was inserted (and not removed by optimizations).
There are a few cases in the tests where !annotation gets dropped by
optimizations. Those optimizations will be updated in subsequent
patches.
This patch is based on a patch by Francis Visoiu Mistrih.
Reviewed By: thegameg, paquette
Differential Revision: https://reviews.llvm.org/D91417
Widen the IV to the widest available and legal integer type, which makes this
transformations always safe so that we can skip overflow checks.
Motivation is to let this pass trigger on 64-bit targets too, and this is the
last patch in a serie to achieve this: D90402 moves pass LoopFlatten to just
before IndVarSimplify so that IVs are not already widened, D90421 factors out
widening from IndVarSimplify into Utils/SimplifyIndVar so that we can also use
it in LoopFlatten.
Differential Revision: https://reviews.llvm.org/D90640
This patch adds the SchedMachineModel for Cortex-M7. It
also adds test cases for the scheduling information.
Details of the pipeline and descriptions are in comments
in file ARMScheduleM7.td included in this patch.
Differential Revision: https://reviews.llvm.org/D91355
This introduces the new `ARCHER_OPTIONS` flag `ignore_serial=0|1` to disable
analysis and logging of memory accesses in the sequential part of the OpenMP
application.
In the sequential part of an OpenMP program no data race is possible, unless
there is non-OpenMP concurrency (such as pthreads, MPI, ...). For the latter
reason, this is not active by default.
Besides reducing the runtime overhead for the sequential part of the program,
this reduces the memory overhead for sequential initialization. In combination
with `flush_shadow=1` this can allow analysis of applications, which run close
to the limit of available memory, but only access smaller parts of shared
memory during each OpenMP parallel region.
A problem for this approach is that Archer only gets active, when the OpenMP
runtime gets initialized, which might be after serial initialization of the
application. In such case, it helps to call for example `omp_get_max_threads()`
at the beginning of main.
Differential Revision: https://reviews.llvm.org/D90473
Similar to the X86 and AMDGPU targets, this uses a macro to cut down on
repetitive and error-prone code when converting RISCVISD node names to
strings in getTargetNodeName.
Reviewed By: asb
Differential Revision: https://reviews.llvm.org/D91414