- Explain the use of the _MM_SHUFFLE and _MM_SHUFFLE2 macros
- Update some doxygen parameter descriptions to match the implementations
- Add "see also" doxygen tags to some intrinsics
- Minor clang-format changes
Reviewers: RKSimon
Differential Revision: https://reviews.llvm.org/D124469
canonicalizeClampLike canonicalizes the ule/ugt comparisons to ult/uge,
respectively. However, it does not update the variable holding the
comparison predicate type after doing this. Later code fails to handle
the non-canonical predicate type (specifically, the swap of
ThresholdLowIncl and ThresholdHighExcl when Pred0 has been canonicalized
from ugt to uge). This leads to the miscompile reported in PR53252. Fix
this by updating the comparison predicate after canonicalizing.
Fixes#53252
Differential Revision: https://reviews.llvm.org/D119690
The callback is expected to create a branch to the ContinuationBB (sometimes called FiniBB in some lambdas) argument when finishing. This creates problems:
1. The InsertPoint used for CodeGenIP does not need to be the end of a block. If it is not, a naive callback will insert a branch instruction into the middle of the block.
2. The BasicBlock the CodeGenIP is pointing to may or may not have a terminator. There is an conflict where to branch to if the block already has a terminator.
3. Some API functions work only with block having a terminator. Some workarounds have been used to insert a temporary terminator that is removed again.
4. Some callbacks are sensitive to whether the BasicBlock has a terminator or not. This creates a callback ordering problem where different callback may have different behaviour depending on whether a previous callback created a terminator or not. The problem also exists for FinalizeCallbackTy where some callbacks do create branch to another "continue" block, but unlike BodyGenCallbackTy does not receive the target as argument. This is not addressed in this patch.
With this patch, the callback receives an CodeGenIP into a BasicBlock where to insert instructions. If it has to insert control flow, it can split the block at that position as needed but otherwise no separate ContinuationBB is needed. In particular, a callback can be empty without breaking the emitted IR. If the caller needs the control flow to branch to a specific target, it can insert the branch instruction itself and pass an InsertPoint before the terminator to the callback.
Certain frontends such as Clang may expect the current IRBuilder position to be at the end of a basic block. In this case its callbacks must split the block at CodeGenIP before setting the IRBuilder position such that the instructions after CodeGenIP are moved to another basic block and before returning create a new branch instruction to the split block.
Some utility functions such as `splitBB` are supporting correct splitting of BasicBlocks, independent of whether they have a terminator or not, returning/setting the InsertPoint of an IRBuilder to the end of split predecessor block, and optionally omitting creating a branch to the split successor block to be added later.
Reviewed By: kiranchandramohan
Differential Revision: https://reviews.llvm.org/D118409
Add these bits to the MUBUF and FLAT LDS DMA instructions:
- LGKM_CNT - these operate on LDS;
- VALU - SPG 3.9.8: This instruction acts as both a MUBUF and
VALU instruction;
Codegen currently does not produce any of this, so the change is NFC.
Differential Revision: https://reviews.llvm.org/D124472
The functionality was added in 7726ad0, catching up on the tests now.
This test mostly runs clang and checks one thing each time, which isn't
very efficient. I added some -DAG suffixes and combined some checks so
adding PS5 checking didn't actually increase the number of clang runs.
Symbol on-demand feature is never tested on Windows so it is not a surprise
that we are getting Buildbot failure from Windows:
https://lab.llvm.org/buildbot/#/builders/83/builds/18228
This patch disables symbol on-demand feature on Windows. I will find a Windows
machine to test and re-enable symbol on-demand feature as follow-up.
Differential Revision: https://reviews.llvm.org/D124471
The test was added in D121299 and it fails on Windows:
error: CHECK-NODIR: expected string
not found in input
; CHECK-NODIR: .file {{[0-9]+}} "/tmp/dbginfo/a/a.cpp"
^
<stdin>:1:1: note: scanning from here
//
^
<stdin>:25:2: note: possible intended match here
.file 1 "/tmp/dbginfo/a\\a.cpp"
Default behavior for .file directory was changed in D105856, but
ptxas (CUDA 11.5 release) refuses to parse it:
$ llc -march=nvptx64 llvm/test/DebugInfo/NVPTX/debug-file-loc.ll
$ ptxas debug-file-loc.s
ptxas debug-file-loc.s, line 42; fatal : Parsing error near
'"foo.h"': syntax error
Added a new field to MCAsmInfo to control default value of
UseDwarfDirectory. This value is used if -dwarf-directory command line
option is not specified.
Differential Revision: https://reviews.llvm.org/D121299
We can always replace the undef elements in a vector constant
with regular constants to get rid of the freeze:
https://alive2.llvm.org/ce/z/nfRb4F
The select diffs show that we might do better by adjusting the
logic for a frozen select condition. We may also want to refine
the vector constant replacement to consider forming a splat.
Differential Revision: https://reviews.llvm.org/D123962
Before this patch `Args` was used to pass a broadcat's arguments by SLP.
This patch changes this. `Args` is now used for passing the operands of
the shuffle.
Differential Revision: https://reviews.llvm.org/D124202
This continues the push away from hard-coded knowledge about functions
towards attributes. We'll use this to annotate free(), realloc() and
cousins and obviate the hard-coded list of free functions.
Differential Revision: https://reviews.llvm.org/D123083
Semantics now needs to preserve the parse trees from module files,
in case they contain parameterized derived type definitions with
component initializers that may require re-analysis during PDT
instantiation. Save them in the SemanticsContext.
Differential Revision: https://reviews.llvm.org/D124467
This diff introduces a new symbol on-demand which skips
loading a module's debug info unless explicitly asked on
demand. This provides significant performance improvement
for application with dynamic linking mode which has large
number of modules.
The feature can be turned on with:
"settings set symbols.load-on-demand true"
The feature works by creating a new SymbolFileOnDemand class for
each module which wraps the actual SymbolFIle subclass as member
variable. By default, most virtual methods on SymbolFileOnDemand are
skipped so that it looks like there is no debug info for that module.
But once the module's debug info is explicitly requested to
be enabled (in the conditions mentioned below) SymbolFileOnDemand
will allow all methods to pass through and forward to the actual SymbolFile
which would hydrate module's debug info on-demand.
In an internal benchmark, we are seeing more than 95% improvement
for a 3000 modules application.
Currently we are providing several ways to on demand hydrate
a module's debug info:
* Source line breakpoint: matching in supported files
* Stack trace: resolving symbol context for an address
* Symbolic breakpoint: symbol table match guided promotion
* Global variable: symbol table match guided promotion
In all above situations the module's debug info will be on-demand
parsed and indexed.
Some follow-ups for this feature:
* Add a command that allows users to load debug info explicitly while using a
new or existing command when this feature is enabled
* Add settings for "never load any of these executables in Symbols On Demand"
that takes a list of globs
* Add settings for "always load the the debug info for executables in Symbols
On Demand" that takes a list of globs
* Add a new column in "image list" that shows up by default when Symbols On
Demand is enable to show the status for each shlib like "not enabled for
this", "debug info off" and "debug info on" (with a single character to
short string, not the ones I just typed)
Differential Revision: https://reviews.llvm.org/D121631
This test is no longer necessary as it is a subset of:
llvm/test/Analysis/CostModel/AArch64/shuffle-load.ll
Differential Revision: https://reviews.llvm.org/D124456
Just clone all the virtual registers instead of looking for def
operands. This preserves the register values used, simplifying the
rest of the code. This avoids needing to expose the register map to
target code.
Previously the specific values used for fixed frame indexes was in
reverse order in the cloned function from the original, and a map was
used to adjust all frame indexes to the potentially new values. Insert
the fixed objects in reverse to avoid this. This simplifies other
code, since now we don't need to track down all frame indexes. This
will allow targets that store frame indexes in MachineFunctionInfo to
simply copy the values.
Note this isn't directly observable in the test since the resulting
MIR print/parse can shuffle the IDs around (in particular the final
serialization implicitly strips out dead objects).
This change borrows the ideas from `computeExpanded/CollapsedLayoutMap`
and computes the dynamic strides at runtime for the memref descriptors.
Differential Revision: https://reviews.llvm.org/D124001
A recent change assumed that the native C++ "long double" maps to
a Fortran data type; but this turns out to not be true for ppc64le,
which uses "double-double" for "long double".
This is a quick patch to get the ppc64le flang build bot back up.
A better fix that either uses HostTypeExists<> or replaces "long double"
with "ieee128_t" (or some other solution) is expected to follow soon.
Differential Revision: https://reviews.llvm.org/D124423
A struct like { float a; int :0; } should per the SystemZ ABI be passed in a
GPR, but to match a bug in GCC it has been passed in an FPR (see 759449c).
GCC has now corrected the C++ ABI for this case, and this patch for clang
follows suit.
Reviewed By: Ulrich Weigand
Differential Revision: https://reviews.llvm.org/D122388
Switch statements that cover all cases should not have a `default`
case. When a switch covers all cases and includes a `default` case,
clang emits a diagnostic. Omitting the `default` case allows the
compiler to instead emit a diagnostic on unhandled enum values.
This change removes default cases from all the places that they
shouldn't be, and adds a missing enum case for one switch statement
that wasn't covering all values.
Summary:
A previous patch updated the path searching in the linker wrapper. I
made an error and caused `lld`, which is necessary to link AMDGPU
images, to not be found on some systems. This patch fixes this by
correctly searching that linker-wrapper's binary path first again.