KMOVWkr produces VK16, there's no reason to copy it to VK16 again.
Test changes are presumably because we were scheduling based on
the COPY that is no longer there.
Remove `SetObjectModificationTime` which is not currently used, and assigns to the wrong member.
Differential Revision: https://reviews.llvm.org/D86493
We only need the C++ type and the corresponding TF Enum. The other
parameter was used for the output spec json file, but we can just
standardize on the C++ type name there.
Differential Revision: https://reviews.llvm.org/D86549
There are two ways .llvmbc can be produced:
* clang -c -fembed-bitcode=all (which also produces .llvmcmd)
* LTO backend: ld.lld -mllvm -lto-embed-bitcode or -plugin-opt=-lto-embed-bitcode
.llvmbc and .llvmcmd have the SHF_ALLOC flag, so they can be dropped by
--gc-sections.
This patch sets SectionKind::Metadata to drop the SHF_ALLOC flag. This
is conceptually correct: the two sections are not part of the process
image, so SHF_ALLOC is not appropriate.
`test/LTO/X86/embed-bitcode.ll`: changed `llvm-objcopy -O binary --only-section` to
`llvm-objcopy --dump-section`. `-O binary` does not dump non-SHF_ALLOC sections.
Reviewed By: tejohnson
Differential Revision: https://reviews.llvm.org/D86374
In this patch, we pack all small first-private arguments, allocate and transfer them all at once to reduce the number of data transfer which is very expensive.
Let's take the test case as example.
```
int main() {
int data1[3] = {1}, data2[3] = {2}, data3[3] = {3};
int sum[16] = {0};
#pragma omp target teams distribute parallel for map(tofrom: sum) firstprivate(data1, data2, data3)
for (int i = 0; i < 16; ++i) {
for (int j = 0; j < 3; ++j) {
sum[i] += data1[j];
sum[i] += data2[j];
sum[i] += data3[j];
}
}
}
```
Here `data1`, `data2`, and `data3` are three first-private arguments of the target region. In the previous `libomptarget`, it called data allocation and data transfer three times, each of which allocated and transferred 12 bytes. With this patch, it only calls allocation and transfer once. The size is `(12+4)*3=48` where 12 is the size of each array and 4 is the padding to keep the address aligned with 8. It is implemented in this way:
1. First collect all information for those *first*-private arguments. _private_ arguments are not the case because private arguments don't need to be mapped to target device. It just needs a data allocation. With the patch for memory manager, the data allocation could be very cheap, especially for the small size. For each qualified argument, push a place holder pointer `nullptr` to the `vector` for kernel arguments, and we will update them later.
2. After we have all information, create a buffer that can accommodate all arguments plus their paddings. Copy the arguments to the buffer at the right place, i.e. aligned address.
3. Allocate a target memory with the same size as the host buffer, transfer the host buffer to target device, and finally update all place holder pointers in the arguments `vector`.
The reason we only consider small arguments is, the data transfer is asynchronous. Therefore, for the large argument, we could continue to do things on the host side meanwhile, hopefully, the data is also being transferred. The "small" is defined by that the argument size is less than a predefined value. Currently it is 1024. I'm not sure whether it is a good one, and that is an open question. Another question is, do we need to make it configurable via an environment variable?
Reviewed By: ye-luo
Differential Revision: https://reviews.llvm.org/D86307
This patch adds the z/OS target and defines macros as a stepping stone
towards enabling a native build on z/OS.
Reviewed By: hubert.reinterpretcast
Differential Revision: https://reviews.llvm.org/D85324
This patch helps getGuaranteedNonPoisonOp find multiple non-poison operands.
Instead of special-casing llvm.assume, I think it is also a viable option to
add noundef to Intrinsics.td. If it makes sense, I'll make a patch for that.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D86477
For some reason the ctor homing case was before the template
specialization case, and could have returned false too early.
I moved the code out into a separate function to avoid this.
This reverts commit 05777ab941.
We're not changing IR while running a single MemDep query, so it's
safe to cache alias analysis results using BatchAA. This adds BatchAA
usage to getSimplePointerDependencyFrom(), which is non-intrusive --
covering larger parts (like a whole processNonLocalLoad query) is
also possible, but requires threading BatchAA through a bunch of APIs.
For the ThinLTO configuration, this is a 1% geomean improvement on CTMark.
Differential Revision: https://reviews.llvm.org/D85583
Provides fast, generic way of setting a mask up to a certain
point. Potential use cases that may benefit are create_mask
and transfer_read/write operations in the vector dialect.
Reviewed By: bkramer
Differential Revision: https://reviews.llvm.org/D86501
including printing them.
Reviewers: andreadb, lebedev.ri
Differential Review: https://reviews.llvm.org/D86390
Introduces a new base class "InstructionView" that such views derive from.
Other views still use the "View" base class.
A number of I/O syntax rules involve variables that will be written to,
and must therefore be definable. This includes internal file variables,
IOSTAT= and IOMSG= specifiers, most INQUIRE statement specifiers, a few
other specifiers, and input variables. This patch checks for
these violations, and implements several additional I/O TODO constraint
checks.
Differential Revision: https://reviews.llvm.org/D86557
The legacy LoopVectorize has a dependency on InjectTLIMappingsLegacy.
That cannot be expressed in the new PM since they are both normal
passes. Explicitly add -inject-tli-mappings as a pass.
Follow-up to https://reviews.llvm.org/D86492.
Reviewed By: spatel
Differential Revision: https://reviews.llvm.org/D86561
When an illegal character appears in Fortran source (after
preprocessing), catch and report it in the prescanning phase
rather than leaving it for the parser to cope with.
Differential Revision: https://reviews.llvm.org/D86553
Accept and represent "global" compiler directives that appear
before and between program units in a source file.
Differential Revision: https://reviews.llvm.org/D86555
Since a842950b62 this test started using
the reproducer subsystem but we never initialized it in the test. The
Subsystem takes an argument, so we can't use the usual SubsystemRAII at the
moment to do this for us.
This just adds the initialize/terminate calls to get the test passing again.
TestQueues is curiously failing for me as my queue for QOS_CLASS_UNSPECIFIED
is named "Utility" and not "User Initiated" or "Default". While debugging, this
I noticed that this test isn't actually using this API right from what I understand. The API documentation
for `dispatch_get_global_queue` specifies for the parameter: "You may specify the value
QOS_CLASS_USER_INTERACTIVE, QOS_CLASS_USER_INITIATED, QOS_CLASS_UTILITY, or QOS_CLASS_BACKGROUND."
QOS_CLASS_UNSPECIFIED isn't listed as one of the supported values. swift-corelibs-libdispatch
even checks for this value and returns a DISPATCH_BAD_INPUT. The
libdispatch shipped on macOS seems to also check for QOS_CLASS_UNSPECIFIED and seems to
instead cause a "client crash", but somehow this doesn't trigger in this test and instead we just
get whatever queue
This patch just removes that part of the test as it appears the code is just incorrect.
Reviewed By: jasonmolenda
Differential Revision: https://reviews.llvm.org/D86211
Dead function has its body stripped away, and can cause various
analyses to panic. Also it does not make sense to apply analyses on
such function.
Reviewed By: xazax.hun, MaskRay, wenlei, hoy
Differential Revision: https://reviews.llvm.org/D84715
If the label field is empty, and macro replacement occurs,
the rescanned text might be misclassified as a comment card
if it happens to begin with a C or a D. Insert a leading
space into these otherwise empty label fields.
Fixes https://bugs.llvm.org/show_bug.cgi?id=47173
The legacy SLPVectorizer has a dependency on InjectTLIMappingsLegacy.
That cannot be expressed in the new PM since they are both normal
passes. Explicitly add -inject-tli-mappings as a pass.
Reviewed By: spatel
Differential Revision: https://reviews.llvm.org/D86492
Summary: The LCSSA pass (required for all loop passes) sometimes adds
additional blocks containing LCSSA variables, and checkLoopsStructure
may return false even when the loops are perfectly nested in this case.
This is because the successor of the exit block of the inner loop now
points to the LCSSA block instead of the latch block of the outer loop.
Examples are shown in the test nests-with-lcssa.ll.
To fix the issue, the successor of the exit block of the inner loop can
now point to a block in which all instructions are LCSSA phi node
(except the terminator), and the sole successor of that block should
point to the latch block of the outer loop.
Reviewed By: Whitney, etiotto
Differential Revision: https://reviews.llvm.org/D86133
A first version of get.active.lane.mask was committed in rG7fb8a40e5220. One of
the main purposes and uses of this intrinsic is to communicate information from
the middle-end to the back-end, but its current definition and semantics make
this actually very difficult. The intrinsic was defined as:
@llvm.get.active.lane.mask(%IV, %BTC)
where %BTC is the Backedge-Taken Count (variable names are different in the
LangRef spec). This allows to implicitly communicate the loop tripcount, which
can be reconstructed by calculating BTC + 1. But it has been very difficult to
prove that calculating BTC + 1 is safe and doesn't overflow. We need
complicated range and SCEV analysis, and thus the problem is that this
intrinsic isn't really doing what it was supposed to solve. Examples of the
overflow checks that are required in the (ARM) back-end are D79175 and D86074,
which aren't even complete/correct yet.
To solve this problem, we are revising the definitions/semantics for
get.active.lane.mask to avoid all the complicated overflow analysis. This means
that instead of communicating the BTC, we are now using the loop tripcount. Now
using LangRef's variable names, its semantics is changed from:
icmp ule (%base + i), %n
to:
icmp ult (%base + i), %n
with %n > 0 and corresponding to the loop tripcount. The intrinsic signature
remains the same.
Differential Revision: https://reviews.llvm.org/D86147
The 1st attempt (rG557b890) was reverted because it caused miscompiles.
That bug is avoided here by changing the order of folds and as verified
in the new tests.
Original commit message:
InstCombine currently has odd rules for folding insert-extract chains to shuffles,
so we miss collapsing seemingly simple cases as shown in the tests here.
But poison makes this not quite as easy as we might have guessed. Alive2 tests to
show the subtle difference (similar to the regression tests):
https://alive2.llvm.org/ce/z/hp4hv3 (this is ok)
https://alive2.llvm.org/ce/z/ehEWaN (poison leakage)
SLP tends to create these patterns (as shown in the SLP tests), and this could
help with solving PR16739.
Differential Revision: https://reviews.llvm.org/D86460
The 1st draft of D86460 (reverted) would show miscompiles with these tests
because the undef element tracking went wrong and became visible in the
shuffle masks.
This adapts the verifier checks for intrinsic get.active.lane.mask to the new
semantics of it as described in D86147. I.e., the second argument %n, which
corresponds to the loop tripcount, must be greater than 0 if it is a constant,
so check that.
Differential Revision: https://reviews.llvm.org/D86301
With the 'new' way of releasing on 32-bit, we iterate through all the
regions in between `First` and `Last`, which covers regions that do not
belong to the class size we are working with. This is effectively wasted
cycles.
With this change, we add a `SkipRegion` lambda to `releaseFreeMemoryToOS`
that will allow the release function to know when to skip a region.
For the 64-bit primary, since we are only working with 1 region, we never
skip.
Reviewed By: hctim
Differential Revision: https://reviews.llvm.org/D86399
This patch makes the 'Attributes' field optional. We don't need to
explicitly specify the 'Attributes' field in the future.
Reviewed By: jhenderson, grimar
Differential Revision: https://reviews.llvm.org/D86537
This adapts legalization of intrinsic get.active.lane.mask to the new semantics
as described in D86147. Because the second argument is now the loop tripcount,
we legalize this intrinsic to an 'icmp ULT' instead of an ULE when it was the
backedge-taken count.
Differential Revision: https://reviews.llvm.org/D86302
This patch adds the -Xclang option
"-fexperimental-debug-variable-locations" and same LLVM CodeGen option,
to pick which variable location tracking solution to use.
Right now all the switch does is pick which LiveDebugValues
implementation to use, the normal VarLoc one or the instruction
referencing one in rGae6f78824031. Over time, the aim is to add fragments
of support in aid of the value-tracking RFC:
http://lists.llvm.org/pipermail/llvm-dev/2020-February/139440.html
also controlled by this command line switch. That will slowly move
variable locations to be defined by an instruction calculating a value,
and a DBG_INSTR_REF instruction referring to that value. Thus, this is
going to grow into a "use the new kind of variable locations" switch,
rather than just "use the new LiveDebugValues implementation".
Differential Revision: https://reviews.llvm.org/D83048