This is a patch that disables the poison-unsafe select -> and/or i1 folding.
It has been blocking D72396 and also has been the source of a few miscompilations
described in llvm.org/pr49688 .
D99674 conditionally blocked this folding and successfully fixed the latter one.
The former one was still blocked, and this patch addresses it.
Note that a few test functions that has `_logical` suffix are now deoptimized.
These are created by @nikic to check the impact of disabling this optimization
by copying existing original functions and replacing and/or with select.
I can see that most of these are poison-unsafe; they can be revived by introducing
freeze instruction. I left comments at fcmp + select optimizations (or-fcmp.ll, and-fcmp.ll)
because I think they are good targets for freeze fix.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D101191
Preexisting waitcnt may not update the scoreboard if the instruction
being examined needed to wait on fewer counters than what was encoded in
the old waitcnt instruction. Fixing this results in the elimination of
some redudnat waitcnt.
These changes also enable combining consecutive waitcnt into a single
S_WAITCNT or S_WAITCNT_VSCNT instruction.
Reviewed By: rampitec
Differential Revision: https://reviews.llvm.org/D100281
It simplifies the logic by moving the predecessor (preHeader or it's predecessor) above the target (or loopExit),
instead of moving the target to after the predecessor.
Since the loopExit is no longer being moved, directions of any branches within/to it are unaffected.
While the predecessor is being moved, the backwards movement simplifies some considerations,
and the only consideration now required is that a forward WLS to the predecessor should not become backwards.
Reviewed By: dmgreen
Differential Revision: https://reviews.llvm.org/D100094
Adjust sanity check in register parsing function to allow register
name with more than 2 characters (e.g. ccr).
Differential Revision: https://reviews.llvm.org/D101733
As the context depicted by bug 49865[1], we are migrating tests under
`test/CodeGen/M68k/Encoding`, which was originally used to test
instruction encoding using MIR file as input, into `test/MC/M68k`. We
are also adding test directives for AsmParser using the same set of
inputs.
Currently we are converting the original MIR test files into assembly
code as well as translating the original LIT "RUN" statement into one
that only uses built-in LLVM tools (i.e. Get rid of `extract-section`).
However, since AsmParser has not completely finished, many of these
original test cases fail. Thus, this patch only migrate test files
that are passed by the current implementation of AsmParser (and
MCCodeEmitter). The remaining tests (under test/CodeGen/M68k/Encoding)
will be ported alone with the patch that fixes the related issues.
[1]: https://bugs.llvm.org/show_bug.cgi?id=49865
Differential Revision: https://reviews.llvm.org/D101410
- Removes the mention of fastcomp, which is deprecated.
- Some functions in Emscripten have moved from JS glue code to
compiler-rt/emscripten_setjmp.c and
compiler-rt/emscripten_exception_builtins.c. This fixes comments about
that.
Reviewed By: sbc100
Differential Revision: https://reviews.llvm.org/D101812
This change enables emitting CFI unwind information for debugging purpose
for targets with MCAsmInfo::ExceptionsType == ExceptionHandling::None.
Currently generating CFI unwind information is entangled with supporting
the exceptions, even when AsmPrinter explicitly recognizes that the unwind
tables are being generated as debug information.
In fact, the unwind information is not generated even if we specify
--force-dwarf-frame-section, unless exceptions are enabled. The LIT test
llvm/test/CodeGen/AMDGPU/debug_frame.ll demonstrates this behavior.
Enable this option for AMDGPU to prepare for future patches which add
complete CFI support.
Reviewed By: dblaikie, MaskRay
Differential Revision: https://reviews.llvm.org/D78778
The Halide project uses `#pragma comment(linker, "/STACK:...")` to set
the stack size high enough for our embedded compiler to run in end-user
programs on Windows.
Unfortunately, lld-link.exe breaks on this when embedded in a COFF
object, despite supporting the flag on the command line. MSVC's link.exe
supports this fine. This patch extends support for this to lld-link.exe
for better compatibility with MSVC projects.
Differential Revision: https://reviews.llvm.org/D99680
Fixing a minor bug which lead to element type of the output being
modified when folding reshapes with generic op.
Differential Revision: https://reviews.llvm.org/D101942
Widen 1 and 2 byte scalar loads to 4 bytes when sufficiently
aligned to avoid using a global load.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D100430
Fix cases that can crash `SBStructuredData::operator=`.
This happened in a case where `rhs` had a null `SBStructuredDataImpl`.
Differential Revision: https://reviews.llvm.org/D101585
Currently the linker causes unnecessary errors when either the target or the config's platform is a simulator.
Differential Revision: https://reviews.llvm.org/D101855
specialization while substituting a partial template parameter pack,
don't try to extend the existing deduction.
This caused us to select the wrong partial specialization in some rare
cases. A recent change to libc++ caused this to happen in practice for
code using std::conjunction.
An address can be a uniform sum of two i64 bit values.
That regularly happens in a loop where index is an induction
variable promoted to 64 bit by the LSR. We can materialize
zero in a VGPR and still use SADDR form of the load.
Differential Revision: https://reviews.llvm.org/D101591
The custom insert of an unmerge and the callback weirdness should be
unnecessary. Since handleAssignments should now use
getRegisterTypeForCalling conv as SelectionDAG builder would, this
should now just be able to use the generic code. X86-32 relies on the
generated CCAssignFns not seeing illegal types and sharing code with
x86_64, so i64 values would incorrectly be assigned to 64-bit
registers.
Unfortunately the current call lowering code is built on top of the
legacy MVT/DAG based code. However, GlobalISel was not using it the
same way. In short, the DAG passes legalized types to the assignment
function, and GlobalISel was passing the original raw type if it was
simple.
I do believe the DAG lowering is conceptually broken since it requires
picking a type up front before knowing how/where the value will be
passed. This ends up being a problem for AArch64, which wants to pass
i1/i8/i16 values as a different size if passed on the stack or in
registers.
The argument type decision is split across 3 different places which is
hard to follow. SelectionDAG builder uses
getRegisterTypeForCallingConv to pick a legal type, tablegen gives the
illusion of controlling the type, and the target may have additional
hacks in the C++ part of the call lowering. AArch64 hacks around this
by not using the standard AnalyzeFormalArguments and special casing
i1/i8/i16 by looking at the underlying type of the original IR
argument.
I believe people have generally assumed the calling convention code is
processing the original types, and I've discovered a number of dead
paths in several targets.
x86 actually relies on the opposite behavior from AArch64, and relies
on x86_32 and x86_64 sharing calling convention code where the 64-bit
cases implicitly do not work on x86_32 due to using the pre-legalized
types.
AMDGPU targets without legal i16/f16 have always used a broken ABI
that promotes to i32/f32. GlobalISel accidentally fixed this to be the
ABI we should have, but this fixes it so we're using the worse ABI
that is compatible with the DAG. Ideally we would fix the DAG to match
the old GlobalISel behavior, but I don't wish to fight that battle.
A new native GlobalISel call lowering framework should let the target
process the incoming types directly.
CCValAssigns select a "ValVT" and "LocVT" but the meanings of these
aren't entirely clear. Different targets don't use them consistently,
even within their own call lowering code. My current belief is the
intent was "ValVT" is supposed to be the legalized value type to use
in the end, and and LocVT was supposed to be the ABI passed type
(which is also legalized).
With the default CCState::Analyze functions always passing the same
type for these arguments, these only differ when the TableGen part of
the lowering decide to promote the type from one legal type to
another. AArch64's i1/i8/i16 hack ends up inverting the meanings of
these values, so I had to add an additional hack to let the target
interpret how large the argument memory is.
Since targets don't consistently interpret ValVT and LocVT, this
doesn't produce quite equivalent code to the initial DAG
lowerings. I've opted to consistently interpret LocVT as the in-memory
size for stack passed values, and ValVT as the register type to assign
from that memory. We therefore produce extending loads directly out of
the IRTranslator, whereas the DAG would emit regular loads of smaller
values. This will also produce loads/stores that are wider than the
argument value if the allocated stack slot is larger (and there will
be undef padding bytes). If we had the optimizations to reduce
load/stores based on truncated values, this wouldn't produce a
different end result.
Since ValVT/LocVT are more consistently interpreted, we now will emit
more G_BITCASTS as requested by the CCAssignFn. For example AArch64
was directly assigning types to some physical vector registers which
according to the tablegen spec should have been casted to a vector
with a different element type.
This also moves the responsibility for inserting
G_ASSERT_SEXT/G_ASSERT_ZEXT from the target ValueHandlers into the
generic code, which is closer to how SelectionDAGBuilder works.
I had to xfail an x86 test since I don't see a quick way to fix it
right now (I filed bug 50035 for this). It's broken independently of
this change, and only triggers since now we end up with more ands
which hit the improperly handled selection pattern.
I also observed that FP arguments that need promotion (e.g. f16 passed
as f32) are broken, and use regular G_TRUNC and G_ANYEXT.
TLDR; the current call lowering infrastructure is bad and nobody has
ever understood how it chooses types.
- Move the code preventing CSE of `isConvergent` instrs into
`ProcessBlockCSE` (from `isProfitableToCSE`)
- Add comments explaining why `isConvergent` is used to prevent
CSE of non-local instrs in MachineCSE and the new test
This patch renames the replace-function-regex to replace-value-regex to indicate that the existing regex replacement functionality can replace any IR value besides functions.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D101934
When auto-upgrade was replacing a call to a masked intrinsic, it would
not copy the metadata from the original call.
If an intrinsic had metadata, but did not need any updates, the metadata
would stay, but if an update was needed, the would end up being removed.
A similar effect could be observed with masked_expandload and
masked_compressstore, which at the moment are not handled by auto-upgrade:
the metadata remained untouched.
Differential Revision: https://reviews.llvm.org/D101201
These intrinsics do not correspond to their own underlying instruction, but are
a convenience for the common case of materializing a constant vector that has
the same value in each lane.
Differential Revision: https://reviews.llvm.org/D101885
Otherwise I get the following error on windows.
```
CMake Error at D:/bld/lld_1569206597988/work/build/CMakeFiles/CMakeTmp/CMakeLists.txt:2 (set):
Syntax error in cmake code at
D:/bld/lld_1569206597988/work/build/CMakeFiles/CMakeTmp/CMakeLists.txt:2
when parsing string
D:\bld\lld_1569206597988\_h_env\Library\lib\cmake\llvm
Invalid character escape '\b'.
CMake Error at D:/bld/lld_1569206597988/_build_env/Library/share/cmake-3.15/Modules/CheckSymbolExists.cmake:100 (try_compile):
Failed to configure test project build system.
Call Stack (most recent call first):
D:/bld/lld_1569206597988/_build_env/Library/share/cmake-3.15/Modules/CheckSymbolExists.cmake:57 (__CHECK_SYMBOL_EXISTS_IMPL)
D:/bld/lld_1569206597988/_h_env/Library/lib/cmake/llvm/HandleLLVMOptions.cmake:943 (check_symbol_exists)
CMakeLists.txt:56 (include)
```
Reviewed By: sbc100
Differential Revision: https://reviews.llvm.org/D68158
Implements support for undialated depthwise convolution using the existing
depthwise convolution operation. Once convolutions migrate to yaml defined
versions we can rewrite for cleaner implementation.
Reviewed By: mravishankar
Differential Revision: https://reviews.llvm.org/D101579
Operator new must align allocations for types with large alignment.
Before c++17 behavior was implementation defined and both clang and gc++
before 11 ignored alignment. Miss-aligned objects mysteriously crashed
tests on Ubuntu 14.
Alternatives are compile with -std=c++17 or -faligned-new, but they were
discarded as less portable.
Reviewed By: hctim
Differential Revision: https://reviews.llvm.org/D101874
This simply applies Howard's commit 4c80bfbd53 consistently
across all the associative and unordered container tests.
"unord.set/insert_hint_const_lvalue.pass.cpp" failed with `-D_LIBCPP_DEBUG=1`
before this patch; it was the only one that incorrectly reused
invalid iterator `e`. The others already used valid iterators
(generally `c.end()`); I'm just making them all match the same pattern
of usage: "e, then r, then c.end() for the rest."
Differential Revision: https://reviews.llvm.org/D101679
`__debug_less` ends up running the comparator up-to-twice per comparison,
because whenever `(x < y)` it goes on to verify that `!(y < x)`.
This breaks the strict "Complexity" guarantees of algorithms like
`inplace_merge`, which we test in the test suite. So, just skip the
complexity assertions in debug mode.
Differential Revision: https://reviews.llvm.org/D101677
The range of char pointers [data, data+size] is a valid closed range,
but the range [begin, end) is valid only half-open.
Differential Revision: https://reviews.llvm.org/D101676
This appears to be a bug in our string::assign: when assigning into
a longer string, from a shorter snippet of itself, we invalidate
iterators before doing the copy. We should invalidate them afterward.
Also drive-by improve the formatting of a function header.
Differential Revision: https://reviews.llvm.org/D101675
This allocator is not intended for libc++'s users to use;
it's strictly an implementation detail of `src/locale.cpp`.
So, move it to the `src/include/` directory.
Drive-by const-qualify its comparison operators.
For consistency with `__hidden_allocator` (defined in `src/thread.cpp`),
do *not* remove it from "libcxx/lib/libc++unexp.exp",
"libcxx/utils/symcheck-blacklists/linux_blacklist.txt", etc.
Differential Revision: https://reviews.llvm.org/D101293