Mimics similar change for InstCombine:
ce192ced2b / D104602
All these uses are in blocks that aren't reachable from function's entry,
and said blocks are removed by SimplifyCFG itself,
so we can't really test this change.
This tries to bail out if the PHI is in a `catchswitch` BB in
InstCombine. A PHI cannot be combined into a non-PHI instruction if it
is in a `catchswitch` BB, because `catchswitch` BB cannot have any
non-PHI instruction other than `catchswitch` itself.
The given test case started crashing after D98058.
Reviewed By: lebedev.ri, rnk
Differential Revision: https://reviews.llvm.org/D105309
Somewhat related to D105338.
While it is up for discussion whether or not volatile store traps,
so far there has been no complaints that volatile load/cmpxchg/atomicrmw also may trap.
And even if simplifycfg currently concervatively believes that to be the case,
instcombine does not: https://godbolt.org/z/5vhv4K5b8
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D105343
In MASM, the ifdef family of directives treats its argument literally, without expanding it as a text macro. Add support for this, and also replace the special handling that was previously used for echo.
Reviewed By: thakis
Differential Revision: https://reviews.llvm.org/D104196
Fusion by linearization should not happen when
- The reshape is expanding and it is a consumer
- The reshape is collapsing and is a producer.
The bug introduced in this logic by some recent refactoring resulted
in a crash.
To enforce this (negetive) use case, add a test that reproduces the
error and verifies the fix.
Differential Revision: https://reviews.llvm.org/D104970
This means `max_align_t` is 8 bytes which also sets the alignment
malloc. Since this is technically and ABI breaking change we have
limited to just the emscripten OS target. It is also relatively low
import breakage since it will only effect the alignement of struct that
contai `long double`s (extremerly rare I imagine).
Emscripten's malloc implementation already use 8 byte alignement
(dlmalloc uses and alignement of 2*sizeof(void*) == 8 rather than
checking max_align_t) so will not be effected by this change. By
bringing the ABI in line with the current malloc code this will fix
several issue we have seen in the wild.
See: https://github.com/emscripten-core/emscripten/pull/14456
Differential Revision: https://reviews.llvm.org/D104808
This adds a very basic test in `cuda_with_openmp.cu` that just checks whether the CUDA & OpenMP integrated headers do compile, when a CUDA file is compiled with OpenMP (CPU) enabled.
Thus this basically adds the missing test for https://reviews.llvm.org/D90415.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D105322
This change yields an additional 2% size reduction on an internal search
binary, and an additional 0.5% size reduction on fuchsia.
Differential Revision: https://reviews.llvm.org/D104751
The implementation has become too unwieldy and cognitive overhead wins.
Instead compress the implementation in preparation for additional lowering paths.
Differential Revision: https://reviews.llvm.org/D105359
D97883 introduced a compile-time error in the experimental remote offloading
libomptarget plugin, this patch fixes it and resolves a number of
inconsistencies in the plugin as well:
1. Non-functional Asynchronous API
2. Unnecessarily verbose debug printing
3. Misc. code clean ups
This is not intended to make any functional changes to the plugin.
Differential Revision: https://reviews.llvm.org/D105325
Add the min operation to OpDSL and introduce a min pooling operation to test the implementation. The patch is a sibling of the max operation patch https://reviews.llvm.org/D105203 and the min operation is again lowered to a compare and select pair.
Differential Revision: https://reviews.llvm.org/D105345
This change introduces libMutagen/libclang_rt.mutagen.a as a subset of libFuzzer/libclang_rt.fuzzer.a. This library contains only the fuzzing strategies used by libFuzzer to produce new test inputs from provided inputs, dictionaries, and SanitizerCoverage feedback.
Most of this change is simply moving sections of code to one side or the other of the library boundary. The only meaningful new code is:
* The Mutagen.h interface and its implementation in Mutagen.cpp.
* The following methods in MutagenDispatcher.cpp:
* UseCmp
* UseMemmem
* SetCustomMutator
* SetCustomCrossOver
* LateInitialize (similar to the MutationDispatcher's original constructor)
* Mutate_AddWordFromTORC (uses callbacks instead of accessing TPC directly)
* StartMutationSequence
* MutationSequence
* DictionaryEntrySequence
* RecommendDictionary
* RecommendDictionaryEntry
* FuzzerMutate.cpp (which now justs sets callbacks and handles printing)
* MutagenUnittest.cpp (which adds tests of Mutagen.h)
A note on performance: This change was tested with a 100 passes of test/fuzzer/LargeTest.cpp with 1000 runs per pass, both with and without the change. The running time distribution was qualitatively similar both with and without the change, and the average difference was within 30 microseconds (2.240 ms/run vs 2.212 ms/run, respectively). Both times were much higher than observed with the fully optimized system clang (~0.38 ms/run), most likely due to the combination of CMake "dev mode" settings (e.g. CMAKE_BUILD_TYPE="Debug", LLVM_ENABLE_LTO=OFF, etc.). The difference between the two versions built similarly seems to be "in the noise" and suggests no meaningful performance degradation.
Reviewed By: morehouse
Differential Revision: https://reviews.llvm.org/D102447
Add a flag so that target can choose to use AsmParser for parsing inline asm.
And set the flag by default for AIX.
-no-intergrated-as will override this default if specified explicitly.
Reviewed By: #powerpc, shchenz
Differential Revision: https://reviews.llvm.org/D105314
Implement XCOFFMCAsmParser so that we can use MC to parse inline asm.
The directives and storage mapping classes will be added later
iteratively.
Reviewed By: xgupta
Differential Revision: https://reviews.llvm.org/D105259
To make TensorExp clearer, this change refactors the e0/e1 fields into a union: e0/e1 for a binary op tensor expression, and tensor_num for a tensor-kinded tensor expression.
Reviewed By: aartbik
Differential Revision: https://reviews.llvm.org/D105303
In the original review D87149 it was mentioned that this approach was tried,
and it lead to infinite combine loops, but i'm not seeing anything like that now,
neither in the `check-llvm`, nor on some codebases i tried.
This is a recommit of d9d65527c2,
which i immediately reverted because i have messed up something
during branch switch, and 597ccc92ce
accidentally ended up being pushed, which was very much not the intention.
Reviewed By: spatel
Differential Revision: https://reviews.llvm.org/D105339
In the original review D87149 it was mentioned that this approach was tried,
and it lead to infinite combine loops, but i'm not seeing anything like that now,
neither in the `check-llvm`, nor on some codebases i tried.
Reviewed By: spatel
Differential Revision: https://reviews.llvm.org/D105339
BlockUntilIdle is supposed to return false if it fails.
If an intermediate step fails to clear the queue, we shouldn't
charge ahead and assert on the state of the queue.
The SLM model is inconsistent about where it kept its 'unsupported' schedule classes - better to keep them close to similar classes.
I'm not sure why some ymm classes are defined and others are unsupported though (but I haven't altered them) - the only SLM-like CPU supporting any ymm is KNL and that currently uses the HSW model.