Instead of parsing the crashlog in one big loop, use methods that
correspond to the different parsing modes.
Differential revision: https://reviews.llvm.org/D90665
As discussed on D90527, we should be trying to move shift handling functionality into KnownBits to avoid code duplication in SelectionDAG/GlobalISel/ValueTracking.
DAGCombine doesn't canonicalize rotl/rotr with immediate so we
need patterns for both.
Remove the custom matcher for rotl to RORI and just use a SDNodeXForm
to convert the immediate instead. Doing this gives priority to the
rev32/rev16 versions of grevi over rori since an explicit immediate
is more precise than any immediate. I also added rotr patterns for
rev32/rev16. And removed the (or (shl), (shr)) patterns that should be
combined to rotl by DAG combine.
There is at least one other grev pattern that probably needs a
another rotr pattern, but we need more test coverage first.
Differential Revision: https://reviews.llvm.org/D90575
As discussed on D90527, we should be be trying to move shift handling functionality into KnownBits to avoid code duplication in SelectionDAG/GlobalISel/ValueTracking.
The refactor to use the KnownBits fixed/min/max constant helpers allows us to hit a couple of cases that we were missing before.
We still need the getValidMinimumShiftAmountConstant case as KnownBits doesn't handle per-element vector cases.
This patch adds support for building the compiler-rt profile library on AIX.
Reviewed by: phosek
Differential Revision: https://reviews.llvm.org/D90619
This differentiates the Ryzen 4000/4300/4500/4700 series APUs that were
previously included in gfx909.
Differential Revision: https://reviews.llvm.org/D90419
Change-Id: Ia901a7157eb2f73ccd9f25dbacec38427312377d
This patch adds some helper in the DirectiveLanguage wrapper to initialize it from
the RecordKeeper and validate the records. This simplify arguments in lots of function
since only the DirectiveLanguge is passed.
Reviewed By: kiranchandramohan
Differential Revision: https://reviews.llvm.org/D90358
As noted in D90554, there's an opcode typo in using an easily
misused cost model API: getCmpSelInstrCost(). Beyond that, the
assumed sequence of ops is questionable, but that would be
another patch.
My guess is that the x86 test diffs show that we are probably
wrong both before and after this change, so there will be no
practical difference.
As an example, I tried this test which shows a cost of '7'
either way:
define <4 x i32> @sadd(<4 x i32> %va, <4 x i32> %vb) {
%V4I32 = call {<4 x i32>, <4 x i1>} @llvm.sadd.with.overflow.v4i32(<4 x i32> %va, <4 x i32> %vb)
%ov = extractvalue {<4 x i32>, <4 x i1>} %V4I32, 1
%r = extractvalue {<4 x i32>, <4 x i1>} %V4I32, 0
%z = select <4 x i1> %ov, <4 x i32> <i32 42, i32 42, i32 42, i32 42>, <4 x i32> %r
ret <4 x i32> %z
}
$ llc -o - sadd.ll -mattr=avx
vpaddd %xmm1, %xmm0, %xmm2
vpcmpgtd %xmm2, %xmm0, %xmm0
vpxor %xmm0, %xmm1, %xmm0
vblendvps %xmm0, LCPI0_0(%rip), %xmm2, %xmm0a
Differential Revision: https://reviews.llvm.org/D90681
These instructions use a scaled offset. We were wrongly selecting them
even when the required offset was not a multiple of the scale factor.
Differential Revision: https://reviews.llvm.org/D90607
If __libcpp_mbsrtowcs_l outputs zero wchar_t's for week days or
month names (due to errors in the locale function setup), these are
matched all the time in __time_get_storage::__analyze, ending up in
an infinite loop, allocating more memory until killed.
Differential Revision: https://reviews.llvm.org/D69553
Previously, these had to be set manually when building each of the
projects standalone, in order to get proper symbol visibility when
combining the two libraries.
Differential Revision: https://reviews.llvm.org/D90021
This lets external consumers customize the output, similar to how
AssemblyAnnotationWriter lets the caller define callbacks when printing
IR. The array of handlers already existed, this just cleans up the code
so that it can be exposed publically.
Replaces https://reviews.llvm.org/D74158
Differential Revision: https://reviews.llvm.org/D89613
Adds a method called pop_back_n to SmallVector.
This is more readable and less error prone than the alternatives of using
```lang=c++
Vector.resize(Vector.size() - N);
Vector.erase(Vector.end() - N, Vector.end());
for (unsigned I = 0;I<N;++I) Vector.pop_back();
```
Reviewed By: dblaikie
Differential Revision: https://reviews.llvm.org/D90576
Made the isExpandedFromMacro matcher work on Stmt's, TypeLocs and Decls in line with the other macro expansion matchers.
Also tweaked it to take a `std::string` instead of a `StringRef`.
This prevents potential use-after-free bugs if the matcher is created with a string thats destroyed before the matcher finishes matching.
Reviewed By: aaron.ballman
Differential Revision: https://reviews.llvm.org/D90303
As discussed on D90527, we should be be trying to move shift handling functionality into KnownBits to avoid code duplication in SelectionDAG/GlobalISel/ValueTracking.
The refactor to use the KnownBits fixed/min/max constant helpers allows us to hit a couple of cases that we were missing before.
We still need the getValidMinimumShiftAmountConstant case as KnownBits doesn't handle per-element vector cases.
Currently for explicit template function instantiation in CUDA/HIP device
compilation clang emits instantiated kernel with external linkage
and instantiated device function with internal linkage.
This is fine for -fno-gpu-rdc since there is only one TU.
However this causes duplicate symbols for kernels for -fgpu-rdc if
the same instantiation happen in multiple TU. Or missing symbols
if a device function calls an explicitly instantiated template function
in a different TU.
To make explicit template function instantiation work for
-fgpu-rdc we need to follow the C++ linkage paradigm, i.e.
use weak_odr linkage.
Differential Revision: https://reviews.llvm.org/D90311
This relaxes one-use restriction on that `sub` fold,
since apparently the addition of Negator broke
preexisting `C-(C2-X) --> X+(C-C2)` (with C=0) fold.
Vectors where all elements have the same known constant range are treated as a
single constant range in the lattice. When bitcasting such vectors, there is a
mis-match between the width of the lattice value (single constant range) and
the original operands (vector). Go to overdefined in that case.
Fixes PR47991.
As reported by @ronlieb, the test shows intermittent fails.
The test failed, if the dependent task was already finished, when the depending
task was to be created. We have other tests to check for the dependences pair.
Since detached tasks are supported by clang and the OpenMP runtime, Archer
must expect to receive the corresponding callbacks.
This patch adds support to interpret the synchronization semantics of
omp_fulfill_event and cleans up the handling of task switches.
This caused an explosion in ICF times during linking on Windows when libfuzzer
instrumentation is enabled. For a small binary we see ICF time go from ~0 to
~10 s. For a large binary it goes from ~1 s to forevert (I gave up after 30
minutes).
See comment on the code review.
> If we are going to write handler data (that is written as variable
> length data following after the unwind info in .xdata), we need to
> emit the handler data immediately, but for cases where no such
> info is going to be written, skip emitting it right away. (Unwind
> info for all remaining functions that hasn't gotten it emitted
> directly is emitted at the end.)
>
> This does slightly change the ordering of sections (triggering a
> bunch of updates to DebugInfo/COFF tests), but the change should be
> benign.
>
> This also matches GCC's assembly output, which doesn't output
> .seh_handlerdata unless it actually is needed.
>
> For ARM64, the unwind info can be packed into the runtime function
> entry itself (leaving no data in the .xdata section at all), but
> that can only be done if there's no follow-on data in the .xdata
> section. If emission of the unwind info is triggered via
> EmitWinEHHandlerData (or the .seh_handlerdata directive), which
> implicitly switches to the .xdata section, there's a chance of the
> caller wanting to pass further data there, so the packed format
> can't be used in that case.
>
> Differential Revision: https://reviews.llvm.org/D87448
This reverts commit 36c64af9d7.
Basic implementation for call and jmp branches with 32 bit offset. Branches to local targets produce
Branch32 edges that are resolved like a regular PCRel32 relocations. Branches to external (undefined)
targets produce Branch32ToStub edges and go through a PLT entry by default. If the target happens to
get resolved within the 32 bit range from the callsite, the edge is relaxed during post-allocation
optimization. There is a test for each of these cases.
Reviewed By: lhames
Differential Revision: https://reviews.llvm.org/D90331