AMDGPU was the last in tree target to use this tablegen mode. I plan to
split up the global intrinsic enum similar to the way that clang
diagnostics are split up today. I don't plan to build on this mode.
Reviewers: arsenm, echristo, efriedma
Reviewed By: echristo
Differential Revision: https://reviews.llvm.org/D71318
Before z14, we did not have any FMA instruction for 128-bit
floating-point, so the @llvm.fma.f128 intrinsic needs to be
expanded to a libcall on those platforms.
This worked correctly for regular FMA, but was implemented
incorrectly for the strict version. This was not noticed
because we did not have test coverage for this case.
This patch fixes that incorrect expansion and adds the
missing test cases.
Summary:
This patch adds a method to determine if a loop is in rotated form (the latch is
an exiting block). It also modifies the getLoopGuardBranch method to use this
new method. This method can also be used in Loopfusion. Once this patch lands I
will make the corresponding changes there.
Reviewers: jdoerfert, Meinersbur, dmgreen, etiotto, Whitney, fhahn, hfinkel
Reviewed By: Meinersbur
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D65958
This simplifies code where no extra details are required
Also don't write out detail when it is empty.
Differential Revision: https://reviews.llvm.org/D71347
There are a few places that check specific string attributes have
particular values, and assert if they are something else. The verifier
should catch these kinds of cases.
Sometimes the return value of a comparison operator call is
`UnkownVal`. Since no assumptions can be made on `UnknownVal`,
this leeds to keeping impossible execution paths in the
exploded graph resulting in poor performance and false
positives. To overcome this we replace unknown results of
iterator comparisons by conjured symbols.
Differential Revision: https://reviews.llvm.org/D70244
On some edge cases such as Chromium compiled with full instrumentation we
have a .text section over twice the size of the maximum branch range and
the instrumented code generation containing many examples of the erratum
sequence. The combination of Thunks and many erratum sequences causes
finalizeAddressDependentContent() to not converge. We end up with:
start
- Thunk Creation (disturbs addresses after thunks, creating more patches)
- Patch Creation (disturbs addresses after patches, creating more thunks)
- goto start
In most images with few thunks and patches the mutual disturbance does not
cause convergence problems. As the .text size and number of patches go up
the risk increases.
A way to prevent the thunk creation from interfering with patch creation is
to round up the size of the thunks to a 4KiB boundary when the
erratum patch is enabled. As the erratum sequence only triggers when an
instruction sequence starts at 0xff8 or 0xffc modulo (4 KiB) by making the
thunks not affect addresses modulo (4 KiB) we prevent thunks from
interfering with the patch.
The patches themselves could be aggregated in the same way that Thunks are
within ThunkSections and we could round up the size in the same way. This
would reduce the number of patches created in a .text section size >
128 MiB but would not likely help convergence problems.
Differential Revision: https://reviews.llvm.org/D71281
fixes (remaining part of) pr44071, other part in D71242
The code to insert patch section merges them with a comparison function that
uses logic of the form:
return (isa<PatchSection>(a) && !isa<PatchSection>(b));
If the PatchSections don't implement classof this check fails if b is also
a SyntheticSection. This can result in the patches being out of range if
the SyntheticSection is big, for example a ThunkSection with lots of thunks.
Differential Revision: https://reviews.llvm.org/D71242
fixes (part of) pr44071
HasMetadata checks if our metadata map knows the given object. GetMetadata
also does this check and then does another search to actually retrieve
the value. This can all just be one lookup.
In some cases, we can rename a store operand, in order to enable pairing
of stores. For store pairs, that cannot be merged because the first
tored register is defined in between the second store, we try to find
suitable rename register.
First, we check if we can rename the given register:
1. The first store register must be killed at the store, which means we
do not have to rename instructions after the first store.
2. We scan backwards from the first store, to find the definition of the
stored register and check all uses in between are renamable. Along
they way, we collect the minimal register classes of the uses for
overlapping (sub/super)registers.
Second, we try to find an available register from the minimal physical
register class of the original register. A suitable register must not be
1. defined before FirstMI
2. between the previous definition of the register to rename
3. a callee saved register.
We use KILL flags to clear defined registers while scanning from the
beginning to the end of the block.
This triggers quite often, here are the top changes for MultiSource,
SPEC2000, SPEC2006 compiled with -O3 for iOS:
Metric: aarch64-ldst-opt.NumPairCreated
Program base patch diff
test-suite...nch/fourinarow/fourinarow.test 2.00 39.00 1850.0%
test-suite...s/ASC_Sequoia/IRSmk/IRSmk.test 46.00 80.00 73.9%
test-suite...chmarks/Olden/power/power.test 70.00 96.00 37.1%
test-suite...cations/hexxagon/hexxagon.test 29.00 39.00 34.5%
test-suite...nchmarks/McCat/05-eks/eks.test 100.00 132.00 32.0%
test-suite.../Trimaran/enc-rc4/enc-rc4.test 46.00 59.00 28.3%
test-suite...T2006/473.astar/473.astar.test 160.00 200.00 25.0%
test-suite.../Trimaran/enc-md5/enc-md5.test 8.00 10.00 25.0%
test-suite...telecomm-gsm/telecomm-gsm.test 113.00 139.00 23.0%
test-suite...ediabench/gsm/toast/toast.test 113.00 139.00 23.0%
test-suite...Source/Benchmarks/sim/sim.test 91.00 111.00 22.0%
test-suite...C/CFP2000/179.art/179.art.test 41.00 49.00 19.5%
test-suite...peg2/mpeg2dec/mpeg2decode.test 245.00 279.00 13.9%
test-suite...marks/Olden/health/health.test 16.00 18.00 12.5%
test-suite...ks/Prolangs-C/cdecl/cdecl.test 90.00 101.00 12.2%
test-suite...fice-ispell/office-ispell.test 91.00 100.00 9.9%
test-suite...oxyApps-C/miniGMG/miniGMG.test 430.00 465.00 8.1%
test-suite...lowfish/security-blowfish.test 39.00 42.00 7.7%
test-suite.../Applications/spiff/spiff.test 42.00 45.00 7.1%
test-suite...arks/mafft/pairlocalalign.test 2473.00 2646.00 7.0%
test-suite.../VersaBench/ecbdes/ecbdes.test 29.00 31.00 6.9%
test-suite...nch/beamformer/beamformer.test 220.00 235.00 6.8%
test-suite...CFP2000/177.mesa/177.mesa.test 2110.00 2252.00 6.7%
test-suite...ve-susan/automotive-susan.test 109.00 116.00 6.4%
test-suite...s-C/unix-smail/unix-smail.test 65.00 69.00 6.2%
test-suite...CI_Purple/SMG2000/smg2000.test 1194.00 1265.00 5.9%
test-suite.../Benchmarks/nbench/nbench.test 472.00 500.00 5.9%
test-suite...oxyApps-C/miniAMR/miniAMR.test 248.00 262.00 5.6%
test-suite...quoia/CrystalMk/CrystalMk.test 18.00 19.00 5.6%
test-suite...rks/tramp3d-v4/tramp3d-v4.test 7331.00 7710.00 5.2%
test-suite.../Benchmarks/Bullet/bullet.test 5651.00 5938.00 5.1%
test-suite...ternal/HMMER/hmmcalibrate.test 750.00 788.00 5.1%
test-suite...T2006/456.hmmer/456.hmmer.test 764.00 802.00 5.0%
test-suite...ications/JM/ldecod/ldecod.test 1028.00 1079.00 5.0%
test-suite...CFP2006/444.namd/444.namd.test 1368.00 1434.00 4.8%
test-suite...marks/7zip/7zip-benchmark.test 4471.00 4685.00 4.8%
test-suite...6/464.h264ref/464.h264ref.test 3122.00 3271.00 4.8%
test-suite...pplications/oggenc/oggenc.test 1497.00 1565.00 4.5%
test-suite...T2000/300.twolf/300.twolf.test 742.00 774.00 4.3%
test-suite.../Prolangs-C/loader/loader.test 24.00 25.00 4.2%
test-suite...0.perlbench/400.perlbench.test 1983.00 2058.00 3.8%
test-suite...ications/JM/lencod/lencod.test 4612.00 4785.00 3.8%
test-suite...yApps-C++/PENNANT/PENNANT.test 995.00 1032.00 3.7%
test-suite...arks/VersaBench/dbms/dbms.test 54.00 56.00 3.7%
Reviewers: efriedma, thegameg, samparker, dmgreen, paquette, evandro
Reviewed By: paquette
Differential Revision: https://reviews.llvm.org/D70450
A number of the --debug-* options in llvm-dwarfdump are not particularly
well tested. In some cases, the option is only tested as part of testing
another feature, or a specific part of the section that the options
dump. This change adds four new tests to address some of these holes. It
is not aiming to address every hole however.
I kept the --debug-line switch test separate to X86/brief.s because the
latter only considers the parts of the line table that are affected by
verbose printing, thus missing out things like the header and different
values for things like the Line, Column etc registers.
Reviewed by: JDevlieghere
Differential Revision: https://reviews.llvm.org/D71276
The Isa register is a uint8_t, but at least on Windows this is
internally an unsigned char, which meant that prior to this patch it got
formatted as an ASCII character, rather than a decimal number. This
patch fixes this by casting it to a uint64_t before printing. I did it
this way instead of using a uint8_t formatter because a) it is simpler,
and b) it allows us to change the internal type of Isa in the future
without this code breaking.
I also took the opportunity to test the printing of the other standard
opcodes.
Reviewed by: probinson
Differential Revision: https://reviews.llvm.org/D71274
Summary: Rollback of parts of D71213. After digging more into the code I think we should leave 0 when creating the instructions (CreateMemcpy, CreateMaskedStore, CreateMaskedLoad). It's probably fine for MemorySanitizer because Alignement is resolved but I'm having a hard time convincing myself it has no impact at all (although tests are passing).
Reviewers: courbet
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D71332
Debugging the Iterator Modeling checker or any of the iterator checkers
is difficult without being able to see the relations between the
iterator variables and their abstract positions, as well as the abstract
symbols denoting the begin and the end of the container.
This patch adds the checker-specific part of the Program State printing
to the Iterator Modeling checker.
Summary:
Add host predefined macros to compilation for SYCL device, which is
required for pre-processing host specific includes (e.g. system
headers).
Reviewers: ABataev, jdoerfert
Subscribers: ebevhan, Anastasia, cfe-commits, keryell, Naghasan, Fznamznon
Tags: #clang
Differential Revision: https://reviews.llvm.org/D71286
Signed-off-by: Alexey Bader <alexey.bader@intel.com>
Summary:
Attribute annotations are recorded in a special global composite variable
that points to annotation strings and the annotated objects.
As a restriction of the LLVM IR type system, those pointers are all
pointers to address space 0, so let's insert an addrspacecast when the
annotated global is in a non-0 address space.
Since this addrspacecast is only reachable from the global annotations
object, this should allow us to represent annotations on all globals
regardless of which addrspacecasts are usually legal for the target.
Reviewers: rjmccall
Subscribers: cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D71208
Summary:
The ACLE intrinsics for MVE contain a lot of pairs of functions with
`_m` and `_x` in the name, wrapping a predicated MVE instruction which
only partially overwrites its output register. They have the common
pattern that the `_m` variant takes an initial argument called
'inactive', of the same type as the return value, supplying the input
value of the output register, so that lanes disabled by the
predication will be taken from that parameter; the `_x` variant omits
that initial argument, and simply sets it to undef.
That common pattern is simple enough to wrap into a multiclass, which
should save a lot of effort in setting up all the rest of the `_x`
variants. In this commit I introduce `multiclass IntrinsicMX` in
`arm_mve_defs.td`, and convert existing generation of m/x pairs to use
it.
This allows me to remove the `PredicatedImmediateVectorShift`
multiclass (from D71065) completely, because the new multiclass makes
it so much simpler that it's not worth bothering to define it at all.
Reviewers: MarkMurrayARM, miyuki
Reviewed By: MarkMurrayARM, miyuki
Subscribers: kristof.beyls, dmgreen, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D71335
After recent changes it is now seems possible to get rid of
printing '\n' before each error and warning. This makes the output
cleaner.
Differential revision: https://reviews.llvm.org/D71246
Summary:
These allow you to get and set the operator of a dag node, without
affecting its list of arguments.
`!getop` is slightly fiddly because in many contexts you need its
return value to have a static type more specific than 'any record'. It
works to say `!cast<BaseClass>(!getop(...))`, but it's cumbersome, so
I made `!getop` take an optional type suffix itself, so that can be
written as the shorter `!getop<BaseClass>(...)`.
Reviewers: hfinkel, nhaehnle
Reviewed By: nhaehnle
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D71191
A monolithic checker class is hard to maintain. This patch splits it up
into a modeling part, the three checkers and a debug checker. The common
functions are moved into a library.
Differential Revision: https://reviews.llvm.org/D70320
Summary:
Adds the following intrinsics:
- llvm.aarch64.sve.ldnt1
- llvm.aarch64.sve.stnt1
This patch creates masked loads and stores with the
MONonTemporal flag set when used with the intrinsics above.
Reviewers: sdesmalen, paulwalker-arm, dancgr, mgudim, efriedma, rengolin
Reviewed By: efriedma
Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, cfe-commits, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D71000
Let's try this again; this has been reverted/recommited a few times. Last time
this got reverted because for this loop:
void a() {
#pragma clang loop vectorize(disable)
for (;;)
;
}
vectorisation was incorrectly enabled and the vectorize.enable metadata was set
due to a logic error. But with this fixed, we now imply vectorisation when:
1) vectorisation is enabled, which means: VectorizeWidth > 1,
2) and don't want to add it when it is disabled or enabled, otherwise we would
be incorrectly setting it or duplicating the metadata, respectively.
This should fix PR27643.
Differential Revision: https://reviews.llvm.org/D69628
After creating a low-overhead loop, the loop update instruction was still
lingering around hurting performance. This removes dead loop update
instructions, which in our case are mostly SUBS instructions.
To support this, some helper functions were added to MachineLoopUtils and
ReachingDefAnalysis to analyse live-ins of loop exit blocks and find uses
before a particular loop instruction, respectively.
This is a first version that removes a SUBS instruction when there are no other
uses inside and outside the loop block, but there are some more interesting
cases in test/CodeGen/Thumb2/LowOverheadLoops/mve-tail-data-types.ll which
shows that there is room for improvement. For example, we can't handle this
case yet:
..
dlstp.32 lr, r2
.LBB0_1:
mov r3, r2
subs r2, #4
vldrh.u32 q2, [r1], #8
vmov q1, q0
vmla.u32 q0, q2, r0
letp lr, .LBB0_1
@ %bb.2:
vctp.32 r3
..
which is a lot more tricky because r2 is not only used by the subs, but also by
the mov to r3, which is used outside the low-overhead loop by the vctp
instruction, and that requires a bit of a different approach, and I will follow
up on this.
Differential Revision: https://reviews.llvm.org/D71007
This adds the family of `vshlq_n` and `vshrq_n` ACLE intrinsics, which
shift every lane of a vector left or right by a compile-time
immediate. They mostly work by expanding to the IR `shl`, `lshr` and
`ashr` operations, with their second operand being a vector splat of
the immediate.
There's a fiddly special case, though. ACLE specifies that the
immediate in `vshrq_n` can take values up to //and including// the bit
size of the vector lane. But LLVM IR thinks that shifting right by the
full size of the lane is UB, and feels free to replace the `lshr` with
an `undef` half way through the optimization pipeline. Hence, to keep
this legal in source code, I have to detect it at codegen time.
Logical (unsigned) right shifts by the element size are handled by
simply emitting the zero vector; arithmetic ones are converted into a
shift of one bit less, which will always give the same output.
In order to do that check, I also had to enhance the tablegen
MveEmitter so that it can cope with converting a builtin function's
operand into a bare integer to pass to a code-generating subfunction.
Previously the only bare integers it knew how to handle were flags
generated from within `arm_mve.td`.
Reviewers: dmgreen, miyuki, MarkMurrayARM, ostannard
Reviewed By: dmgreen, MarkMurrayARM
Subscribers: echristo, hokein, rdhindsa, kristof.beyls, hiraditya, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D71065
Summary:
This adds support for DWARF5 location lists which are specified
indirectly, via an index into the debug_loclists offset table. This
includes parsing the DW_AT_loclists_base attribute which determines the
location of this offset table, and support for new form DW_FORM_loclistx
which is used in conjuction with DW_AT_location to refer to the location
lists in this way.
The code uses the llvm class to parse the offset information, and I've
also tried to structure it similarly to how the relevant llvm
functionality works.
Reviewers: JDevlieghere, aprantl, clayborg
Subscribers: lldb-commits
Tags: #lldb
Differential Revision: https://reviews.llvm.org/D71268
Summary:
If the index returns duplicated refs, it will trigger the assertion in
BuildRenameEdit (we expect the processing position is always larger the
the previous one, but it is not true if we have duplication), and also
breaks our heuristics.
This patch make the code robost enough to handle duplications, also
save some cost of redundnat llvm::sort.
Though clangd's index doesn't return duplications, our internal index
kythe will.
Reviewers: ilya-biryukov
Subscribers: MaskRay, jkorous, mgrang, arphaman, kadircet, usaxena95, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D71300
Summary:
In the function `EarlyIfPredicator::shouldConvertIf()`, we call
`TII->isProfitableToIfCvt()` with `BranchProbability::getUnknown()`, it may
cause the potential assertion error for those hook which use `BranchProbability`
in `isProfitableToIfCvt()`, for example `SystemZ`.
`SystemZ` use `Probability < BranchProbability(1, 8))` in the function
`SystemZInstrInfo::isProfitableToIfCvt()`, if we call this function with
`BranchProbability::getUnknown()`, it will cause assertion error.
This patch is to fix the potential bug.
Reviewed By: ThomasRaoux
Differential Revision: https://reviews.llvm.org/D71273
This iterator range just includes physical registers and register masks,
which are interesting when dealing with register liveness.
Reviewers: evandro, t.p.northover, paquette, MatzeB, arsenm
Reviewed By: paquette
Differential Revision: https://reviews.llvm.org/D70562
It is discussed here https://reviews.llvm.org/D71118#inline-643172
Currently when a version is empty, llvm-readelf prints:
"000: 0 (*local*) 2 (<corrupt>)"
But GNU readelf does not treat empty section as corrupt.
There is no sense in having empty versions anyways it seems, but
this change is for consistency with GNU.
Differential revision: https://reviews.llvm.org/D71243
The -fsplit-dwarf-inlining option does not conform to DWARF5 standard.
It creates children for Skeleton compilation unit. We need default behavior
to be DWARF5 compatible. Thus set default state for -fsplit-dwarf-inlining
into "false".
Differential Revision: https://reviews.llvm.org/D71304
Summary: Null type pointers could be dereferenced in some cases.
Reviewers: kadircet, sammccall
Reviewed By: sammccall
Subscribers: cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D71329