Summary:
Added 'RegisterKind' to the VectorListOp structure, so that this operand
type can be reused for SVE vector lists in a later patch. It also
refactors the 'tryParseVectorList' function so it can be used directly
in the ParserMethod of an operand. The parsing can now parse multiple
kinds of vectors and recover if there is no match.
This is patch [3/6] in a series to add assembler/disassembler support for
SVE's contiguous ST1 (scalar+imm) instructions.
Reviewers: fhahn, rengolin, javed.absar, huntergr, SjoerdMeijer, t.p.northover, echristo, evandro
Reviewed By: rengolin
Subscribers: kristof.beyls, llvm-commits
Differential Revision: https://reviews.llvm.org/D45429
llvm-svn: 329900
Summary:
According RISC-V ELF psABI specification, base RV32 and RV64 ISAs only
allow 32-bit instruction alignment, but instruction allow to be aligned
to 16-bit boundaries for C-extension.
So we just align to 4 bytes and 2 bytes for C-extension is enough.
Reviewers: asb, apazos
Differential Revision: https://reviews.llvm.org/D45560
Patch by Kito Cheng.
llvm-svn: 329899
This patch makes tryCandidate() virtual and some utility functions like
tryLess(), tryGreater(), ... externally available (used to be static).
This makes it possible for a target to derive a new MachineSchedStrategy from
GenericScheduler and reuse most parts.
It was necessary to wrap functions with the same names in
AMDGPU/SIMachineScheduler in a local namespace.
Review: Andy Trick, Florian Hahn
https://reviews.llvm.org/D43329
llvm-svn: 329884
Operand 0 should have the same type of the result. So if the result type needs to be promoted, operand 0 needs to be promoted unconditionally.
llvm-svn: 329883
Also add double-prevoius-failure.ll which captures a test case that at one
point triggered a compiler crash, while developing calling convention support
for f64 on RV32D with soft-float ABI.
llvm-svn: 329877
fadd.d is required in order to force floating point registers to be used in
test code, as parameters are passed in integer registers in the soft float
ABI.
Much of this patch is concerned with support for passing f64 on RV32D with a
soft-float ABI. Similar to Mips, introduce pseudoinstructions to build an f64
out of a pair of i32 and to split an f64 to a pair of i32. BUILD_PAIR and
EXTRACT_ELEMENT can't be used, as a BITCAST to i64 would be necessary, but i64
is not a legal type.
llvm-svn: 329871
We're already removing allocsize attributes from Functions that we
remove args from, since removing arguments from a function may make the
allocsize attribute incorrect. It appears we forgot to also remove them
from callsites.
Without this, I get verifier errors on `@Test2`.
It probably wouldn't be too hard to make DAE properly update allocsize
attributes instead of dropping them, but I can't think of a scenario
where that'd be useful in practice.
llvm-svn: 329868
The standard says that the order of evaluation of an expression
s[x] = foo()
is unspecified. In our case, we first create an empty entry in the map,
then call foo(), then store its return value to the created entry. The
problem is that foo uses the map as a cache, so if it finds that there
is an entry in the map, it stops computation. This change explicitly
sets the order, thus fixing this heisenbug.
llvm-svn: 329864
Without these functions it's hard to create a TargetMachine for
Orc JIT that creates efficient native code.
It's not sufficient to just expose LLVMGetHostCPUName(), because
for some CPUs there's fewer features actually available than
the CPU name indicates (e.g. AVX might be missing on some CPUs
identified as Skylake).
Differential Revision: https://reviews.llvm.org/D44861
llvm-svn: 329856
This patch fixes https://bugs.llvm.org/show_bug.cgi?id=37039
The condition only covers one of the two 64-bit rotate instructions. This just
adds the second (RLDICLo).
Patch by Josh Stone.
llvm-svn: 329852
Similar to the wbinvd instruction, except this
one does not invalidate caches. Ring 0 only.
The encoding matches a wbinvd instruction with
an F3 prefix.
Reviewers: craig.topper, zvi, ashlykov
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D43816
llvm-svn: 329847
Most importantly, we should not replace slashes with backslashes
because that would invalidate the path.
Differential Revision: https://reviews.llvm.org/D45473
llvm-svn: 329838
Atom is the only x86 target that still uses schedule itineraries, if we can remove this then we can begin the work on removing x86 itineraries. I've also found that it will help with PR36550.
I've focussed on matching the existing model as closely as possible (relying on the schedule tests), PR36895 indicated a lot of these were incorrect but we can just as easily fix these after this patch as before. Hopefully we can get llvm-exegesis to help here,
There are a few instructions that rely on itinerary scheduling (mainly push/pop/return) of multiple resource stages, but I don't think any of these are show stoppers.
There are also a few codegen changes that seem related to the post-ra scheduler acting a little differently, I haven't tracked these down but they don't seem critical.
NOTE: I don't have access to any Atom hardware, so this hasn't been tested in the wild.
Differential Revision: https://reviews.llvm.org/D45486
llvm-svn: 329837
Pre-commit for D45486, don't rely on itinerary scheduler model to determine latencies for padding, use the generic TargetSchedModel::computeInstrLatency call.
Also, replace hard coded (atom specific) 2*uop creation per padding cycle with a version based on the scheduler model's issue width.
Differential Revision: https://reviews.llvm.org/D45486
llvm-svn: 329834
When NVPTX TARGET_BUILTIN specifies sm_XX or ptxYY as required feature,
consider those features available if we're compiling for GPU >= sm_XX or have
enabled PTX version >= ptxYY.
Differential Revision: https://reviews.llvm.org/D45061
llvm-svn: 329829
Summary:
This fixes the number of SGPRs and VGPRs in the *_RSRC1 register to
allow for registers set up in wave dispatch, even if those registers are
not used in the shader.
Re-landed after noticing that the buildbot failure from 329808 seemed to
be unrelated.
Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, t-tye, llvm-commits
Differential Revision: https://reviews.llvm.org/D45503
Change-Id: I6575f0e0d2a528d1319d0b289f0ebe4510fa5771
llvm-svn: 329826
This is causing compilation timeouts on code with long sequences of
local values and calls (i.e. foo(1); foo(2); foo(3); ...). It turns out
that code coverage instrumentation is a great way to create sequences
like this, which how our users ran into the issue in practice.
Intel has a tool that detects these kinds of non-linear compile time
issues, and Andy Kaylor reported it as PR37010.
The current sinking code scans the whole basic block once per local
value sink, which happens before emitting each call. In theory, local
values should only be introduced to be used by instructions between the
current flush point and the last flush point, so we should only need to
scan those instructions.
llvm-svn: 329822
Previously the MD5 option of the .file directive provided the checksum
as a quoted hex string; now it's a normal hex number with 0x prefix,
same as the .octa directive accepts.
Differential Revision: https://reviews.llvm.org/D45459
llvm-svn: 329820
Add the minimal support necessary to lower a function that returns the
sum of two i32 values.
Support argument/return lowering of i32 values through registers only.
Add tablegen for regbankselect and instructionselect.
Patch by Petar Avramovic.
Differential Revision: https://reviews.llvm.org/D44304
llvm-svn: 329819
Two issues were fixed:
runtime has difficulty to allocate memory for an external symbol of a
kernel and set the address of the external symbol, therefore make the runtime
handle of an enqueued kernel an ordinary global variable. Runtime only needs
to store the address of the loaded kernel to the handle and has verified
that this approach works.
handle the situation where __enqueue_kernel* gets inlined therefore
the enqueued kernel may be used through a constant expr instead
of an instruction.
Differential Revision: https://reviews.llvm.org/D45187
llvm-svn: 329815
This reverts 329808. That change caused a report of a failure in
test/CodeGen/MIR/AMDGPU/mir-canon-multi.mir that I didn't see. I suspect
it is an expensive-check-only error.
Change-Id: I8133f26f15e7d5ec2b09c687c12cd70e918461b0
llvm-svn: 329811
Summary:
Place parsing of a vector index into a separate function to reduce
duplication, since the code is duplicated in both the parsing of a
Neon vector register operand and a Neon vector list.
This is patch [2/6] in a series to add assembler/disassembler support for
SVE's contiguous ST1 (scalar+imm) instructions.
Reviewers: fhahn, rengolin, javed.absar, huntergr, SjoerdMeijer, t.p.northover, echristo, evandro
Reviewed By: rengolin
Subscribers: kristof.beyls, llvm-commits
Differential Revision: https://reviews.llvm.org/D45428
llvm-svn: 329809
Summary:
This fixes the number of SGPRs and VGPRs in the *_RSRC1 register to
allow for registers set up in wave dispatch, even if those registers are
not used in the shader.
Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, t-tye, llvm-commits
Differential Revision: https://reviews.llvm.org/D45503
Change-Id: I6575f0e0d2a528d1319d0b289f0ebe4510fa5771
llvm-svn: 329808
In r329691, we would choose FP even if the offset wouldn't fit, just
because the offset is smaller than the one from BP. This made many
accesses through FP need to scavenge a register, which resulted in
slower and bigger code for no good reason.
This patch now always picks the offset that fits first, even if FP is
preferred.
llvm-svn: 329797
This is a follow up of rL327695 to instruction select more variants of VSELGT
and VSELGE, for which it is necessary to custom lower SELECT.
More work is required in this area, which will be addressed soon:
- more variants need to be regression tested, but this depends on the next point.
- first LowerConstantFP need to be adjusted for fp16 values.
Differential Revision: https://reviews.llvm.org/D45205
llvm-svn: 329788
Summary:
Merged 'tryMatchVectorRegister' (specific to Neon) and
'tryParseSVERegister' into a single 'tryParseVectorRegister' function, and
created a generic 'parseVectorKind()' function that returns the #Elements
and ElementWidth of a vector suffix. This reduces the duplication of
this functionality between two the vector implementations.
This is patch [1/6] in a series to add assembler/disassembler support for
SVE's contiguous ST1 (scalar+imm) instructions.
Reviewers: fhahn, rengolin, javed.absar, huntergr, SjoerdMeijer, t.p.northover, echristo, evandro
Reviewed By: fhahn
Subscribers: tschuett, llvm-commits, kristof.beyls
Differential Revision: https://reviews.llvm.org/D45427
llvm-svn: 329782
The 128/256-bit versions were no longer used by clang. It uses the legacy SSE/AVX2 version and a select. The 512-bit was changed to the same for consistency.
llvm-svn: 329774
With -fno-plt, for example, calls to printf when getting converted to puts
still use the PLT. This patch checks for the metadata "RtLibUseGOT" and
annotates the declaration with the right attributes.
Differential Revision: https://reviews.llvm.org/D45180
llvm-svn: 329768
Author: Samuel Pitoiset
ds_read_b128 and ds_write_b128 have been recently enabled
under the amdgpu-ds128 option because the performance benefit
is unclear.
Though, using 128-bit loads/stores for the local address space
appears to introduce regressions in tessellation shaders. Not
sure what is broken, but as ds_read_b128/ds_write_b128 are not
enabled by default, just introduce a global option and enable
128-bit only if requested (until it's fixed/used correctly).
v2: - fix regressions in merge-stores.ll and multiple_tails.ll
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105464
llvm-svn: 329764
Summary:
When inserting MOVs to avoid Falkor HWPF collisions, the non-base
register operand of load instructions (e.g. a register offset) was not
being considered live, so it could potentially have been used as a
scratch register, clobbering the actual offset value.
Reviewers: mcrosier
Subscribers: rengolin, javed.absar, kristof.beyls, llvm-commits
Differential Revision: https://reviews.llvm.org/D45502
llvm-svn: 329761
This is based on an example that was recently posted on llvm-dev:
void *propagate_null(void* b, int* g) {
if (!b) {
return 0;
}
(*g)++;
return b;
}
https://godbolt.org/g/xYk3qG
The original code or constant propagation in other passes has obscured the fact
that the phi can be removed completely.
Differential Revision: https://reviews.llvm.org/D45448
llvm-svn: 329755
Summary:
The verification rules for the intrinsics for atomic memcpy, atomic memmove,
and atomic memset are basically code clones. This change merges their verification
rules into a single block to remove duplication.
llvm-svn: 329753
Summary:
Darwin dynamic linker can handle weak symbols in ConstDataSection.
ReadonReadOnlyWithRel symbols should be emitted in ConstDataSection
instead of normal DataSection.
rdar://problem/39298457
Reviewers: dexonsmith, kledzik
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D45472
llvm-svn: 329752
This commit fixes the bot failures that were coming up before with r329716.
The fix was to move the check for "isInSection()" inside of the if condition
and emit the error there instead of waiting to get past the unreachable statement.
This should work in debug and release builds now.
llvm-svn: 329746
There was missing nullptr check before a call to getSection() in
recordRelocation. This would result in a segfault in code like the attached
test.
This adds the missing check and a test which makes sure we get the expected
error output.
llvm-svn: 329716
Summary:
We would like the UMR debugging tool[0] to be able to provide
disassembly for currently live waves based on plain memory
dumps, and we want to leverage the LLVM disassembler for this.
This mostly works, except that UMR clearly can't provide real
symbol info, so it wants to set DisInfo == nullptr.
[0] https://cgit.freedesktop.org/amd/umr/
Reviewers: arsenm, rampitec, artem.tamazov, dp
Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits
Differential Revision: https://reviews.llvm.org/D45477
Change-Id: Ibb2c5af2e66f2e100b4702fd81308e1932bc4ee6
llvm-svn: 329715
Summary:
This type is created on-demand and used as the base type for array
ranges. Since it is "special", its construction did not go through the
createTypeDIE function and so it was never inserted into the accelerator
table, although it clearly belongs there.
I add an explicit addAccelType call to insert it into the table.
During review, we also decided to rename the type to something more
unique to avoid confusion in case the user has own "sizetype" type. The
new name for the type size __ARRAY_SIZE_TYPE__.
Reviewers: JDevlieghere, aprantl, dblaikie
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D45445
llvm-svn: 329705
Improve the alias analysis to account for cases where we
know that src/dst pairs cannot alias due to things like
TBAA. As we know they are noalias, we know no dependency
can occur. Also fixes issues around the size parameter
to AA being incorrect.
Differential Revision: https://reviews.llvm.org/D42381
llvm-svn: 329692
In the presence of variable-sized stack objects, we always picked the
base pointer when resolving frame indices if it was available.
This makes us hit an assert where we can't reach the emergency spill
slot if it's too far away from the base pointer. Since on AArch64 we
decide to place the emergency spill slot at the top of the frame, it
makes more sense to use FP to access it.
The changes here don't affect only emergency spill slots but all the
frame indices. The goal here is to try to choose between FP, BP and SP
so that we minimize the offset and avoid scavenging, or worse, asserting
when trying to access a slot allocated by the scavenger.
Previously discussed here: https://reviews.llvm.org/D40876.
Differential Revision: https://reviews.llvm.org/D45358
llvm-svn: 329691
Summary:
For OS type AMDPAL, the scratch descriptor is loaded from offset 0 of
the GIT, whose 32 bit pointer is in s0 (s8 for gfx9 merged shaders).
This commit fixes that to use offset 0x10 instead of offset 0 for a
compute shader, per the PAL ABI spec.
V2: Ensure s0 (s8 for gfx9 merged shader) is marked live-in when loading
scratch descriptor from GIT.
Reviewers: kzhuravl, nhaehnle, timcorringham
Subscribers: kzhuravl, wdng, yaxunl, t-tye, llvm-commits, dstuttard, nhaehnle, arsenm
Differential Revision: https://reviews.llvm.org/D44468
Change-Id: I93dffa647758e37f613bb5e0dfca840d82e6d26f
llvm-svn: 329690
Much like any written register in load/store instructions, the status register
is not allowed to overlap with any others. So diagnose it like we already do
with the other cases.
llvm-svn: 329687
The BroadwellModelProcResources had an entry for HWPort5, which is a Haswell
resource, and not a Broadwell processor resource. That entry was added to the
Broadwell model because variable blends were consuming it.
This was clearly a typo (the resource name should have been BWPort5), which
unfortunately was never caught before. It was not reported as an error because
HWPort5 is a resource defined by the Haswell model. It has been found when
testing some code with llvm-mca: the list of resources in the resource pressure
view was odd.
This patch fixes the issue; now variable blend instructions consume 2 cycles on
BWPort5 instead of HWPort5. This is enough to get rid of the extra (spurious)
entry in the BroadWellModelProcResources table.
llvm-svn: 329686
Summary:
Subtargets can define the libpfm counter names that can be used to
measure cycles and uops issued on ProcResUnits.
This allows making llvm-exegesis available on more targets.
Fixes PR36984.
Reviewers: gchatelet, RKSimon, andreadb, craig.topper
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D45360
llvm-svn: 329675
This cleans up a number of operations that only claimed te use EFLAGS
due to using DF. But no instructions which we think of us setting EFLAGS
actually modify DF (other than things like popf) and so this needlessly
creates uses of EFLAGS that aren't really there.
In fact, DF is so restrictive it is pretty easy to model. Only STD, CLD,
and the whole-flags writes (WRFLAGS and POPF) need to model this.
I've also somewhat cleaned up some of the flag management instruction
definitions to be in the correct .td file.
Adding this extra register also uncovered a failure to use the correct
datatype to hold X86 registers, and I've corrected that as necessary
here.
Differential Revision: https://reviews.llvm.org/D45154
llvm-svn: 329673
Prefer to use the 32-bit AND with immediate instead.
Primarily I'm doing this to ensure that immediates created by shrinkAndImmediate will always get absorbed into the AND. But I do believe this would be a reduction in the number of uops that need to execute. Ideally we should shrink the 'and' and the 'load' during DAG combine to re-enable the fold.
Fixes PR37063.
llvm-svn: 329667
While reading Codeview records which contain variable-length encoded integers,
such as LF_BCLASS, LF_ENUMERATE, LF_MEMBER, LF_VBCLASS or LF_IVBCLASS,
the record's size would be improperly calculated in cases where the value was
indeed of a variable length (>= LF_NUMERIC). This caused a bad alignement on
the next record, which would/might crash later on.
Differential Revision: https://reviews.llvm.org/D45104
llvm-svn: 329659
The key idea is to lower COPY nodes populating EFLAGS by scanning the
uses of EFLAGS and introducing dedicated code to preserve the necessary
state in a GPR. In the vast majority of cases, these uses are cmovCC and
jCC instructions. For such cases, we can very easily save and restore
the necessary information by simply inserting a setCC into a GPR where
the original flags are live, and then testing that GPR directly to feed
the cmov or conditional branch.
However, things are a bit more tricky if arithmetic is using the flags.
This patch handles the vast majority of cases that seem to come up in
practice: adc, adcx, adox, rcl, and rcr; all without taking advantage of
partially preserved EFLAGS as LLVM doesn't currently model that at all.
There are a large number of operations that techinaclly observe EFLAGS
currently but shouldn't in this case -- they typically are using DF.
Currently, they will not be handled by this approach. However, I have
never seen this issue come up in practice. It is already pretty rare to
have these patterns come up in practical code with LLVM. I had to resort
to writing MIR tests to cover most of the logic in this pass already.
I suspect even with its current amount of coverage of arithmetic users
of EFLAGS it will be a significant improvement over the current use of
pushf/popf. It will also produce substantially faster code in most of
the common patterns.
This patch also removes all of the old lowering for EFLAGS copies, and
the hack that forced us to use a frame pointer when EFLAGS copies were
found anywhere in a function so that the dynamic stack adjustment wasn't
a problem. None of this is needed as we now lower all of these copies
directly in MI and without require stack adjustments.
Lots of thanks to Reid who came up with several aspects of this
approach, and Craig who helped me work out a couple of things tripping
me up while working on this.
Differential Revision: https://reviews.llvm.org/D45146
llvm-svn: 329657
Summary:
Another clean up, following D43208.
Interleaved memory access analysis/optimization has nothing to do with vectorization legality. It doesn't really belong there. On the other hand, cost model certainly has to know about it.
In principle, vectorization should proceed like Legality ==> Optimization ==> CostModel ==> CodeGen, and this change just does that,
by moving the interleaved access analysis/decision out of Legal, and run it just before CostModel object is created.
After this, I can move LoopVectorizationLegality and Hints/Requirements classes into it's own header file, making it shareable within Transform tree. I have the patch already but I don't want to mix with this change. Eventual goal is to move to Analysis tree, but I first need to move RecurrenceDescriptor/InductionDescriptor from Transform/Util/LoopUtil.* to Analysis.
Reviewers: rengolin, hfinkel, mkuper, dcaballe, sguggill, fhahn, aemerson
Reviewed By: rengolin
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D45072
llvm-svn: 329645
Summary:
SSAUpdater is a bottleneck in JumpThreading, and this patch improves the
situation by using SSAUpdaterBulk instead.
Compile time impact: no noticable changes on CTMark, a big improvement
on the test from PR16756.
Reviewers: dberlin, davide, MatzeB
Subscribers: llvm-commits, hiraditya
Differential Revision: https://reviews.llvm.org/D44282
llvm-svn: 329644
Summary:
SSAUpdater is a bottleneck in a number of passes, and one of the reasons
is that it performs a lot of unnecessary computations (DT/IDF) over and
over again. This patch adds a new SSAUpdaterBulk that uses existing DT
and avoids recomputing IDF when possible.
Reviewers: dberlin, davide, MatzeB
Subscribers: llvm-commits, hiraditya
Differential Revision: https://reviews.llvm.org/D44282
llvm-svn: 329643
The caching walker used to hold its own caches, which made its `reset()`
function meaningful. Since caching has been moved out of it, there's no
reason to continue to have these cache-related methods.
Similarly, the EXPENSIVE_CHECKS block that's getting removed used to
rerun the query with caching disabled. Since that's how we always do
queries now, it's redundant.
llvm-svn: 329638
Lower is slightly odd. It often doesn't change the type but the lowerings
do use the new type to decide what code to create. Treat it like a mutation
but provide convenience functions that re-use the existing type.
Re-uses the existing tests:
test/CodeGen/AArch64/GlobalISel/legalize-rem.mir
test/CodeGen/AArch64/GlobalISel//legalize-mul.mir
test/CodeGen/AArch64/GlobalISel//legalize-cmpxchg-with-success.mir
llvm-svn: 329623