We've started (D80598) the process of migrating away from the inline operand lists in statepoints to using explicit operand bundles. Update a few tests to reflect the new preference. More to come, these were simply the ones outside any obvious grouping.
Summary:
We received a report of LiveDebugValues consuming 25GB+ of RAM when
compiling code generated by Unity's IL2CPP scripting backend.
There's an initial 5GB spike due to repeatedly copying cached lists of
MachineBasicBlocks within the UserValueScopes members of VarLocs.
But the larger scaling issue arises due to the fact that prior to range
extension, there are 81K basic blocks and 156K DBG_VALUEs: given enough
memory, LiveDebugValues would insert 101 million MIs (I counted this by
incrementing a counter inside of VarLoc::BuildDbgValue).
It seems like LiveDebugValues would have to be rearchitected to support
this kind of input (we'd need some new represntation for DBG_VALUEs that
get inserted into ~every block via flushPendingLocs). OTOH, large globs
of auto-generated code are typically not debugged interactively.
So: add cutoffs to disable range extension when the input is too big. I
chose the cutoffs experimentally, erring on the conservative side. When
compiling a large collection of Apple software, range extension never
got disabled.
rdar://63418929
Reviewers: aprantl, friss, jmorse, Orlando
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D80662
Summary:
Verify that each DBG_VALUE has a debug location. This is required by
LiveDebugValues, and perhaps by other late passes.
There's an exception for tests: lots of tests use a two-operand form of
DBG_VALUE for convenience. There's no reason to prevent that.
This is an extension of D80665, but there's no dependency.
Reviewers: aprantl, jmorse, davide, chrisjackson
Subscribers: hiraditya, asb, rbar, johnrusso, simoncook, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, Jim, lenary, s.egerton, pzheng, sameer.abuasal, apazos, luismarques, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D80670
We need to process only parameters. Allocas access can be calculated
afterwards.
Also don't create fake function for aliases and just resolve them on
initialization.
Summary:
Follow-up to https://reviews.llvm.org/D80581.
Turns out the codegen part already worked, so only needed to add tests.
I manually verified that in these tests the generated code for inalloca
and preallocated were identical.
Reviewers: efriedma, hans
Subscribers: llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D80742
Summary:
Only column-major was supported so far. This adds row-major support as well.
Note that we probably also want very efficient SIMD implementations for the
various target platforms.
Bug:
https://bugs.llvm.org/show_bug.cgi?id=46085
Reviewers: nicolasvasilache, reidtatge, bkramer, fhahn, ftynse, andydavis1, craig.topper, dcaballe, mehdi_amini, anemet
Reviewed By: fhahn
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D80673
If this mask only clears bits in the low 32-bit half of a flat
pointer, these bits are always preserved in the result address
space. If the high bits are modified, they may need to be preserved
for some kind of user pointer tagging.
Summary:
Count the per-module number of basic blocks when the module summary is computed
and sum them up during Thin LTO indexing.
This is used to estimate the working set size under the partial sample PGO.
This is split off of D79831.
Reviewers: davidxl, espindola
Subscribers: emaste, inglorion, hiraditya, MaskRay, steven_wu, dexonsmith, arphaman, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D80403
Continues from D80598.
The key point of the change is to default to using operand bundles instead of the inline length prefix argument lists for statepoint nodes. An important subtlety to note is that the presence of a bundle has semantic meaning, even if it is empty. As such, we need to make a somewhat deeper change to the interface than is first obvious.
Existing code treats statepoint deopt arguments and the deopt bundle operands differently during inlining. The former is ignored (resulting in caller state being dropped), the later is merged.
We can't preserve the old behaviour for calls with deopt fed to RS4GC and then inlining, but we can avoid the no-deopt case changing. At least in internal testing, that seem to be the important one. (I'd argue the "stop merging after RS4GC" behaviour for the former was always "unexpected", but that the behaviour for non-deopt calls actually make sense.)
Differential Revision: https://reviews.llvm.org/D80674
Summary:
Follow up D79751 and put the instrumentation / value collection side (in
addition to the optimization side) behind the flag as well.
Reviewers: davidxl
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D80646
While LazyBlockFrequencyInfo itself is lazy, the dominator tree
and loop info analyses it requires are not. Drop the dependency
on this pass in SelectionDAGIsel at O0.
This makes for a ~0.6% O0 compile-time improvement.
Differential Revision: https://reviews.llvm.org/D80387
Summary:
PHIs result register class is set to VGPR or SGPR depending on the cross block value divergence.
In some cases uniform PHI need to be converted to return VGPR to prevent the oddnumber of moves values from VGPR to SGPR and back.
PHI should certainly return VGPR if it has at least one VGPR input. This change adds the exception.
We don't want to convert uniform PHI to VGPRs in case the only VGPR input is a VGPR to SGPR COPY and definition od the
source VGPR in this COPY is move immediate.
bb.0:
%0:vgpr_32 = V_MOV_B32_e32 0, implicit $exec
%2:sreg_32 = .....
bb.1:
%3:sreg_32 = PHI %1, %bb.3, %2, %bb.1
S_BRANCH %bb.3
bb.3:
%1:sreg_32 = COPY %0
S_BRANCH %bb.2
Reviewers: rampitec
Reviewed By: rampitec
Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D80434
Summary:
Propagate memory operands when folding load instructions into instructions that directly operate on memory.
The original revision has been split. See D80140 for the other part of the changes.
Reviewers: craig.topper, rnk, lebedev.ri, efriedma
Reviewed By: craig.topper
Subscribers: lebedev.ri, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D80062
This one is slightly odd since it counts as an address expression,
which previously could never fail. Allow the existing TTI hook to
return the value to use, and re-use it for handling how to handle
ptrmask.
Handles the no-op addrspacecasts for AMDGPU. We could probably do
something better based on analysis of the mask value based on the
address space, but leave that for now.
There was a failure on windows bit due to format mismatch on
different(Hex and Decimal) platforms even if meaning of output is same.
For example on X86 linux =>
DW_OP_plus_uconst 0x70, DW_OP_deref, DW_OP_lit4, DW_OP_mul
^
on X86 Windows-gnu =>
DW_AT_location (DW_OP_fbreg +112, DW_OP_deref, DW_OP_lit4, DW_OP_mul)
: error: CHECK-SAME: expected string not found in input
; CHECK-SAME: DW_OP_plus_uconst 0x70, DW_OP_deref, DW_OP_lit4, DW_OP_mul
^
<stdin>:28:17: note: scanning from here
DW_AT_location (DW_OP_fbreg +112, DW_OP_deref, DW_OP_lit4, DW_OP_mul)
^
<stdin>:28:18: note: possible intended match here
DW_AT_location (DW_OP_fbreg +112, DW_OP_deref, DW_OP_lit4, DW_OP_mul)
Now the test is limited to x86 using REQUIRED and -mtriple.
http://45.33.8.238/win/16214/step_11.txt
With the "SectionHeaderTable" it is now possible to reorder
entries in the section header table.
It also allows to stop emitting the table.
Differential revision: https://reviews.llvm.org/D80002
Summary:
This patch enables the register allocator to spill/fill lists of 2, 3
and 4 SVE vectors registers to/from the stack. This is implemented with
pseudo instructions that get expanded to individual LDR_ZXI/STR_ZXI
instructions in AArch64ExpandPseudoInsts.
Patch by Sander de Smalen.
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D75988
Summary:
Instead of generating two i32 instructions for each load or store of a volatile
i64 value (two LDRs or STRs), now emit LDRD/STRD.
These improvements cover architectures implementing ARMv5TE or Thumb-2.
The code generation explicitly deviates from using the register-offset
variant of LDRD/STRD. In this variant, the register allocated to the
register-offset cannot be reused in any of the remaining operands. Such
restriction seems to be non-trivial to implement in LLVM, thus it is
left as a to-do.
Differential Revision: https://reviews.llvm.org/D70072
Summary:
This patch is part of a patch series to add support for FileCheck
numeric expressions. This specific patch adds support signed numeric
values, thus allowing negative numeric values.
As such, the patch adds a new class to represent a signed or unsigned
value and add the logic for type promotion and type conversion in
numeric expression mixing signed and unsigned values. It also adds
the %d format specifier to represent signed value.
Finally, it also adds underflow and overflow detection when performing a
binary operation.
Copyright:
- Linaro (changes up to diff 183612 of revision D55940)
- GraphCore (changes in later versions of revision D55940 and
in new revision created off D55940)
Reviewers: jhenderson, chandlerc, jdenny, probinson, grimar, arichardson
Reviewed By: jhenderson, arichardson
Subscribers: MaskRay, hiraditya, llvm-commits, probinson, dblaikie, grimar, arichardson, kristina, hfinkel, rogfer01, JonChesterfield
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D60390
Summary:
TableGen interprets braces ('{}') in the asm string of instruction aliases as
variants but when defining aliases with literal braces they have to be escaped
to prevent them being removed.
Braces are escaped with '\\', for example:
def FooBraces : InstAlias<"foo \\{$imm\\}", (foo IntOperand:$imm)>;
Although when TableGen is emitting the assembly writer (-gen-asm-writer)
the AsmString that gets emitted is:
AsmString = "foo \{$\x01\}";
In c/c++ braces don't need to be escaped which causes compilation
warnings:
warning: use of non-standard escape character '\{' [-Wpedantic]
This patch fixes the issue by unescaping the flattened alias asm string
in the asm writer, by replacing '\{\}' with '{}'.
Reviewed By: hfinkel
Differential Revision: https://reviews.llvm.org/D79991
This patch upgrades DISubrange to support fortran requirements.
Summary:
Below are the updates/addition of fields.
lowerBound - Now accepts signed integer or DIVariable or DIExpression,
earlier it accepted only signed integer.
upperBound - This field is now added and accepts signed interger or
DIVariable or DIExpression.
stride - This field is now added and accepts signed interger or
DIVariable or DIExpression.
This is required to describe bounds of array which are known at runtime.
Testing:
unit test cases added (hand-written)
check clang
check llvm
check debug-info
Reviewed By: aprantl
Differential Revision: https://reviews.llvm.org/D80197
Summary:
Define ELF binary code for VE and modify code where should use this new code.
Depends on D79544.
Reviewed By: jhenderson
Differential Revision: https://reviews.llvm.org/D79545
Currently we can only eliminate call return pairs that either return the
result of the call or a dynamic constant. This patch removes that
limitation.
Differential Revision: https://reviews.llvm.org/D79660
OpenMP emits these for some reason, so handle them. Assume these use
4096 bytes by default, with a flag to override this. Also change the
related stack assumption for calls to have a flag.
We do not have register classes for all possible vector
sizes, so round it up for extract vector element.
Also fixes selection of G_MERGE_VALUES when vectors are
not a power of two.
This has required to refactor getRegSplitParts() in way
that it can handle not just power of two vectors.
Ideally we would like RegSplitParts to be generated by
tablegen.
Differential Revision: https://reviews.llvm.org/D80457
- This allow us to specify the (minimal) alignment on an intrinsic's
arguments and, more importantly, the return value.
Differential Revision: https://reviews.llvm.org/D80422
- Argument attribute needs specifiying through `ArgIndex<n>`
(corresponding to `FirstArgIndex`) to distinguish explicitly from the
index number from the overloaded type list.
- In addition, `RetIndex` (corresponding to `ReturnIndex`) and
`FuncIndex` (corresponding to `FunctionIndex`) are introduced for us
to associate attributes on the return value and potentially function
itself.
Differential Revision: https://reviews.llvm.org/D80422
If we have a memory instruction (e.g. a load), we shouldn't combine it away in
some trivial combine.
It's possible that, say, a call lives between the instructions. This could
modify the value loaded, making the load instructions not safe to fold.
Differential Revision: https://reviews.llvm.org/D80053
Looking back over gcc and icc behavior it looks like icc does
use mulx32 on 32-bit targets and mulx64 on 64-bit targets. It's
also used when dividing i32 by constant on 32-bit targets and
i64 by constant on 64-bit targets.
gcc uses it multiplies producing a 64 bit result on 32-bit targets
and 128-bit results on a 64-bit target. gcc does not appear to use
it for division by constant.
After this patch clang is closer to the icc behavior. This
basically reverts d1c61861dd, but
there were no strong feelings at the time.
Fixes PR45518.
Differential Revision: https://reviews.llvm.org/D80498
This is the groundwork required to implement strictfp. For now, this
should be NFC for regular instructoins (many instructions just gain an
extra use of a reserved register). Regalloc won't rematerialize
instructions with reads of physical registers, but we were suffering
from that anyway with the exec reads.
Should add it for all the related FP uses (possibly with some
extras). I did not add it to either the gpr index mode instructions
(or every single VALU instruction) since it's a ridiculous feature
already modeled as an arbitrary side effect.
Also work towards marking instructions with FP exceptions. This
doesn't actually set the bit yet since this would start to change
codegen. It seems nofpexcept is currently not implied from the regular
IR FP operations. Add it to some MIR tests where I think it might
matter.
The current pattern matching for zext results in the following code snippet
being produced,
w1 = w0
r1 <<= 32
r1 >>= 32
Because BPF implementations require zero extension on 32bit loads this
both adds a few extra unneeded instructions but also makes it a bit
harder for the verifier to track the r1 register bounds. For example in
this verifier trace we see at the end of the snippet R2 offset is unknown.
However, if we track this correctly we see w1 should have the same bounds
as r8. R8 smax is less than U32 max value so a zero extend load should keep
the same value. Adding a max value of 800 (R8=inv(id=0,smax_value=800)) to
an off=0, as seen in R7 should create a max offset of 800. However at the
end of the snippet we note the R2 max offset is 0xffffFFFF.
R0=inv(id=0,smax_value=800)
R1_w=inv(id=0,umax_value=2147483647,var_off=(0x0; 0x7fffffff))
R6=ctx(id=0,off=0,imm=0) R7=map_value(id=0,off=0,ks=4,vs=1600,imm=0)
R8_w=inv(id=0,smax_value=800,umax_value=4294967295,var_off=(0x0; 0xffffffff))
R9=inv800 R10=fp0 fp-8=mmmm????
58: (1c) w9 -= w8
59: (bc) w1 = w8
60: (67) r1 <<= 32
61: (77) r1 >>= 32
62: (bf) r2 = r7
63: (0f) r2 += r1
64: (bf) r1 = r6
65: (bc) w3 = w9
66: (b7) r4 = 0
67: (85) call bpf_get_stack#67
R0=inv(id=0,smax_value=800)
R1_w=ctx(id=0,off=0,imm=0)
R2_w=map_value(id=0,off=0,ks=4,vs=1600,umax_value=4294967295,var_off=(0x0; 0xffffffff))
R3_w=inv(id=0,umax_value=800,var_off=(0x0; 0x3ff))
R4_w=inv0 R6=ctx(id=0,off=0,imm=0)
R7=map_value(id=0,off=0,ks=4,vs=1600,imm=0)
R8_w=inv(id=0,smax_value=800,umax_value=4294967295,var_off=(0x0; 0xffffffff))
R9_w=inv(id=0,umax_value=800,var_off=(0x0; 0x3ff))
R10=fp0 fp-8=mmmm????
After this patch R1 bounds are not smashed by the <<=32 >>=32 shift and we
get correct bounds on R2 umax_value=800.
Further it reduces 3 insns to 1.
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Differential Revision: https://reviews.llvm.org/D73985
Summary:
This patch simply adds support for the new CPU in anticipation of
Power10. There isn't really any functionality added so there are no
associated test cases at this time.
Reviewers: stefanp, nemanjai, amyk, hfinkel, power-llvm-team, #powerpc
Reviewed By: stefanp, nemanjai, amyk, #powerpc
Subscribers: NeHuang, steven.zhang, hiraditya, llvm-commits, wuzish, shchenz, cfe-commits, kbarton, echristo
Tags: #clang, #powerpc, #llvm
Differential Revision: https://reviews.llvm.org/D80020
All 3 passes that change instruction encodings were dropping MI
flags. This avoids scheduling regressions caused by setting
mayRaiseFPExceptions on FP instructions for non-strictfp functions.