This one is slightly odd since it counts as an address expression,
which previously could never fail. Allow the existing TTI hook to
return the value to use, and re-use it for handling how to handle
ptrmask.
Handles the no-op addrspacecasts for AMDGPU. We could probably do
something better based on analysis of the mask value based on the
address space, but leave that for now.
With the "SectionHeaderTable" it is now possible to reorder
entries in the section header table.
It also allows to stop emitting the table.
Differential revision: https://reviews.llvm.org/D80002
Summary:
This patch enables the register allocator to spill/fill lists of 2, 3
and 4 SVE vectors registers to/from the stack. This is implemented with
pseudo instructions that get expanded to individual LDR_ZXI/STR_ZXI
instructions in AArch64ExpandPseudoInsts.
Patch by Sander de Smalen.
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D75988
Summary:
Instead of generating two i32 instructions for each load or store of a volatile
i64 value (two LDRs or STRs), now emit LDRD/STRD.
These improvements cover architectures implementing ARMv5TE or Thumb-2.
The code generation explicitly deviates from using the register-offset
variant of LDRD/STRD. In this variant, the register allocated to the
register-offset cannot be reused in any of the remaining operands. Such
restriction seems to be non-trivial to implement in LLVM, thus it is
left as a to-do.
Differential Revision: https://reviews.llvm.org/D70072
Summary:
This patch is part of a patch series to add support for FileCheck
numeric expressions. This specific patch adds support signed numeric
values, thus allowing negative numeric values.
As such, the patch adds a new class to represent a signed or unsigned
value and add the logic for type promotion and type conversion in
numeric expression mixing signed and unsigned values. It also adds
the %d format specifier to represent signed value.
Finally, it also adds underflow and overflow detection when performing a
binary operation.
Copyright:
- Linaro (changes up to diff 183612 of revision D55940)
- GraphCore (changes in later versions of revision D55940 and
in new revision created off D55940)
Reviewers: jhenderson, chandlerc, jdenny, probinson, grimar, arichardson
Reviewed By: jhenderson, arichardson
Subscribers: MaskRay, hiraditya, llvm-commits, probinson, dblaikie, grimar, arichardson, kristina, hfinkel, rogfer01, JonChesterfield
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D60390
This patch upgrades DISubrange to support fortran requirements.
Summary:
Below are the updates/addition of fields.
lowerBound - Now accepts signed integer or DIVariable or DIExpression,
earlier it accepted only signed integer.
upperBound - This field is now added and accepts signed interger or
DIVariable or DIExpression.
stride - This field is now added and accepts signed interger or
DIVariable or DIExpression.
This is required to describe bounds of array which are known at runtime.
Testing:
unit test cases added (hand-written)
check clang
check llvm
check debug-info
Reviewed By: aprantl
Differential Revision: https://reviews.llvm.org/D80197
Summary:
Define ELF binary code for VE and modify code where should use this new code.
Depends on D79544.
Reviewed By: jhenderson
Differential Revision: https://reviews.llvm.org/D79545
Now that all of the statepoint related routines have classes with isa support, let's cleanup.
I'm leaving the (dead) utitilities in tree for a few days so that I can do the same cleanup downstream without breakage.
Back when we had CallSite, we implemented the current Statepoint/ImmutableStatepoint structure in analogous manner. Now that CallSite has been removed, the structure used for statepoints looks decidely out of place. gc.statepoint is one of the small handful of intrinsics which are invokable. Because of this, it can't subclass IntrinsicInst as is idiomatic.
This change simply introduces the GCStatepointInst class, restructures the existing Statepoint/ImmutableStatepoint types to wrap it. I will be landing a series of changes to sink functionality into GCStatepointInst and updating callers to be more idiomatic.
Currently we can only eliminate call return pairs that either return the
result of the call or a dynamic constant. This patch removes that
limitation.
Differential Revision: https://reviews.llvm.org/D79660
OpenMP emits these for some reason, so handle them. Assume these use
4096 bytes by default, with a flag to override this. Also change the
related stack assumption for calls to have a flag.
Can't test this since I can't directly use the default expansion for
AMDGPU. It needs to scale the amount by the wave size, rather than use
the raw byte size value.
We do not have register classes for all possible vector
sizes, so round it up for extract vector element.
Also fixes selection of G_MERGE_VALUES when vectors are
not a power of two.
This has required to refactor getRegSplitParts() in way
that it can handle not just power of two vectors.
Ideally we would like RegSplitParts to be generated by
tablegen.
Differential Revision: https://reviews.llvm.org/D80457
Patch by Neil Dhar <dhar@alumni.duke.edu>
Current state machine for parsing tokens from response files in Windows
does not correctly handle the case where the last token is "". The current
implementation handles the last token by only adding it if it is not empty,
however this does not cover the case where the last token is meant to be
the empty string. We can cover this case by checking whether the state
machine was last in the UNQUOTED state, which indicates that the last
character of the input was a non-whitespace character.
Differential Revision: https://reviews.llvm.org/D78346
- This allow us to specify the (minimal) alignment on an intrinsic's
arguments and, more importantly, the return value.
Differential Revision: https://reviews.llvm.org/D80422
Many edge cases, e.g. wrapped ranges, can be processed
precisely without bailout. However it's very unlikely that
memory access with min/max integer offsets will be
classified as safe anyway.
Early bailout may help with ThinLTO where we can
drop unsafe parameters from summaries.
If we have a memory instruction (e.g. a load), we shouldn't combine it away in
some trivial combine.
It's possible that, say, a call lives between the instructions. This could
modify the value loaded, making the load instructions not safe to fold.
Differential Revision: https://reviews.llvm.org/D80053
Looking back over gcc and icc behavior it looks like icc does
use mulx32 on 32-bit targets and mulx64 on 64-bit targets. It's
also used when dividing i32 by constant on 32-bit targets and
i64 by constant on 64-bit targets.
gcc uses it multiplies producing a 64 bit result on 32-bit targets
and 128-bit results on a 64-bit target. gcc does not appear to use
it for division by constant.
After this patch clang is closer to the icc behavior. This
basically reverts d1c61861dd, but
there were no strong feelings at the time.
Fixes PR45518.
Differential Revision: https://reviews.llvm.org/D80498
ProfileSummaryInfo is updated seldom, as result of very specific
triggers. This patch clearly demarcates state updates from read-only uses.
This, arguably, improves readability and maintainability.
This is the groundwork required to implement strictfp. For now, this
should be NFC for regular instructoins (many instructions just gain an
extra use of a reserved register). Regalloc won't rematerialize
instructions with reads of physical registers, but we were suffering
from that anyway with the exec reads.
Should add it for all the related FP uses (possibly with some
extras). I did not add it to either the gpr index mode instructions
(or every single VALU instruction) since it's a ridiculous feature
already modeled as an arbitrary side effect.
Also work towards marking instructions with FP exceptions. This
doesn't actually set the bit yet since this would start to change
codegen. It seems nofpexcept is currently not implied from the regular
IR FP operations. Add it to some MIR tests where I think it might
matter.
The current pattern matching for zext results in the following code snippet
being produced,
w1 = w0
r1 <<= 32
r1 >>= 32
Because BPF implementations require zero extension on 32bit loads this
both adds a few extra unneeded instructions but also makes it a bit
harder for the verifier to track the r1 register bounds. For example in
this verifier trace we see at the end of the snippet R2 offset is unknown.
However, if we track this correctly we see w1 should have the same bounds
as r8. R8 smax is less than U32 max value so a zero extend load should keep
the same value. Adding a max value of 800 (R8=inv(id=0,smax_value=800)) to
an off=0, as seen in R7 should create a max offset of 800. However at the
end of the snippet we note the R2 max offset is 0xffffFFFF.
R0=inv(id=0,smax_value=800)
R1_w=inv(id=0,umax_value=2147483647,var_off=(0x0; 0x7fffffff))
R6=ctx(id=0,off=0,imm=0) R7=map_value(id=0,off=0,ks=4,vs=1600,imm=0)
R8_w=inv(id=0,smax_value=800,umax_value=4294967295,var_off=(0x0; 0xffffffff))
R9=inv800 R10=fp0 fp-8=mmmm????
58: (1c) w9 -= w8
59: (bc) w1 = w8
60: (67) r1 <<= 32
61: (77) r1 >>= 32
62: (bf) r2 = r7
63: (0f) r2 += r1
64: (bf) r1 = r6
65: (bc) w3 = w9
66: (b7) r4 = 0
67: (85) call bpf_get_stack#67
R0=inv(id=0,smax_value=800)
R1_w=ctx(id=0,off=0,imm=0)
R2_w=map_value(id=0,off=0,ks=4,vs=1600,umax_value=4294967295,var_off=(0x0; 0xffffffff))
R3_w=inv(id=0,umax_value=800,var_off=(0x0; 0x3ff))
R4_w=inv0 R6=ctx(id=0,off=0,imm=0)
R7=map_value(id=0,off=0,ks=4,vs=1600,imm=0)
R8_w=inv(id=0,smax_value=800,umax_value=4294967295,var_off=(0x0; 0xffffffff))
R9_w=inv(id=0,umax_value=800,var_off=(0x0; 0x3ff))
R10=fp0 fp-8=mmmm????
After this patch R1 bounds are not smashed by the <<=32 >>=32 shift and we
get correct bounds on R2 umax_value=800.
Further it reduces 3 insns to 1.
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Differential Revision: https://reviews.llvm.org/D73985
Summary:
This patch simply adds support for the new CPU in anticipation of
Power10. There isn't really any functionality added so there are no
associated test cases at this time.
Reviewers: stefanp, nemanjai, amyk, hfinkel, power-llvm-team, #powerpc
Reviewed By: stefanp, nemanjai, amyk, #powerpc
Subscribers: NeHuang, steven.zhang, hiraditya, llvm-commits, wuzish, shchenz, cfe-commits, kbarton, echristo
Tags: #clang, #powerpc, #llvm
Differential Revision: https://reviews.llvm.org/D80020
code motion
Summary: Currently isSafeToMoveBefore uses DFS numbering for determining
the relative position of instruction and insert point which is not
always correct. This PR proposes the use of Dominator Tree depth for the
same. If a node is at a higher level than the insert point then it is
safe to say that we want to move in the forward direction.
Authored By: RithikSharma
Reviewer: Whitney, nikic, bmahjour, etiotto, fhahn
Reviewed By: Whitney
Subscribers: fhahn, hiraditya, llvm-commits
Tag: LLVM
Differential Revision: https://reviews.llvm.org/D80084
Use getFunctionEntryPointSymbol whenever possible to enclose the
implementation detail and reduce duplicate logic.
Differential Revision: https://reviews.llvm.org/D80402
This will enable selecting non-entry block allocas. Skip the SP write
check in the base isSchedulingBoundary implementation to preserve the
previous scheduling behavior and avoid test churn. It's apparently for
compile time reasons, but if we were to use this more work would be
needed since in some of the failing tests, we seem to incorrectly get
hazard nops inserted.
All 3 passes that change instruction encodings were dropping MI
flags. This avoids scheduling regressions caused by setting
mayRaiseFPExceptions on FP instructions for non-strictfp functions.
ffmpeg/libavcodec/x86/h264_cabac.c inline assembly may produce
movzb 1280(%rbx, %r12), %r12
After D80608, llvm-mc errors:
error: unknown use of instruction mnemonic without a size suffix