It is not safe in general to replace an alias in a GEP with its aliasee
if the alias can be replaced with another definition (i.e. via strong/weak
resolution (linkonce_odr) or via symbol interposition (default visibility
in ELF)) while the aliasee cannot. An example of how this can go wrong is
in the included test case.
I was concerned that this might be a load-bearing misoptimization (it's
possible for us to use aliases to share vtables between base and derived
classes, and on Windows, vtable symbols will always be aliases in RTTI
mode, so this change could theoretically inhibit trivial devirtualization
in some cases), so I built Chromium for Linux and Windows with and without
this change. The file sizes of the resulting binaries were identical, so it
doesn't look like this is going to be a problem.
Differential Revision: https://reviews.llvm.org/D65118
llvm-svn: 366754
[Attributor] Liveness analysis.
Liveness analysis abstract attribute used to indicate which BasicBlocks are dead and can therefore be ignored.
Right now we are only looking at noreturn calls.
Reviewers: jdoerfert, uenoku
Subscribers: hiraditya, llvm-commits
Differential revision: https://reviews.llvm.org/D64162
llvm-svn: 366753
The xform has no real valuewhen it's using out of a complex pattern
output. The complex pattern was already creating TargetConstants with
i16, so this was just unnecessary machinery.
This allows global isel to import the simple cases once the complex
pattern is implemented.
llvm-svn: 366743
Liveness analysis abstract attribute used to indicate which BasicBlocks are dead and can therefore be ignored.
Right now we are only looking at noreturn calls.
Reviewers: jdoerfert, uenoku
Subscribers: hiraditya, llvm-commits
Differential revision: https://reviews.llvm.org/D64162
llvm-svn: 366736
The build_vector will become a constant pool load. By using the
desired type initially, it ensures we don't generate a bitcast
of the constant pool load which will need to be folded with
the load.
While experimenting with another patch, I noticed that when the
load type and the constant pool type don't match, then
SimplifyDemandedBits can't handle it. While we should probably
fix that, this was a simple way to fix the issue I saw.
llvm-svn: 366732
Summary:
Since we are planning to add ADDIStocHA for 32bit in later patch, we decided
to change 64bit one first to follow naming convention with 8 behind opcode.
Patch by: Xiangling_L
Differential Revision: https://reviews.llvm.org/D64814
llvm-svn: 366731
Porting function return value attribute noalias to attributor.
This will be followed with a patch for callsite and function argumets.
Reviewers: jdoerfert
Subscribers: lebedev.ri, hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D63067
llvm-svn: 366728
Stubs out a TargetLoweringObjectFileXCOFF class, implementing only
SelectSectionForGlobal for common symbols. Also adds an override of
EmitGlobalVariable in PPCAIXAsmPrinter which adds a number of defensive errors
and adds support for emitting common globals.
llvm-svn: 366727
While debugging code that uses SafeStack, we've noticed that LLVM
produces an invalid DWARF. Concretely, in the following example:
int main(int argc, char* argv[]) {
std::string value = "";
printf("%s\n", value.c_str());
return 0;
}
DWARF would describe the value variable as being located at:
DW_OP_breg14 R14+0, DW_OP_deref, DW_OP_constu 0x20, DW_OP_minus
The assembly to get this variable is:
leaq -32(%r14), %rbx
The order of operations in the DWARF symbols is incorrect in this case.
Specifically, the deref is incorrect; this appears to be incorrectly
re-inserted in repalceOneDbgValueForAlloca.
With this change which inserts the deref after the offset instead of
before it, LLVM produces correct DWARF:
DW_OP_breg14 R14-32
Differential Revision: https://reviews.llvm.org/D64971
llvm-svn: 366726
The bytes inserted before an overaligned global need to be padded according
to the alignment set on the original global in order for the initializer
to meet the global's alignment requirements. The previous implementation
that padded to the pointer width happened to be correct for vtables on most
platforms but may do the wrong thing if the vtable has a larger alignment.
This issue is visible with a prototype implementation of HWASAN for globals,
which will overalign all globals including vtables to 16 bytes.
There is also no padding requirement for the bytes inserted after the global
because they are never read from nor are they significant for alignment
purposes, so stop inserting padding there.
Differential Revision: https://reviews.llvm.org/D65031
llvm-svn: 366725
We were previously ignoring alignment entirely when combining globals
together in this pass. There are two main things that we need to do here:
add additional padding before each global to meet the alignment requirements,
and set the combined global's alignment to the maximum of all of the original
globals' alignments.
Since we now need to calculate layout as we go anyway, use the calculated
layout to produce GlobalLayout instead of using StructLayout.
Differential Revision: https://reviews.llvm.org/D65033
llvm-svn: 366722
This reverts commit r366686 as it appears to be causing buildbot
failures on sanitizer-x86_64-linux-android and sanitizer-x86_64-linux.
llvm-svn: 366708
ARMLowOverheadLoops would assert a failure if it did not find all the
pseudo instructions that comprise the hardware loop. Instead of doing
this, iterate through all the instructions of the function and revert
any remaining pseudo instructions that haven't been converted.
Differential Revision: https://reviews.llvm.org/D65080
llvm-svn: 366691
This patch was not the reason of the buildbot failure.
Deleted code was introduced as a work around for a bug in the gold linker
(http://sourceware.org/PR16794). Test case that was given as a reason for
this part of code, the one on previous link, now works for the gold.
This condition is too strict and when a code is compiled with debug info
it forces generation of numerous relocations with symbol for architectures
that do not have relocation addend.
Reviewers: arsenm, espindola
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D64327
llvm-svn: 366686
We need to ensure that the number of T's is correct when adding multiple
instructions into the same VPT block.
Differential revision: https://reviews.llvm.org/D65049
llvm-svn: 366684
This patch enables us to find the source loads for each element, splitting them into a Load and ByteOffset, and attempts to recognise consecutive loads that are in fact from the same source load.
A helper function, findEltLoadSrc, recurses to find a LoadSDNode and determines the element's byte offset within it. When attempting to match consecutive loads, byte offsetted loads then attempt to matched against a previous load that has already been confirmed to be a consecutive match.
Next step towards PR16739 - after this we just need to account for shuffling/repeated elements to create a vector load + shuffle.
Fixed out of bounds load assert identified in rL366501
Differential Revision: https://reviews.llvm.org/D64551
llvm-svn: 366681
ARM has code to recognise uses of the "returned" function parameter
attribute which guarantee that the value passed to the function in r0
will be returned in r0 unmodified. IPRA replaces the regmask on call
instructions, so needs to be told about this to avoid reverting the
optimisation.
Differential revision: https://reviews.llvm.org/D64986
llvm-svn: 366669
Summary:
In the atomic optimizer, save doing a bunch of work and generating a
bunch of dead IR in the fairly common case where the result of an
atomic op (i.e. the value that was in memory before the atomic op was
performed) is not used. NFC.
Reviewers: arsenm, dstuttard, tpr
Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, t-tye, hiraditya, jfb, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D64981
llvm-svn: 366667
Current algorithm to update branch weights of latch block and its copies is
based on the assumption that number of peeling iterations is approximately equal
to trip count.
However it is not correct. According to profitability check in one case we can decide to peel
in case it helps to reduce the number of phi nodes. In this case the number of peeled iteration
can be less then estimated trip count.
This patch introduces another way to set the branch weights to peeled of branches.
Let F is a weight of the edge from latch to header.
Let E is a weight of the edge from latch to exit.
F/(F+E) is a probability to go to loop and E/(F+E) is a probability to go to exit.
Then, Estimated TripCount = F / E.
For I-th (counting from 0) peeled off iteration we set the the weights for
the peeled latch as (TC - I, 1). It gives us reasonable distribution,
The probability to go to exit 1/(TC-I) increases. At the same time
the estimated trip count of remaining loop reduces by I.
As a result after peeling off N iteration the weights will be
(F - N * E, E) and trip count of loop becomes
F / E - N or TC - N.
The idea is taken from the review of the patch D63918 proposed by Philip.
Reviewers: reames, mkuper, iajbar, fhahn
Reviewed By: reames
Subscribers: hiraditya, zzheng, llvm-commits
Differential Revision: https://reviews.llvm.org/D64235
llvm-svn: 366665
For equality, the function called getTrue/getFalse with the VT
of the comparison input. But getTrue/getFalse need the boolean VT.
So if this code ever executed, it would assert.
I believe these cases are removed by InstSimplify so we don't get here.
So this patch just fixes up an assert to exclude the equality
possibility and removes the broken code.
llvm-svn: 366649
AddOne/SubOne create new Constant objects. That seems heavy for
comparing ConstantInts which wrap APInts. Just do the math on
on the APInts and compare them.
llvm-svn: 366648
Summary:
Four things here:
1. Generalize the fold to handle non-splat divisors. Reasonably trivial.
2. Unban power-of-two divisors. I don't see any reason why they should
be illegal.
* There is no ban in Hacker's Delight
* I think the ban came from the same bug that caused the miscompile
in the base patch - in `floor((2^W - 1) / D)` we were dividing by
`D0` instead of `D`, and we **were** ensuring that `D0` is not `1`,
which made sense.
3. Unban `1` divisors. I no longer believe Hacker's Delight actually says
that the fold is invalid for `D = 0`. Further considerations:
* We know that
* `(X u% 1) == 0` can be constant-folded to `1`,
* `(X u% 1) != 0` can be constant-folded to `0`,
* Also, we know that
* `X u<= -1` can be constant-folded to `1`,
* `X u> -1` can be constant-folded to `0`,
* https://godbolt.org/z/7jnZJXhttps://rise4fun.com/Alive/oF6p
* We know will end up with the following:
`(setule/setugt (rotr (mul N, P), K), Q)`
* Therefore, for given new DAG nodes and comparison predicates
(`ule`/`ugt`), we will still produce the correct answer if:
`Q` is a all-ones constant; and both `P` and `K` are *anything*
other than `undef`.
* The fold will indeed produce `Q = all-ones`.
4. Try to re-splat the `P` and `K` vectors - we don't care about
their values for the lanes where divisor was `1`.
Reviewers: RKSimon, hermord, craig.topper, spatel, xbolva00
Reviewed By: RKSimon
Subscribers: hiraditya, javed.absar, dexonsmith, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D63963
llvm-svn: 366637
As detailed on PR42674, we can reduce a vXi8 down until we have the final <8 x i8>, and then use PSADBW with zero, to sum those values. We then extract the bottom i8, discarding any overflow from the upper bits of the i16 result.
llvm-svn: 366636
If the blockaddress is not destoryed, the destination block will still
be marked as having its address taken, limiting further transformations.
I think there are other places where the dead blockaddress constants are kept
around, I'll look into that as follow up.
Reviewers: craig.topper, brzycki, davide
Reviewed By: brzycki, davide
Differential Revision: https://reviews.llvm.org/D64936
llvm-svn: 366633
Sometimes, you can end up with cross-bank copies between same-sized GPRs and
FPRs, which feed into G_STOREs. When these copies feed only into stores, they
aren't necessary; we can just store using the original register bank.
This provides some minor code size savings for some floating point SPEC
benchmarks. (Around 0.2% for 453.povray and 450.soplex)
This issue doesn't seem to show up due to regbankselect or anything similar. So,
this patch introduces an early select function, `contractCrossBankCopyIntoStore`
which performs the contraction when possible. The selector then continues
normally and selects the correct store opcode, eliminating needless copies
along the way.
Differential Revision: https://reviews.llvm.org/D65024
llvm-svn: 366625
Summary:
Add immutable WASM global `__tls_align` which stores the alignment
requirements of the TLS segment.
Add `__builtin_wasm_tls_align()` intrinsic to get this alignment in Clang.
The expected usage has now changed to:
__wasm_init_tls(memalign(__builtin_wasm_tls_align(),
__builtin_wasm_tls_size()));
Reviewers: tlively, aheejin, sbc100, sunfish, alexcrichton
Reviewed By: tlively
Subscribers: dschuff, jgravelle-google, hiraditya, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D65028
llvm-svn: 366624
The top-level BUNDLE instruction should behave as an ordinary
instruction. It is supposed to have all relevant registers as implicit
operands. Moving it should work as any other instruction. I believe
the assert intended to avoid moving instructions inside bundles.
llvm-svn: 366605
This change reverts most of the previous register name generation.
The real problem is that RegisterTuple does not generate asm names.
Added optional operand to RegisterTuple. This way we can simplify
register name access and dramatically reduce the size of static
tables for the backend.
Differential Revision: https://reviews.llvm.org/D64967
llvm-svn: 366598
This should now handle everything except structs passed as multiple
registers.
I think most of the packing logic should be handled by
handleAssignments, but I'm unclear on what the contract is for
multiple registers. This is copying how x86 handles this.
This does change the behavior of the test_sgpr_alignment0 amdgpu_vs
test. I don't think shader arguments should try to follow the
alignment, and registers need to be repacked. I also don't think it
matters, since I think the pointers are packed to the beginning of the
argument list anyway.
llvm-svn: 366582
This is the more natural lowering, and presents more opportunities to
reduce 64-bit ops to 32-bit.
This should also help avoid issues graphics shaders have had with
64-bit values, and simplify argument lowering in globalisel.
llvm-svn: 366578
This was handled previously for arguments split due to not fitting in
an MVT. This was dropping the register for argument registers split
due to TLI::getRegisterTypeForCallingConv.
llvm-svn: 366574
Summary:
Current PRE hoists common computations into
CMBB = DT->findNearestCommonDominator(MBB, MBB1).
However, if CMBB is in a hot loop body, we might get performance
degradation.
Differential Revision: https://reviews.llvm.org/D64394
llvm-svn: 366570
Summary:
For split-stack, if the nested argument (i.e. R10) is not used, no need to save/restore it in the prologue.
Reviewers: thanm
Reviewed By: thanm
Subscribers: mstorsjo, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D64673
llvm-svn: 366569
We'd like to remove this whole function, because these are properties of
functions, not the target as a whole. These two are easy to remove
because they are only used for emitting ARM build attributes, which
expects them to represent the defaults for the whole module, not just
the last function generated.
This is needed to get correct build attributes when using IPRA on ARM,
because IPRA causes resetTargetOptions to get called before
ARMAsmPrinter::emitAttributes.
Differential revision: https://reviews.llvm.org/D64929
llvm-svn: 366562
If a function definition is not exact, then the linker could select a
differently-compiled version of it, which could use different registers.
https://reviews.llvm.org/D64909
llvm-svn: 366557
Summary:
According to the new Armv8-M specification
https://static.docs.arm.com/ddi0553/bh/DDI0553B_h_armv8m_arm.pdf the
instructions SQRSHRL and UQRSHLL now have an additional immediate
operand <saturate>. The new assembly syntax is:
SQRSHRL<c> RdaLo, RdaHi, #<saturate>, Rm
UQRSHLL<c> RdaLo, RdaHi, #<saturate>, Rm
where <saturate> can be either 64 (the existing behavior) or 48, in
that case the result is saturated to 48 bits.
The new operand is encoded as follows:
#64 Encoded as sat = 0
#48 Encoded as sat = 1
sat is bit 7 of the instruction bit pattern.
This patch adds a new assembler operand class MveSaturateOperand which
implements parsing and encoding. Decoding is implemented in
DecodeMVEOverlappingLongShift.
Reviewers: ostannard, simon_tatham, t.p.northover, samparker, dmgreen, SjoerdMeijer
Reviewed By: simon_tatham
Subscribers: javed.absar, kristof.beyls, hiraditya, pbarrio, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D64810
llvm-svn: 366555
Summary:
This patch removes the `default` case from some switches on
`llvm::Triple::ObjectFormatType`, and cases for the missing enumerators
(`UnknownObjectFormat`, `Wasm`, and `XCOFF`) are then added.
For `UnknownObjectFormat`, the effect of the action for the `default`
case is maintained; otherwise, where `llvm_unreachable` is called,
`report_fatal_error` is used instead.
Where the `default` case returns a default value, `report_fatal_error`
is used for XCOFF as a placeholder. For `Wasm`, the effect of the action
for the `default` case in maintained.
The code is structured to avoid strongly implying that the `Wasm` case
is present for any reason other than to make the switch cover all
`ObjectFormatType` enumerator values.
Reviewers: sfertile, jasonliu, daltenty
Reviewed By: sfertile
Subscribers: hiraditya, aheejin, sunfish, llvm-commits, cfe-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D64222
llvm-svn: 366544
Summary:
If we have some pattern that leaves only some low bits set, and then performs
left-shift of those bits, if none of the bits that are left after the final
shift are modified by the mask, we can omit the mask.
There are many variants to this pattern:
f. `((x << MaskShAmt) a>> MaskShAmt) << ShiftShAmt`
All these patterns can be simplified to just:
`x << ShiftShAmt`
iff:
f. `(ShiftShAmt-MaskShAmt) s>= 0` (i.e. `ShiftShAmt u>= MaskShAmt`)
Normally, the inner pattern is sign-extend,
but for our purposes it's no different to other patterns:
alive proofs:
f: https://rise4fun.com/Alive/7U3
For now let's start with patterns where both shift amounts are variable,
with trivial constant "offset" between them, since i believe this is
both simplest to handle and i think this is most common.
But again, there are likely other variants where we could use
ValueTracking/ConstantRange to handle more cases.
https://bugs.llvm.org/show_bug.cgi?id=42563
Differential Revision: https://reviews.llvm.org/D64524
llvm-svn: 366540
Summary:
If we have some pattern that leaves only some low bits set, and then performs
left-shift of those bits, if none of the bits that are left after the final
shift are modified by the mask, we can omit the mask.
There are many variants to this pattern:
e. `((x << MaskShAmt) l>> MaskShAmt) << ShiftShAmt`
All these patterns can be simplified to just:
`x << ShiftShAmt`
iff:
e. `(ShiftShAmt-MaskShAmt) s>= 0` (i.e. `ShiftShAmt u>= MaskShAmt`)
alive proofs:
e: https://rise4fun.com/Alive/0FT
For now let's start with patterns where both shift amounts are variable,
with trivial constant "offset" between them, since i believe this is
both simplest to handle and i think this is most common.
But again, there are likely other variants where we could use
ValueTracking/ConstantRange to handle more cases.
https://bugs.llvm.org/show_bug.cgi?id=42563
Differential Revision: https://reviews.llvm.org/D64521
llvm-svn: 366539
Summary:
If we have some pattern that leaves only some low bits set, and then performs
left-shift of those bits, if none of the bits that are left after the final
shift are modified by the mask, we can omit the mask.
There are many variants to this pattern:
d. `(x & ((-1 << MaskShAmt) >> MaskShAmt)) << ShiftShAmt`
All these patterns can be simplified to just:
`x << ShiftShAmt`
iff:
d. `(ShiftShAmt-MaskShAmt) s>= 0` (i.e. `ShiftShAmt u>= MaskShAmt`)
alive proofs:
d: https://rise4fun.com/Alive/I5Y
For now let's start with patterns where both shift amounts are variable,
with trivial constant "offset" between them, since i believe this is
both simplest to handle and i think this is most common.
But again, there are likely other variants where we could use
ValueTracking/ConstantRange to handle more cases.
https://bugs.llvm.org/show_bug.cgi?id=42563
Differential Revision: https://reviews.llvm.org/D64519
llvm-svn: 366538
Summary:
If we have some pattern that leaves only some low bits set, and then performs
left-shift of those bits, if none of the bits that are left after the final
shift are modified by the mask, we can omit the mask.
There are many variants to this pattern:
c. `(x & (-1 >> MaskShAmt)) << ShiftShAmt`
All these patterns can be simplified to just:
`x << ShiftShAmt`
iff:
c. `(ShiftShAmt-MaskShAmt) s>= 0` (i.e. `ShiftShAmt u>= MaskShAmt`)
alive proofs:
c: https://rise4fun.com/Alive/RgJh
For now let's start with patterns where both shift amounts are variable,
with trivial constant "offset" between them, since i believe this is
both simplest to handle and i think this is most common.
But again, there are likely other variants where we could use
ValueTracking/ConstantRange to handle more cases.
https://bugs.llvm.org/show_bug.cgi?id=42563
Differential Revision: https://reviews.llvm.org/D64517
llvm-svn: 366537
Summary:
If we have some pattern that leaves only some low bits set, and then performs
left-shift of those bits, if none of the bits that are left after the final
shift are modified by the mask, we can omit the mask.
There are many variants to this pattern:
b. `(x & (~(-1 << maskNbits))) << shiftNbits`
All these patterns can be simplified to just:
`x << ShiftShAmt`
iff:
b. `(MaskShAmt+ShiftShAmt) u>= bitwidth(x)`
alive proof:
b: https://rise4fun.com/Alive/y8M
For now let's start with patterns where both shift amounts are variable,
with trivial constant "offset" between them, since i believe this is
both simplest to handle and i think this is most common.
But again, there are likely other variants where we could use
ValueTracking/ConstantRange to handle more cases.
https://bugs.llvm.org/show_bug.cgi?id=42563
Differential Revision: https://reviews.llvm.org/D64514
llvm-svn: 366536
Summary:
If we have some pattern that leaves only some low bits set, and then performs
left-shift of those bits, if none of the bits that are left after the final
shift are modified by the mask, we can omit the mask.
There are many variants to this pattern:
a. `(x & ((1 << MaskShAmt) - 1)) << ShiftShAmt`
All these patterns can be simplified to just:
`x << ShiftShAmt`
iff:
a. `(MaskShAmt+ShiftShAmt) u>= bitwidth(x)`
alive proof:
a: https://rise4fun.com/Alive/wi9
Indeed, not all of these patterns are canonical.
But since this fold will only produce a single instruction
i'm really interested in handling even uncanonical patterns,
since i have this general kind of pattern in hotpaths,
and it is not totally outlandish for bit-twiddling code.
For now let's start with patterns where both shift amounts are variable,
with trivial constant "offset" between them, since i believe this is
both simplest to handle and i think this is most common.
But again, there are likely other variants where we could use
ValueTracking/ConstantRange to handle more cases.
https://bugs.llvm.org/show_bug.cgi?id=42563
Reviewers: spatel, nikic, huihuiz, xbolva00
Reviewed By: xbolva00
Subscribers: efriedma, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D64512
llvm-svn: 366535
In debug frame information, some fields, e.g., Length in CIE/FDE and
Offset in FDE are attributes to describe the structure of CIE/FDE. They
are not related to the relaxed code. However, these attributes are
symbol differences. So, in current design, these attributes will be
filled as zero and LLVM generates relocations for them.
We only need to generate relocations for symbols in executable sections.
So, if the symbols are not located in executable sections, we still
evaluate their values under relaxation.
Differential Revision: https://reviews.llvm.org/D61584
llvm-svn: 366531
It is necessary to generate fixups in .debug_frame or .eh_frame as
relaxation is enabled due to the address delta may be changed after
relaxation.
There is an opcode with 6-bits data in debug frame encoding. So, we
also need 6-bits fixup types.
Differential Revision: https://reviews.llvm.org/D58335
llvm-svn: 366524
Summary:
Inline asm doesn't use labels when compiled as an object file. Therefore, we
shouldn't create one for the (potential) callbr destination. Instead, use the
symbol for the MachineBasicBlock.
Reviewers: nickdesaulniers, craig.topper
Reviewed By: nickdesaulniers
Subscribers: xbolva00, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D64888
llvm-svn: 366523
I plan on adding memcpy optimizations in the GlobalISel pipeline, but we can't
do that unless we delay lowering to actual function calls. This patch changes
the translator to generate G_INTRINSIC_W_SIDE_EFFECTS for these functions, and
then have each target specify that using the new custom legalizer for intrinsics
hook that they want it expanded it a libcall.
Differential Revision: https://reviews.llvm.org/D64895
llvm-svn: 366516
Add support for folding G_GEPs into loads of the form
```
ldr reg, [base, off]
```
when possible. This can save an add before the load. Currently, this is only
supported for loads of 64 bits into 64 bit registers.
Add a new addressing mode function, `selectAddrModeRegisterOffset` which
performs this folding when it is profitable.
Also add a test for addressing modes for G_LOAD.
Differential Revision: https://reviews.llvm.org/D64944
llvm-svn: 366503
This is a small extension of !associated, mostly useful for the implementation
convenience of instrumentation passes that RAUW globals with aliases, such
as LowerTypeTests.
Differential Revision: https://reviews.llvm.org/D64951
llvm-svn: 366502
Summary:
Some polish for r365099 which adds a static initializer to
MachOObjectFile. Remove it by moving it to file scope.
Reviewers: smeenai, alexshap, compnerd, mtrent, anushabasana
Reviewed By: smeenai
Subscribers: hiraditya, jkorous, dexonsmith, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D64873
llvm-svn: 366496
This causes sections with relative pointers to be marked as read only,
which means that they won't end up sharing pages with writable data.
Differential Revision: https://reviews.llvm.org/D64948
llvm-svn: 366494
While 'd_type' is a non-standard extension to `struct dirent`, only
glibc signals its presence with a macro '_DIRENT_HAVE_D_TYPE'.
However, any platform with 'd_type' also includes a way to convert to
mode_t values using the macro 'DTTOIF', so we can check for that alone
and still be confident that the 'd_type' member exists.
(If this turns out to be wrong, I'll go back and set up an actual
CMake check.)
I couldn't think of how to write a test for this, because I couldn't
think of how to test that a 'stat' call doesn't happen without
controlling the filesystem or intercepting 'stat', and there's no good
cross-platform way to do that that I know of.
Follow-up (almost a year later) to r342089.
rdar://problem/50592673
https://reviews.llvm.org/D64940
llvm-svn: 366486
Summary:
Add `__builtin_wasm_tls_base` so that LeakSanitizer can find the thread-local
block and scan through it for memory leaks.
Reviewers: tlively, aheejin, sbc100
Subscribers: dschuff, jgravelle-google, hiraditya, sunfish, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D64900
llvm-svn: 366475
Summary:
- As the pointer stripping now tracks through `addrspacecast`, prepare
to handle the bit-width difference from the result pointer.
Reviewers: jdoerfert
Subscribers: jvesely, nhaehnle, hiraditya, arphaman, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D64928
llvm-svn: 366470
There doesn't seem to be a practical reason for these instructions to have
different restrictions on the types of relocations that they may be used
with, notwithstanding the language in the ELF AArch64 spec that implies that
specific relocations are meant to be used with specific instructions.
For example, we currently forbid the first instruction in the following
sequence, despite it currently being used by clang to generate a global
reference under -mcmodel=large:
movz x0, #:abs_g0_nc:foo
movk x0, #:abs_g1_nc:foo
movk x0, #:abs_g2_nc:foo
movk x0, #:abs_g3:foo
Therefore, allow MOVK/MOVN/MOVZ to accept the union of the set of relocations
that they currently accept individually.
Differential Revision: https://reviews.llvm.org/D64466
llvm-svn: 366461
It is necessary to generate fixups in .debug_frame or .eh_frame as
relaxation is enabled due to the address delta may be changed after
relaxation.
There is an opcode with 6-bits data in debug frame encoding. So, we
also need 6-bits fixup types.
Differential Revision: https://reviews.llvm.org/D58335
llvm-svn: 366442
This patch enables us to find the source loads for each element, splitting them into a Load and ByteOffset, and attempts to recognise consecutive loads that are in fact from the same source load.
A helper function, findEltLoadSrc, recurses to find a LoadSDNode and determines the element's byte offset within it. When attempting to match consecutive loads, byte offsetted loads then attempt to matched against a previous load that has already been confirmed to be a consecutive match.
Next step towards PR16739 - after this we just need to account for shuffling/repeated elements to create a vector load + shuffle.
Differential Revision: https://reviews.llvm.org/D64551
llvm-svn: 366441
Summary:
Commit r365249 changed usage of FileCheckNumericVariable to have one
instance of that class per variable as opposed to one instance per
definition of a given variable as was done before. However, it retained
the safety check in setValue that it should only be called with the
variable unset, even after r365625.
However this causes assert failure when a non-pseudo variable is being
redefined. And while redefinition of @LINE at each CHECK line work in
the general case, it caused problem when a substitution failed (fixed in
r365624) and still causes problem when a CHECK line does not match since
@LINE's value is cleared after substitutions in match() happened but
printSubstitutions also attempts a substitution.
This commit solves the root of the problem by changing setValue to set a
new value regardless of whether a value was set or not, thus fixing all
the aforementioned issues.
Reviewers: jhenderson, chandlerc, jdenny, probinson, grimar, arichardson, rnk
Subscribers: hiraditya, llvm-commits, probinson, dblaikie, grimar, arichardson, tra, rnk, kristina, hfinkel, rogfer01, JonChesterfield
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D64882
llvm-svn: 366434
LEA doesn't affect flags, so use it more liberally to replace an ADD when
we know that the ADD operands affect flags.
In the motivating example from PR40483:
https://bugs.llvm.org/show_bug.cgi?id=40483
...this lets us avoid duplicating a math op just to avoid flag conflict.
As mentioned in the TODO comments, this heuristic can be extended to
fire more often if that leads to more improvements.
Differential Revision: https://reviews.llvm.org/D64707
llvm-svn: 366431
Summary:
PerformVMOVRRDCombine ommits adding a offset
of 4 to the PointerInfo, when converting a
f64 = load[M]
to
{i32, i32} = {load[M], load[M + 4]}
Which would allow the machine scheduller
to break dependencies with the second load.
- pr42638
Reviewers: eli.friedman, dmgreen, ostannard
Reviewed By: ostannard
Subscribers: ostannard, javed.absar, kristof.beyls, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D64870
llvm-svn: 366423
I'm not convinced the code this calls is properly vetted for
vXi1 vectors. Experimental vector widening legalization testing
for D55251 is now hitting an assertion failure inside
EltsFromConsecutiveLoads. This is occurring from a v2i1 load
having a store size different than its VT size. Hopefully
this commit will keep such issues from happening.
llvm-svn: 366405
When code relaxation is enabled many RISC-V fixups are not resolved but
instead relocations are emitted. This happens even for DWARF debug
sections. Therefore, to properly support the parsing of DWARF debug info
we need to be able to resolve RISC-V relocations. This patch adds:
* Support for RISC-V relocations in RelocationResolver
* DWARF support for two relocations per object file offset
* DWARF changes to support relocations in more DIE fields
The two relocations per offset change is needed because some RISC-V
relocations (used for label differences) come in pairs.
Relocations can also be emitted for DWARF fields where relocations were
not yet evaluated. Adding relocation support for some of these fields is
essencial. On the other hand, LLVM currently emits RISC-V relocations
for fixups that could be safely evaluated, since they can never be
affected by code relaxations. This patch also adds relocation support
for the fields affected by those extraneous relocations (the DWARF unit
entry Length, and the DWARF debug line entry TotalLength and
PrologueLength), for testing purposes.
Differential Revision: https://reviews.llvm.org/D62062
Patch by Luís Marques.
llvm-svn: 366402
Summary:
LoopInfoWrapperPass::verify uses DT, which means DT must be alive
even if it has no direct users.
Fixes a crash in expensive checks mode.
Reviewers: pcc, leonardchan
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D64896
llvm-svn: 366388
After rL365286 I had failing test:
LLVM :: tools/gold/X86/v1.12/thinlto_emit_linked_objects.ll
It was failing with the output:
$ llvm-bcanalyzer --dump llvm/test/tools/gold/X86/v1.12/Output/thinlto_emit_linked_objects.ll.tmp3.o.thinlto.bc
Expected<T> must be checked before access or destruction.
Unchecked Expected<T> contained error:
Unexpected end of file reading 0 of 0 bytesStack dump:
Change-Id: I07e03262074ea5e0aae7a8d787d5487c87f914a2
llvm-svn: 366387
- getCompression() used to return a PDB_SourceCompression even though
the docs for IDiaInjectedSource are explicit about the return value
being compiler-dependent. Return an uint32_t instead, and make the
printing code handle unknown values better by printing "Unknown" and
the int value instead of not printing any compression.
- Print compressed contents as hex dump, not as string.
- Add compression type "DotNet", which is used (at least) by csc.exe,
the C# compiler. Also add a lengthy comment describing the stream
contents (derived from looking at the raw hex contents long enough
to see the GUIDs, which led me to the roslyn and mono implementations
for handling this).
- The native injected source dumper was dumping the contents of the
whole data stream -- but csc.exe writes a stream that's padded with
zero bytes to the next 512 boundary, and the dia api doesn't display
those padding bytes. So make NativeInjectedSource::getCode() do the
same thing.
Differential Revision: https://reviews.llvm.org/D64879
llvm-svn: 366386
This will let us instrument globals during initialization. This required
making the new PM pass a module pass, which should still provide access to
analyses via the ModuleAnalysisManager.
Differential Revision: https://reviews.llvm.org/D64843
llvm-svn: 366379
The LocalStackSlotPass pre-allocates a stack protector and makes sure
that it comes before the local variables on the stack.
We need to make sure that later during PEI we don't re-allocate a new
stack protector slot. If that happens, the new stack protector slot will
end up being **after** the local variables that it should be protecting.
Therefore, we would have two slots assigned for two different stack
protectors, one at the top of the stack, and one at the bottom. Since
PEI will overwrite the assigned slot for the stack protector, the load
that is used to compare the value of the stack protector will use the
slot assigned by PEI, which is wrong.
For this, we need to check if the object is pre-allocated, and re-use
that pre-allocated slot.
Differential Revision: https://reviews.llvm.org/D64757
llvm-svn: 366371
Extract the sources to the GCD of the original size and target size,
padding with implicit_def as necessary.
Also fix the case where the requested source type is wider than the
original result type. This was ignoring the type, and just using the
destination. Do the operation in the requested type and truncate back.
llvm-svn: 366367
Use an anyext to the requested type for the leftover operand to
produce a slightly wider type, and then truncate the final merge.
I have another implementation almost ready which handles arbitrary
widens, but I think it produces worse code in this example (which I
think is 90% due to not folding redundant copies or folding out
implicit_def users), so I wanted to add this as a baseline first.
llvm-svn: 366366
Implement IR intrinsics for stack tagging. Generated code is very
unoptimized for now.
Two special intrinsics, llvm.aarch64.irg.sp and llvm.aarch64.tagp are
used to implement a tagged stack frame pointer in a virtual register.
Differential Revision: https://reviews.llvm.org/D64172
llvm-svn: 366360
Summary:
Since the target has no significant advantage of vectorization,
vector instructions bous threshold bonus should be optional.
amdgpu-inline-arg-alloca-cost parameter default value and the target
InliningThresholdMultiplier value tuned then respectively.
Reviewers: arsenm, rampitec
Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, eraman, hiraditya, haicheng, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D64642
llvm-svn: 366348
Summary:
ORCv1 is deprecated. The current aim is to remove it before the LLVM 10.0
release. This patch adds deprecation attributes to the ORCv1 layers and
utilities to warn clients of the change.
Reviewers: dblaikie, sgraenitz, AlexDenisov
Subscribers: llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D64609
llvm-svn: 366344
Summary:
Deduce the "willreturn" attribute for functions.
For now, intrinsics are not willreturn. More annotation will be done in another patch.
Reviewers: jdoerfert
Subscribers: jvesely, nhaehnle, nicholas, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D63046
llvm-svn: 366335
The original behavior was to always emit the offsets to each call site in the
call site table as uleb128 values, however on some architectures (eg RISCV)
these uleb128 offsets into the code cannot always be resolved until link time
(because relaxation will invalidate any calculated offsets), and there are no
appropriate relocations for uleb128 values. As a consequence it needs to be
possible to specify an alternative.
This also switches RISCV to use DW_EH_PE_udata4 for call side encodings in
.gcc_except_table
Differential Revision: https://reviews.llvm.org/D63415
Patch by Edward Jones.
llvm-svn: 366329
This patch sets correct encodings for DWARF exception handling for RISC-V
(other than call site encoding, which must be udata4 rather than uleb128 and
is handled by D63415).
This has the same intend as D63409, except this version matches GCC/binutils
behaviour which uses the same encodings regardless of PIC/non-PIC and
medlow/medany code model.
llvm-svn: 366327
Summary:
Missed in the original commit, use the correct callee-saved register
list for spilling, instead of the standard SVR432 list. This avoids
needlessly spilling the SPE non-volatile registers when they're not used.
As part of this, also add where missing, and sort, the spill opcode
checks for SPE and SPE4 register classes.
Reviewers: nemanjai, hfinkel, joerg
Subscribers: kbarton, jsji, llvm-commits
Differential Revision: https://reviews.llvm.org/D56703
llvm-svn: 366319
Summary:
Pointed out in a comment for D49754, register spilling will currently
spill SPE registers at almost any offset. However, the instructions
`evstdd` and `evldd` require a) 8-byte alignment, and b) a limit of 256
(unsigned) bytes from the base register, as the offset must fix into a
5-bit offset, which ranges from 0-31 (indexed in double-words).
The update to the register spill test is taken partially from the test
case shown in D49754.
Additionally, pointed out by Kei Thomsen, globals will currently use
evldd/evstdd, though the offset isn't known at compile time, so may
exceed the 8-bit (unsigned) offset permitted. This fixes that as well,
by forcing it to always use evlddx/evstddx when accessing globals.
Part of the patch contributed by Kei Thomsen.
Reviewers: nemanjai, hfinkel, joerg
Subscribers: kbarton, jsji, llvm-commits
Differential Revision: https://reviews.llvm.org/D54409
llvm-svn: 366318
Add narrowScalar to half of original size for G_ICMP.
ClampScalar G_ICMP's operands 2 and 3 to to s32.
Select G_ICMP for pointers for MIPS32. Pointer compare is same
as for integers, it is enough to declare them as legal type.
Differential Revision: https://reviews.llvm.org/D64856
llvm-svn: 366317
Migrate CallLowering::lowerReturnVal to use the same infrastructure as
lowerCall/FormalArguments and remove the now obsolete code path from
splitToValueTypes.
Forgot to push this earlier.
llvm-svn: 366308
This directive forces to use the alternate register for context pointer.
For example, this code:
.cplocal $4
jal foo
expands to:
ld $25, %call16(foo)($4)
jalr $25
Differential Revision: https://reviews.llvm.org/D64743
llvm-svn: 366300
As well as other LLVM targets we do not handle "offsettable"
memory addresses in any special way. In other words, the "o" constraint
is an exact equivalent of the "m" one. But some existing code require
the "o" constraint support.
This fixes PR42589.
Differential Revision: https://reviews.llvm.org/D64792
llvm-svn: 366299
AMDGPU needs to allocate special argument registers separately from
the user function argument list, so needs direct control over the
CCState.
The ArgLocs argument is only really necessary because CCState doesn't
allow access to it.
llvm-svn: 366279
Summary:
Currently, on Emscripten, dynamic linking is not supported with threads.
This means that if thread-local storage is used, it must be used in a
statically-linked executable. Hence, local-exec is the only possible model.
This diff compiles all TLS variables to use local-exec on Emscripten as a
temporary measure until dynamic linking is supported with threads.
The goal for this is to allow C++ types with constructors to be thread-local.
Currently, when `clang` compiles a `thread_local` variable with a constructor,
it generates `__tls_guard` variable:
@__tls_guard = internal thread_local global i8 0, align 1
As no TLS model is specified, this is treated as general-dynamic, which we do
not support (and cannot support without implementing dynamic linking support
with threads in Emscripten). As a result, any C++ constructor in `thread_local`
variables would not compile.
By compiling all `thread_local` as local-exec, `__tls_guard` will compile and
we can support C++ constructors with TLS without implementing dynamic linking
with threads.
Depends on D64537
Reviewers: tlively, aheejin, sbc100
Reviewed By: aheejin
Subscribers: dschuff, jgravelle-google, hiraditya, sunfish, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D64776
llvm-svn: 366275
Summary:
Thread local variables are placed inside a `.tdata` segment. Their symbols are
offsets from the start of the segment. The address of a thread local variable
is computed as `__tls_base` + the offset from the start of the segment.
`.tdata` segment is a passive segment and `memory.init` is used once per thread
to initialize the thread local storage.
`__tls_base` is a wasm global. Since each thread has its own wasm instance,
it is effectively thread local. Currently, `__tls_base` must be initialized
at thread startup, and so cannot be used with dynamic libraries.
`__tls_base` is to be initialized with a new linker-synthesized function,
`__wasm_init_tls`, which takes as an argument a block of memory to use as the
storage for thread locals. It then initializes the block of memory and sets
`__tls_base`. As `__wasm_init_tls` will handle the memory initialization,
the memory does not have to be zeroed.
To help allocating memory for thread-local storage, a new compiler intrinsic
is introduced: `__builtin_wasm_tls_size()`. This instrinsic function returns
the size of the thread-local storage for the current function.
The expected usage is to run something like the following upon thread startup:
__wasm_init_tls(malloc(__builtin_wasm_tls_size()));
Reviewers: tlively, aheejin, kripken, sbc100
Subscribers: dschuff, jgravelle-google, hiraditya, sunfish, jfb, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D64537
llvm-svn: 366272
This is part of what is requested by PR42023:
https://bugs.llvm.org/show_bug.cgi?id=42023
There's an extension needed for FP add, but exactly how we would specify
that using flags is not clear to me, so I left that as a TODO.
We're still missing patterns for partial reductions when the input vector
is 256-bit or 512-bit, but I think that's a failure of vector narrowing.
If we can reduce the widths, then this matching should work on those tests.
Differential Revision: https://reviews.llvm.org/D64760
llvm-svn: 366268
D64033 <https://reviews.llvm.org/D64033> added DW_AT_call_column for
inline sites. However, that change wasn't aware of "-gno-column-info".
To avoid adding column info when "-gno-column-info" is used, now
DW_AT_call_column is only added when we have non-zero column (when
"-gno-column-info" is used, column will be zero).
Patch by Wenlei He!
Differential Revision: https://reviews.llvm.org/D64784
llvm-svn: 366264
Summary:
This is exposed by our internal testing.
The reduced testcase will assert with "Impossible reg-to-reg copy"
We can't use COPY to do 32-bit to 64-bit conversion.
Reviewers: kbarton, hfinkel, nemanjai
Reviewed By: hfinkel
Subscribers: hiraditya, MaskRay, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D64499
llvm-svn: 366255
I think this manages to not break the DAG handling with the divergent
predicates because the stadalone divergent patterns end up with a
higher priority than the pattern on the instruction definition.
The 16-bit versions don't work yet.
llvm-svn: 366254
When it is AReg_1024 this results in unnecessary copying into
AGPRs of a 32 element vectors even though they are not intended
for an mfma instruction.
Differential Revision: https://reviews.llvm.org/D64815
llvm-svn: 366252
I don't have an IR sample which is actually failing, but the issue described in the comment is theoretically possible, and should be guarded against even if there's a different root cause for the bot failures.
llvm-svn: 366241
Now that the patterns use the new PatFrag address space support, the
only blocker to importing most load patterns is the addressing mode
complex patterns.
llvm-svn: 366237
`pretty -native -injected-sources -injected-source-content` works with
this patch, and produces identical output to the dia version.
Differential Revision: https://reviews.llvm.org/D64428
llvm-svn: 366236
Summary:
Extend the atomic optimizer to handle signed and unsigned max and min
operations, as well as add and subtract.
Reviewers: arsenm, sheredom, critson, rampitec
Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, jfb, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D64328
llvm-svn: 366235
Reimplement scheduling constraints for strict FP instructions in
ScheduleDAGInstrs::buildSchedGraph to allow for more relaxed
scheduling. Specifially, allow one strict FP instruction to
be scheduled across another, as long as it is not moved across
any global barrier.
Differential Revision: https://reviews.llvm.org/D64412
Reviewed By: cameron.mcinally
llvm-svn: 366222
Before, everything was based on some kind of type erased parser
implementation which container a lot of boilerplate code when multiple
formats were to be supported.
This simplifies it by:
* the remark now owns its arguments
* *always* returning an error from the implementation side
* working around the way the YAML parser reports errors: catch them through
callbacks and re-insert them in a proper llvm::Error
* add a CParser wrapper that is used when implementing the C API to
avoid cluttering the C++ API with useless state
* LLVMRemarkParserGetNext now returns an object that needs to be
released to avoid leaking resources
* add a new API to dispose of a remark entry: LLVMRemarkEntryDispose
llvm-svn: 366217
Summary:
As per title. DAGCombiner only mathes the special case where b = 0, this patches extends the pattern to match any value of b.
Depends on D57302
Reviewers: hfinkel, RKSimon, craig.topper
Subscribers: llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D59208
llvm-svn: 366214
Apparently the check for legal instructions during instruction
select does not happen without an asserts build, so these would
successfully select in release, and fail in debug.
Make s16 and/or/xor legal. These can just be selected directly
to the 32-bit operation, as is already done in SelectionDAG, so just
make them legal.
llvm-svn: 366210
The jcvt intrinsic defined in ACLE [1] is available when ARM_FEATURE_JCVT is defined.
This change introduces the AArch64 intrinsic, wires it up to the instruction and a new clang builtin function.
The __ARM_FEATURE_JCVT macro is now defined when an Armv8.3-A or higher target is used.
I've implemented the target detection logic in Clang so that this feature is enabled for architectures from armv8.3-a onwards (so -march=armv8.4-a also enables this, for example).
make check-all didn't show any new failures.
[1] https://developer.arm.com/docs/101028/latest/data-processing-intrinsics
Differential Revision: https://reviews.llvm.org/D64495
llvm-svn: 366197
The DWARF3 documentation had inconsistency concerning the reserved range
for unit length values. The issue was fixed in DWARF4.
Differential Revision: https://reviews.llvm.org/D64622
llvm-svn: 366190
The first argument in the constructor was ignored, and the remaining
arguments were always passed as their defaults.
Differential Revision: https://reviews.llvm.org/D64407
llvm-svn: 366188
The canonical GNU form of JALR resembles a load/store instruction rather
than placing the immediate offset as a separate argument, so match this
behaviour. Also add parser-only aliases for the three-operand form, and
add other shorter aliases also emitted by GNU tools.
Differential Revision: https://reviews.llvm.org/D55277
Patch by James Clarke.
llvm-svn: 366179
RISCVAsmBackend::shouldInsertExtraNopBytesForCodeAlign() assumed that the
align specified would be greater than or equal to the minimum nop length, but
that is not always the case - for example if a user specifies ".align 0" in
assembly.
Differential Revision: https://reviews.llvm.org/D63274
Patch by Edward Jones.
llvm-svn: 366176
The bool result of shouldInsertExtraNopBytesForCodeAlign() is not checked but
the returned nop count is unconditionally read even though it could be
uninitialized.
Differential Revision: https://reviews.llvm.org/D63285
Patch by Edward Jones.
llvm-svn: 366175
Since PseudoCALL defines AsmString, it can be generated from assembly,
and so code-gen patterns should be defined separately to be consistent
with the style of the RISCV backend. Other pseudo-instructions exist
that have code-gen patterns defined directly, but these instructions are
purely for code-gen and cannot be written in assembly.
Differential Revision: https://reviews.llvm.org/D64012
Patch by James Clarke.
llvm-svn: 366174
Previously, this function didn't check the IsPCRel argument. But doing so is a
useful check for errors, and also seemingly necessary for FK_Data_4 (which we
produce a R_RISCV_32_PCREL relocation for if IsPCRel).
Other than R_RISCV_32_PCREL, this should be NFC. Future exception handling
related patches will include tests that capture this behaviour.
llvm-svn: 366172
Use the MemoryVT field. This will be necessary for tablegen to
automatically handle patterns for GlobalISel.
Doesn't handle the d16 lo/hi patterns. Those are a special case since
it involvess the custom node type.
llvm-svn: 366168
In LLDB, when parsing type units, we don't need to parse the whole line
table. Instead, we only need to parse the "support files" from the line
table prologue.
To make that possible, this patch moves the respective functions from
the LineTable into the Prologue. Because I don't think users of the
LineTable should have to know that these files come from the Prologue,
I've left the original methods in place, and made them redirect to the
LineTable.
Differential revision: https://reviews.llvm.org/D64774
llvm-svn: 366164
Summary:
- As the pointer stripping could trace through `addrspacecast` now, need
to sext/trunc the offset to ensure it has the same width as the
pointer after stripping.
Reviewers: jdoerfert
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D64768
llvm-svn: 366162
In LLDB, when parsing type units, we don't need to parse the whole line
table. Instead, we only need to parse the "support files" from the line
table prologue.
To make that possible, this patch moves the respective functions from
the LineTable into the Prologue. Because I don't think users of the
LineTable should have to know that these files come from the Prologue,
I've left the original methods in place, and made them redirect to the
LineTable.
Differential revision: https://reviews.llvm.org/D64774
llvm-svn: 366158
As there are some reported miscompiles with AVX512 and performance regressions
in Eigen. Verified with the original committer and testcases will be forthcoming.
This reverts commit r364964.
llvm-svn: 366154
We mostly avoid sub with immediate but there are a couple cases that can create them. One is the add 128, %rax -> sub -128, %rax trick in isel. The other is when a SUB immediate gets created for a compare where both the flags and the subtract value is used. If we are unable to linearize the SelectionDAG to satisfy the flag user and the sub result user from the same instruction, we will clone the sub immediate for the two uses. The one that produces flags will eventually become a compare. The other will have its flag output dead, and could then be considered for LEA creation.
I added additional test cases to add.ll to show the the sub -128 trick gets converted to LEA and a case where we don't need to convert it.
This showed up in the current codegen for PR42571.
Differential Revision: https://reviews.llvm.org/D64574
llvm-svn: 366151
Summary:
This adds missing utility methods and copy instruction handling for
`exnref` type and also adds tests.
`tee` instruction tests are missing because `isTee` is currently only
used in ExplicitLocals pass and testing that pass in mir requires
serialization of stackified registers in mir files, which is a bit
nontrivial because `MachineFunctionInfo` only has info of vreg numbers
(which are large integers) but not the mir's register numbers. But this
change is quite trivial anyway.
Reviewers: tlively
Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D64705
llvm-svn: 366149
Summary:
We agreed to rename `except_ref` to `exnref` for consistency with other
reference types in
https://github.com/WebAssembly/exception-handling/issues/79. This also
renames WebAssemblyInstrExceptRef.td to WebAssemblyInstrRef.td in order
to use the file for other reference types in future.
Reviewers: dschuff
Subscribers: sbc100, jgravelle-google, hiraditya, sunfish, jfb, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D64703
llvm-svn: 366145
Summary:
These are emitted as identifiers by the InstPrinter, so we should
parse them as such. These could potentially clash with symbols of
the same name, but that is out of our (the WebAssembly backend) control.
Reviewers: dschuff
Subscribers: sbc100, jgravelle-google, aheejin, sunfish, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D64770
llvm-svn: 366139
Summary:
Enable hoisting and merging m0 defs that are initialized with the same
immediate value. Fixes bug where removed instructions are not considered
to interfere with other inits, and make sure to not hoist inits before block
prologues.
Reviewers: rampitec, arsenm
Reviewed By: rampitec
Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D64766
llvm-svn: 366135
We already do this for the flat and DS instructions, although it is
certainly uglier and more verbose.
This will allow using separate pattern definitions for extload and
zextload. Currently we get away with using a single PatFrag with
custom predicate code to check if the extension type is a zextload or
anyextload. The generic mechanism the global isel emitter understands
treats these as mutually exclusive. I was considering making the
pattern emitter accept zextload or sextload extensions for anyextload
patterns, but in global isel, the different extending loads have
distinct opcodes, and there is currently no mechanism for an opcode
matcher to try multiple (and there probably is very little need for
one beyond this case).
llvm-svn: 366132
Summary:
There is currently a correctness issue when unrolling loops containing
callbr's where their indirect targets are being updated correctly to the
newly created labels, but their operands are not. This manifests in
unrolled loops where the second and subsequent copies of callbr
instructions have blockaddresses of the label from the first instance of
the unrolled loop, which would result in nonsensical runtime control
flow.
For now, conservatively do not unroll the loop. In the future, I think
we can pursue unrolling such loops provided we transform the cloned
callbr's operands correctly.
Such a transform and its legalities are being discussed in:
https://reviews.llvm.org/D64101
Link: https://bugs.llvm.org/show_bug.cgi?id=42489
Link: https://groups.google.com/forum/#!topic/clang-built-linux/z-hRWP9KqPI
Reviewers: fhahn, hfinkel, efriedma
Reviewed By: fhahn, hfinkel, efriedma
Subscribers: efriedma, hiraditya, zzheng, dmgreen, llvm-commits, pirama, kees, nathanchance, E5ten, craig.topper, chandlerc, glider, void, srhines
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D64368
llvm-svn: 366130
If a 1-bit value is in a 32-bit VGPR, the scalar opcodes set SCC to
whether the result is 0. If the inputs are SCC, these can be copied to
a 32-bit SGPR to produce an SCC result.
llvm-svn: 366125
Add "memtag" sanitizer that detects and mitigates stack memory issues
using armv8.5 Memory Tagging Extension.
It is similar in principle to HWASan, which is a software implementation
of the same idea, but there are enough differencies to warrant a new
sanitizer type IMHO. It is also expected to have very different
performance properties.
The new sanitizer does not have a runtime library (it may grow one
later, along with a "debugging" mode). Similar to SafeStack and
StackProtector, the instrumentation pass (in a follow up change) will be
inserted in all cases, but will only affect functions marked with the
new sanitize_memtag attribute.
Reviewers: pcc, hctim, vitalybuka, ostannard
Subscribers: srhines, mehdi_amini, javed.absar, kristof.beyls, hiraditya, cryptoad, steven_wu, dexonsmith, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D64169
llvm-svn: 366123