Commit Graph

71533 Commits

Author SHA1 Message Date
Matt Arsenault 4b4496312e AMDGPU: Start adding MODE register uses to instructions
This is the groundwork required to implement strictfp. For now, this
should be NFC for regular instructoins (many instructions just gain an
extra use of a reserved register). Regalloc won't rematerialize
instructions with reads of physical registers, but we were suffering
from that anyway with the exec reads.

Should add it for all the related FP uses (possibly with some
extras). I did not add it to either the gpr index mode instructions
(or every single VALU instruction) since it's a ridiculous feature
already modeled as an arbitrary side effect.

Also work towards marking instructions with FP exceptions. This
doesn't actually set the bit yet since this would start to change
codegen. It seems nofpexcept is currently not implied from the regular
IR FP operations. Add it to some MIR tests where I think it might
matter.
2020-05-27 14:47:00 -04:00
John Fastabend 13f6c81c5d [BPF] simplify zero extension with MOV_32_64
The current pattern matching for zext results in the following code snippet
being produced,

  w1 = w0
  r1 <<= 32
  r1 >>= 32

Because BPF implementations require zero extension on 32bit loads this
both adds a few extra unneeded instructions but also makes it a bit
harder for the verifier to track the r1 register bounds. For example in
this verifier trace we see at the end of the snippet R2 offset is unknown.
However, if we track this correctly we see w1 should have the same bounds
as r8. R8 smax is less than U32 max value so a zero extend load should keep
the same value. Adding a max value of 800 (R8=inv(id=0,smax_value=800)) to
an off=0, as seen in R7 should create a max offset of 800. However at the
end of the snippet we note the R2 max offset is 0xffffFFFF.

  R0=inv(id=0,smax_value=800)
  R1_w=inv(id=0,umax_value=2147483647,var_off=(0x0; 0x7fffffff))
  R6=ctx(id=0,off=0,imm=0) R7=map_value(id=0,off=0,ks=4,vs=1600,imm=0)
  R8_w=inv(id=0,smax_value=800,umax_value=4294967295,var_off=(0x0; 0xffffffff))
  R9=inv800 R10=fp0 fp-8=mmmm????
 58: (1c) w9 -= w8
 59: (bc) w1 = w8
 60: (67) r1 <<= 32
 61: (77) r1 >>= 32
 62: (bf) r2 = r7
 63: (0f) r2 += r1
 64: (bf) r1 = r6
 65: (bc) w3 = w9
 66: (b7) r4 = 0
 67: (85) call bpf_get_stack#67
  R0=inv(id=0,smax_value=800)
  R1_w=ctx(id=0,off=0,imm=0)
  R2_w=map_value(id=0,off=0,ks=4,vs=1600,umax_value=4294967295,var_off=(0x0; 0xffffffff))
  R3_w=inv(id=0,umax_value=800,var_off=(0x0; 0x3ff))
  R4_w=inv0 R6=ctx(id=0,off=0,imm=0)
  R7=map_value(id=0,off=0,ks=4,vs=1600,imm=0)
  R8_w=inv(id=0,smax_value=800,umax_value=4294967295,var_off=(0x0; 0xffffffff))
  R9_w=inv(id=0,umax_value=800,var_off=(0x0; 0x3ff))
  R10=fp0 fp-8=mmmm????

After this patch R1 bounds are not smashed by the <<=32 >>=32 shift and we
get correct bounds on R2 umax_value=800.

Further it reduces 3 insns to 1.

Signed-off-by: John Fastabend <john.fastabend@gmail.com>

Differential Revision: https://reviews.llvm.org/D73985
2020-05-27 11:26:39 -07:00
Lei Huang 2368bf52cd [PowerPC] Add support for -mcpu=pwr10 in both clang and llvm
Summary:
This patch simply adds support for the new CPU in anticipation of
Power10. There isn't really any functionality added so there are no
associated test cases at this time.

Reviewers: stefanp, nemanjai, amyk, hfinkel, power-llvm-team, #powerpc

Reviewed By: stefanp, nemanjai, amyk, #powerpc

Subscribers: NeHuang, steven.zhang, hiraditya, llvm-commits, wuzish, shchenz, cfe-commits, kbarton, echristo

Tags: #clang, #powerpc, #llvm

Differential Revision: https://reviews.llvm.org/D80020
2020-05-27 13:14:25 -05:00
Matt Arsenault 07cd19efa2 AMDGPU: Fix dropping MI flags when rewriting instructions
All 3 passes that change instruction encodings were dropping MI
flags. This avoids scheduling regressions caused by setting
mayRaiseFPExceptions on FP instructions for non-strictfp functions.
2020-05-27 13:27:06 -04:00
Fangrui Song 5b4cd2d4c4 [X86] Assemble movzb 1280(%rbx, %r12), %r12 after D80608
ffmpeg/libavcodec/x86/h264_cabac.c inline assembly may produce
movzb 1280(%rbx, %r12), %r12

After D80608, llvm-mc errors:

error: unknown use of instruction mnemonic without a size suffix
2020-05-27 09:55:55 -07:00
Philip Reames 1af3705c7f Start migrating away from statepoint's inline length prefixed argument bundles
In the current statepoint design, we have four distinct groups of operands to the call: call args, gc transition args, deopt args, and gc args. This format prexisted the support in IR for operand bundles and was in fact one of the inspirations for the extension. However, we never went back and rearchitected statepoints to fully leverage bundles.

This change is the first in a small sequence to do so. All this does is extend the SelectionDAG lowering code to allow deopt and gc transition operands to be specified in either inline argument bundles or operand bundles.

Differential Revision: https://reviews.llvm.org/D8059
2020-05-27 09:16:10 -07:00
Alex Richardson 3be5e53f20 [FileCheck] Allow parenthesized expressions
With this change it is be possible to write FileCheck expressions such
as [[#(VAR+1)-2]]. Currently, the only supported arithmetic operators are
plus and minus, so this is not particularly useful yet. However, it our
CHERI fork we have tests that benefit from having multiplication in
FileCheck expressions. Allowing parenthesized expressions is the simplest
way for us to work around the current lack of operator precedence in
FileCheck expressions.

Reviewed By: thopre, jhenderson
Differential Revision: https://reviews.llvm.org/D77383
2020-05-27 16:31:39 +01:00
Lei Huang 559845f8fe Revert "[PowerPC] Add support for -mcpu=pwr10 in both clang and llvm"
This reverts commit 7eb666b155.
2020-05-27 09:40:21 -05:00
Ties Stuij 78bd0c0e5e [AArch64][BFloat] add BFloat instruction support for AArch64
Summary:
Add support for lowering various BFloat related SelDAG nodes:
- load/store (ldrh/strh)
- concat
- dup/duplane
- bitconvert/bitcast
- insert_subvector/insert_subreg

This patch is part of a series implementing the Bfloat16 extension of the
Armv8.6-a architecture, as detailed here:

https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/arm-architecture-developments-armv8-6-a

The bfloat type, and its properties are specified in the Arm Architecture
Reference Manual:

https://developer.arm.com/docs/ddi0487/latest/arm-architecture-reference-manual-armv8-for-armv8-a-architecture-profile

Reviewers: ab, t.p.northover, john.brawn, fpetrogalli, sdesmalen, LukeGeeson

Reviewed By: fpetrogalli

Subscribers: LukeGeeson, pbarrio, kristof.beyls, hiraditya, danielkiss, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D79712
2020-05-27 15:36:54 +01:00
Georgii Rymar 4ab03e62fd [llvm-readobj] - Do not crash when an invalid .eh_frame_hdr is dumped using --unwind.
When the p_offset/p_filesz of the PT_GNU_EH_FRAME is invalid
(e.g larger than the file size) then llvm-readobj might crash.

This patch fixes the issue. I've introduced `ELFFile<ELFT>::getSegmentContent`
method, which is very similar to `ELFFile<ELFT>::getSectionContentsAsArray` one.

Differential revision: https://reviews.llvm.org/D80380
2020-05-27 16:41:09 +03:00
David Green 70d4a20299 [UnJ] Update LI for inner nested loops
This makes sure to correctly register the loop info of the children
of unroll and jammed loops. It re-uses some code from the unroller for
registering subloops.

Differential Revision: https://reviews.llvm.org/D80619
2020-05-27 14:36:38 +01:00
Matt Arsenault 833996cef1 AMDGPU: Fix backwards s_cselect_* operands
The vector equivalent has backwards operands, but the scalar version
does not. The passes that use these hooks aren't enabled by default,
so this doesn't really change anything.
2020-05-27 09:26:09 -04:00
Guillaume Chatelet 5b84ee4f61 [Alignment] Fix misaligned interleaved loads
Summary: Tentatively fixing https://bugs.llvm.org/show_bug.cgi?id=45957

Reviewers: craig.topper, nlopes

Subscribers: hiraditya, llvm-commits, RKSimon, jdoerfert, efriedma

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D80276
2020-05-27 12:12:22 +00:00
Victor Campos c7593b0f0d [ARM] Fix rewrite of frame index in Thumb2's address mode i8s4
Summary:
In Thumb2's frame index rewriting process, the address mode i8s4, which
is used by LDRD and STRD instructions, is handled by taking the
immediate offset operand and multiplying it by 4.

This behaviour is wrong, however. In this specific address mode, the
MachineInstr's immediate operand is already in the expected form. By
consequence of that, multiplying it once more by 4 yields a flawed
offset value, four times greater than it should be.

Differential Revision: https://reviews.llvm.org/D80557
2020-05-27 13:09:13 +01:00
Guillaume Chatelet 6e1eff7858 [NFC] Updating tests
Summary:
Updating IR now that alignment is explicitly set.
This is a prerequisite to D80276.

Reviewers: efriedma

Subscribers: llvm-commits, craig.topper

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D80549
2020-05-27 12:02:46 +00:00
Daniil Suchkov 706b22e3e4 [SimpleLoopUnswitch] Drop uses of instructions before block deletion
Currently if instructions defined in a block are used in unreachable
blocks and SimpleLoopUnswitch attempts deleting the block, it triggers
assertion "Uses remain when a value is destroyed!".
This patch fixes it by replacing all uses of instructions from BB with
undefs before BB deletion.

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D80551
2020-05-27 18:25:18 +07:00
Georgii Rymar fc98447af6 [llvm-readobj] - Do not skip building of the GNU hash table histogram.
When the `--elf-hash-histogram` is used, the code first tries to build
a histogram for the .hash table and then for the .gnu.hash table.

The problem is that dumper might return early when unable or do not need to
build a histogram for the .hash.

This patch reorders the code slightly to fix the issue and adds a test case.

Differential revision: https://reviews.llvm.org/D80204
2020-05-27 13:46:41 +03:00
Simon Pilgrim 410667f1b7 [X86][SSE] Convert PTEST to MOVMSK for allsign bits vector results
If we are using PTEST to check 'allsign bits' vector elements we can use MOVMSK to extract the signbits directly and perform the comparison on the scalar value.

For vXi16 cases, as we don't have a MOVMSK for this type, we must mask each signbit out of a PMOVMSKB v2Xi8 result, which folds into the TEST comparison.

If this allows us to remove a vector op (via the SimplifyMultipleUseDemandedBits call) this is consistently faster than a PTEST (https://godbolt.org/z/ziJUst).

I'm investigating whether we ever get regressions without the SimplifyMultipleUseDemandedBits call, even if this means we don't remove a vector op, but that has exposed some other poor codegen issues that I'm still investigating and would have to wait for a later patch.

Suggested on PR42035 to avoid unnecessary ashr(x,bw-1)/pcmpgt(0,x) sign splat patterns feeding into ptest.

Differential Revision: https://reviews.llvm.org/D80563
2020-05-27 11:06:16 +01:00
Konstantin Schwarz f2fad3f703 [GlobalISel][InlineAsm] Add missing EarlyClobber flag to inline asm output operands
Summary:
Previously, we only added early-clobber flags to the 'group' immediate flag operand
of an inline asm operand.
However, we also have to add the EarlyClobber flag to the MachineOperand itself.

This fixes PR46028

Reviewers: arsenm, leonardchan

Reviewed By: arsenm, leonardchan

Subscribers: phosek, wdng, rovka, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D80467
2020-05-27 12:04:18 +02:00
Vitaly Buka f6383643d9 [StackSafety] Bailout on some function calls
Don't miss values used in calls outside regular argument list.
2020-05-27 02:48:42 -07:00
Vitaly Buka 06a07dd608 [StackSafety] Fix formatting in the test 2020-05-27 02:48:41 -07:00
Vitaly Buka b101c6251a [StackSafety] Ignore some use of values
We should ignore value used in MemTransferInst
as other then src/dst argument.
2020-05-27 02:48:41 -07:00
Kazushi (Jam) Marukawa dedaf3a2ac [VE] Dynamic stack allocation
Summary:
This patch implements dynamic stack allocation for the VE target. Changes:
* compiler-rt: `__ve_grow_stack` to request stack allocation on the VE.
* VE: base pointer support, dynamic stack allocation.

Differential Revision: https://reviews.llvm.org/D79084
2020-05-27 10:11:06 +02:00
Daniil Suchkov fc44da746f Add test exposing a bug in SimpleLoopUnswitch. 2020-05-27 15:02:28 +07:00
Wang, Pengfei 6565b58584 [X86][llvm-mc] Make the suffix matcher more accurate.
Summary:
Some instruction like VPMULDQ is NOT the variant of VPMULD but a new
one.
So we should make sure the suffix matcher only works for memory variant
that has the same size with the suffix.
Currently we only check for SSE/AVX* instructions, because many legacy
instructions didn't declare the alias instructions of their variants.

Differential Revision: https://reviews.llvm.org/D80608
2020-05-27 14:45:17 +08:00
Vitaly Buka 32a1f60d11 [StackSafety] Use SCEV to find mem operation length 2020-05-26 23:22:37 -07:00
Vitaly Buka d0f1f5adfa [StackSafety] Use getSignedRange for offsets 2020-05-26 23:22:36 -07:00
Kang Zhang 23a2f45214 [NFC][PowerPC] Modify the test case two-address-crash.mir 2020-05-27 02:35:45 +00:00
Matt Arsenault ef3e831226 GlobalISel: Basic legalization for G_PTRMASK 2020-05-26 21:20:30 -04:00
Vitaly Buka b5ae70046b [StackSafety] Simplify SCEVRewriteVisitor
Probably NFC.
2020-05-26 18:09:43 -07:00
Jessica Paquette ae597a771e [AArch64][GlobalISel] Do not modify predicate when optimizing G_ICMP
This fixes a bug in `tryOptArithImmedIntegerCompare`.

It is unsafe to update the predicate on a MachineOperand when optimizing a
G_ICMP, because it may be used in more than one place.

For example, when we are optimizing G_SELECT, we allow compares which are used
in more than one G_SELECT. If we modify the G_ICMP, then we'll break one of
the G_SELECTs.

Since the compare is being produced to either

1) Select a G_ICMP
2) Fold a G_ICMP into an instruction when profitable

there's no reason to actually modify it. The change is local to the specific
compare.

Instead, pass a `CmpInst::Predicate` to `tryOptArithImmedIntegerCompare` which
can be modified by reference.

Differential Revision: https://reviews.llvm.org/D80585
2020-05-26 17:51:08 -07:00
Philip Reames bed6624ac4 Split a test file so that most of it can be autogened 2020-05-26 17:33:32 -07:00
Philip Reames b90eb0f23b Autogen a couple of test files to make a future diff easier to read 2020-05-26 17:33:32 -07:00
Alexander Shaposhnikov 842a8cc10c [llvm-objcopy][MachO] Add support for removing Swift symbols
cctools strip has the option "-T" which removes Swift symbols.
This diff implements this option in llvm-strip for MachO.

Test plan: make check-all

Differential revision: https://reviews.llvm.org/D80099
2020-05-26 16:49:56 -07:00
Arthur Eubanks 9a0b0855a9 Modify verifier checks to support musttail + preallocated
Summary:
preallocated and musttail can work together, but we don't want to call
@llvm.call.preallocated.setup() to modify the stack in musttail calls.
So we shouldn't have the "preallocated" operand bundle when a
preallocated call is musttail.

Also disallow use of preallocated on calls without preallocated.

Codegen not yet implemented.

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D80581
2020-05-26 15:20:20 -07:00
Chris Jackson 9eacda51fa [debuginfo] Fix broken tests from MachineLICM salvaging fix
Previous commit: bd7ff5d94f

- Added missing x86 triples
- Added missing asserts
2020-05-26 22:46:07 +01:00
Vitaly Buka 6a74ad6baa [sancov] Accommodate sancov and coverage report server for use under Windows
Summary:
This patch makes the following changes to SanCov and its complementary Python script in order to resolve issues pertaining to non-UNIX file paths in JSON symbolization information:
* Convert all paths to use forward slash.
* Update `coverage-report-server.py` to correctly handle paths to sources which contain spaces.
* Remove Linux platform restriction for all SanCov unit tests. All SanCov tests passed when ran on my local Windows machine.

Patch by Douglas Gliner.

Reviewers: kcc, filcab, phosek, morehouse, vitalybuka, metzman

Reviewed By: vitalybuka

Subscribers: vsk, Dor1s, llvm-commits

Tags: #sanitizers, #llvm

Differential Revision: https://reviews.llvm.org/D51018
2020-05-26 14:36:44 -07:00
Vedant Kumar 6e39379bbb [DwarfExpression] Support entry values for indirect parameters
Summary:
A struct argument can be passed-by-value to a callee via a pointer to a
temporary stack copy. Add support for emitting an entry value DBG_VALUE
when an indirect parameter DBG_VALUE becomes unavailable. This is done
by omitting DW_OP_stack_value from the entry value expression, to make
the expression describe the location of an object.

rdar://63373691

Reviewers: djtodoro, aprantl, dstenb

Subscribers: hiraditya, lldb-commits, llvm-commits

Tags: #lldb, #llvm

Differential Revision: https://reviews.llvm.org/D80345
2020-05-26 14:22:28 -07:00
Stanislav Mekhanoshin 512e806a33 [AMDGPU] Bail alloca vectorization if GEP not found
Differential Revision: https://reviews.llvm.org/D80587
2020-05-26 13:59:49 -07:00
Davide Italiano 01fee8aa24 [MLICM] Remove unneeded option so the test doesn't fail. 2020-05-26 13:53:56 -07:00
Matt Arsenault bb10fa3a53 AMDGPU: Fix wrong null value for private address space
I'm guessing this was a holdover from when 0 was an invalid stack
pointer, but surprised nobody has discovered this before.

Also don't allow offset folding for -1 pointers, since it looks weird
to partially fold this.
2020-05-26 16:35:13 -04:00
Chris Jackson bd7ff5d94f [DebugInfo] Correct debuginfo for post-ra hoist and sink in Machine LICM
Reviewers: vsk, aprantl

Differential Revision: https://reviews.llvm.org/D79868
2020-05-26 21:07:10 +01:00
Stanislav Mekhanoshin 42725aeed8 Process gep (select ptr1, ptr2) in SROA
Differential Revision: https://reviews.llvm.org/D79217
2020-05-26 12:56:02 -07:00
Sanjay Patel 1a2bffaf8b [InstCombine] reassociate sub+add to increase adds and throughput
The -reassociate pass tends to transform this kind of pattern into
something that is worse for vectorization and codegen. See PR43953:
https://bugs.llvm.org/show_bug.cgi?id=43953

Follows-up the FP version of the same transform:
rGa0ce2338a083
2020-05-26 14:49:17 -04:00
Sanjay Patel f5cfcc4b06 [LoopVectorize] regenerate full test checks; NFC 2020-05-26 14:49:17 -04:00
Sanjay Patel 0788392637 [InstCombine] add tests for reassociative sub/add expressions; NFC 2020-05-26 14:49:16 -04:00
Lei Huang 7eb666b155 [PowerPC] Add support for -mcpu=pwr10 in both clang and llvm
Summary:
This patch simply adds support for the new CPU in anticipation of
Power10. There isn't really any functionality added so there are no
associated test cases at this time.

Reviewers: stefanp, nemanjai, amyk, hfinkel, power-llvm-team, #powerpc

Reviewed By: stefanp, nemanjai, amyk, #powerpc

Subscribers: NeHuang, steven.zhang, hiraditya, llvm-commits, wuzish, shchenz, cfe-commits, kbarton, echristo

Tags: #clang, #powerpc, #llvm

Differential Revision: https://reviews.llvm.org/D80020
2020-05-26 13:48:22 -05:00
Nemanja Ivanovic 6e9223a2c6 [PowerPC][NFC] Update test to prevent DCE from causing failures
The test case provided in PR45709 can be simplified by DCE to an
empty function. To prevent this from happening if DCE is run prior
to ISEL in the back end, just add optnone to the function. The
behaviour it is testing for is in the SDAG legalization and is
not sensitive to optnone so the test case still achieves its desired
objective.
2020-05-26 13:37:48 -05:00
Hiroshi Yamauchi 106ec64fbc [PGO] Add memcmp/bcmp size value profiling.
Summary: This adds support for memcmp/bcmp to the existing memcpy/memset value profiling.

Reviewers: davidxl

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D79751
2020-05-26 10:28:04 -07:00
Sanjay Patel a0ce2338a0 [InstCombine] reassociate fsub+fadd with FMF to increase adds and throughput
The -reassociate pass tends to transform this kind of pattern into
something that is worse for vectorization and codegen. See PR43953:
https://bugs.llvm.org/show_bug.cgi?id=43953
2020-05-26 13:17:15 -04:00