Commit Graph

145752 Commits

Author SHA1 Message Date
Joseph Tremoulet b785e03612 Support: mapped_file_region: Pass MAP_NORESERVE to mmap
This allows mapping larger files, delaying OOM failures until too many
pages of them are accessed.  This is makes the behavior of the
mapped_file_region in this regard consistent between its "Unix" and
"Windows" implementations.

Guard the code witih #if defined(MAP_NORESERVE), consistent with other
uses of MAP_NORESERVE in llvm-project, because some FreeBSD versions do
not provide this flag.

Reviewed By: clayborg

Differential Revision: https://reviews.llvm.org/D96626
2021-04-08 09:07:25 -04:00
Jay Foad 3344cd3a14 [AMDGPU] SIFoldOperands: make tryFoldCndMask a member function. NFC. 2021-04-08 14:05:29 +01:00
Sebastian Neubauer c10cc4ea27 [AMDGPU] Fix computing live registers in prolog
ScratchExecCopy needs to be marked as live, we cannot use that register
while EXEC is stored in there.

Marking SGPRForFPSaveRestoreCopy and SGPRForBPSaveRestoreCopy as
available is unnecessary, they should not be live at that point anway.

Differential Revision: https://reviews.llvm.org/D100098
2021-04-08 14:52:50 +02:00
Paul C. Anagnostopoulos 14580ce2fd [TableGen] Make behavior of list slice suffix consistent across all values
Differential Revision: https://reviews.llvm.org/D99883
2021-04-08 08:38:44 -04:00
Paul C. Anagnostopoulos 3b9a15d910 [TableGen] Add support for the 'assert' statement in multiclasses 2021-04-08 08:36:03 -04:00
David Sherwood 1206313f82 [CodeGen][AArch64] Fix isel crash for truncating FP stores
When attempting to truncate a FP vector and store the result out
to memory we crashed because we had no pattern for truncating FP
stores. In fact, we don't support these types of stores and the
correct fix is to stop marking these truncating stores as legal.

Tests have been added here:

  CodeGen/AArch64/sve-fptrunc-store.ll

Differential Revision: https://reviews.llvm.org/D100025
2021-04-08 13:21:29 +01:00
Jay Foad 94a6fe43de [AMDGPU] SIFoldOperands: refactor tryFoldCndMask with early-outs. NFC. 2021-04-08 13:16:07 +01:00
Stephen Tozer 140757bfaa [DebugInfo] Prevent invalid debug info being produced during LoopStrengthReduce
During LoopStrengthReduce, some of the SSA values that are used by debug values
may be lost and/or salvaged. After LSR we attempt to recover any undef debug
values, including any that were salvaged but then lost their values afterwards,
by replacing the lost values with any live equal values (plus a possible
constant offset) that have been gathered prior to running LSR. When we do this
we restore the debug value's original DIExpression, to undo any salvaging (as we
have gone back to using the original debug value).

This process can currently produce invalid debug info if the number of operands
has changed by salvaging during LSR. Replacing old values during the
applyEqualValues step does not change the number of location operands, which
means that when we restore the old DIExpression we may have a mismatch between
the number of operands used by the debug value and the number of operands
referenced by the DIExpression. This patch fixes this by restoring the full
original location metadata at the start of the applyEqualValues step, so that
there is no mismatch in operand count between the debug value and its
DIExpression.

Differential Revision: https://reviews.llvm.org/D98644
2021-04-08 13:04:48 +01:00
Mikael Holmen 2a1f87167c [NVPTX] Fix compiler warning in NDEBUG build [NFC]
Without the fix we get

../lib/Target/NVPTX/NVPTXLowerArgs.cpp:236:24: error: lambda capture 'Arg' is not used [-Werror,-Wunused-lambda-capture]
  auto IsALoadChain = [Arg](Value *Start) {
                       ^~~
1 error generated.
2021-04-08 13:21:21 +02:00
Serguei Katkov a0e8738d45 [GreedyRA ORE] Add function level spill/reloads stats
Reviewers: reames, MatzeB, anemet, thegameg
Reviewed By: reames, thegameg
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D100014
2021-04-08 16:55:52 +07:00
David Green 8675ef100f [LV] Logical and/or select costs
D99674 stopped the folding of certain select operations into and/or, due
to incorrect folding in the presence of poison. D97360 added some costs
to attempt to account for the change, but only worked at the getUserCost
level, not the getCmpSelInstrCost that the vectorizer will use directly.
This adds similar logic into the vectorizer to handle these logical
and/or selects, treating them like and/or directly.

This fixes 60% performance regressions from code like the attached test
case.

Differential Revision: https://reviews.llvm.org/D99884
2021-04-08 10:39:47 +01:00
Fraser Cormack a5693445ca [RISCV] Support OR/XOR/AND reductions on vector masks
This patch adds RVV codegen support for OR/XOR/AND reductions for both
scalable- and fixed-length vector types. There are a few possible
codegen strategies for each -- vmfirst.m, vmsbf.m, and vmsif.m could be
used to some extent -- but the vpopc.m instruction was chosen since it
produces the scalar result in one instruction, after which scalar
instructions can finish off the computation.

The reductions are lowered identically for both scalable- and
fixed-length vectors, although some alternate strategies may be more
optimal on fixed-length vectors since it's cheaper to get the length of
those types.

Other reduction types were not deemed to be relevant for mask vectors.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D100030
2021-04-08 09:46:38 +01:00
LemonBoy edb18ea5a9 [AsmParser] Recognize more escaped characters between single quotes
The GNU AS manual states the following about single-character constants enclosed within single quotes:

>  Some backslash escapes apply to characters, \b, \f, \n, \r, \t, and \" with the same meaning as for strings, plus \' for a single quote.

Add two more characters to the switch handling this case to match GAS behaviour, plus a test to make sure nothing regresses.

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D99609
2021-04-08 09:59:37 +02:00
Serguei Katkov 6b64c662c7 [GreedyRA ORE] Extract computeNumberOfSplillsReloads to use in different places. NFC.
Extract one basic block handling to introduce stat computation for method scope.

Reviewers: reames, MatzeB, anemet, thegameg
Reviewed By: reames, thegameg
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D100013
2021-04-08 14:40:45 +07:00
Serguei Katkov df25787797 [GreedyRA ORE] Extract stats in RAGreedyStats struct. NFC.
Combine all collected stats into separate struct RAGreedyStats
with add and report methods.

The motivation is to extend the number of statistics to capture and instead of
adding new parameters, just combine all of them into one structure.
Additionally I plan to use report from different places in future to report data
for function as well.

Reviewers: reames, MatzeB, anemet, thegameg
Reviewed By: thegameg
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D100012
2021-04-08 14:27:37 +07:00
Serguei Katkov 0a1c6637a1 [GreedyRA ORE] Compute ORE stats if extra analysis is enabled
To save compile time, avoid computation of stats if ORE will not emit it.
The motivation is to add more stats and compute it only if it will dumped.

Reviewers: reames, MatzeB, anemet, thegameg
Reviewed By: reames
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D100010
2021-04-08 14:24:18 +07:00
Esme-Yi 0c36da722a [Debug-Info] Use inlined strings in .dwinfo section by default for DBX.
Summary: Set the default DwarfInlinedStrings as inlined strings for DBX, due to DBX does not support .dwstr section for now.

Reviewed By: dblaikie

Differential Revision: https://reviews.llvm.org/D99933
2021-04-08 07:20:22 +00:00
Hsiangkai Wang ba72bdef32 [RISCV] Add scalable offset under very large stack size.
If the stack size is larger than 12 bits, we have to use a scratch
register to store the stack size. Before we introduce the scalable stack
offset, we could simplify

%0 = ADDI %stack.0, 0

=>

%scratch = ... # sequence of instructions to move the offset into
%%scratch
%0 = ADD %fp, %scratch

However, if the offset contains scalable part, we need to consider it.

%0 = ADDI %stack.0, 0

=>

%scratch = ... # sequence of instructions to move the offset into
%%scratch
%scratch = ADD %fp, %scratch
%scalable_offset = ... # sequence of instructions for vscaled-offset.
%0 = ADD/SUB %scratch, %scalable_offset

Differential Revision: https://reviews.llvm.org/D100035
2021-04-08 14:46:05 +08:00
Juneyoung Lee 648544f998 [Constant] ConstantStruct/Array should not lower poison to undef
This is a (late) follow-up patch of 8871a4b4ca and
c95f39891a to make ConstantStruct::get/ConstantArray::getImpl
correctly return PoisonValue if all elements are poison.
This was found while discussing about the elements of a vector-typed UndefValue (D99853)
2021-04-08 15:23:12 +09:00
Hongtao Yu 2a2720a2de [CSSPGO] Move pseudo probes to the beginning of a block to unblock SelectionDAG combine.
Pseudo probes, when scattered in a block, can be chained dependencies of other regular DAG nodes and block DAG combine optimizations. To fix this, scattered probes in a block are grouped and placed at the beginning of the block. This shouldn't affect the profile quality.

Test Plan:

Reviewed By: wenlei, wmi

Differential Revision: https://reviews.llvm.org/D100002
2021-04-07 22:45:35 -07:00
Serge Pavlov 65b1103798 [RISCV] DAG nodes and pseudo instructions for CSR access
New custom DAG nodes were added to represent operations on CSR. These
nodes are lowered to corresponding pseudo instruction. Using the pseudo
instructions allows to specify different scheduling information for
operations on different system registers. It also make possible to
specify dependencies of instructions on specific system registers.

Differential Revision: https://reviews.llvm.org/D98936
2021-04-08 10:36:36 +07:00
hsmahesha ac64995ceb [AMDGPU] Only use ds_read/write_b128 for alignment >= 16
PS: Submitting on behalf of Jay.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D100008
2021-04-08 08:12:05 +05:30
Chen Zheng 74e77295e7 [PowerPC] fixup killed flags for ri + addi to ri transformation
Fixup killed flags if DefMI and MI are not in the same basic blocks.

Reviewed By: nemanjai

Differential Revision: https://reviews.llvm.org/D100023
2021-04-07 22:04:08 -04:00
Congzhe Cao 593cb46550 Revert "[LoopInterchange] Fix transformation bugs in loop interchange"
This reverts commit 6ec68bd815d00c1eec2a6b9766452554f0e6cb61.
2021-04-07 21:17:30 -04:00
CongzheUalberta f5645ea65f [LoopInterchange] Fix transformation bugs in loop interchange
After loop interchange, the (old) outer loop header should not jump to
`LoopExit`. Note that the old outer loop becomes the new inner loop
after interchange. If we branched to `LoopExit` then after interchange
we would jump directly from the (new) inner loop header to `LoopExit`
without executing the rest of (new) outer loop.

This patch modifies adjustLoopBranches() such that the old outer
loop header (which becomes the new inner loop header) jumps to the
old inner loop latch which becomes the new outer loop latch after
interchange.

Reviewed By: bmahjour

Differential Revision: https://reviews.llvm.org/D98475
2021-04-07 20:55:44 -04:00
Stanislav Mekhanoshin 37878de503 Disable use of SCC bit from asm
Differential Revision: https://reviews.llvm.org/D100069
2021-04-07 15:32:17 -07:00
Tony Tye 4658cd4c18 [AMDGPU] Update gfx90a memory model support
Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D100070
2021-04-07 22:17:58 +00:00
Stanislav Mekhanoshin d5d412f2ae [AMDGPU] Split GCNRegBankReassign
Allow pass to work separately with SGPR, VGPR registers or both.
This is NFC now but will be needed to split RA for separate
SGPR and VGPR passes.

Differential Revision: https://reviews.llvm.org/D100063
2021-04-07 14:45:13 -07:00
Sanjay Patel c0bbd0cc35 [InstCombine] fold not ops around min/max intrinsics
This is another step towards parity with the existing
cmp+select folds (see D98152).
2021-04-07 17:31:36 -04:00
Craig Topper 56ea2e2fdd [RISCV] Add a special case to lowerSELECT for select of 2 constants with a SETLT condition.
If the constants have a difference of 1 we can convert one to
the other by adding or subtracting the condition.

We have a DAG combine for this, but it only runs before type
legalization. If the select is introduced later during type
legalization or op legalization we will miss it.

We don't need a specific condition, but some conditions are
harder to materialize than others on RISCV. I know that SETLT
will be a single instruction and it is what is used by the
motivating pattern from signed saturating add/sub.

Differential Revision: https://reviews.llvm.org/D99021
2021-04-07 13:47:17 -07:00
Craig Topper 9895285191 [RISCV] Replace 'return ReplaceNode' with 'ReplaceNode; return;' NFC
ReplaceNode is a void function as is the function that we were
doing this in. While this is valid code, it was a bit confusing.
2021-04-07 12:18:41 -07:00
Jonas Hahnfeld 6415f424bc [AArch64] Materialize FP constant in code for large code model
When using the large code model with FastISel (for example via
clang -O0 which adds the optnone attribute), FP constants could
still be materialized using adrp + ldr. Unconditionally enable
the existing path for MachO to materialize the constant in code.

For testing, restore literal_pools_float.ll to exercise the constant
pool and add two optnone-functions that return a float and a double,
respectively. Consolidate fpimm.ll and add a new fast-isel-fpimm.ll
to check the code paths taken with FastISel.

Differential Revision: https://reviews.llvm.org/D99607
2021-04-07 21:02:05 +02:00
Arthur Eubanks 90af134473 Revert "[AsmPrinter] Delete dead takeDeletedSymbsForFunction()"
This reverts commit 9583a3f262.

This wasn't NFC as initially thought. Needed for D99707.
2021-04-07 11:40:44 -07:00
Abhina Sreeskantharajan 5c8462b5da [Windows] Remove global OF_None flag for Windows in ToolOutputFiles
Since we have created a new OF_TextWithCRLF flag, we no longer need to worry about OF_Text flag turning on CRLF translation. I can remove this workaround I added to globally open all ToolOutputFiles as binary on Windows.

Reviewed By: rnk

Differential Revision: https://reviews.llvm.org/D100034
2021-04-07 14:10:04 -04:00
Craig Topper f087d7544a [RISCV] Support vslide1up/down intrinsics for SEW=64 on RV32.
This can't use our normal strategy of splatting the scalar and using
a .vv operation instead of .vx.

Instead this patch bitcasts the vector to the equivalent SEW=32
vector and inserts the scalar parts using two vslide1up/down. We
do that unmasked and apply the mask separately at the end with
a vmerge.

For vslide1up there maybe some other options here like getting
i64 into element 0 and using vslideup.vi with this vector as
vd and the original source as vs1. Masking would still need to
be done afterwards.

That idea doesn't work for vslide1down. We need to slidedown and
then insert a single scalar at vl-1 which we could do with a
vslideup, but that assumes vl > 0 which I don't think we can assume.

The i32 double slide1down implemented here is the best I could come
up with and I just made vslide1up consistent.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D99910
2021-04-07 10:44:53 -07:00
Craig Topper 67953311e2 [SelectionDAG] Teach SelectionDAG::FoldConstantArithmetic to handle SPLAT_VECTOR
This allows FoldConstantArithmetic to handle SPLAT_VECTOR in
addition to BUILD_VECTOR. This allows it to support scalable
vectors. I'm also allowing fixed length SPLAT_VECTOR which is
used by some targets, but I'm not familiar enough to write tests
for those targets.

I had to block this function from running on CONCAT_VECTORS to
avoid calling getNode for a CONCAT_VECTORS of 2 scalars.
This can happen because the 2 operand getNode calls this
function for any opcode. Previously we were protected because
CONCAT_VECTORs of BUILD_VECTOR is folded to a larger BUILD_VECTOR
before that call. But it's not always possible to fold a CONCAT_VECTORS
of SPLAT_VECTORs, and we don't even try.

This fixes PR49781 where DAG combine thought constant folding
should be possible, but FoldConstantArithmetic couldn't do it.

Reviewed By: david-arm

Differential Revision: https://reviews.llvm.org/D99682
2021-04-07 10:03:33 -07:00
Craig Topper 5fc0e98d9a [LoopIdiomRecognize] Minor cleanups to the FFS idiom matching. NFC
-Make sure of the CreateShl/LShr/AShr methods that take a uint64_t
instead of creating a ConstantInt for 1 ourselves.
-Use Builder.getInt1 or ConstantInt::getBool instead of a conditional.
-Pull out repeated calls to getType.
2021-04-07 10:03:14 -07:00
Roman Lebedev 24f67473dd
[InstCombine] foldAddWithConstant(): don't deal with non-immediate constants
All of the code that handles general constant here (other than the more
restrictive APInt-dealing code) expects that it is an immediate,
because otherwise we won't actually fold the constants, and increase
instruction count. And it isn't obvious why we'd be okay with
increasing the number of constant expressions,
those still will have to be run..

But after 2829094a8e
this could also cause endless combine loops.
So actually properly restrict this code to immediates.
2021-04-07 19:50:19 +03:00
Sanjay Patel 1894c6c59e [InstCombine] avoid infinite loop from partial undef vectors
This fixes the examples from
D99674 and
https://llvm.org/PR49878

The matchers succeed on partial undef/poison vector constants,
but the transform creates a full 'not' (-1) constant, so it
would undo a demanded vector elements change triggered by the
extractelement.

Differential Revision: https://reviews.llvm.org/D100044
2021-04-07 12:18:12 -04:00
wlei 6d5132b426 [CSSPGO] Fix incorrect probe distribution factor computation in top-down inliner
We see a regression related to low probe factor(0.01) which prevents some callsites being promoted in ICPPass and later cause the missing inline in CGSCC inliner. The root cause is due to redundant(the second) multiplication of the probe factor and this change try to fix it.

`Sum` does multiply a factor right after findCallSamples but later when using as the parameter in setProbeDistributionFactor, it multiplies one again.

This change could get ~2% perf back on mcf benchmark. In mcf, previously the corresponding factor is 1 and it's the recent feature introducing the <1 factor then trigger this bug.

Reviewed By: hoy, wenlei

Differential Revision: https://reviews.llvm.org/D99787
2021-04-07 08:48:59 -07:00
Abhina Sreeskantharajan 1bcf58b213 [SystemZ][z/OS][TableGen] TableGen files should be text
This patch sets tablegen files as text. It should have no effect on Windows after this patch landed https://reviews.llvm.org/rG82b3e28e836d2f5c8cfd6e1047b93c088522365a.

Reviewed By: anirudhp

Differential Revision: https://reviews.llvm.org/D100036
2021-04-07 11:23:00 -04:00
Sam Clegg f23b259e18 [WebAssembly] Improve error messages regarding missing indirect function table. NFC
Use report_fatal_error here since this is an internal error, and not
something the user can/should be trying to fix.

Also distinguish between the symbol being missing and the symbol having
the wrong type.

We have a failure internally where the symbol is missing.  Currently
trying to reduce the test case to something we can attach to an llvm
bug.

Differential Revision: https://reviews.llvm.org/D99960
2021-04-07 07:58:43 -07:00
Sebastian Neubauer 2dc6be5209 [AMDGPU] Update SGPRSpillVGPRCSR name. NFC
The struct is used for both, callee and caller-save registers now.
The frame index is not set for entrypoints, as we do not need to save
the registers then.
Update the struct name to reflect that.

Differential Revision: https://reviews.llvm.org/D99722
2021-04-07 16:30:40 +02:00
Jingu Kang 798b0fd36b [NPM] Fix typo inisLTOPreLink for loop rotate
Differential Revision: https://reviews.llvm.org/D100033
2021-04-07 15:08:37 +01:00
Simon Pilgrim 302e748065 [X86] Improve optimizeCompareInstr for signed comparisons after AND/OR/XOR instructions
Extend D94856 to handle 'and', 'or' and 'xor' instructions as well

We still fail on many i8/i16 cases as the test and the logic-op are performed on different widths
2021-04-07 14:28:42 +01:00
Alexey Bataev a78e86e6be [SLP]Avoid multiple attempts to vectorize CmpInsts.
No need to lookup through and/or try to vectorize operands of the
CmpInst instructions during attempts to find/vectorize min/max
reductions. Compiler implements postanalysis of the CmpInsts so we can
skip extra attempts in tryToVectorizeHorReductionOrInstOperands and save
compile time.

Differential Revision: https://reviews.llvm.org/D99950
2021-04-07 06:15:42 -07:00
Jay Foad bf6cab6f07 [AMDGPU] SIFoldOperands: don't dump extra '\n' after MachineInstr. NFC. 2021-04-07 14:13:00 +01:00
Sanjay Patel 0333ed8e0c [InstCombine] move abs transform to helper function; NFC
The swap of the operands can affect later transforms that
are expecting a constant as operand 1. I don't think we
can trigger a bug with the current code, but I hit that
problem while drafting a new transform for min/max intrinsics.
2021-04-07 08:35:07 -04:00
Simon Pilgrim 583258723f [X86] Improve optimizeCompareInstr for signed comparisons after BZHI instructions
Extend D94856 to handle 'bzhi' instructions as well
2021-04-07 12:07:26 +01:00
Yevgeny Rouban 3e738afae4 [Statepoint Lowering] Allow other than N byte sized types in deopt bundle
I do not see any bit-width restriction from the point of the
LLVM Lang Ref - Operand Bundles on the types of the deopt bundle
operands. Statepoint Lowering seems to be able to work with any
types.
This patch relaxes the two related assertions and adds a new test
for this change.

Reviewed By: reames
Differential Revision: https://reviews.llvm.org/D100006
2021-04-07 17:48:31 +07:00