Commit Graph

151734 Commits

Author SHA1 Message Date
Jeremy Morse cf033bb2d3 [DebugInfo][NFC] Zero-initialize a class field
This field gets assigned when the relevant object starts being used; but it
remains uninitialized beforehand. This risks introducing hard-to-detect
bugs if something changes, so zero-initialize the field.
2021-10-19 10:24:12 +01:00
Lang Hames cc3115cd1d [JITLink][x86-64] Lift GOT, PLT table managers into x86_64.h; reuse for MachO.
This lifts the global offset table and procedure linkage table builders out of
ELF_x86_64.h and into x86_64.h, renaming them with generic names
x86_64::GOTTableBuilder and x86_64::PLTTableBuilder. MachO_x86_64.cpp is updated
to use these classes instead of the older PerGraphGOTAndStubsBuilder tool.
2021-10-18 21:47:24 -07:00
Noah Shutty e678c51177 [Support][ThinLTO] Move ThinLTO caching to LLVM Support library
We would like to move ThinLTO’s battle-tested file caching mechanism to
the LLVM Support library so that we can use it elsewhere in LLVM.

Patch By: noajshu

Differential Revision: https://reviews.llvm.org/D111371
2021-10-18 18:57:25 -07:00
Hsiangkai Wang facff468b6 [RISCV] Reorder the vector register allocation order.
GPR uses argument registers as the first group of registers to allocate.
This patch uses vector argument registers, v8 to v23, as the first group
to allocate.

Differential Revision: https://reviews.llvm.org/D111304
2021-10-19 09:30:13 +08:00
Lang Hames bc03a9c066 Simplify the TableManager class and move it into a public header.
Moves visitEdge into the TableManager derivatives, replacing the fixEdgeKind
methods in those classes. The visitEdge method takes on responsibility for
updating the edge target, as well as its kind.
2021-10-18 18:20:33 -07:00
Anshil Gandhi 0567f03331 [HIP] [AlwaysInliner] Disable AlwaysInliner to eliminate undefined symbols
By default clang emits complete contructors as alias of base constructors if they are the same.
The backend is supposed to emit symbols for the alias, otherwise it causes undefined symbols.
@yaxunl observed that this issue is related to the llvm options `-amdgpu-early-inline-all=true`
and `-amdgpu-function-calls=false`. This issue is resolved by only inlining global values
with internal linkage. The `getCalleeFunction()` in AMDGPUResourceUsageAnalysis also had
to be extended to support aliases to functions. inline-calls.ll was corrected appropriately.

Reviewed By: yaxunl, #amdgpu

Differential Revision: https://reviews.llvm.org/D109707
2021-10-18 16:53:15 -06:00
Simon Pilgrim a83384498b [X86] combineMulToPMADDWD - replace ASHR(X,16) -> LSHR(X,16)
If we're using an ashr to sign-extend the entire upper 16 bits of the i32 element, then we can replace with a lshr. The sign bit will be correctly shifted for PMADDWD's implicit sign-extension and the upper 16 bits are zero so the upper i16 sext-multiply is guaranteed to be zero.

The lshr also has a better chance of folding with shuffles etc.
2021-10-18 22:12:56 +01:00
Arthur Eubanks ecd25edfc5 [InlineCost] Add empty line between call sites when printing inline costs 2021-10-18 13:56:48 -07:00
Craig Topper 1053e0b27c [RISCV] Use a lambda to avoid having the Support library depend on Option library.
RISCVISAInfo::toFeatures needs to allocate strings using
ArgList::MakeArgString, but toFeatures lives in Support and
MakeArgString lives in Option.

toFeature only has one caller, so the simple fix is to have that
caller pass a lamdba that wraps MakeArgString to break the
dependency.

Differential Revision: https://reviews.llvm.org/D112032
2021-10-18 13:39:37 -07:00
Matt Morehouse 431a5d8411 [x86] Implement a tagged-globals backend feature.
The feature tells the backend to allow tags in the upper bits of global
variable addresses.  These tags will be ignored by upcoming CPUs with
the Intel LAM feature but may be used in instrumentation passes (e.g.,
HWASan).

This patch implements the feature by using @GOTPCREL relocations instead
of direct references to the locally defined global.  Thus the full
tagged address can be loaded by a single instruction:
  movq global@GOTPCREL(%rip), %rax

Reviewed By: eugenis

Differential Revision: https://reviews.llvm.org/D111343
2021-10-18 13:31:10 -07:00
Alexandros Lamprineas 04dc68710a [DebugInfo][ARM] Fix incorrect debug information for RWPI accessed globals
When compiling for the RWPI relocation model the debug information is wrong:

* the debug location is described as { DW_OP_addr Var }
  instead of { DW_OP_constNu Var DW_OP_bregX 0 DW_OP_plus }
* the relocation type is R_ARM_ABS32 instead of R_ARM_SBREL32

Differential Revision: https://reviews.llvm.org/D111404
2021-10-18 21:29:46 +01:00
Arthur Eubanks b8ce97372d [NewPM] Add PipelineTuningOption to eagerly invalidate analyses
This trades off more compile time for less peak memory usage. Right now
it invalidates all function analyses after a module->function or
cgscc->function adaptor.

https://llvm-compile-time-tracker.com/compare.php?from=1fb24fe85a19ae71b00875ff6c96ef1831dcf7e3&to=cb28ddb063c87f0d5df89812ab2de9a69dd276db&stat=instructions
https://llvm-compile-time-tracker.com/compare.php?from=1fb24fe85a19ae71b00875ff6c96ef1831dcf7e3&to=cb28ddb063c87f0d5df89812ab2de9a69dd276db&stat=max-rss

For now this is just experimental.

See comments on why this may affect optimizations.

Reviewed By: asbirlea, nikic

Differential Revision: https://reviews.llvm.org/D111575
2021-10-18 13:20:35 -07:00
Alexey Bataev b9cfa016da [SLP]Fix emission of the shrink shuffles.
Need to follow the order of the reused scalars from the
ReuseShuffleIndices mask rather than rely on the natural order.

Differential Revision: https://reviews.llvm.org/D111898
2021-10-18 13:13:12 -07:00
modimo 313c657fce [InlineAdvisor] Add -inline-replay-scope=<Function|Module> to control replay scope
The goal is to allow grafting an inline tree from Clang or GCC into a new compilation without affecting other functions. For GCC, we're doing this by extracting the inline tree from dwarf information and generating the equivalent remarks.

This allows easier side-by-side asm analysis and a trial way to see if a particular inlining setup provides benefits by itself.

Testing:
ninja check-all

Reviewed By: wenlei, mtrofin

Differential Revision: https://reviews.llvm.org/D110658
2021-10-18 13:08:39 -07:00
Nikita Popov 54d868991a [ExpandMemCmp] Update CFG before DTU
The applyUpdates() API requires that the CFG is already updated,
so make sure to insert the new terminator first.
2021-10-18 21:49:47 +02:00
Petr Hosek 8e46e34d24 Revert "[Support][ThinLTO] Move ThinLTO caching to LLVM Support library"
This reverts commit 92b8cc52bb since
it broke the gold plugin.
2021-10-18 12:24:05 -07:00
Noah Shutty 92b8cc52bb [Support][ThinLTO] Move ThinLTO caching to LLVM Support library
We would like to move ThinLTO’s battle-tested file caching mechanism to
the LLVM Support library so that we can use it elsewhere in LLVM.

Patch By: noajshu

Differential Revision: https://reviews.llvm.org/D111371
2021-10-18 12:08:49 -07:00
Yonghong Song f4a8526cc4 [NFC][BPF] fix comments and rename functions related to BTF_KIND_DECL_TAG
There are no functionality change.
Fix some comments and rename processAnnotations() to
processDeclAnnotations() to avoid confusion when later
BTF_KIND_TYPE_TAG is introduced (https://reviews.llvm.org/D111199).
2021-10-18 10:43:45 -07:00
Jon Roelofs 1300677f97 [AArch64][GlobalISel] combine and + [la]sr => ubfx
https://godbolt.org/z/h8ejrG4hb

rdar://83597585

Differential Revision: https://reviews.llvm.org/D111839
2021-10-18 10:33:01 -07:00
Yonghong Song e9e4fc0fd3 BPF: fix a bug in IRPeephole pass
Commit 009f3a89d8 ("BPF: remove intrindics @llvm.stacksave()
and @llvm.stackrestore()") implemented IRPeephole pass to remove
llvm.stacksave()/stackrestore() instrinsics.
Buildbot reported a failure:
  UNREACHABLE executed at ../lib/IR/LegacyPassManager.cpp:1445!
which is:
  llvm_unreachable("Pass modifies its input and doesn't report it");

The code has changed but the implementation didn't return true
for changing. This patch fixed this problem.
2021-10-18 10:18:24 -07:00
Florian Hahn e844f05397
[LoopUtils] Simplify addRuntimeCheck to return a single value.
This simplifies the return value of addRuntimeCheck from a pair of
instructions to a single `Value *`.

The existing users of addRuntimeChecks were ignoring the first element
of the pair, hence there is not reason to track FirstInst and return
it.

Additionally all users of addRuntimeChecks use the second returned
`Instruction *` just as `Value *`, so there is no need to return an
`Instruction *`. Therefore there is no need to create a redundant
dummy `and X, true` instruction any longer.

Effectively this change should not impact the generated code because the
redundant AND will be folded by later optimizations. But it is easy to
avoid creating it in the first place and it allows more accurately
estimating the cost of the runtime checks.
2021-10-18 18:03:09 +01:00
Craig Topper 84d9bc51a3 [RISCV] Rewrite forwardCopyWillClobberTuple to not assume that there are exactly 32 registers. NFC
This function was copied from ARM where register pairs/triples/quads can wrap around the 32 encoding space. So register 31 can pair with register 0. This is not true for RISCV vectors. The spec specifically mentions the possibility of a future encoding that has more than 32 registers.

This patch removes the modulo from the code and directly checks that destination register is in the source register range and not the beginning of the range. Though I don't expect an identity copy will occur.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D111467
2021-10-18 09:57:38 -07:00
Yonghong Song 009f3a89d8 BPF: remove intrindics @llvm.stacksave() and @llvm.stackrestore()
Paul Chaignon reported a bpf verifier failure ([1]) due to using
non-ABI register R11. For the test case, llvm11 is okay while
llvm12 and later generates verifier unfriendly code.

The failure is related to variable length array size.
The following mimics the variable length array definition
in the test case:

struct t { char a[20]; };
void foo(void *);
int test() {
   const int a = 8;
   char tmp[AA + sizeof(struct t) + a];
   foo(tmp);
   ...
}

Paul helped bisect that the following llvm commit is
responsible:

552c6c2328 ("PR44406: Follow behavior of array bound constant
              folding in more recent versions of GCC.")

Basically, before the above commit, clang frontend did constant
folding for array size "AA + sizeof(struct t) + a" to be 68,
so used alloca for stack allocation. After the above commit,
clang frontend didn't do constant folding for array size
any more, which results in a VLA and llvm.stacksave/llvm.stackrestore
is generated.

BPF architecture API does not support stack pointer (sp) register.
The LLVM internally used R11 to indicate sp register but it should
not be in the final code. Otherwise, kernel verifier will reject it.

The early patch ([2]) tried to fix the issue in clang frontend.
But the upstream discussion considered frontend fix is really a
hack and the backend should properly undo llvm.stacksave/llvm.stackrestore.
This patch implemented a bpf IR phase to remove these intrinsics
unconditionally. If eventually the alloca can be resolved with
constant size, r11 will not be generated. If alloca cannot be
resolved with constant size, SelectionDag will complain, the same
as without this patch.

 [1] https://lore.kernel.org/bpf/20210809151202.GB1012999@Mem/
 [2] https://reviews.llvm.org/D107882

Differential Revision: https://reviews.llvm.org/D111897
2021-10-18 09:51:19 -07:00
Kazu Hirata 8568ca789e Use llvm::erase_if (NFC) 2021-10-18 09:33:42 -07:00
Kirill Stoimenov 62627c7217 [Sanitizers] Replaced getMaxPointerSizeInBits with getPointerSizeInBits, which was causing failures for 32bit x86.
Reviewed By: vitalybuka

Differential Revision: https://reviews.llvm.org/D111829
2021-10-18 09:31:14 -07:00
Mircea Trofin 8612b47a8e [NFC] ProfileSummary: const a bunch of members and fields.
It helps readability and maintainability (don't need to chase down
writes to a field I see is const, for example)
2021-10-18 08:55:06 -07:00
Gil Rapaport 1156bd4fc3 [LV] Record memory widening decisions (NFCI)
Record widening decisions for memory operations within the planned recipes and
use the recorded decisions in code-gen rather than querying the cost model.

Differential Revision: https://reviews.llvm.org/D110479
2021-10-18 18:03:35 +03:00
Jessica Clarke f5755c0849 [Mips] Add glue between CopyFromReg, CopyToReg and RDHWR nodes for TLS
The MIPS ABI requires the thread pointer be accessed via rdhwr $3, $r29.
This is currently represented by (CopyToReg $3, (RDHWR $29)) followed by
a (CopyFromReg $3). However, there is no glue between these, meaning
scheduling can break those apart. In particular, PR51691 is a report
where PseudoSELECT_I was moved to between the CopyToReg and CopyFromReg,
and since its expansion uses branches, it split the def and use of the
physical register between two basic blocks, resulting in the def being
eliminated and the use having no def. It also seems possible that a
similar situation could arise splitting up the CopyToReg from the RDHWR,
causing the RDHWR to use a destination register other than $3, violating
the ABI requirement.

Thus, add glue between all three nodes to ensure they aren't split up
during instruction selection. No regression test is added since any test
would be implictly relying on specific scheduling behaviour, so whilst
it might be testing that glue is preventing reordering today, changes to
scheduling behaviour could result in the test no longer being able to
catch a regression here, as the reordering might no longer happen for
other unrelated reasons.

Fixes PR51691.

Reviewed By: atanasyan, dim

Differential Revision: https://reviews.llvm.org/D111967
2021-10-18 15:10:20 +01:00
Kerry McLaughlin ac4e01ea0e [SVE][CodeGen] Fix predicate for add/sub + element count patterns
The patterns added in D111441 should use the HasSVEorStreamingSVE
predicate. This changes one incorrect use of HasSVE with the new
patterns.
2021-10-18 14:42:29 +01:00
Andrew Wei f5056c8c16 [AArch64] Improve shuffle vector by using wider types
Try to widen element type to get a new mask value for a better permutation
sequence, so that we can use NEON shuffle instructions, such as zip1/2,
UZP1/2, TRN1/2, REV, INS, etc.
For example:
  shufflevector <4 x i32> %a, <4 x i32> %b, <4 x i32> <i32 6, i32 7, i32 2, i32 3>
is equivalent to:
  shufflevector <2 x i64> %a, <2 x i64> %b, <2 x i32> <i32 3, i32 1>
Finally, we can get:
  mov     v0.d[0], v1.d[1]

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D111619
2021-10-18 21:24:45 +08:00
Sanjay Patel 2a3cc4d461 [Analysis] add utility function for unary shuffle mask creation
This is NFC-intended for the callers. Posting in case there are
other potential users that I missed.
I would also use this from VectorCombine in a patch for:
https://llvm.org/PR52178 ( D111901 )

Differential Revision: https://reviews.llvm.org/D111891
2021-10-18 09:00:39 -04:00
Simon Pilgrim f041338153 [X86][Costmodel] Add SSE2 sub-128bit vXi32/f32 stride 2 interleaved store costs
Differential Revision: https://reviews.llvm.org/D111941
2021-10-18 13:46:10 +01:00
Simon Pilgrim c850d5c5c8 [X86][Costmodel] Add SSE2 sub-128bit vXi8/16 stride 2 interleaved store costs
Differential Revision: https://reviews.llvm.org/D111941
2021-10-18 13:15:14 +01:00
Stephen Tozer b9ca73e1a8 [DebugInfo] Correctly handle arrays with 0-width elements in GEP salvaging
Fixes an issue where GEP salvaging did not properly account for GEP
instructions which stepped over array elements of width 0 (effectively a
no-op). This unnecessarily produced long expressions by appending
`... + (x * 0)` and potentially extended the number of SSA values used
in the dbg.value. This also erroneously triggered an assert in the
salvage function that the element width would be strictly positive.
These issues are resolved by simply ignoring these useless operands.

Reviewed By: aprantl

Differential Revision: https://reviews.llvm.org/D111809
2021-10-18 12:01:12 +01:00
Peter Waller c4603a8a43 [InstCombine][DebugInfo] Remove superflous assertion, add test
When this code was added, an unnecessary assertion slipped in which we
now hit in real code.

Add a test to defend against it firing again.
2021-10-18 11:00:01 +00:00
Jay Foad d55db4b033 [AMDGPU] Remove unused VirtRegMap analysis. NFC. 2021-10-18 11:55:40 +01:00
Max Kazantsev baad10c09e Revert "[NFC] [LoopPeel] Change the way DT is updated for loop exits"
This reverts commit fa16329ae0.

See comments in discussion. Merged by mistake, not entirely getting what
the problem was.
2021-10-18 17:14:11 +07:00
Jay Foad a129932b0d [AMDGPU] Add link to bug 2021-10-18 10:33:42 +01:00
Jeremy Morse ea970661dc Fix signed/unsigned comparison after b5426ced71
gcc11 warns that this counter causes a signed/unsigned comaprison when it's
later compared with a SmallVector::difference_type. gcc appears to be
correct, clang does not warn one way or the other.
2021-10-18 10:28:52 +01:00
Jay Foad 012248b0bc Remove the verifyAfter mechanism that was replaced by D111397
Differential Revision: https://reviews.llvm.org/D111872
2021-10-18 10:26:46 +01:00
Jay Foad 36deb9a670 Add new MachineFunction property FailsVerification
TargetPassConfig::addPass takes a "bool verifyAfter" argument which lets
you skip machine verification after a particular pass. Unfortunately
this is used in generic code in TargetPassConfig itself to skip
verification after a generic pass, only because some previous target-
specific pass damaged the MIR on that specific target. This is bad
because problems in one target cause lack of verification for all
targets.

This patch replaces that mechanism with a new MachineFunction property
called "FailsVerification" which can be set by (usually target-specific)
passes that are known to introduce problems. Later passes can reset it
again if they are known to clean up the previous problems.

Differential Revision: https://reviews.llvm.org/D111397
2021-10-18 10:26:46 +01:00
Piotr Sobczak d869921004 [AMDGPU] Add patterns for i8/i16 local atomic load/store
Add patterns for i8/i16 local atomic load/store.

Added tests for new patterns.

Copied atomic_[store/load]_local.ll to GlobalISel directory.

Differential Revision: https://reviews.llvm.org/D111869
2021-10-18 11:23:10 +02:00
Fraser Cormack 3d850d03ae [SelectionDAG] Fix illegal widening of scalable-vector loads
The process of widening simple vector loads attempts to use a load of a
wider vector type if the original load is sufficiently aligned to avoid
memory faults.

However this optimization is only legal when performed on fixed-length
vector types. For scalable vector types this is invalid (unless vscale
happens to be 1).

This patch does increase the likelihood of compiler crashes (from
`FindMemType` failing to find a suitable type) but this now better
matches how widening non-simple loads, insufficiently-aligned loads, and
scalable-vector stores are handled.

Patches will be introduced later by which loads and stores can be
widened on targets with support for masked or predicated operations.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D111885
2021-10-18 10:00:00 +01:00
Luo, Yuanke 942536ac08 [X86] Prefer VEX encoding in X86 assembler.
This patch is to order the AVX instructions ahead of AVX512 instructions
in the matching table so that the AVX instructions can be matched first.
Thanks Craig and Shengchen for the idea.

Differential Revision: https://reviews.llvm.org/D111538
2021-10-18 16:54:11 +08:00
Stanislav Mekhanoshin 7cdb1df8c7 [AMDGPU] Divergence driven selection for fused bitlogic
The change adds divergence predicates for fused logical operations.
The problem with selecting a scalar fused op such as S_NOR_B32 is
that it does not have a VALU counterpart and will be split in
moveToVALU. At the same time it prevents selection of a better
opcode on the VALU side (such as V_OR3_B32) which does not have a
counterpart on SALU side.

XNOR opcodes are left as is and selected as scalar to get advantage
of the SIInstrInfo::lowerScalarXnor() code which can commute
operations to keep one of two opcodes on SALU if possible. See
xnor.ll test for this.

Differential Revision: https://reviews.llvm.org/D111907
2021-10-18 01:44:25 -07:00
Raphael Isemann de4d2f80b7 Fix cyclic header dependency between Support<->Option due to RISCVISAInfo
This was introduced in D105168 which added RISCVISAInfo.h.
2021-10-18 10:06:11 +02:00
Jingu Kang 3f0b178de2 [AArch64] Fixed a bug on AArch64MIPeepholeOpt
Create new virtual register for the definition of new AND instruction and
replace old register by the new one to keep SSA form.

Differential Revision: https://reviews.llvm.org/D109963
2021-10-18 08:55:42 +01:00
Bing1 Yu f383c53311 [MachineSink] Compile time improvement for large testcases which has many kill flags
We did a experiment and observed dramatic decrease on compilation time which spent on clearing kill flags.
Before:
Number of BasicBlocks:33357
Number of Instructions:162067
Number of Cleared Kill Flags:32869
Time of handling kill flags(ms):1.607509e+05

After:
Number of BasicBlocks:33357
Number of Instructions:162067
Number of Cleared Kill Flags:32869
Time of handling kill flags:3.987371e+03

Reviewed By: MatzeB

Differential Revision: https://reviews.llvm.org/D111688
2021-10-18 15:44:07 +08:00
Qiu Chaofan 67c64d8337 [PowerPC] Implement scheduling model for Power10
Reviewed By: jsji

Differential Revision: https://reviews.llvm.org/D110855
2021-10-18 15:27:49 +08:00
Max Kazantsev fa16329ae0 [NFC] [LoopPeel] Change the way DT is updated for loop exits
When peeling a loop, we assume that the latch has a `br` terminator and
that all loop exits are either terminated with an `unreachable` or have
a terminating deoptimize call. So when we peel off the 1st iteration, we
change the IDom of all loop exits to the peeled copy of
`NCD(IDom(Exit), Latch)`. This works now, but if we add logic to support
loops with exits that are followed by a block with an `unreachable` or a
terminating deoptimize call, changing the exit's idom wouldn't be enough
and DT would be broken.

For example, let `Exit1` and `Exit2` are loop exits, and each of them
unconditionally branches to the same `unreachable` terminated block. So
neither of the exits dominates this unreachable block. If we change the
IDoms of the exits to some peeled loop block, we don't update the
dominators of the unreachable block. Currently we just don't get to the
peeling logic, saying that we can't peel such loops.

With this NFC we just insert edges from cloned exiting blocks to their
exits after peeling each iteration (we accumulate the insertion updates
and then after peeling apply the updates to DT).

This patch was a part of D110922.

Patch by Dmitry Makogon!

Differential Revision: https://reviews.llvm.org/D111611
Reviewed By: mkazantsev
2021-10-18 10:23:05 +07:00