Commit Graph

133046 Commits

Author SHA1 Message Date
Amara Emerson befc788cfa GlobalISel: Add a setInstrAndDebugLoc(MachineInstr&) convenience helper to MachineIRBuilder. NFC.
This saves doing two separate calls to set the Instr and DebugLoc from an existing MI.
2020-04-08 14:38:33 -07:00
Matt Arsenault e49e33b610 CodeGen: Use Register in MachineInstrBuilder 2020-04-08 17:03:53 -04:00
Kirill Naumov 8b67853a83 [CFGPrinter] Adding heat coloring to CFGPrinter
This patch introduces the heat coloring of the Control Flow Graph which is based
on the relative "hotness" of each BB. The patch is a part of sequence of three
patches, related to graphs Heat Coloring.

Reviewers: rcorcs, apilipenko, davidxl, sfertile, fedor.sergeev, eraman, bollu

Differential Revision: https://reviews.llvm.org/D77161
2020-04-08 19:59:51 +00:00
Matt Arsenault c42cc7fd24 CodeGen: Use Register in MachineSSAUpdater 2020-04-08 14:29:01 -04:00
Artem Belevich a9627b7ea7 [CUDA] Add partial support for recent CUDA versions.
Generate PTX using newer versions of PTX and allow using sm_80 with CUDA-11.
None of the new features of CUDA-10.2+ have been implemented yet, so using these
versions will still produce a warning.

Differential Revision: https://reviews.llvm.org/D77670
2020-04-08 11:19:44 -07:00
Vedant Kumar 48e65fc630 MachineFunction: Copy call site info when duplicating insts
Summary:
Preserve call site info for duplicated instructions. We copy over the
call site info in CloneMachineInstrBundle to avoid repeated calls to
copyCallSiteInfo in CloneMachineInstr.

(Alternatively, we could copy call site info higher up the stack, e.g.
into TargetInstrInfo::duplicate, or even into individual backend passes.
However, I don't see how that would be safer or more general than the
current approach.)

Reviewers: aprantl, djtodoro, dstenb

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D77685
2020-04-08 11:06:14 -07:00
Matt Arsenault 586769cce2 DAG: Use Register 2020-04-08 13:44:31 -04:00
Sean Fertile d0b57b41f4 [PowerPC][AIX][NFC] Replace deprecated getByValAlign call.
Replace call to deprecated 'getByValAlign()' with
'getNonZeroByValAlign()'.
2020-04-08 13:27:39 -04:00
Matt Arsenault dcce3ef1d2 FastISel: Partially use Register
Doesn't try to convert the cases that depend on generated code.
2020-04-08 12:10:58 -04:00
Matt Arsenault 7a46e36d51 CodeGen: Use Register more in CallLowering
Some of these MCPhysReg uses should probably be MCRegister, but right
now this would require more invasive changes.
2020-04-08 12:10:58 -04:00
Matt Arsenault ca0ace7298 CodeGen: Use Register in MachineBasicBlock 2020-04-08 12:10:58 -04:00
Matt Arsenault 84aa58cbe2 CodeGen: Use Register in TargetLowering 2020-04-08 12:10:58 -04:00
Kirill Naumov 0125db9ab2 [TimePasses] Small fix in "-time-passes" flag that makes it more stable
Adds StringMap for TimingData.

Differential Revision: https://reviews.llvm.org/D76946
Reviewed By: fedor.sergeev
2020-04-08 15:59:45 +00:00
Sean Fertile 8abfd2c3bb [PowerPC][AIX] Enable passing byval formal arguments in multiple registers.
Any or all the argument registers can be used to pass a byval formal
argument, with the limitation that the argument must fit in the
available registers (ie: is not split between registers and stack).

Differential Revision: https://reviews.llvm.org/D76902
2020-04-08 11:16:33 -04:00
Florian Hahn bbbec71609 [DSE.MSSA] Only use callCapturesBefore for calls.
callCapturesBefore always returns ModRef , if UseInst isn't a call. As
we only call it if we already know Mod is set, this only destroys the
Must bit for non-calls.
2020-04-08 15:12:33 +01:00
Florian Hahn a6353fdf3b [DSE,MSSA] Hoist getMemoryAccess call (NFC). 2020-04-08 15:10:05 +01:00
Alexey Lapshin 0ed2170dc4 [DWARFLinker][dsymutil] followup for 88c2137b6d
That patch is a followup for "Move DwarfStreamer into DWARFLinker".
It fixes build with LLVM_LINK_LLVM_DYLIB.
2020-04-08 16:46:52 +03:00
Stefan Pintilie 6c4b40def7 [PowerPC][Future] Add Support For Functions That Do Not Use A TOC.
On PowerPC most functions require a valid TOC pointer.

This is the case because either the function itself needs to use this
pointer to access the TOC or because other functions that are called
from that function expect a valid TOC pointer in the register R2.
The main exception to this is leaf functions that do not access the TOC
since they are guaranteed not to need a valid TOC pointer.

This patch introduces a feature that will allow more functions to not
require a valid TOC pointer in R2.

Differential Revision: https://reviews.llvm.org/D73664
2020-04-08 08:07:35 -05:00
Sanjay Patel a1c05fe20f [InstCombine] exclude bitcast of ppc_fp128 in icmp signbit fold
Based on the post-commit comments for rG0f56bbc, there might
be a problem with this transform:

(bitcast (fpext/fptrunc X)) to iX) < 0 --> (bitcast X to iY) < 0

...and the ppc_fp128 data type, so conservatively bypass if we
are bitcasting a ppc_fp128.

We might be able to account for endian or other differences to
enable this for PowerPC again if that is useful.

Differential Revision: https://reviews.llvm.org/D77642
2020-04-08 08:56:19 -04:00
Simon Pilgrim 66c18c729d [X86][SSE] Combine PTEST(AND(X,Y),AND(X,Y)) -> PTEST(X,Y) and ANDN equivalents
Tests derived from PR42035 examples
2020-04-08 12:42:22 +01:00
Jeremy Morse c77887e4d1 [DebugInfo][NFC] Early-exit when analyzing for single-location variables
This is a performance patch that hoists two conditions in DwarfDebug's
validThroughout to avoid a linear-scan of all instructions in a block. We
now exit early if validThrougout will never return true for the variable
location.

The first added clause filters for the two circumstances where
validThroughout will return true. The second added clause should be
identical to the one that's deleted from after the linear-scan.

Differential Revision: https://reviews.llvm.org/D77639
2020-04-08 12:27:11 +01:00
Shengchen Kan 916044d819 [X86][MC] Support enhanced relaxation for branch align
Summary:
Since D75300 has been landed, I want to support enhanced relaxation when we need to align branches and allow prefix padding. "Enhanced Relaxtion" means we allow an instruction that could not be traditionally relaxed to be emitted into RelaxableFragment so that we increase its length by adding prefixes for optimization.

The motivation is straightforward, RelaxFragment is mostly for relative jumps and we can not increase the length of jumps when we need to align them, so if we need to achieve D75300's purpose (reducing the bytes of nops) when need to align jumps, we have to make more instructions "relaxable".

Reviewers: reames, MaskRay, craig.topper, LuoYuanke, jyknight

Reviewed By: reames

Subscribers: hiraditya, llvm-commits, annita.zhang

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D76286
2020-04-08 19:08:19 +08:00
Mikael Holmen 893df2032d [IfConversion] Disallow TrueBB == FalseBB for valid diamonds
Summary:
This fixes PR45302.

Previously the case

     BB1
     / \
    |   |
   TBB FBB
    |   |
     \ /
     BB2

was treated as a valid diamond also when TBB and FBB was the same basic
block. This then lead to a failed assertion in IfConvertDiamond.

Since TBB == FBB is quite a degenerated case of a diamond, we now
don't treat it as a valid diamond anymore, and thus we will avoid the
trouble of making IfConvertDiamond handle it correctly.

Reviewers: efriedma, kparzysz

Reviewed By: efriedma

Subscribers: hiraditya, llvm-commits

Differential Revision: https://reviews.llvm.org/D77651
2020-04-08 12:50:36 +02:00
Anna Welker 89e1248d7b [ARM][MVE] Optimise offset addresses of gathers/scatters
This patch adds an analysis of the offset addresses used by gathers
and scatters to the MVEGatherScatterLowering pass to find
multiplications and additions that are loop invariant and thus can
be moved into the loop preheader, avoiding to execute them each time.

Differential Revision: https://reviews.llvm.org/D76681
2020-04-08 11:46:57 +01:00
Max Kazantsev 7adb9e06fd [LoopLoadElim] Add test showing that LoopLoadElim doesn't work correctly with new PM 2020-04-08 17:32:03 +07:00
Dominik Montada 35950fea8d [GlobalISel] support narrow G_IMPLICIT_DEF for DstSize % NarrowSize != 0
Summary:
When narrowing G_IMPLICIT_DEF where the original size is not a multiple
of the narrow size, emit a smaller G_IMPLICIT_DEF and use G_ANYEXT.

To prevent a potential endless loop in the legalizer, the condition
to combine G_ANYEXT(G_IMPLICIT_DEF) is changed from isInstUnsupported
to !isInstLegal, since in this case the combine is only valid if
consequent legalization of the newly combined G_IMPLICIT_DEF does not
introduce G_ANYEXT due to narrowing.

Although this legalization for G_IMPLICIT_DEF would also be valid for
the general case, it actually caused a lot of code regressions when
tried due to superfluous COPYs and combines not getting hit anymore.

Reviewers: dsanders, aemerson, volkan, arsenm, aditya_nandakumar

Reviewed By: arsenm

Subscribers: jvesely, nhaehnle, kerbowa, wdng, rovka, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D76598
2020-04-08 11:00:07 +02:00
Kazushi (Jam) Marukawa aa034867f1 [VE] Simplify definitions of uimm6 and simm7
Summary: To prepare continuous changes, simplify uimm6 and simm7 operands.

Reviewed By: simoll

Differential Revision: https://reviews.llvm.org/D77700
2020-04-08 09:53:42 +02:00
Igor Kudrin af11c556db [DebugInfo] Fix reading DWARFv5 type units in DWP.
In DWARFv5, type units are stored in .debug_info sections, along with
compilation units, and they are distinguished by the unit_type field
in the header, not by the name of the section. It is impossible to
associate the correct index section of a DWP file with the unit before
the unit's header is read. This patch fixes reading DWARFv5 type units
by parsing the header first and then applying the index entry according
to the actual unit type.

Differential Revision: https://reviews.llvm.org/D77552
2020-04-08 12:50:58 +07:00
Stanislav Mekhanoshin f96810ff34 [AMDGPU] Expand vector trunc stores from i16 to i8
Differential Revision: https://reviews.llvm.org/D77693
2020-04-07 21:47:45 -07:00
Johannes Doerfert a19eb1de72 [OpenMP] Add match_{all,any,none} declare variant selector extensions.
By default, all traits in the OpenMP context selector have to match for
it to be acceptable. Though, we sometimes want a single property out of
multiple to match (=any) or no match at all (=none). We offer these
choices as extensions via
  `implementation={extension(match_{all,any,none})}`
to the user. The choice will affect the entire context selector not only
the traits following the match property.

The first user will be D75788. There we can replace
```
  #pragma omp begin declare variant match(device={arch(nvptx64)})
  #define __CUDA__

  #include <__clang_cuda_cmath.h>

  // TODO: Hack until we support an extension to the match clause that allows "or".
  #undef __CLANG_CUDA_CMATH_H__

  #undef __CUDA__
  #pragma omp end declare variant

  #pragma omp begin declare variant match(device={arch(nvptx)})
  #define __CUDA__

  #include <__clang_cuda_cmath.h>

  #undef __CUDA__
  #pragma omp end declare variant
```
with the much simpler
```
  #pragma omp begin declare variant match(device={arch(nvptx, nvptx64)}, implementation={extension(match_any)})
  #define __CUDA__

  #include <__clang_cuda_cmath.h>

  #undef __CUDA__
  #pragma omp end declare variant
```

Reviewed By: mikerice

Differential Revision: https://reviews.llvm.org/D77414
2020-04-07 23:33:24 -05:00
Kazu Hirata 91eb442fde [JumpThreading] NFC: Simplify ComputeValueKnownInPredecessorsImpl
Summary:
ComputeValueKnownInPredecessorsImpl is the main folding mechanism in
JumpThreading.cpp.  To avoid potential infinite recursion while
chasing use-def chains, it uses:

  DenseSet<std::pair<Value *, BasicBlock *>> &RecursionSet

to keep track of Value-BB pairs that we've processed.

Now, when ComputeValueKnownInPredecessorsImpl recursively calls
itself, it always passes BB as is, so the second element is always BB.

This patch simplifes the function by dropping "BasicBlock *" from
RecursionSet.

Reviewers: wmi, efriedma

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D77699
2020-04-07 18:37:36 -07:00
Eli Friedman 565b56a72c [NFC] Clean up uses of LoadInst constructor. 2020-04-07 16:28:53 -07:00
Daniel Sanders 1adeeabb79 Add MIR-level debugify with only locations support for now
Summary:
Re-used the IR-level debugify for the most part. The MIR-level code then
adds locations to the MachineInstrs afterwards based on the LLVM-IR debug
info.

It's worth mentioning that the resulting locations make little sense as
the range of line numbers used in a Function at the MIR level exceeds that
of the equivelent IR level function. As such, MachineInstrs can appear to
originate from outside the subprogram scope (and from other subprogram
scopes). However, it doesn't seem worth worrying about as the source is
imaginary anyway.

There's a few high level goals this pass works towards:
* We should be able to debugify our .ll/.mir in the lit tests without
  changing the checks and still pass them. I.e. Debug info should not change
  codegen. Combining this with a strip-debug pass should enable this. The
  main issue I ran into without the strip-debug pass was instructions with MMO's and
  checks on both the instruction and the MMO as the debug-location is
  between them. I currently have a simple hack in the MIRPrinter to
  resolve that but the more general solution is a proper strip-debug pass.
* We should be able to test that GlobalISel does not lose debug info. I
  recently found that the legalizer can be unexpectedly lossy in seemingly
  simple cases (e.g. expanding one instr into many). I have a verifier
  (will be posted separately) that can be integrated with passes that use
  the observer interface and will catch location loss (it does not verify
  correctness, just that there's zero lossage). It is a little conservative
  as the line-0 locations that arise from conflicts do not track the
  conflicting locations but it can still catch a fair bit.

Depends on D77439, D77438

Reviewers: aprantl, bogner, vsk

Subscribers: mgorny, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D77446
2020-04-07 16:25:13 -07:00
Fangrui Song 624654fd64 [VE] Migrate to the getMachineMemOperand overload using llvm::Align
Just delete the deprecated overload because nothing uses it.
2020-04-07 16:04:54 -07:00
Matt Arsenault 6011627f51 CodeGen: More conversions to use Register 2020-04-07 18:54:36 -04:00
Fangrui Song d2ef8c1f2c [ThinLTO] Drop dso_local if a GlobalVariable satisfies isDeclarationForLinker()
dso_local leads to direct access even if the definition is not within this compilation unit (it is
still in the same linkage unit). On ELF, such a relocation (e.g. R_X86_64_PC32) referencing a
STB_GLOBAL STV_DEFAULT object can cause a linker error in a -shared link.

If the linkage is changed to available_externally, the dso_local flag should be dropped, so that no
direct access will be generated.

The current behavior is benign, because -fpic does not assume dso_local
(clang/lib/CodeGen/CodeGenModule.cpp:shouldAssumeDSOLocal).
If we do that for -fno-semantic-interposition (D73865), there will be an
R_X86_64_PC32 linker error without this patch.

Reviewed By: tejohnson

Differential Revision: https://reviews.llvm.org/D74751
2020-04-07 15:46:01 -07:00
Fangrui Song 2f8fb4d1cd [VE] Adapt aa26dd9858 and 2481f26ac3 2020-04-07 15:45:19 -07:00
Wei Mi b49eac71ad Recommit [SampleFDO] Add flag for partial profile.
Fix the error of show-prof-info.test on some platforms without zlib.

The common profile usage is to collect profile from a target and then use the profile to guide the optimized build for the same target. There are some cases that no profile can be collected for a target. In those cases, although no full profile is available, it is possible to have some partial profile collected from other targets to optimize common libraries and utilities. A flag is needed to tell the partial profile from the full profile apart, so compiler can use different strategy for them.

Differential Revision: https://reviews.llvm.org/D77426
2020-04-07 14:28:25 -07:00
Stanislav Mekhanoshin 96e51ed005 [AMDGPU] Implement copyPhysReg for 16 bit subregs
Differential Revision: https://reviews.llvm.org/D74937
2020-04-07 14:22:46 -07:00
Matt Arsenault 2481f26ac3 CodeGen: Use Register in TargetFrameLowering 2020-04-07 17:07:44 -04:00
Nikita Popov fe8abbf442 [BPI] Clear handles when releasing memory (NFC)
This reduces max-rss of sqlite compilation by 2.5%.
2020-04-07 22:51:01 +02:00
Matt Arsenault aa26dd9858 CodeGen: Use Register in more places 2020-04-07 15:59:40 -04:00
Wei Mi c5da949ae8 Revert "[SampleFDO] Add flag for partial profile." show-prof-info.test breaks on some platforms.
This reverts commit e3ba652a14.
2020-04-07 12:54:51 -07:00
Wei Mi e3ba652a14 [SampleFDO] Add flag for partial profile.
The common profile usage is to collect profile from a target and then use the profile to guide the optimized build for the same target. There are some cases that no profile can be collected for a target. In those cases, although no full profile is available, it is possible to have some partial profile collected from other targets to optimize common libraries and utilities. A flag is needed to tell the partial profile from the full profile apart, so compiler can use different strategy for them.

Differential Revision: https://reviews.llvm.org/D77426
2020-04-07 12:17:56 -07:00
Nemanja Ivanovic ecd8435483 [NFC][PowerPC] Fix register class for patterns using XXPERMDIs
There are a few patterns where we use a superclass for inputs to this
instruction rather than the correct class. This can sometimes lead to
unncessary copies.
2020-04-07 14:06:08 -05:00
Graham Sellers a19a56f6a1 [AMDGPU] Extend constant folding for logical operations
This patch extends existing constant folding in logical operations to
handle S_XNOR, S_NAND, S_NOR, S_ANDN2, S_ORN2, V_LSHL_ADD_U32 and
V_AND_OR_B32. Also added a couple of tests for existing folds.
2020-04-07 14:37:16 -04:00
Craig Topper c41685b16f [SelectionDAG] Make getZeroExtendInReg take a vector VT if the operand VT is a vector.
This removes a call to getScalarType from a bunch of call sites.
It also makes the behavior consistent with SIGN_EXTEND_INREG.

Differential Revision: https://reviews.llvm.org/D77631
2020-04-07 11:34:08 -07:00
Alexey Lapshin 88c2137b6d [DWARFLinker][dsymutil][NFC] Move DwarfStreamer into DWARFLinker.
For implementing "remove obsolete debug info in lld", it is neccesary
to have DWARF generation code implementation. dsymutil uses DwarfStreamer
for that purpose. DwarfStreamer uses AsmPrinter. It is considered OK
to use AsmPrinter based code in lld(D74169). This patch moves
DwarfStreamer implementation into DWARFLinker, so that it could be reused
from lld.

Generally, a better place for such a common DWARF generation code would be
not DWARFLinker but an additional separate library. Such a library could
contain a single version of DWARF generation routines and could also
be independent of AsmPrinter. At the current moment, DwarfStreamer
does not pretend to be such a general implementation of DWARF generation.
So I decided to put it into DWARFLinker since it is the only user
of DwarfStreamer.

Testing: it passes "check-all" lit testing. MD5 checksum for clang .dSYM
bundle matches for the dsymutil with/without that patch.

Reviewed By: JDevlieghere

Differential revision: https://reviews.llvm.org/D77169
2020-04-07 21:21:54 +03:00
Eli Friedman e9ac757f79 [AArch64] Don't expand memcmp in strict align mode.
7aecf232 fixed the bug where we would miscompile, but we still generate
a crazy amount of code. Turn off the expansion until someone implements
an appropriate heuristic.

Differential Revision: https://reviews.llvm.org/D77599
2020-04-07 10:53:36 -07:00
Matt Arsenault f596ab4066 AMDGPU: Use early return 2020-04-07 13:48:00 -04:00