Commit Graph

141045 Commits

Author SHA1 Message Date
Jay Foad 6881a82e8c [AMDGPU] Fix scheduling of exp pos4
Also fix a similar issue in SIInsertWaitcnts, but I don't think that fix
has any effect in practice.

Differential Revision: https://reviews.llvm.org/D91290
2020-11-12 19:57:14 +00:00
Jay Foad d7d6ac5624 [AMDGPU] Define and use names for export targets. NFC.
Differential Revision: https://reviews.llvm.org/D91289
2020-11-12 19:57:14 +00:00
Jianzhou Zhao 2d96859ea6 [msan] Break the getShadow loop after matching an argument
Reviewed-by: eugenis

Differential Revision: https://reviews.llvm.org/D91320
2020-11-12 19:48:59 +00:00
Nikita Popov c00545dc32 [BasicAA] Remove checks for GEP decomposition limit reached
The GEP aliasing code currently checks for the GEP decomposition
limit being reached (i.e., we did not reach the "final" underlying
object). As far as I can see, these checks are not necessary. It is
perfectly fine to work with a GEP whose base can still be further
decomposed.

Looking back through the commit history, these checks were originally
introduced in 1a444489e9. However, I
believe that the problem this was intended to address was later
properly fixed with 1726fc698c, and
the checks are no longer necessary since then (and were not the
right fix in the first place).

Differential Revision: https://reviews.llvm.org/D91010
2020-11-12 20:43:38 +01:00
Craig Topper 4cdf1d2110 [MSP430] Remove unused MVT::Glue output from MSP430ISD::SELECT_CC nodes.
Follow up from a similar patch on RISCV 637f19c36b

Nothing reads this Glue value that I could see. The SDNode def in
the td file does not have the SDNPOutGlue flag so I don't think
this glue would get properly propagated to MachineSDNodes if it
was used.
2020-11-12 10:34:01 -08:00
Jamie Schmeiser 5f672fefeb Reland: Introduce -dot-cfg-mssa option which creates dot-cfg style file with mssa comments included in source
Summary:
Expand the print-memoryssa and print<memoryssa> passes with a new hidden
option -cfg-dot-mssa that names a file. When set, a dot-cfg style file
will be generated into the named file with the memoryssa comments retained
and those blocks containing them shown in light pink. The option does
nothing in isolation.

Author: Jamie Schmeiser <schmeise@ca.ibm.com>

Reviewed By: asbirlea (Alina Sbirlea), dblaikie (David Blaikie)

Differential Revision: https://reviews.llvm.org/D90638
2020-11-12 17:39:14 +00:00
Alexander Kornienko 76b6cb515b Fix unused variable warning in release builds 2020-11-12 18:14:06 +01:00
Simon Pilgrim f72d350bfb [ValueTracking] Update computeKnownBitsFromShiftOperator callbacks to take KnownBits shift amount. NFCI.
We were creating this internally, but will need to support general KnownBits amounts as part of D90479.
2020-11-12 16:56:55 +00:00
Simon Pilgrim 8996742741 [KnownBits] Add KnownBits::makeConstant helper. NFCI.
Helper for cases where we need to create a KnownBits from a (fully known) constant value.
2020-11-12 16:16:04 +00:00
Anh Tuyen Tran a20b3620bb Revert "Introduce -dot-cfg-mssa option which creates dot-cfg style file with mssa comments included in source"
This reverts commit 45d459e752 due to
build issue in Poly.
2020-11-12 15:48:14 +00:00
Craig Topper 0add5f9122 [RISCV] Don't include CodeGen layer files in MC layer
-Use MCRegister instead of Register in MC layer.
-Move some enums from RISCVInstrInfo.h to RISCVBaseInfo.h to be with other TSFlags bits.

Differential Revision: https://reviews.llvm.org/D91114
2020-11-12 07:45:38 -08:00
Jamie Schmeiser 45d459e752 Introduce -dot-cfg-mssa option which creates dot-cfg style file with mssa comments included in source
Summary:
Expand the print-memoryssa and print<memoryssa> passes with a new hidden
option -cfg-dot-mssa that names a file. When set, a dot-cfg style file
will be generated into the named file with the memoryssa comments retained
and those blocks containing them shown in light pink. The option does
nothing in isolation.

Author: Jamie Schmeiser <schmeise@ca.ibm.com>

Reviewed By: asbirlea (Alina Sbirlea), dblaikie (David Blaikie)

Differential Revision: https://reviews.llvm.org/D90638
2020-11-12 15:41:16 +00:00
Craig Topper 9ca02d6fe1 [RISCV] Add an ANDI to shift amount of FSL/FSR instructions
The fshl and fshr intrinsics are defined to modulo their shift amount by the bitwidth of one of their inputs. The FSR/FSL instructions read one extra bit from the shift amount. If that bit is set the inputs are swapped. In order to preserve the semantics of the llvm intrinsics we need to make sure that the extra bit isn't set. DAG combine or instcombine may have removed any mask that was originally present.

We could be smarter here and try to use computeKnownBits to check if the bit is known zero, but wanted to start with correctness.

Differential Revision: https://reviews.llvm.org/D90905
2020-11-12 07:33:40 -08:00
Simon Pilgrim 11c106544b [ValueTracking] Update computeKnownBitsFromShiftOperator callbacks to use KnownBits shift handling. NFCI. 2020-11-12 15:31:26 +00:00
Jamie Schmeiser 782d6a6963 Introduce -print-before-changed, making -print-changed also print before passes that modify IR
Summary:
Add an option -print-before-changed that modifies the print-changed
behaviour so that it prints the IR before a pass that changed it in
addition to printing the IR after the pass. Note that the option
does nothing in isolation. The filtering options work as expected.
Lit tests are included.

Author: Jamie Schmeiser <schmeise@ca.ibm.com>

Reviewed By: aeubanks (Arthur Eubanks)

Differential Revision: https://reviews.llvm.org/D88757
2020-11-12 15:20:50 +00:00
Jamie Schmeiser f79b483385 [NFC intended] Refactor SinkAndHoistLICMFlags to allow others to construct without exposing internals
Summary:
Refactor SinkAdHoistLICMFlags from a struct to a class with accessors and constructors to allow other
classes to construct flags with meaningful defaults while not exposing LICM internal details.

Author: Jamie Schmeiser <schmeise@ca.ibm.com>

Reviewed By: asbirlea (Alina Sbirlea)

Differential Revision: https://reviews.llvm.org/D90482
2020-11-12 15:06:59 +00:00
David Green 11dee2eae2 [ARM] Ensure CountReg definition dominates InsertPt when creating t2DoLoopStartTP
Of course there was something missing, in this case a check that the def
of the count register we are adding to a t2DoLoopStartTP would dominate
the insertion point.

In the future, when we remove some of these COPY's in between, the
t2DoLoopStartTP will always become the last instruction in the block,
preventing this from happening. In the meantime we need to check they
are created in a sensible order.

Differential Revision: https://reviews.llvm.org/D91287
2020-11-12 13:47:46 +00:00
Alexandre Ganea ec63dfe368 [LLD] Fix include following 45b8a741fb 2020-11-12 08:32:16 -05:00
Alexandre Ganea 45b8a741fb [LLD][COFF] When using LLD-as-a-library, always prevent re-entrance on failures
This is a follow-up for D70378 (Cover usage of LLD as a library).

While debugging an intermittent failure on a bot, I recalled this scenario which
causes the issue:

1.When executing lld/test/ELF/invalid/symtab-sh-info.s L45, we reach
  lld:🧝:Obj-File::ObjFile() which goes straight into its base ELFFileBase(),
  then ELFFileBase::init().
2.At that point fatal() is thrown in lld/ELF/InputFiles.cpp L381, leaving a
  half-initialized ObjFile instance.
3.We then end up in lld::exitLld() and since we are running with LLD_IN_TEST, we
  hapily restore the control flow to CrashRecoveryContext::RunSafely() then back
  in lld::safeLldMain().
4.Before this patch, we called errorHandler().reset() just after, and this
  attempted to reset the associated SpecificAlloc<ObjFile<ELF64LE>>. That tried
  to free the half-initialized ObjFile instance, and more precisely its
  ObjFile::dwarf member.

Sometimes that worked, sometimes it failed and was catched by the
CrashRecoveryContext. This scenario was the reason we called
errorHandler().reset() through a CrashRecoveryContext.

But in some rare cases, the above repro somehow corrupted the heap, creating a
stack overflow. When the CrashRecoveryContext's filter (that is,
__except (ExceptionFilter(GetExceptionInformation()))) tried to handle the
exception, it crashed again since the stack was exhausted -- and that took the
whole application down. That is the issue seen on the bot. Locally it happens
about 1 times out of 15.

Now this situation can happen anywhere in LLD. Since catching stack overflows is
not a reliable scenario ATM when using CrashRecoveryContext, we're now
preventing further re-entrance when such failures occur, by signaling
lld::SafeReturn::canRunAgain=false. When running with LLD_IN_TEST=2 (or above),
only one iteration will be executed, instead of two.

Differential Revision: https://reviews.llvm.org/D88348
2020-11-12 08:14:43 -05:00
Kazushi (Jam) Marukawa a72d384249 [VE] Change the default type of v64 register class
Change the default type of v64 register class from v512i32 to v256f64.
Add a regression test also.

Reviewed By: simoll

Differential Revision: https://reviews.llvm.org/D91301
2020-11-12 19:07:07 +09:00
David Sherwood 3225fcf11e [SVE] Deal with SVE tuple call arguments correctly when running out of registers
When passing SVE types as arguments to function calls we can run
out of hardware SVE registers. This is normally fine, since we
switch to an indirect mode where we pass a pointer to a SVE stack
object in a GPR. However, if we switch over part-way through
processing a SVE tuple then part of it will be in registers and
the other part will be on the stack.

I've fixed this by ensuring that:

1. When we don't have enough registers to allocate the whole block
   we mark any remaining SVE registers temporarily as allocated.
2. We temporarily remove the InConsecutiveRegs flags from the last
   tuple part argument and reinvoke the autogenerated calling
   convention handler. Doing this prevents the code from entering
   an infinite recursion and, in combination with 1), ensures we
   switch over to the Indirect mode.
3. After allocating a GPR register for the pointer to the tuple we
   then deallocate any SVE registers we marked as allocated in 1).
   We also set the InConsecutiveRegs flags back how they were before.
4. I've changed the AArch64ISelLowering LowerCALL and
   LowerFormalArguments functions to detect the start of a tuple,
   which involves allocating a single stack object and doing the
   correct numbers of legal loads and stores.

Differential Revision: https://reviews.llvm.org/D90219
2020-11-12 08:41:50 +00:00
Amara Emerson ad376657c1 [AArch64][GlobalISel] Optimize G_PTR_ADD with a negated offset to be a G_SUB. 2020-11-11 22:46:53 -08:00
Max Kazantsev 2734a9ebf4 [NFC][SCEV] Generalize monotonicity check for full and limited iteration space
A piece of logic of `isLoopInvariantExitCondDuringFirstIterations` is actually
a generalized predicate monotonicity check. This patch moves it into the
corresponding method and generalizes it a bit.

Differential Revision: https://reviews.llvm.org/D90395
Reviewed By: apilipenko
2020-11-12 12:37:07 +07:00
Xun Li 94a45a8098 Revert "[Coroutine] Allocas used by StoreInst does not always escape"
This reverts commit 8bc7b9278e, which landed by accident.
2020-11-11 21:09:39 -08:00
Max Kazantsev d6dd938589 [IndVars] IV user should not prevent use widening
Sometimes the an instruction we are trying to widen is used by the IV
(which means the instruction is the IV increment). Currently this may
prevent its widening. We should ignore such user because it will be
dead once the transform is done anyways.

Differential Revision: https://reviews.llvm.org/D90920
Reviewed By: fhahn
2020-11-12 12:02:01 +07:00
Xun Li 8bc7b9278e [Coroutine] Allocas used by StoreInst does not always escape
In the existing logic, for a given alloca, as long as its pointer value is stored into another location, it's considered as escaped.
This is a bit too conservative. Specifically, in non-optimized build mode, it's often to have patterns of code that first store an alloca somewhere and then load it right away.
These used should be handled without conservatively marking them escaped.

This patch tracks how the memory location where an alloca pointer is stored into is being used. As long as we only try to load from that location and nothing else, we can still
consider the original alloca not escaping and keep it on the stack instead of putting it on the frame.

Differential Revision: https://reviews.llvm.org/D91305
2020-11-11 20:53:51 -08:00
Max Kazantsev 2e01ceafaa [IndVars] Recognize 'sub nuw' expressed as 'add' for widening
InstCombine canonicalizes 'sub nuw' instructions to 'add' without the
`nuw` flag. The typical case where we see it is decrementing induction
variables. For them, IndVars fails to prove that it's legal to widen them,
and inserts unprofitable `zext`'s.

This patch adds recognition of such pattern using SCEV.

Differential Revision: https://reviews.llvm.org/D89550
Reviewed By: fhahn, skatkov
2020-11-12 10:51:29 +07:00
Arnold Schwaighofer 431337662e [coro] Async coroutines: Allow more than 3 arguments in the dispatch function
We need to be able to call function pointers. Inline the dispatch
function.

Also inline the context projection function.

Transfer debug locations from the suspend point to the inlined functions.

Use the function argument index instead of the function argument in
coro.id.async. This solves any spurious use issues.

Coerce the arguments of the tail call function at a suspend point. The LLVM
optimizer seems to drop casts leading to a vararg intrinsic.

rdar://70097093

Differential Revision: https://reviews.llvm.org/D91098
2020-11-11 15:25:28 -08:00
Arthur Eubanks b6ccff3d5f [NewPM] Provide method to run all pipeline callbacks, used for -O0
Some targets may add required passes via
TargetMachine::registerPassBuilderCallbacks(). We need to run those even
under -O0. As an example, BPFTargetMachine adds
BPFAbstractMemberAccessPass, a required pass.

This also allows us to clean up BackendUtil.cpp (and out-of-tree Rust
usage of the NPM) by allowing us to share added passes like coroutines
and sanitizers between -O0 and other optimization levels.

Since callbacks may end up not adding passes, we need to check if the
pass managers are empty before adding them, so PassManager now has an
isEmpty() function. For example, polly adds callbacks but doesn't always
add passes in those callbacks, so this is necessary to keep
-debug-pass-manager tests' output from changing depending on if polly is
enabled or not.

Tests are a continuation of those added in
https://reviews.llvm.org/D89083.

Reviewed By: asbirlea, Meinersbur

Differential Revision: https://reviews.llvm.org/D89158
2020-11-11 15:10:27 -08:00
Baptiste Saleil 37c4ac8545 [PowerPC] Accumulator/Unprimed Accumulator register copy, spill and restore
This patch adds support for accumulator/unprimed accumulator
register copy, spill and restore for MMA.

Authored By: Baptiste Saleil

Reviewed By: #powerpc, bsaleil, amyk

Differential Revision: https://reviews.llvm.org/D90616
2020-11-11 16:23:45 -06:00
Arthur Eubanks d9cbceb041 [CGSCC][Inliner] Handle new non-trivial edges in updateCGAndAnalysisManagerForPass
Previously the inliner did a bit of a hack by adding ref edges for all
new edges introduced by performing an inline before calling
updateCGAndAnalysisManagerForPass(). This was because
updateCGAndAnalysisManagerForPass() didn't handle new non-trivial call
edges.

This adds handling of non-trivial call edges to
updateCGAndAnalysisManagerForPass().  The inliner called
updateCGAndAnalysisManagerForFunctionPass() since it was handling adding
newly introduced edges (so updateCGAndAnalysisManagerForPass() would
only have to handle promotion), but now it needs to call
updateCGAndAnalysisManagerForCGSCCPass() since
updateCGAndAnalysisManagerForPass() is now handling the new call edges
and function passes cannot add new edges.

We follow the previous path of adding trivial ref edges then letting promotion
handle changing the ref edges to call edges and the CGSCC updates. So
this still does not allow adding call edges that result in an addition
of a non-trivial ref edge.

This is in preparation for better detecting devirtualization. Previously
since the inliner itself would add ref edges,
updateCGAndAnalysisManagerForPass() would think that promotion and thus
devirtualization had happened after any sort of inlining.

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D91046
2020-11-11 13:43:49 -08:00
Jessica Paquette 7a70a2f04d [AArch64][GlobalISel] Mark G_FCONSTANT as legal when there is full fp16 support
When there is full fp16 support, there is no reason to widen 16-bit
G_FCONSTANTs to 32 bits. Mark them as legal in this case.

Also, we currently import a pattern for materializing a 16-bit 0.0.
Add a testcase showing we select it.

(All other 16-bit G_FCONSTANTS are not yet selected.)

Differential Revision: https://reviews.llvm.org/D89164
2020-11-11 13:25:11 -08:00
Jianzhou Zhao 0dd87825db Add a flag to control whether to propagate labels from condition values to results
Before the change, DFSan always does the propagation. W/o
origin tracking, it is harder to understand such flows. After
the change, the flag is off by default.

Reviewed-by: morehouse
Differential Revision: https://reviews.llvm.org/D91234
2020-11-11 20:41:42 +00:00
Pavel Iliin fdb979cfbb [NFC] [Legalize] Fix spaces and style. 2020-11-11 19:24:15 +00:00
Akira Hatanaka 5e85d00ed6 Move variable declarations to functions in which they are used. NFC 2020-11-11 10:58:43 -08:00
Vedant Kumar d76e01a6a7 [MachO] Allow the LC_IDENT load command
xnu coredumps include an LC_IDENT load command. It's helpful to be able
to just ignore these. IIUC an interested client can grab the identifier
using the MachOObjectFile::load_commands() API.

The status quo is that llvm bails out when it finds an LC_IDENT because
the command is obsolete (see isLoadCommandObsolete).

Differential Revision: https://reviews.llvm.org/D91221
2020-11-11 10:15:54 -08:00
Craig Topper 637f19c36b [RISCV] Remove traces of Glue from RISCVISD::SELECT_CC
We were creating RISCVISD::SELECT_CC nodes with Glue output that was never being used, and the tablegen SDNode had the SDNPInGlue flag instead of the SDNPOutGlue flag.

Since we don't seem to need the Glue just get rid of it from both places.

Differential Revision: https://reviews.llvm.org/D91199
2020-11-11 09:30:48 -08:00
Jessica Paquette c42053f79b [AArch64][GlobalISel] Select arith extended add/sub in manual selection code
The manual selection code for add/sub was not checking if it was possible to
fold in shifts + extends (the *rx opcode variants).

As a result, we could never select things like

```
cmp x1, w0, uxtw #2
```

Because we don't import any patterns for compares.

This adds support for the arithmetic shifted register forms and updates tests
for instructions selected using `emitADD`, `emitADDS`, and `emitSUBS`.

This is a 0.1% geomean code size improvement on SPECINT2000 at -Os.

Differential Revision: https://reviews.llvm.org/D91207
2020-11-11 09:26:03 -08:00
Jessica Paquette f0580c73bb [AArch64][GlobalISel] Select negative arithmetic immediates in manual selector
Previously, we only handled negative arithmetic immediates in the imported
selector code.

Since we don't import code for, say, compares, we were missing opportunities
for things like

```
%cst:gpr(s64) = G_CONSTANT i64 -10
%cmp:gpr(s32) = G_ICMP intpred(eq), %reg0(s64), %cst
->
%adds = ADDSXri %reg0, 10, 0, implicit-def $nzcv
%cmp = CSINCWr $wzr, $wzr, 1, implicit $nzcv
```

Instead, we would have to materialize the constant and emit a SUBS.

This adds support for selection like above for SUB, SUBS, ADD, and ADDS.

This is a 0.1% geomean code size improvement on SPECINT2000 at -Os.

Differential Revision: https://reviews.llvm.org/D91108
2020-11-11 09:20:05 -08:00
Jay Foad f23c4c6f8a [AMDGPU] Separate out real exp instructions by subtarget. NFC.
Differential Revision: https://reviews.llvm.org/D91247
2020-11-11 17:13:40 +00:00
Jay Foad 2b33ea6935 [AMDGPU] Split exp instructions out into their own tablegen file. NFC.
Differential Revision: https://reviews.llvm.org/D91246
2020-11-11 17:13:40 +00:00
Jay Foad f94fd1c8ca [AMDGPU] Make use of SIInstrInfo::isEXP. NFC. 2020-11-11 17:01:20 +00:00
Alex Richardson fb9942f876 [AsmParser] Add source location to all errors related to .cfi directives
I was trying to add .cfi_ annotations to assembly code in the FreeBSD
kernel and changed a macro that then resulted in incorrectly nested
directives. However, clang's diagnostics said the error was happening at
<unknown>:0. This addresses one of the TODOs added in D51695.

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D89787
2020-11-11 17:00:07 +00:00
Hans Wennborg 418f18c6cd Revert "Reland [CFGuard] Add address-taken IAT tables and delay-load support"
This broke both Firefox and Chromium (PR47905) due to what seems like dllimport
function not being handled correctly.

> This patch adds support for creating Guard Address-Taken IAT Entry Tables (.giats$y sections) in object files, matching the behavior of MSVC. These contain lists of address-taken imported functions, which are used by the linker to create the final GIATS table.
> Additionally, if any DLLs are delay-loaded, the linker must look through the .giats tables and add the respective load thunks of address-taken imports to the GFIDS table, as these are also valid call targets.
>
> Reviewed By: rnk
>
> Differential Revision: https://reviews.llvm.org/D87544

This reverts commit cfd8481da1.
2020-11-11 16:03:33 +01:00
Sander de Smalen 30fded75b4 Revert "[LoopVectorizer] NFCI: Calculate register usage based on TLI.getTypeLegalizationCost."
This reverts commits:
* [LoopVectorizer] NFCI: Calculate register usage based on TLI.getTypeLegalizationCost.
  b873aba394.
* [LoopVectorizer] Silence warning in GetRegUsage.
  9ff701100a.
2020-11-11 14:41:55 +00:00
Jay Foad 830ed64ccd Revert "Revert "[AMDGPU] Reorganize GCN subtarget features for unaligned access""
This reverts commit 8b08fa0103.

The underlying problems were fixed by D90607.
2020-11-11 14:40:14 +00:00
Simon Pilgrim f6a326adef [ValueTracking] computeKnownBitsFromShiftOperator - merge zero/one callbacks to single KnownBits callback. NFCI.
Another cleanup for D90479 - handle the Known Ones/Zeros in a single callback, which will make it much easier to jump over to the KnownBits shift handling.
2020-11-11 14:22:42 +00:00
Caroline Concatto 37f4ccb275 [AArch64]Add memory op cost model for SVE
This patch adds/fixes memory op cost model for SVE with fixed-width
vector.

Differential Revision: https://reviews.llvm.org/D90950
2020-11-11 12:49:19 +00:00
Simon Pilgrim 1a62ca65c1 [KnownBits] Add KnownBits::commonBits helper. NFCI.
We have a frequent pattern where we're merging two KnownBits to get the common/shared bits, and I just fell for the gotcha where I tried to use the & operator to merge them........
2020-11-11 12:15:54 +00:00
Kerry McLaughlin 170947a5de [SVE][CodeGen] Lower scalable masked scatters
Lowers the llvm.masked.scatter intrinsics (scalar plus vector addressing mode only)

Changes included in this patch:
 - Custom lowering for MSCATTER, which chooses the appropriate scatter store opcode to use.
    Floating-point scatters are cast to integer, with patterns added to match FP reinterpret_casts.
 - Added the getCanonicalIndexType function to convert redundant addressing
   modes (e.g. scaling is redundant when accessing bytes)
 - Tests with 32 & 64-bit scaled & unscaled offsets

Reviewed By: sdesmalen

Differential Revision: https://reviews.llvm.org/D90941
2020-11-11 11:50:22 +00:00