Commit Graph

140363 Commits

Author SHA1 Message Date
Geoffrey Martin-Noble c17ae2916c Remove unnecessary header include which violates layering
This was introduced in https://reviews.llvm.org/D89774, but I don't
think it should be necessary.

Reviewed By: TaWeiTu, aeubanks

Differential Revision: https://reviews.llvm.org/D89843
2020-10-20 20:14:03 -07:00
Carl Ritson 324a15cead [AMDGPU][NFC] Fix missing size in comment 2020-10-21 11:38:21 +09:00
Cyndy Ishida acb33cba6d [llvm] Fix ODRViolations for VersionTuple YAML specializations NFC
It appears for Swift there was confusing errors when trying to parse APINotes, when libAPINotes and libInterfaceStub are linked, they both export symbol
`__ZN4llvm4yaml7yamlizeINS_12VersionTupleEEENSt3__19enable_ifIXsr16has_ScalarTraitsIT_EE5valueEvE4typeERNS0_2IOERS5_bRNS0_12EmptyContextE`, and discovered
same symbol defined within llvm-ifs.

This consolidates the boilerplate into YAMLTraits and defers the specific validation in reading the whole input.
fixes: rdar://problem/70450563

Reviewed By: phosek, dblaikie

Differential Revision: https://reviews.llvm.org/D89764
2020-10-20 18:29:15 -07:00
Hubert Tong 134ffa8138 NFC: Fix -Wsign-compare warnings on 32-bit builds
Comparing 32-bit `ptrdiff_t` against 32-bit `unsigned` results in
`-Wsign-compare` warnings for both GCC and Clang.

The warning for the cases in question appear to identify an issue
where the `ptrdiff_t` value would be mutated via conversion to an
unsigned type.

The warning is resolved by using the usual arithmetic conversions to
safely preserve the value of the `unsigned` operand while trying to
convert to a signed type. Host platforms where `unsigned` has the same
width as `unsigned long long` will need to make a different change, but
using an explicit cast has disadvantages that can be avoided for now.

Reviewed By: dantrushin

Differential Revision: https://reviews.llvm.org/D89612
2020-10-20 20:52:10 -04:00
Austin Kerbow ebdcef20ce [AMDGPU] Avoid inserting noops during scheduling
Passes that are run after the post-RA scheduler may insert instructions like
waitcnt which eliminate the need for certain noops. After this patch the
scheduler is still aware of possible latency from hazards but noops will
not be inserted until the dedicated hazard recognizer pass is run.

Depends on D89753.

Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D89754
2020-10-20 17:11:36 -07:00
Austin Kerbow 37d907899f [HazardRec] Allow inserting multiple wait-states simultaneously
If a target can encode multiple wait-states into a noop allow emitting such
instructions directly.

Reviewed By: rampitec, dmgreen

Differential Revision: https://reviews.llvm.org/D89753
2020-10-20 17:03:47 -07:00
Tony 1bc7bfffdb [AMDGPU] Optimize waitcnt insertion for flat memory operations
Change waitcnt insertion to check the memory operand tokens to see if
flat memory operations access VMEM in the same way it does to check if
accessing LDS. This avoids adding waitcnt for counters for address
spaces that are not accessed.

In addition, only generate the pessimistic waitcnt 0 if a flat memory
operation appears to access both VMEM and LDS.

This benefits flat memory operations that explicitly specify the
address space as GLOBAL or LOCAL.

Differential Revision: https://reviews.llvm.org/D89618
2020-10-20 22:55:12 +00:00
Craig Topper 1298252f80 [X86] Move 'int $3' -> 'int3' handling in the assembler to processInstruction.
Instead of handling before parsing, just fix it after parsing.
2020-10-20 15:22:00 -07:00
Craig Topper 702aae368a [X86] Move 's{hr,ar,hl} , <op>' to 'shift <op>' optimization in the assembler into processInstruction.
Instead of detecting the mnemonic and hacking the operands before
parsing. Just fix it up after parsing.
2020-10-20 15:20:46 -07:00
Kazu Hirata 96f372c1e7 [AsmWriter] Construct SlotTracker with the function
This patch teaches BasicBlock::print to construct an instance of
SlotTracker with the containing function.

Without this patch, we dump:

*** IR Dump After LoopInstSimplifyPass ***
; Preheader:
  br label %1

; Loop:
<badref>:                                         ; preds = %1, %0
  br label %1

Note "<badref>" above.  This happens because BasicBlock::print calls:

  SlotTracker SlotTable(this->getModule());

Note that this constructor does not add the contents of functions to
the slot table.  That is, basic blocks are left unnumbered.

This patch fixes the problem by switching to:

  SlotTracker SlotTable(this->getParent());

which does add the contents of the Module and the function,
this->getParent(), to the slot table.

Differential Revision: https://reviews.llvm.org/D89567
2020-10-20 15:01:40 -07:00
Paul C. Anagnostopoulos 332ff48dee [AMDGPU] [TableGen] Clean up !if(!eq(boolean, 1) and related booleans
Differential Revision: https://reviews.llvm.org/D89796
2020-10-20 16:23:15 -04:00
Shimin Cui 95bda510fb [ConstantFold] Fold the comparison of bitcasted global values
This is to simplify icmp instructions in the form like:

%cmp = icmp eq i32 (i8*, i8*)* bitcast (i32 (i32**, i32**)* @f32 to i32
%(i8*, i8*)), bitcast (i32 (i64**, i64**) @f64 to i32 (i8*, i8*)*)

Here @f32 and @f64 are two functions.

Differential Revision: https://reviews.llvm.org/D87850
2020-10-20 12:41:49 -07:00
David Stenberg 0c0fcea557 Handle value uses wrapped in metadata for the use-list order
When generating the use-list order, also consider value uses that are
operands which are wrapped in metadata; e.g. llvm.dbg.value operands.

This fixes PR36778. The test case is based on the reproducer from that
report.

Reviewed By: dexonsmith

Differential Revision: https://reviews.llvm.org/D53758
2020-10-20 20:05:59 +02:00
Nicolai Hähnle 848a68a032 DomTree: Extract (mostly) read-only logic into type-erased base classes
Avoid having to instantiate and compile a subset of the dominator tree logic
separately for each node type. More importantly, this allows generic
algorithms to be built on top of dominator trees without writing them as
templates -- such algorithms can now use opaque CfgBlockRef and
CfgInterface instead.

A type-erased implementation of dominator trees could be written in
terms of CfgInterface as well, but doing so would change the current
trade-off: it would slightly reduce code size at the cost of a slight
runtime overhead.

This patch does not change the trade-off, as it only does type-erasure
where basic blocks can be treated in a fully opaque way, i.e. it only
moves methods that don't require iteration over CFG successors and
predecessors.

v5:
- rename generic_{begin,end,children} back without the generic_ prefix
  and refer explictly to base class methods in NewGVN, which wants to
  mutate the order of dominator tree node children directly

v6:
- style change: iDom -> idom; it's arguable whether this is really
  invalid, since it is actually standard camelCase, but clang-tidy
  complains about it so... *shrug*
- rename {to,from}Generic -> {wrap,unwrap}Ref

Change-Id: Ib860dc04cf8bb093d8ed00be7def40d662213672

Differential Revision: https://reviews.llvm.org/D83089
2020-10-20 19:53:07 +02:00
Ta-Wei Tu 529ecd19df [NPM] port -unify-loop-exits to NPM
Reviewed By: aeubanks

Differential Revision: https://reviews.llvm.org/D89774
2020-10-20 10:46:57 -07:00
vnalamot 89f7ccea6f [AMDGPU] Remove getAllVGPR32() which cannot handle Accum VGPRs properly
Remove getAllVGPR32() interface and update the SGPR spill code to use
a proper method to get the relevant VGPR registers list.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D89806
2020-10-20 23:15:24 +05:30
Ta-Wei Tu 59286b36df [NPM] Port -mergereturn to NPM
Reviewed By: aeubanks

Differential Revision: https://reviews.llvm.org/D89781
2020-10-20 10:33:58 -07:00
Florian Hahn 2e58010208 [DSE] Do not scan users of memory terminators for further reads.
isMemTerminator checks if the current def is a memory terminator that
terminates the memory pointed to by DefLoc. We do not have to add any of
their users to the worklist, because the follow-on users cannot read the
memory in question.

This leads to more stores eliminated in the presence of lifetime calls.
Previously we added the users of those intrinsics to the worklist,
limiting elimination.

In terms of removed stores, this gives a nice boost on some benchmarks
(MultiSource/SPEC2000/SPEC2006 on X86 with -flto -O3):

Same hash: 205 (filtered out)
Remaining: 32
Metric: dse.NumFastStores

Program                                          base   patch   diff
 test-suite...000/197.parser/197.parser.test     4.00    8.00  100.0%
 test-suite...rolangs-C++/family/family.test     4.00    7.00  75.0%
 test-suite...marks/7zip/7zip-benchmark.test   1722.00 2189.00 27.1%
 test-suite...CFP2000/177.mesa/177.mesa.test    30.00   38.00  26.7%
 test-suite :: External/Nurbs/nurbs.test        44.00   49.00  11.4%
 test-suite...lications/sqlite3/sqlite3.test   115.00  128.00  11.3%
 test-suite...006/447.dealII/447.dealII.test   2715.00 3013.00 11.0%
 test-suite...ProxyApps-C++/CLAMR/CLAMR.test   237.00  261.00  10.1%
 test-suite...tions/lambda-0.1.3/lambda.test    40.00   44.00  10.0%
 test-suite...3.xalancbmk/483.xalancbmk.test   1366.00 1475.00  8.0%
 test-suite...abench/jpeg/jpeg-6a/cjpeg.test    13.00   14.00   7.7%
 test-suite...oxyApps-C++/miniFE/miniFE.test    43.00   46.00   7.0%
 test-suite...lications/ClamAV/clamscan.test   230.00  246.00   7.0%
 test-suite...006/450.soplex/450.soplex.test   284.00  299.00   5.3%
 test-suite...nsumer-jpeg/consumer-jpeg.test    21.00   22.00   4.8%
2020-10-20 16:55:22 +01:00
Simon Pilgrim ec228fbfc0 [InstCombine] SimplifyDemandedUseBits - replace dyn_cast<ConstantInt> with m_ConstantInt. NFCI. 2020-10-20 16:45:16 +01:00
Jay Foad 4913e3627c [AMDGPU] Remove unused declaration. NFC.
The implementation of this method was removed in D89706.
2020-10-20 16:31:42 +01:00
Simon Pilgrim ce13549761 [InstCombine] foldOrOfICmps - use m_Specific instead of explicit comparisons. NFCI. 2020-10-20 16:26:41 +01:00
Michael Liao 2a0e4d1c01 [amdgpu] Enhance AMDGPU AA.
- In general, a generic point may alias to pointers in all other address
  spaces. However, for certain cases enforced by the programming model,
  we may found a generic point won't alias to pointers to local objects.
  * When a generic pointer is loaded from the constant address space, it
    could only be a pointer to the GLOBAL or CONSTANT address space.
    Thus, it won't alias to pointers to the PRIVATE or LOCAL address
    space.
  * When a generic pointer is passed as a kernel argument, it also could
    only be a pointer to the GLOBAL or CONSTANT address space. Thus, it
    also won't alias to pointers to the PRIVATE or LOCAL address space.

Differential Revision: https://reviews.llvm.org/D89525
2020-10-20 09:54:12 -04:00
Florian Hahn 6439fde6d4 [DSE] Bail out from getLocForWriteEx if call is not argmemonly/inacc_mem.
This change should currently not have any impact, but guard against
further inconsistencies between MemoryLocation and function attributes.
2020-10-20 14:37:53 +01:00
Georgii Rymar 6487ffafd1 Reland "[yaml2obj][ELF] - Simplify the code that performs sections validation."
This reverts commit 1b589f4d4d and relands the D89463
with the fix: update `MappingTraits<FileFilter>::validate()` in ClangTidyOptions.cpp to
match the new signature (change the return type to "std::string" from "StringRef").

Original commit message:

This:

Changes the return type of MappingTraits<T>>::validate to std::string
instead of StringRef. It allows to create more complex error messages.

It introduces std::vector<std::pair<StringRef, bool>> getEntries():
a new virtual method of Section, which is the base class for all sections.
It returns names of special section specific keys (e.g. "Entries") and flags that says if them exist in a YAML.
The code in validate() uses this list of entries descriptions to generalize validation.
This approach was discussed in the D89039 thread.

Differential revision: https://reviews.llvm.org/D89463
2020-10-20 16:25:33 +03:00
Sanjay Patel 7c516504a1 [InstSimplify] allow vector splats for icmp-of-neg folds 2020-10-20 09:24:36 -04:00
Simon Pilgrim e372a5f86f [InstCombine] Add or((icmp ult/ule (A + C1), C3), (icmp ult/ule (A + C2), C3)) uniform vector support
Reapplied rGa704d8238c86 with a check for integer/integervector types to prevent matching with pointer types
2020-10-20 14:14:26 +01:00
Georgii Rymar 1b589f4d4d Revert "[yaml2obj][ELF] - Simplify the code that performs sections validation."
This reverts commit b9e2b59680.
2020-10-20 15:16:56 +03:00
Nicolai Hähnle c0cdd22c72 Introduce CfgTraits abstraction
The CfgTraits abstraction simplfies writing algorithms that are
generic over the type of CFG, and enables writing such algorithms
as regular non-template code that operates on opaque references
to CFG blocks and values.

Implementations of CfgTraits provide operations on the concrete
CFG types, e.g. `IrCfgTraits::BlockRef` is `BasicBlock *`.

CfgInterface is an abstract base class which provides operations
on opaque types CfgBlockRef and CfgValueRef. Those opaque types
encapsulate a `void *`, but the meaning depends on the concrete
CFG type. For example, MachineCfgTraits -- for use with MachineIR
in SSA form -- encodes a Register inside CfgValueRef. Converting
between concrete references and opaque/generic ones is done by
CfgTraits::{fromGeneric,toGeneric}. Convenience methods
CfgTraits::{un}wrap{Iterator,Range} are available as well.

Writing algorithms in terms of CfgInterface adds some overhead
(virtual method calls, plus in same cases it removes the
opportunity to inline iterators), but can be much more convenient
since generic algorithms can be written as non-templates.

This patch adds implementations of CfgTraits for all CFGs on
which dominator trees are calculated, so that the dominator
tree can be ported to this machinery. Only IrCfgTraits (LLVM IR)
and MachineCfgTraits (Machine IR in SSA form) are complete, the
other implementations are limited to the absolute minimum
required to make the upcoming dominator tree changes work.

v5:
- fix MachineCfgTraits::blockdef_iterator and allow it to iterate over
  the instructions in a bundle
- use MachineBasicBlock::printName

v6:
- implement predecessors/successors for all CfgTraits implementations
- fix error in unwrapRange
- rename toGeneric/fromGeneric into wrapRef/unwrapRef to have naming
  that is consistent with {wrap,unwrap}{Iterator,Range}
- use getVRegDef instead of getUniqueVRegDef

v7:
- std::forward fix in wrapping_iterator
- fix typos

v8:
- cleanup operators on CfgOpaqueType
- address other review comments

Change-Id: Ia75f4f268fded33fca11218a7d578c9aec1f3f4d

Differential Revision: https://reviews.llvm.org/D83088
2020-10-20 13:50:52 +02:00
Simon Pilgrim e346ea9905 [InstCombine] SimplifyDemandedUseBits - pass APInt by const reference. NFCI. 2020-10-20 12:13:08 +01:00
Carl Ritson be2afbd019 [AMDGPU] Remove fix up operand from SI_ELSE
Remove immediate operand from SI_ELSE which indicates if EXEC has
been modified.  Instead always emit code that handles EXEC and
remove unnecessary instructions during pre-RA optimisation.

This facilitates passes (i.e. SIWholeQuadMode) adding exec mask
manipulation post control flow lowering, and pre control flow
lower passes do not need to be aware of SI_ELSE handling.

Reviewed By: nhaehnle

Differential Revision: https://reviews.llvm.org/D89644
2020-10-20 19:15:21 +09:00
Georgii Rymar 3be2b0d1a1 [yaml2obj][NFCI] - Address post commit comments for "[yaml2obj][ELF] - Simplify the code that performs sections validation."
This addresses post commit comments for D89463.
2020-10-20 12:51:19 +03:00
Georgii Rymar b9e2b59680 [yaml2obj][ELF] - Simplify the code that performs sections validation.
This:
1) Changes the return type of `MappingTraits<T>>::validate` to `std::string`
instead of `StringRef`. It allows to create more complex error messages.

2) It introduces std::vector<std::pair<StringRef, bool>> getEntries():
a new virtual method of Section, which is the base class for all sections.
It returns names of special section specific keys (e.g. "Entries") and flags that
says if them exist in a YAML. The code in validate() uses this list of entries
descriptions to generalize validation.
This approach was discussed in the D89039 thread.

Differential revision: https://reviews.llvm.org/D89463
2020-10-20 11:28:23 +03:00
Carl Ritson 6aabbeadae [AMDGPU][NFC] Tidy SIOptimizeExecMaskingPreRA for extensibility
Remove duplicate code and move things around to make it easier to
add additional optimisations to the pass.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D89619
2020-10-20 17:22:43 +09:00
Ulrich Weigand c299f3555d [SystemZ] Fix disassembler crashes
The "Size" value returned by SystemZDisassembler::getInstruction is
used by common code even in the case where the routine returns
failure.  If that Size value exceeds the number of bytes remaining
in the section, that could cause disassembler crashes.

Fixed by never returning more than the number of bytes remaining.
2020-10-20 10:21:42 +02:00
Evgeny Leviant 991e86156c [ARM][SchedModels] Convert IsCPSRDefinedPred to MCSchedPredicate
Differential revision: https://reviews.llvm.org/D89460
2020-10-20 11:14:21 +03:00
David Green 6dcbc323fd Revert "[ARM][LowOverheadLoops] Adjust Start insertion."
This reverts commit 38f625d0d1.

This commit contains some holes in its logic and has been causing
issues since it was commited. The idea sounds OK but some cases were not
handled correctly. Instead of trying to fix that up later it is probably
simpler to revert it and work to reimplement it in a more reliable way.
2020-10-20 08:55:21 +01:00
Atmn Patel 595c615606 [IR] Adds mustprogress as a LLVM IR attribute
This adds the LLVM IR attribute `mustprogress` as defined in LangRef through D86233. This attribute will be applied to functions with in languages like C++ where forward progress is guaranteed. Functions without this attribute are not required to make progress.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D85393
2020-10-20 03:09:57 -04:00
Luqman Aden 51892a42da [COFF][ARM] Fix CodeView for Windows on 32bit ARM targets.
Create the LLVM / CodeView register mappings for the 32-bit ARM Window targets.

Reviewed By: compnerd

Differential Revision: https://reviews.llvm.org/D89622
2020-10-19 22:16:16 -07:00
Jonas Devlieghere f44fb13025 [FileCollector] Move interface into FileCollectorBase (NFC)
For the reproducers in LLDB we want to switch to an "immediate mode"
FileCollector that writes every file encountered straight to disk so we
can generate the actual mapping out-of-process. This patch moves the
interface into a separate base class.

Differential revision: https://reviews.llvm.org/D89742
2020-10-19 21:37:20 -07:00
Max Kazantsev a10a64e7e3 [SCEV] Recommit "Use nw flag and symbolic iteration count to sharpen ranges of AddRecs", attempt 2
Fixed wrapping range case & proof methods reduced to constant range
checks to save compile time.

Differential Revision: https://reviews.llvm.org/D89381
2020-10-20 11:32:36 +07:00
Serguei Katkov 38799975ce [IRCE] Do not transform if loop has small number of iterations
IRCE has some overhead for runtime checks and in case number of iteration is small
the overhead can kill the benefit from optimizations.

This CL bases on BlockFrequencyInfo of pre-header and header to estimate the
number of loop iterations. If it is less than irce-min-estimated-iters we do not transform the loop.

Probably it is better to make more complex cost model but for simplicity it seems the be enough.

The usage of BFI is added only for new pass manager and tries to use it efficiently.

Reviewers: ebrevnov, dantrushin, asbirlea, mkazantsev
Reviewed By: mkazantsev
Subscribers: llvm-commits, fhahn
Differential Revision: https://reviews.llvm.org/D89541
2020-10-20 10:33:59 +07:00
Qiu Chaofan 1b2fe71ecf [DAGCombiner] Tighten reasscociation of visitFMA
From LangRef, FMF contract should not enable reassociating to form
arbitrary contractions. So it should not help rearrange nodes like
(fma (fmul x, c1), c2, y) into (fma x, c1*c2, y).

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D89527
2020-10-20 10:13:01 +08:00
Volodymyr Sapsai a28678e20a Revert "Reland "[Modules] Add stats to measure performance of building and loading modules.""
This reverts commit 4000c9ee18.

Test "LLVM :: Other/statistic.ll" is failing on Windows.
2020-10-19 18:27:30 -07:00
Wang, Pengfei 3a85472af2 [X86] Fix assert fail when element type is i1.
extract_vector_elt will turn type vxi1 into i8, which triggers the assertion fail.
Since we don't really handle vxi1 cases in below code, we can just return from here.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D89096
2020-10-20 09:26:32 +08:00
Volodymyr Sapsai 4000c9ee18 Reland "[Modules] Add stats to measure performance of building and loading modules."
Measure amount of high-level or fixed-cost operations performed during
building/loading modules and during header search. High-level operations
like building a module or processing a .pcm file are motivated by
previous issues where clang was re-building modules or re-reading .pcm
files unnecessarily. Fixed-cost operations like `stat` calls are tracked
because clang cannot change how long each operation takes but it can
perform fewer of such operations to improve the compile time.

Also tracking such stats over time can help us detect compile-time
regressions. Added stats are more stable than the actual measured
compilation time, so expect the detected regressions to be less noisy.

On relanding drop stats in MemoryBuffer.cpp as their value is pretty low
but affects a lot of clients and many of those aren't interested in
modules and header search.

rdar://problem/55715134

Reviewed By: aprantl, bruno

Differential Revision: https://reviews.llvm.org/D86895
2020-10-19 15:44:11 -07:00
Stanislav Mekhanoshin 6ddadf9901 [AMDGPU] flat scratch ST addressing mode on gfx10
GFX10 enables third addressing mode for flat scratch instructions,
an ST mode. In that mode both register operands are omitted and
only swizzled offset is used in addition to flat_scratch base.

Differential Revision: https://reviews.llvm.org/D89501
2020-10-19 15:29:52 -07:00
Jordan Rupprecht 8a377f1e3c [NFC] Inline assertion-only variable 2020-10-19 15:11:37 -07:00
Sergei Trofimovich 1eb812e06d [VE] Fix initializer visibility
Before the change attempt to link libLTO.so against shared
LLVM library failed as:

```
[ 76%] Linking CXX shared library ../../lib/libLTO.so
... /usr/bin/cmake -E cmake_link_script CMakeFiles/LTO.dir/link.txt --verbose=1
c++ -o ...libLTO.so.12git ...ibLLVM-12git.so
ld: CMakeFiles/LTO.dir/lto.cpp.o: in function `llvm::InitializeAllTargetInfos()':
include/llvm/Config/Targets.def:31: undefined reference to `LLVMInitializeVETargetInfo'
```

It happens because on linux llvm build system sets default
symbol visibility to "hidden". The fix is to set visibility
back to "default" for exported APIs with LLVM_EXTERNAL_VISIBILITY.

Bug: https://bugs.llvm.org/show_bug.cgi?id=47847

Reviewed By: simoll

Differential Revision: https://reviews.llvm.org/D89633
2020-10-19 22:54:41 +01:00
Amy Huang ea693a1627 [NPM] Port module-debuginfo pass to the new pass manager
Port pass to NPM and update tests in DebugInfo/Generic.

Differential Revision: https://reviews.llvm.org/D89730
2020-10-19 14:31:17 -07:00
Roman Lebedev e0567582b8
[NFCI][SCEV] Always refer to enum SCEVTypes as enum, not integer
The main tricky thing here is forward-declaring the enum:
we have to specify it's underlying data type.

In particular, this avoids the danger of switching over the SCEVTypes,
but actually switching over an integer, and not being notified
when some case is not handled.

I have updated most of such switches to be exaustive and not have
a default case, where it's pretty obvious to be the intent,
however not all of them.
2020-10-20 00:10:22 +03:00
Roman Lebedev d4b0aa9773
[NFC][SCEV] BuildConstantFromSCEV(): reformat, NFC
Makes diff in next commit more readable
2020-10-20 00:10:22 +03:00
Roman Lebedev 3355284b2d
[NFC][SCEVExpander] isHighCostExpansionHelper(): rewrite as a switch
If we switch over an enum, compiler can easily issue a diagnostic
if some case is not handled. However with an if cascade that isn't so.
Experimental evidence suggests new behavior to be superior.
2020-10-20 00:10:22 +03:00
Evgenii Stepanov 188a7d6710 Add alloca size threshold for StackTagging initializer merging.
Summary:
Initializer merging generates pretty inefficient code for large allocas
that also happens to trigger an exponential algorithm somewhere in
Machine Instruction Scheduler. See https://bugs.llvm.org/show_bug.cgi?id=47867.

This change adds an upper limit for the alloca size. The default limit
is selected such that worst case size of memtag-generated code is
similar to non-memtag (but because of the ISA quirks, this case is
realized at the different value of alloca size, ex. memset inlining
triggers at sizes below 512, but stack tagging instructions are 2x
shorter, so limit is approx. 256).

We could try harder to emit more compact code with initializer merging,
but that would only affect large, sparsely initialized allocas, and
those are doing fine already.

Reviewers: vitalybuka, pcc

Subscribers: llvm-commits
2020-10-19 13:44:07 -07:00
Craig Topper edd0cb11bd [SelectionDAG][X86] Enable SimplifySetCC CTPOP transforms for vector splats
This enables these transforms for vectors:
(ctpop x) u< 2 -> (x & x-1) == 0
(ctpop x) u> 1 -> (x & x-1) != 0
(ctpop x) == 1 --> (x != 0) && ((x & x-1) == 0)
(ctpop x) != 1 --> (x == 0) || ((x & x-1) != 0)

All enabled if CTPOP isn't Legal. This differs from the scalar
behavior where the first two are done unconditionally and the
last two are done if CTPOP isn't Legal or Custom. The Legal
check produced better results for vectors based on X86's
custom handling. Might be worth re-visiting scalars here.

I disabled the looking through truncate for vectors. The
code that creates new setcc can use the same result VT as the
original setcc even if we truncated the input. That may work
work for most scalars, but definitely wouldn't work for vectors
unless it was a vector of i1.

Fixes or at least improves PR47825

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D89346
2020-10-19 12:56:59 -07:00
Craig Topper e28376ec28 [X86] Add i32->float and i64->double bitcast pseudo instructions to store folding table.
We have pseudo instructions we use for bitcasts between these types.
We have them in the load folding table, but not the store folding
table. This adds them there so they can be used for stack spills.

I added an exact size check so that we don't fold when the stack slot
is larger than the GPR. Otherwise the upper bits in the stack slot
would be garbage. That would be fine for Eli's test case in PR47874,
but I'm not sure its safe in general.

A step towards fixing PR47874. Next steps are to change the ADDSSrr_Int
pseudo instructions to use FR32 as the second source register class
instead of VR128. That will keep the coalescer from promoting the
register class of the bitcast instruction which will make the stack
slot 4 bytes instead of 16 bytes.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D89656
2020-10-19 12:53:14 -07:00
Lang Hames 9898d9d885 [ORC] Fix a missing include. 2020-10-19 12:13:55 -07:00
Tony 6be9c7d2dc [AMDGPU] Correct comment typo in SIMemoryLegaliizer.cpp 2020-10-19 18:50:28 +00:00
Jay Foad 56f6bf1a8d [AMDGPU] Remove MUL_LOHI_U24/MUL_LOHI_I24
These were introduced in r279902 on the grounds that using separate
MUL_U24/MUL_I24 and MULHI_U24/MULHI_I24 nodes would introduce multiple
uses of the operands, which would prevent SimplifyDemandedBits from
simplifying the operands.

This has since been fixed by D24672 "AMDGPU/SI: Use new SimplifyDemandedBits helper for multi-use operations"

No functional change intended. At least it has no effect on lit tests.

Differential Revision: https://reviews.llvm.org/D89706
2020-10-19 19:15:34 +01:00
Amy Kwan 6a946fd06f [DAGCombiner][PowerPC] Remove isMulhCheaperThanMulShift TLI hook, Use isOperationLegalOrCustom directly instead.
MULH is often expanded on targets.
This patch removes the isMulhCheaperThanMulShift hook and uses
isOperationLegalOrCustom instead.

Differential Revision: https://reviews.llvm.org/D80485
2020-10-19 12:23:04 -05:00
Tony 151e297034 [AMDGPU] Simplify cumode handling in SIMemoryLegalizer
Differential Revision: https://reviews.llvm.org/D89663
2020-10-19 17:13:45 +00:00
Mircea Trofin 225065b9a8 [NFC][MC] Type [MC]Register uses in MachineTraceMetrics
Differential Revision: https://reviews.llvm.org/D89710
2020-10-19 09:49:52 -07:00
Lang Hames c89447b659 [ORC] Fix unused variable warning. 2020-10-19 09:06:33 -07:00
Simon Pilgrim adb52e5f9e [InstCombine] foldOrOfICmps - only fold (icmp_eq B, 0) | (icmp_ult/gt A, B) for integer types
Fixes a number of stage2 buildbots that were failing when I generalized the m_ConstantInt() logic - that didn't match for pointer types but m_Zero() does......
2020-10-19 17:05:38 +01:00
Mircea Trofin d454328ea8 [ML] Add final reward logging facility.
Allow logging final rewards. A final reward is logged only once, and is
serialized as all-zero values, except for the last one.

Differential Revision: https://reviews.llvm.org/D89626
2020-10-19 08:44:50 -07:00
Simon Pilgrim 482e6f0041 Revert rGa704d8238c86bac: "[InstCombine] Add or((icmp ult/ule (A + C1), C3), (icmp ult/ule (A + C2), C3)) uniform vector support"
This reverts commit a704d8238c.

Causing stage2 build failures on some bots.
2020-10-19 16:03:36 +01:00
Simon Pilgrim de885f1b2a [InstCombine] Add (icmp ne A, 0) | (icmp ne B, 0) --> (icmp ne (A|B), 0) vector support
Scalar cases were already being handled by foldLogOpOfMaskedICmps (so this was dead code), but refactoring to support non-uniform vectors will take some time, so tweak this fold in the meantime.
2020-10-19 15:41:21 +01:00
Paul C. Anagnostopoulos 2871c6c93f [Aarch64] [TableGen] Clean up !if(!eq(boolean, 1) and related booleans.
Differential Revision: https://reviews.llvm.org/D89551
2020-10-19 10:33:55 -04:00
Simon Pilgrim ecd25086d1 [InstCombine] Add (icmp eq B, 0) | (icmp ult/gt A, B) -> (icmp ule A, B-1) vector support 2020-10-19 15:23:48 +01:00
Simon Pilgrim a704d8238c [InstCombine] Add or((icmp ult/ule (A + C1), C3), (icmp ult/ule (A + C2), C3)) uniform vector support 2020-10-19 14:55:18 +01:00
Simon Pilgrim 1d90e53044 [InstCombine] foldOrOfICmps - pull out repeated getOperand() calls. NFCI. 2020-10-19 14:28:08 +01:00
Paul C. Anagnostopoulos dc5d6632b0 [TableGen] Enhance !empty and !size to handle strings and DAGs.
Fix bug in the type checking for !empty, !head, !size, !tail.
2020-10-19 09:22:20 -04:00
Piotr Sobczak c872faf6e0 [AMDGPU] Do not generate S_CMP_LG_U64 on gfx7
S_CMP_LG_U64 was added in gfx8 and is guarded by hasScalarCompareEq64().

Rewrite S_CMP_LG_U64 to S_OR_B32 + S_CMP_LG_U32 for targets that
do not support 64-bit scalar compare.

Differential Revision: https://reviews.llvm.org/D89536
2020-10-19 14:44:31 +02:00
Kazushi (Jam) Marukawa 6bb60d3e26 [VE] Add setcc for fp128
Add setcc for fp128 and clean existing ISel patterns.  Also add
a regression test.

Reviewed By: simoll

Differential Revision: https://reviews.llvm.org/D89683
2020-10-19 21:36:57 +09:00
Kazushi (Jam) Marukawa fb2bb6fad4 [VE] Add cast to/from fp128 patterns
Add cast to/from fp128 patterns.  Clean other cast patterns too.
Update a regression test by adding missing tests.

Reviewed By: simoll

Differential Revision: https://reviews.llvm.org/D89682
2020-10-19 21:35:27 +09:00
Hans Wennborg 0628bea513 Revert "[PM/CC1] Add -f[no-]split-cold-code CC1 option to toggle splitting"
This broke Chromium's PGO build, it seems because hot-cold-splitting got turned
on unintentionally. See comment on the code review for repro etc.

> This patch adds -f[no-]split-cold-code CC1 options to clang. This allows
> the splitting pass to be toggled on/off. The current method of passing
> `-mllvm -hot-cold-split=true` to clang isn't ideal as it may not compose
> correctly (say, with `-O0` or `-Oz`).
>
> To implement the -fsplit-cold-code option, an attribute is applied to
> functions to indicate that they may be considered for splitting. This
> removes some complexity from the old/new PM pipeline builders, and
> behaves as expected when LTO is enabled.
>
> Co-authored by: Saleem Abdulrasool <compnerd@compnerd.org>
> Differential Revision: https://reviews.llvm.org/D57265
> Reviewed By: Aditya Kumar, Vedant Kumar
> Reviewers: Teresa Johnson, Aditya Kumar, Fedor Sergeev, Philip Pfaffe, Vedant Kumar

This reverts commit 273c299d5d.
2020-10-19 12:31:14 +02:00
Simon Pilgrim 0b7b446a40 [InstCombine] Support vectors-with-undef in and(logicalshift(1,X),1) --> zext(X == 0) fold 2020-10-19 11:10:32 +01:00
Kazushi (Jam) Marukawa 8796746b2a [VE] Support select_cc
Add missing ISel patterns related to select_cc DAG nodes.
Add regression test of all combination of possible scalar types.

Reviewed By: simoll

Differential Revision: https://reviews.llvm.org/D89672
2020-10-19 18:54:25 +09:00
Kazushi (Jam) Marukawa f2fd42098c [VE] Add VBRD/VMV instructions
Add VBRD/VMV vector instructions.  In order to do that, also support
VM512 registers and RV instruction format in MC layer.  Also add
regression tests for new instructions.

Reviewed By: simoll

Differential Revision: https://reviews.llvm.org/D89641
2020-10-19 18:33:54 +09:00
Kazushi (Jam) Marukawa 7a09aec804 [VE] Add LSV/LVS/LVM/SVM instructions
Add LSV/LVS/LVM/SVM vector instructions and regression tests.
Also update AsmParser to support new format of operands.

Reviewed By: simoll

Differential Revision: https://reviews.llvm.org/D89499
2020-10-19 18:32:48 +09:00
Kazushi (Jam) Marukawa 25955cbae4 [VE] Support br_cc comparing fp128
Support br_cc instruction comparing fp128 values.  Add a br_cc.ll
regression test for all kind of br_cc instructions.  And, clean
existing branch regression tests, this time.  Clean a brcond.ll
regression test for brcond instruction.  Remove mixed branch1.ll
regression test.

Reviewed By: simoll

Differential Revision: https://reviews.llvm.org/D89627
2020-10-19 18:29:39 +09:00
Kazushi (Jam) Marukawa af8b444de3 [VE] Update ISel patterns for select instruction
Add an ISel pattern for fp128 select instruction and optimize generated
code for other types' select. instructions.  Add a regression test also.

Reviewed By: simoll

Differential Revision: https://reviews.llvm.org/D89509
2020-10-19 18:28:21 +09:00
Lang Hames f35707047e [ORC] Break up C-API header Orc.h, and add JITEventListener support.
This patch breaks Orc.h up into Orc.h, LLJIT.h and OrcEE.h.

Orc.h contain core Orc utilities.
LLJIT.h contains LLJIT specific types and functions.
OrcEE.h contains types and functions that depend on ExecutionEngine.

The intent is that these headers should match future library divisions: Clients
who only use Orc.h should only need to link againt the Orc core libraries,
clients using LLJIT.h will also need to link against LLVM core, and clients
using OrcEE.h will also have to link against ExecutionEngine.

In addition to breaking up the Orc.h header this patch introduces functions to:
(1) Set the object linking layer creation function on LLJITBuilder.
(2) Create an RTDyldObjectLinkingLayer instance (particularly for use in (1)).
(3) Register JITEventListeners with an RTDyldObjectLinkingLayer.

Together (1), (2) and (3) can be used to force use of RTDyldObjectLinkingLayer
as the underlying JIT linker for LLJIT, rather than the platform default, and
to register event listeners with the RTDyldObjectLinkingLayer.
2020-10-19 01:59:04 -07:00
Lang Hames 00369849e1 [ORC] Add function to get pool entry string.
Patch by Andres Freund. Thanks Andres!
2020-10-19 01:59:04 -07:00
Lang Hames 24afffe63a [ORC] Add C API support for defining absolute symbols.
Also tweaks the definition of TryToGenerate to make it dovetail more neatly
with the new function.
2020-10-19 01:59:04 -07:00
Lang Hames b6ca0c7dd5 [ORC] Add support for custom generators to the C bindings.
C API clients can now define a custom definition generator by providing a
callback function (to implement DefinitionGenerator::tryToGenerate) and context
object. All arguments for the DefinitionGenerator::tryToGenerate method have
been given C API counterparts, and the API allows for optionally asynchronous
generation.
2020-10-19 01:59:04 -07:00
Lang Hames 19402ce79a [Support] Add a C-API function to create a StringError instance.
This will allow C API clients to return errors from callbacks. This
functionality will be used in upcoming Orc C-bindings functions.
2020-10-19 01:59:04 -07:00
Lang Hames 91d1f417fd [ORC] Add basic ResourceTracker support to the OrcV2 C Bindings.
Based on a patch by Andres Freund. Thanks Andres!
2020-10-19 01:59:04 -07:00
Lang Hames 49c065ae70 [ORC] Rename LLVMOrcJITDylibDefinitionGeneratorRef.
The DefinitionGenerator class has been moved out of JITDylib. This updates
the C API type and function names to reflect that.
2020-10-19 01:59:04 -07:00
Lang Hames 40f3fb52f7 [ORC] Fix C API function name.
Patch by Andres Freund. Thanks Andres!
2020-10-19 01:59:03 -07:00
Lang Hames 35e48d7b91 [ORC] Add C API functions to obtain and clear the symbol string pool.
Symbol string pool entries are ref counted, but not automatically cleared.
This can cause the size of the pool to grow without bound if it's not
periodically cleared. These functions allow that to be done via the C API.
2020-10-19 01:59:03 -07:00
Lang Hames 14cb9b4e21 [ORC] Add a C API function to set the ExecutionSession error reporter. 2020-10-19 01:59:03 -07:00
Lang Hames c88d9eae8a [ORC] Fix a memory leak in the OrcV2 C API (and some comment typos).
The LLVMOrcLLJITAddLLVMIRModule function was leaking its
LLVMOrcThreadSafeModuleRef argument. Wrapping the argument in a unique_ptr
fixes this.
2020-10-19 01:59:03 -07:00
Lang Hames 069919c9ba [ORC] Update Symbol Lookup / DefinitionGenerator system.
This patch moves definition generation out from the session lock, instead
running it under a per-dylib generator lock. It also makes the
DefinitionGenerator::tryToGenerate method optionally asynchronous: Generators
are handed an opaque LookupState object which can be captured to stop/restart
the lookup process.

The new scheme provides the following benefits and guarantees:

(1) Queries that do not need to attempt definition generation (because all
    requested symbols matched against existing definitions in the JITDylib)
    can proceed without being blocked by any running definition generators.

(2) Definition generators can capture the LookupState to continue their work
    asynchronously. This allows generators to run for an arbitrary amount of
    time without blocking a thread. Definition generators that do not need to
    run asynchronously can return without capturing the LookupState to eliminate
    unnecessary recursion and improve lookup performance.

(3) Definition generators still do not need to worry about concurrency or
    re-entrance: Since they are still run under a (per-dylib) lock, generators
    will never be re-entered concurrently, or given overlapping symbol sets to
    generate.

Finally, the new system distinguishes between symbols that are candidates for
generation (generation candidates) and symbols that failed to match for a query
(due to symbol visibility). This fixes a bug where an unresolved symbol could
trigger generation of a duplicate definition for an existing hidden symbol.
2020-10-19 01:59:03 -07:00
Lang Hames 5d2e359ce6 [ORC] Move DefinitionGenerator out of JITDylib.
This will make it easier to implement asynchronous definition generators.
2020-10-19 01:59:03 -07:00
Lang Hames 680845ec0d [ORC] Move MaterializationResponsibility methods to ExecutionSession.
MaterializationResponsibility, JITDylib, and ExecutionSession collectively
manage the OrcV2 core JIT state. Responsibility for maintaining and
updating this state has previously been spread among these classes, resulting
in implementations that are each non-trivial, but all tightly coupled. This has
in turn made reading the code and reasoning about state update and locking
rules difficult.

The core state model can be simplified by thinking of
MaterializationResponsibility and JITDylib as facets of ExecutionSession. This
commit is the first in a series intended to refactor Core.cpp to reflect this
model. Operations on MaterializationResponsibility and JITDylib will forward to
implementation methods inside ExecutionSession. Raw state will remain with the
original classes, but in most cases will only be modified by the
ExecutionSession.
2020-10-19 01:59:03 -07:00
Evgeny Leviant 8a7ca143f8 [ARM][SchedModels] Convert IsPredicatedPred to MCSchedPredicate
Differential revision: https://reviews.llvm.org/D89553
2020-10-19 11:37:54 +03:00
Roman Lebedev d083d55c2c
[NFC][SCEV] Rename SCEVCastExpr into SCEVIntegralCastExpr
All existing SCEV cast types operate on integers.
D89456 will add SCEVPtrToIntExpr cast expression type.
I believe this is best for consistency.

Reviewed By: mkazantsev

Differential Revision: https://reviews.llvm.org/D89455
2020-10-19 10:59:53 +03:00
David Sherwood 3945b69e81 [SVE][CodeGen] Replace more TypeSize comparison operators with their scalar equivalents
In certain places in llvm/lib/CodeGen we were relying upon the TypeSize
comparison operators when in fact the code was only ever expecting
either scalar values or fixed width vectors. This patch changes a few
functions that were always expecting to work on scalar or fixed width
types:

1. DAGCombiner::mergeTruncStores - deals with scalar integers only.
2. DAGCombiner::ReduceLoadWidth - not valid for vectors.
3. DAGCombiner::createBuildVecShuffle - should only be used for
   fixed width vectors.
4. SelectionDAGLegalize::ExpandFCOPYSIGN and
   SelectionDAGLegalize::getSignAsIntValue - only work on scalars.

Differential Revision: https://reviews.llvm.org/D88562
2020-10-19 08:38:50 +01:00
David Sherwood 35a531fb45 [SVE][CodeGen][NFC] Replace TypeSize comparison operators with their scalar equivalents
In certain places in llvm/lib/CodeGen we were relying upon the TypeSize
comparison operators when in fact the code was only ever expecting
either scalar values or fixed width vectors. I've changed some of these
places to use the equivalent scalar operator.

Differential Revision: https://reviews.llvm.org/D88482
2020-10-19 08:30:31 +01:00
David Sherwood f693f915a0 [SVE][CodeGen] Replace uses of TypeSize comparison operators
In certain places in the code we can never end up in a situation where
we're mixing fixed width and scalable vector types. For example,
we can't have truncations and extends that change the lane count. Also,
in other places such as GenWidenVectorStores and GenWidenVectorLoads we
know from the behaviour of FindMemType that we can never choose a vector
type with a different scalable property.

In various places I have used EVT::bitsXY functions instead of
TypeSize::isKnownXY, where it probably makes sense to keep an assert
that scalable properties match.

Differential Revision: https://reviews.llvm.org/D88654
2020-10-19 08:08:41 +01:00
David Sherwood d67d8f8790 [SVE][AArch64] Replace TypeSize comparisons with their integer equivalents
In many places in the AArch64 backend we are comparing TypeSize objects,
but in fact we are only ever expecting fixed width types. I've changed
all such comparisons to use their integer equivalents by replacing
calls to getSizeInBits() with getFixedSizeInBits(), etc.

Differential Revision: https://reviews.llvm.org/D89116
2020-10-19 07:41:33 +01:00
Max Kazantsev 199826baa8 [NFC][SCEV] Use getMinusOne where possible 2020-10-19 12:56:09 +07:00
Kai Luo 354d3106c6 [PowerPC] Skip combining (uint_to_fp x) if x is not simple type
Current powerpc64le backend hits
```
Combining: t7: f64 = uint_to_fp t6
llc: llvm-project/llvm/include/llvm/CodeGen/ValueTypes.h:291: llvm::MVT llvm::EVT::getSimpleVT() const: Assertion `isSimple() && "Expected a SimpleValueType!"' failed.
```
This patch fixes it by skipping combination if `t6` is not simple type.
Fixed https://bugs.llvm.org/show_bug.cgi?id=47660.

Reviewed By: #powerpc, steven.zhang

Differential Revision: https://reviews.llvm.org/D88388
2020-10-19 05:23:46 +00:00
Lang Hames ad92f16ccc [ORC][examples] Update Kaleidoscope and BuildingAJIT tutorial series to OrcV2.
This patch updates the Kaleidoscope and BuildingAJIT tutorial series (chapter
1-4) to OrcV2. Chapter 5 of the BuildingAJIT series is removed -- it will be
re-instated once we have in-tree support for out-of-process JITing.

This patch only updates the tutorial code, not the text. Patches welcome for
that, otherwise I will try to update it in a few weeks.
2020-10-18 21:03:04 -07:00
Lang Hames 0aec49c853 [ORC] Add support for resource tracking/removal (removable code).
This patch introduces new APIs to support resource tracking and removal in Orc.
It is intended as a thread-safe generalization of the removeModule concept from
OrcV1.

Clients can now create ResourceTracker objects (using
JITDylib::createResourceTracker) to track resources for each MaterializationUnit
(code, data, aliases, absolute symbols, etc.) added to the JIT. Every
MaterializationUnit will be associated with a ResourceTracker, and
ResourceTrackers can be re-used for multiple MaterializationUnits. Each JITDylib
has a default ResourceTracker that will be used for MaterializationUnits added
to that JITDylib if no ResourceTracker is explicitly specified.

Two operations can be performed on ResourceTrackers: transferTo and remove. The
transferTo operation transfers tracking of the resources to a different
ResourceTracker object, allowing ResourceTrackers to be merged to reduce
administrative overhead (the source tracker is invalidated in the process). The
remove operation removes all resources associated with a ResourceTracker,
including any symbols defined by MaterializationUnits associated with the
tracker, and also invalidates the tracker. These operations are thread safe, and
should work regardless of the the state of the MaterializationUnits. In the case
of resource transfer any existing resources associated with the source tracker
will be transferred to the destination tracker, and all future resources for
those units will be automatically associated with the destination tracker. In
the case of resource removal all already-allocated resources will be
deallocated, any if any program representations associated with the tracker have
not been compiled yet they will be destroyed. If any program representations are
currently being compiled then they will be prevented from completing: their
MaterializationResponsibility will return errors on any attempt to update the
JIT state.

Clients (usually Layer writers) wishing to track resources can implement the
ResourceManager API to receive notifications when ResourceTrackers are
transferred or removed. The MaterializationResponsibility::withResourceKeyDo
method can be used to create associations between the key for a ResourceTracker
and an allocated resource in a thread-safe way.

RTDyldObjectLinkingLayer and ObjectLinkingLayer are updated to use the
ResourceManager API to enable tracking and removal of memory allocated by the
JIT linker.

The new JITDylib::clear method can be used to trigger removal of every
ResourceTracker associated with the JITDylib (note that this will only
remove resources for the JITDylib, it does not run static destructors).

This patch includes unit tests showing basic usage. A follow-up patch will
update the Kaleidoscope and BuildingAJIT tutorial series to OrcV2 and will
use this API to release code associated with anonymous expressions.
2020-10-18 21:02:54 -07:00
Lang Hames 6154c4115c [ORC] Remove OrcV1 APIs.
This removes all legacy layers, legacy utilities, the old Orc C bindings,
OrcMCJITReplacement, and OrcMCJITReplacement regression tests.

ExecutionEngine and MCJIT are not affected by this change.
2020-10-18 21:02:44 -07:00
Hubert Tong 2980ce98be Fix various format specifier mismatches
Format specifiers of incorrect length are replaced with format specifier
macros from `<cinttypes>` matching the typedefs used to declare the type
of the value being printed.

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D89637
2020-10-18 12:39:15 -04:00
Nikita Popov 6de8d7f1ad [BasicAA] Accept AATags by const reference (NFC)
Rather than swapping the value, the sizes, the AA tags and the
underlying objects multiple times, invoke the helper methods
with swapped arguments.
2020-10-18 18:19:01 +02:00
Florian Hahn f5cf7f544b [DSE] Do not consider 'noop' intrinsics as read-clobbers.
isNoopIntrinsic returns true for some intrinsics that are modeled in
MemorySSA but do not actually read or write any memory and do not block
DSE. Such intrinsics should not be considered as read-clobbers.
2020-10-18 15:51:05 +01:00
Nikita Popov f9172d3c7b [AA] Add helper to update result (NFC)
This pattern was repeated a few times, and for some reason always
using insert or try_emplace, even though we know in advance that
we're looking for an existing entry and not trying to create a
new one.
2020-10-18 16:43:26 +02:00
Fangrui Song e7813a930a Delete unneeded X86RegisterInfo::hasReservedSpillSlot. NFC
Only PowerPC and RISCV need to override it.
2020-10-17 23:18:55 -07:00
Craig Topper 5d69eecc53 [X86] Mark the Key Locker instructions as NotMemoryFoldable to make the X86FoldTablesEmitter not crash.
loadiwkey and aesenc128kl share the same opcode but one is memory
and one is register. But they're behavior is quite different. We
were crashing because one has an output register and one doesn't
and the backend couldn't account for that. But since they aren't
foldable we can just add NotMemoryFoldable so they won't be looked at.
2020-10-17 18:02:54 -07:00
Dávid Bolvanský 65e94cc946 [InferAttrs] Add argmemonly attribute to string libcalls
Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D89602
2020-10-18 01:33:26 +02:00
Nikita Popov 9d2b8300b7 [BasicAA] Avoid alias query if result cannot be used (NFCI)
Rather then querying first and then checking additional conditions,
check the conditions first. They are much cheaper than the alias
query.
2020-10-18 00:00:15 +02:00
Nikita Popov 3c6fe0fc77 [BasicAA] Fix stale comment (NFC)
DataLayout is always around...
2020-10-17 23:58:58 +02:00
Dávid Bolvanský 2a75e956e5 Revert "[InferAttrs] Add argmemonly attribute to string libcalls"
This reverts commit b77dd32a6f. Sanitizer tests are broken.
2020-10-17 23:29:02 +02:00
Dávid Bolvanský b77dd32a6f [InferAttrs] Add argmemonly attribute to string libcalls
Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D89602
2020-10-17 22:42:36 +02:00
Roman Lebedev ec54867df5
[SCEV] Model `ashr exact x, C` as `(abs(x) EXACT/u (1<<C)) * signum(x)`
It's not pretty, but probably better than modelling it
as an opaque SCEVUnknown, i guess.

It is relevant e.g. for the loop that was brought up in
https://bugs.llvm.org/show_bug.cgi?id=46786#c26
as an example of what we'd be able to better analyze
once SCEV handles `ptrtoint` (D89456).

But as it is evident, even if we deal with `ptrtoint` there,
we also fail to model such an `ashr`.
Also, modeling of mul-of-exact-shr/div could use improvement.

As per alive2:
https://alive2.llvm.org/ce/z/tnfZKd
```
define i8 @src(i8 %0) {
  %2 = ashr exact i8 %0, 4
  ret i8 %2
}

declare i8 @llvm.abs(i8, i1)
declare i8 @llvm.smin(i8, i8)
declare i8 @llvm.smax(i8, i8)

define i8 @tgt(i8 %x) {
  %abs_x = call i8 @llvm.abs(i8 %x, i1 false)
  %div = udiv exact i8 %abs_x, 16
  %t0 = call i8 @llvm.smax(i8 %x, i8 -1)
  %t1 = call i8 @llvm.smin(i8 %t0, i8 1)
  %r = mul nsw i8 %div, %t1
  ret i8 %r
}
```
Transformation seems to be correct!
2020-10-17 21:22:24 +03:00
Roman Lebedev 130cc662b5
[NFC][SCEV] Refactor getAbsExpr() out of createSCEV() 2020-10-17 21:21:02 +03:00
Roman Lebedev be1678bdb9
[NFC][SCEV] Add 'getMinusOne()' method 2020-10-17 21:20:58 +03:00
Sanjay Patel 53e92b4c0e [InstCombine] (~A & B) ^ A -> A | B
Differential Revision: https://reviews.llvm.org/D86395
2020-10-17 12:20:18 -04:00
Nikita Popov 50cc9a0e61 [MemCpyOpt] Extract common function for unwinding check
These two cases should be using the same logic. Not NFC, as this
resolves the TODO regarding use of the underlying object.
2020-10-17 15:30:39 +02:00
Pedro Tammela 60b19424bb [NFC] fix some typos in LoopUnrollPass
This patch fixes a couple of typos in the LoopUnrollPass.cpp comments

Differential Revision: https://reviews.llvm.org/D89603
2020-10-17 14:20:55 +01:00
David Green b93d74ac9c [ARM] Basic getArithmeticReductionCost reduction costs
This adds some basic costs for MVE reductions - currently just costing
the simple legal add vectors as a single MVE instruction. More complex
costing can be added in the future when the framework more readily
allows it.

Differential Revision: https://reviews.llvm.org/D88980
2020-10-17 10:29:00 +01:00
David Green d79ee3a807 [ARM] Add a very basic active_lane_mask cost
This adds a very basic cost for active_lane_mask under MVE - making the
assumption that they will be free and then apologizing for that in a
comment.

In reality they may either be free (by being nicely folded into a tail
predicated loop), cost the same as a VCTP or be expanded into vdup's,
adds and cmp's. It is difficult to detect the difference from a single
getIntrinsicInstrCost call, so makes the assumption that the vectorizer
is adding them, and only added them where it makes sense.

We may need to change this in the future to better model predicate costs
in the vectorizer, especially at -Os or non-tail predicated loops. The
vectorizer currently does not query the cost of these instructions but
that will change in the future and a zero cost there probably makes the
most sense at the moment.

Differential Revision: https://reviews.llvm.org/D88989
2020-10-17 10:09:42 +01:00
Juneyoung Lee 62a0ec1612 Add support for !noundef metatdata on loads
This patch adds metadata !noundef and makes load instructions can optionally have it.
A load with !noundef always return a well-defined value (has no undef bit or isn't poison).
If the loaded value isn't well defined, the behavior is undefined.

This metadata can be used to encode the assumption from C/C++ that certain reads of variables should have well-defined values.
It is helpful for optimizing freeze instructions away, because freeze can be removed when its operand has well-defined value, and showing that a load from arbitrary location is well-defined is usually hard otherwise.

The same information can be encoded with llvm.assume with operand bundle; using metadata is chosen because I wasn't sure whether code motion can be freely done when llvm.assume is inserted from clang instead.
The existing codebase already is stripping unknown metadata when doing code motion, so using metadata is UB-safe as well.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D89050
2020-10-17 13:50:10 +09:00
Alok Kumar Sharma 0538353b3b [DebugInfo] Support for DWARF operator DW_OP_over
LLVM rejects DWARF operator DW_OP_over. This DWARF operator is needed
for Flang to support assumed rank array.

  Summary:
Currently LLVM rejects DWARF operator DW_OP_over. Below error is
produced when llvm finds this operator.
[..]
invalid expression
!DIExpression(151, 20, 16, 48, 30, 35, 80, 34, 6)
warning: ignoring invalid debug info in over.ll
[..]
There were some parts missing in support of this operator, which are
now completed.

  Testing
-added a unit testcase
-check-debuginfo
-check-llvm

Reviewed By: aprantl

Differential Revision: https://reviews.llvm.org/D89208
2020-10-17 08:42:28 +05:30
Craig Topper 278bd06891 [TargetLowering] Extract simplifySetCCs ctpop into a separate function. NFCI
As requested in D89346. This allows us to add some early outs.

I reordered some checks a little bit to make the more common bail outs happen earlier. Like checking opcode before checking hasOneUse. And I moved the bit width check to make sure it was safe to look through a truncate to the spot where we look through truncates instead of after.

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D89494
2020-10-16 19:47:56 -07:00
Alina Sbirlea dc97138123 [MemorySSA] Verify clobbering within reachable blocks.
Resolves PR45976.
2020-10-16 17:46:28 -07:00
Amara Emerson 4ad459997e [AArch64][GlobalISel] Select csinc if a select has a 1 on RHS.
Differential Revision: https://reviews.llvm.org/D89513
2020-10-16 16:49:52 -07:00
Albion Fung d30155feaa [PowerPC] Implementation of 128-bit Binary Vector Rotate builtins
This patch implements 128-bit Binary Vector Rotate builtins for PowerPC10.

Differential Revision: https://reviews.llvm.org/D86819
2020-10-16 18:03:22 -04:00
Jameson Nash 4242df1470 Revert "make the AsmPrinterHandler array public"
I messed up one of the tests.
2020-10-16 17:22:07 -04:00
Jameson Nash ac2def2d8d make the AsmPrinterHandler array public
This lets external consumers customize the output, similar to how
AssemblyAnnotationWriter lets the caller define callbacks when printing
IR. The array of handlers already existed, this just cleans up the code
so that it can be exposed publically.

Differential Revision: https://reviews.llvm.org/D74158
2020-10-16 16:27:31 -04:00
Artem Belevich c36c0fabd1 [VectorCombine] Avoid crossing address space boundaries.
We can not bitcast pointers across different address spaces, and VectorCombine
should be careful when it attempts to find the original source of the loaded
data.

Differential Revision: https://reviews.llvm.org/D89577
2020-10-16 13:19:31 -07:00
Stanislav Mekhanoshin 874524ab88 [AMDGPU] Drop array size in AMDGCNGPUs and R600GPUs
Differential Revision: https://reviews.llvm.org/D89568
2020-10-16 12:37:22 -07:00
Nikita Popov 74c8c2d903 Revert "Recommit "[SCEV] Use nw flag and symbolic iteration count to sharpen ranges of AddRecs""
This reverts commit 32b72c3165.

While better than before, this change still introduces a large
compile-time regression (>3% on mafft):
https://llvm-compile-time-tracker.com/compare.php?from=fbd62fe60fb2281ca33da35dc25ca3c87ec0bb51&to=32b72c3165bf65cca2e8e6197b59eb4c4b60392a&stat=instructions

Additionally, the logic here doesn't look quite right to me,
I will comment in more detail on the differential revision.
2020-10-16 21:36:33 +02:00
Arthur Eubanks faf5210420 [CGSCC] Add -abort-on-max-devirt-iterations-reached option
Aborts if we hit the max devirtualization iteration.
Will be useful for testing that changes to devirtualization don't cause
devirtualization to repeat passes more times than necessary.

Reviewed By: rnk

Differential Revision: https://reviews.llvm.org/D89519
2020-10-16 12:34:52 -07:00
Austin Kerbow 978fbd8268 [AMDGPU] Run hazard recognizer pass later
If instructions were removed in peephole passes after the hazard recognizer was
run it is possible that new hazards could be introduced.

Fixes: SWDEV-253090

Reviewed By: rampitec, arsenm

Differential Revision: https://reviews.llvm.org/D89077
2020-10-16 12:15:51 -07:00
Amara Emerson 39c05a1a71 [AArch64][GlobalISel] Add selection support for v2s32 and v2s64 reductions for FADD/ADD.
We'll need legalizer lower() support for the other types to work.

Differential Revision: https://reviews.llvm.org/D89159
2020-10-16 11:41:57 -07:00
Benjamin Kramer b740899c50 [Indvars][NFCI] Simplify assertion.
This should be semantically identical. Also avoids unused variable
warnings in Release builds.
2020-10-16 19:58:55 +02:00
Amara Emerson 32f77eea2d [AArch64][GlobalISel] Regbankselect reductions to use FPR bank for scalars.
Differential Revision: https://reviews.llvm.org/D89075
2020-10-16 10:42:15 -07:00
Amara Emerson 9190411fcf [AArch64][GlobalISel] Add basic legalizer rules for supported add/fadd reductions.
NEON is pretty limited in it's reduction support. As a first step add some
basic rules for the legal types we can select.

Differential Revision: https://reviews.llvm.org/D89070
2020-10-16 10:35:46 -07:00
Amara Emerson 6042c25b0a [GlobalISel] Add translation support for vector reduction intrinsics.
In order to prevent the ExpandReductions pass from expanding some intrinsics
before they get to codegen, I had to add a -disable-expand-reductions flag
for testing purposes.

Differential Revision: https://reviews.llvm.org/D89028
2020-10-16 10:17:53 -07:00
Jay Foad 1417abe54c [AMDGPU] Add new llvm.amdgcn.fma.legacy intrinsic
Differential Revision: https://reviews.llvm.org/D89558
2020-10-16 17:10:21 +01:00
Jay Foad 0c1381d795 [llc] Use -filetype=null to disable MIR printing
If you use -stop-after or similar options, llc will normally print MIR.
This patch checks for -filetype=null as a special case to disable MIR
printing. As the comment says, "The Null output is intended for use for
performance analysis ...", and I found this useful for timing a subset
of the passes that llc runs without the significant overhead of printing
MIR just to send it to /dev/null.

Differential Revision: https://reviews.llvm.org/D89476
2020-10-16 16:51:56 +01:00
Matt Arsenault 0a7cd99a70 Reapply "OpaquePtr: Add type to sret attribute"
This reverts commit eb9f7c28e5.

Previously this was incorrectly handling linking of the contained
type, so this merges the fixes from D88973.
2020-10-16 11:05:02 -04:00
Krzysztof Parzyszek 97533b10b2 [Hexagon] Fix license headers in some .td files, NFC 2020-10-16 10:03:05 -05:00
Simon Pilgrim 83ae625f0c [InstCombine] visitAnd - pull out repeated I.getType() calls. NFCI. 2020-10-16 15:43:11 +01:00
Simon Pilgrim 253f24cf4c [InstCombine] Remove custom and(trunc(and(x,c1)),c2) fold
This is more correctly handled by canEvaluateTruncated (one use checks etc.) and covers all the tests cases that were added for this fold.
2020-10-16 15:43:10 +01:00
Matt Arsenault ce16b6835b AMDGPU: Don't kill super-register with overlapping copy
This would end up killing part of the result super-register, resulting
in a verifier error on a later use of the overlapping registers.  We
could add kills of any non-aliasing registers, but we should be moving
away from relying on kill flags.
2020-10-16 09:34:35 -04:00