Commit Graph

138936 Commits

Author SHA1 Message Date
Arthur Eubanks c2590de30d [docs][NewPM] Add docs for writing NPM passes
As to not conflict with the legacy PM example passes under
llvm/lib/Transforms/Hello, this is under HelloNew. This makes the
CMakeLists.txt and general directory structure less confusing for people
following the example.

Much of the doc structure was taken from WritinAnLLVMPass.rst.

This adds a HelloWorld pass which simply prints out each function name.

More will follow after this, e.g. passes over different units of IR, analyses.
https://llvm.org/docs/WritingAnLLVMPass.html contains a lot more.

Reviewed By: ychen, asbirlea

Differential Revision: https://reviews.llvm.org/D86979
2020-09-14 13:26:03 -07:00
Teresa Johnson 226d80ebe2 [MemProf] Rename HeapProfiler to MemProfiler for consistency
This is consistent with the clang option added in
7ed8124d46, and the comments on the
runtime patch in D87120.

Differential Revision: https://reviews.llvm.org/D87622
2020-09-14 13:14:57 -07:00
Craig Topper 4208ea3e19 [FastISel] Bail out of selectGetElementPtr for vector GEPs.
The code that decomposes the GEP into ADD/MUL doesn't work properly
for vector GEPs. It can create bad COPY instructions or possibly
assert.

For now just bail out to SelectionDAG.

Fixes PR45906
2020-09-14 12:53:06 -07:00
Kamau Bridgeman c0f199e566 [PowerPC] Implement Thread Local Storage Support for Local Exec
This patch is the initial support for the Local Exec Thread Local
Storage model to produce code sequence and relocations correct
to the ABI for the model when using PC relative memory operations.

Patch by: Kamau Bridgeman

Differential Revision: https://reviews.llvm.org/D83404
2020-09-14 14:16:28 -05:00
Nikita Popov 53f36f06af [Legalize][ARM][X86] Add float legalization for VECREDUCE
This adds SoftenFloatRes, PromoteFloatRes and SoftPromoteHalfRes
legalizations for VECREDUCE, to fill the remaining hole in the SDAG
legalization. These legalizations simply expand the reduction and
let it be recursively legalized. For the PromoteFloatRes case at
least it is possible to do better than that, but it's pretty tricky
(because we need to consider the interaction of three different
vector legalizations and the type promotion) and probably not
really worthwhile.

I haven't added ExpandFloatRes support, as I am not familiar with
ppc_fp128.

Differential Revision: https://reviews.llvm.org/D87569
2020-09-14 20:42:09 +02:00
Eric Astor 23a2b03221 [ms] [llvm-ml] Add basic support for SEH, including PROC FRAME
Add basic support for SEH, including PROC FRAME

Reviewed By: thakis

Differential Revision: https://reviews.llvm.org/D86948
2020-09-14 14:32:55 -04:00
Eric Astor 20201dc76a [ms] [llvm-ml] Add support for size queries in MASM
Add support for size inference, sizeof, typeof, and lengthof.

Reviewed By: thakis

Differential Revision: https://reviews.llvm.org/D86947
2020-09-14 14:27:06 -04:00
Eric Astor 7c44ee8e19 [ms] [llvm-ml] Fix struct padding logic
MASM structs are end-padded to have size a multiple of the smaller of the requested alignment and the size of their largest field (taken recursively, if they have a field of STRUCT type).

This matches the behavior of ml.exe and ml64.exe. Our original implementation followed the MASM 6.0 documentation, which instead specified that MASM structs were padded to a multiple of their requested alignment.

Reviewed By: thakis

Differential Revision: https://reviews.llvm.org/D87248
2020-09-14 14:12:20 -04:00
Eric Astor da17e0d5c1 [ms] [llvm-ml] Add missing built-in type aliases
Add signed aliases for integral types, as well as the "DF" abbreviation for the FWORD type.

Reviewed By: thakis

Differential Revision: https://reviews.llvm.org/D87246
2020-09-14 14:09:24 -04:00
Nikita Popov cfff88c03c [InstCombine] Simplify select operand based on equality condition
For selects of the type X == Y ? A : B, check if we can simplify A
by using the X == Y equality and replace the operand if that's
possible. We already try to do this in InstSimplify, but will only
fold if the result of the simplification is the same as B, in which
case the select can be dropped entirely. Here the select will be
retained, just one operand simplified.

As we are performing an actual replacement here, we don't have
problems with refinement / poison values.

Differential Revision: https://reviews.llvm.org/D87480
2020-09-14 20:07:06 +02:00
Nikita Popov 8e69c3cde8 [DAGCombiner] Fold fmin/fmax with INF / FLT_MAX
Similar to D87415, this folds the various float min/max opcodes
with a constant INF or -INF operand, or FLT_MAX / -FLT_MAX operand
if the ninf flag is set. Some of the folds are only possible under
nnan.

The fminnum(X, INF) with nnan and fmaxnum(X, -INF) with nnan cases
are needed to improve the VECREDUCE_FMIN/FMAX lowerings on X86,
the rest is here for the sake of completeness.

Differential Revision: https://reviews.llvm.org/D87571
2020-09-14 19:59:33 +02:00
Simon Pilgrim 4ff4708d39 collectBitParts - use const references. NFCI.
Fixes clang-tidy warnings first noticed on D87452.
2020-09-14 18:23:00 +01:00
Rahman Lavaee 7841e21c98 Let -basic-block-sections=labels emit basicblock metadata in a new .bb_addr_map section, instead of emitting special unary-encoded symbols.
This patch introduces the new .bb_addr_map section feature which allows us to emit the bits needed for mapping binary profiles to basic blocks into a separate section.
The format of the emitted data is represented as follows. It includes a header for every function:

|  Address of the function                      |  -> 8 bytes (pointer size)
|  Number of basic blocks in this function (>0) |  -> ULEB128

The header is followed by a BB record for every basic block. These records are ordered in the same order as MachineBasicBlocks are placed in the function. Each BB Info is structured as follows:

|  Offset of the basic block relative to function begin |  -> ULEB128
|  Binary size of the basic block                       |  -> ULEB128
|  BB metadata                                          |  -> ULEB128  [ MBB.isReturn() OR MBB.hasTailCall() << 1  OR  MBB.isEHPad() << 2 ]

The new feature will replace the existing "BB labels" functionality with -basic-block-sections=labels.
The .bb_addr_map section scrubs the specially-encoded BB symbols from the binary and makes it friendly to profilers and debuggers.
Furthermore, the new feature reduces the binary size overhead from 70% bloat to only 12%.

For more information and results please refer to the RFC: https://lists.llvm.org/pipermail/llvm-dev/2020-July/143512.html

Reviewed By: MaskRay, snehasish

Differential Revision: https://reviews.llvm.org/D85408
2020-09-14 10:16:44 -07:00
Sanjay Patel 55d371abd7 [InstSimplify] add folds for fmin/fmax with 'nnan'
maximum(nnan X, +INF) --> +INF
minimum(nnan X, -INF) --> -INF

This is based on the similar codegen transform proposed in:
D87571
2020-09-14 11:46:11 -04:00
Sanjay Patel 7526376164 [InstSimplify] allow folds for fmin/fmax with 'ninf'
maxnum(ninf X, +FLT_MAX) --> +FLT_MAX
minnum(ninf X, -FLT_MAX) --> -FLT_MAX

This is based on the similar codegen transform proposed in:
D87571
2020-09-14 11:18:08 -04:00
Florian Hahn c4f1b31441 [MemorySSA] Make sure PerformedPhiTrans is updated for each visited def.
1ce82015f6 added a fix to restrict phi optimizations after phi
translations. But the current use of performedPhiTranslation only
checked whether phi translation happened for the first iterator and
missed cases where phi translations happens at subsequent
iterators/upwards defs.

This patch changes upward_defs_iteartor to take a pointer to a bool, so
we can easily ensure the final value includes all visited defs, while
still being able to conveniently use it with make_range & co.
2020-09-14 16:11:56 +01:00
Sanjay Patel 22c583c3d0 [InstSimplify] reduce code duplication for fmin/fmax folds; NFC
We use the same code structure for folding integer min/max.
2020-09-14 10:32:11 -04:00
jasonliu 9868ea764f [XCOFF][AIX] Handle TOC entries that could not be reached by positive range in small code model
Summary:
In small code model, AIX assembler could not deal with labels that
could not be reached within the [-0x8000, 0x8000) range from TOC base.
So when generating the assembly, we would need to help the assembler
by subtracting an offset from the label to keep the actual value
within [-0x8000, 0x8000).

Reviewed By: hubert.reinterpretcast, Xiangling_L

Differential Revision: https://reviews.llvm.org/D86879
2020-09-14 13:41:34 +00:00
Sanjay Patel 7bb9a2f996 [InstSimplify] fix miscompiles with maximum/minimum intrinsics
As discussed in the sibling codegen functionality patch D87571,
this transform was created with D52766, but it is not correct.

The incorrect test diffs were missed during review, but the
'TODO' comment about this functionality was still in the code -
we need 'nnan' to enable this fold.
2020-09-14 09:06:41 -04:00
Jay Foad c799f873cb [AMDGPU] Don't cluster stores
Clustering loads has caching benefits, but as far as I know there is no
advantage to clustering stores on any AMDGPU subtargets.

The disadvantage is that it tends to increase register pressure and
restricts scheduling freedom.

Differential Revision: https://reviews.llvm.org/D85530
2020-09-14 13:40:17 +01:00
Simon Pilgrim 98eaacd73d Assert we've found both vector types. NFCI.
Fixes clang static analyzer warning about potential null dereferences.
2020-09-14 13:24:17 +01:00
Simon Pilgrim 7109fc9e42 Don't dereference from a dyn_cast<>. NFCI.
Use cast<> instead which will assert if it fails and not just return null.

Fixes clang static analyzer warning.
2020-09-14 13:05:17 +01:00
Max Kazantsev 412b417bfa [NFC] Add missing `const` statements in SCEV 2020-09-14 18:43:24 +07:00
David Green 06fb4e9064 [CGP] Limit converting phi types to simple loads and stores
Instcombine limits converting phi types to simple loads and stores. This
does the same in codegenprepare, not processing phis that are not
simple.

Note that volatile loads/store ISel will happily convert between float
and int. Atomics are more likely to always be integer. This just keeps
things simple and doesn't process either.

Differential Revision: https://reviews.llvm.org/D83770
2020-09-14 12:08:34 +01:00
Florian Hahn f715d81c9d [DSE] Only eliminate candidates that always store the same loc.
AliasAnalysis/MemoryLocation does not account for loops. Two
MemoryLocation can be must-overwrite, even if the first one writes
multiple locations in a loop.

This patch prevents removing such stores, by only considering candidates
that are known to be loop invariant, or executed in the same BB.

Currently the invariant check is quite conservative and only considers
Alloca and Alloca-like instructions and arguments as invariant base pointers.
It also considers GEPs with all constant indices and invariant bases as
invariant.

This can be improved in the future, but the current implementation has
only minor impact on the total number of stores eliminated (25903 vs
26047 for the baseline). There are some 2-10% swings for some individual
benchmarks. In roughly half of the cases, the number of stores removed
increases actually, because we skip candidates that are unlikely to be
valid candidates early.
2020-09-14 12:06:58 +01:00
Meera Nakrani dd519bf0b0 [ARM] Selects SSAT/USAT from correct LLVM IR
LLVM will canonicalize conditional selectors to a different pattern than the old code that was used.
This is updating the function to match the new expected patterns and select SSAT or USAT when successful.
Tests have also been updated to use the new patterns.

Differential Review: https://reviews.llvm.org/D87379
2020-09-14 10:58:21 +00:00
Sjoerd Meijer 676febc044 [ARM][MVE] Tail-predication: check get.active.lane.mask's TC value
This adds additional checks for the original scalar loop tripcount value, i.e.
get.active.lane.mask second argument, and perform several sanity checks to see
if it is of the form that we expect similarly like we already do for the IV
which is the first argument of get.active.lane.

Differential Revision: https://reviews.llvm.org/D86074
2020-09-14 11:32:15 +01:00
David Sherwood 816663adb5 [SVE] In LoopIdiomRecognize::isLegalStore bail out for scalable vectors
The function LoopIdiomRecognize::isLegalStore looks for stores in loops
that could be transformed into memset or memcpy. However, the algorithm
currently requires that we know how big the store is at runtime, i.e.
that the store size will not overflow an unsigned integer. For scalable
vectors we cannot guarantee this so I have changed the code to bail out
for now. In addition, even if we add a way to query the maximum value of
vscale in future we will still need to update the algorithm to cope with
non-constant strides. The additional cost associated with calculating
the memset and memcpy arguments will need to be taken into account as
well.

This patch also fixes up an implicit TypeSize -> uint64_t cast,
thereby removing a warning. I've added tests here showing a fixed
width vector loop being transformed into memcpy, and a scalable
vector loop remaining unchanged:

  Transforms/LoopIdiom/memcpy-vectors.ll

Differential Revision: https://reviews.llvm.org/D87439
2020-09-14 11:28:31 +01:00
Petar Avramovic 6e2a86ed5a AMDGPU/GlobalISel Check for NoNaNsFPMath in isKnownNeverSNaN
Check for NoNaNsFPMath function attribute in isKnownNeverSNaN.
Function attributes are in held in 'TargetMachine.Options'.
Among other things, this allows selection of some patterns imported
in D87351 since G_FCANONICALIZE is not generated when isKnownNeverSNaN
returns true in lowerFMinNumMaxNum.

However we notice some incorrect results since function attributes are
not correctly written in TargetMachine.Options when next function is
processed. Take a look at @v_test_no_global_nnans_med3_f32_pat0_srcmod0,
it has "no-nans-fp-math"="false" but TargetMachine.Options still has it
set to true since first function in test file had this attribute set to
true. This will be fixed in D87511.

Differential Revision: https://reviews.llvm.org/D87456
2020-09-14 12:11:00 +02:00
Simon Pilgrim 00e5676cf6 [LegalizeDAG] Fix MSVC "result of 32-bit shift implicitly converted to 64 bits" warning. NFCI. 2020-09-14 11:09:43 +01:00
Jeremy Morse d3af441dfe [DebugInstrRef][1/9] Add fields for instr-ref variable locations
Add a DBG_INSTR_REF instruction and a "debug instruction number" field to
MachineInstr. The two allow variable values to be specified by
identifying where the value is computed, rather than the register it lies
in, like so:

  %0 = fooinst, debug-instr-number 1
  [...]
  DBG_INSTR_REF 1, 0

See the original RFC for motivation:
http://lists.llvm.org/pipermail/llvm-dev/2020-February/139440.html

This patch is NFCI; it only adds fields and other boiler plate.

Differential Revision: https://reviews.llvm.org/D85741
2020-09-14 10:06:52 +01:00
Petar Avramovic 09b8871f8d AMDGPU/GlobalISel/Emitter Support for predicate code that uses operands
Predicates with 'let PredicateCodeUsesOperands = 1' want to examine
matched operands. When we encounter predicate code that uses operands,
analyze its named operand arguments and create a map between argument
index and name. Later, when leaf node with name is encountered, emit
GIM_RecordNamedOperand that will store that operand at its argument
index in operand list. This operand list will be an argument to c++
code of the predicate.

Differential Revision: https://reviews.llvm.org/D87285
2020-09-14 10:39:56 +02:00
David Stenberg bfcb824ba5 [JumpThreading] Fix an incorrect Modified status
This fixes PR47297.

When ProcessBlock() was able to constant fold the terminator's
condition, but not do any more transformations, the function would
return false, which would lead to the JumpThreading pass returning an
incorrect modified status. This patch makes so that ProcessBlock()
returns true in such cases. This will trigger an unnecessary invocation
of ProcessBlock() in such cases, but this should be rare to occur.

This was caught using the check introduced by D80916.

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D87392
2020-09-14 10:36:13 +02:00
Jay Foad 9a4476072e [UnifyLoopExits] Fix non-deterministic iteration order
This was causing random minor codegen differences in shaders compiled
with the AMDGPU backend.

Differential Revision: https://reviews.llvm.org/D87548
2020-09-14 09:09:58 +01:00
Simon Wallis 4946802c5f [ARM] Fix so immediates and pc relative checks
Treating an SoImm offset as a multiple of 4 between -1020 and 1020
mis-handles the second of a pair of 16-bit constants where the offset is a multiple of 2 but not a multiple of 4,
leading to an LLVM ERROR: out of range pc-relative fixup value

For 32-bit and larger (64-bit) constants, continue to treat an SoImm offset as a multiple of 4 between -1020 and 1020.
For smaller (16-bit) constants, treat an SoImm offset as a multiple of 1 between -255 and 255.

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D86949
2020-09-14 08:52:59 +01:00
David Sherwood 15bff4dec4 [CodeGen] Fix bug in IncrementPointer
In an earlier patch I meant to add the correct flags to the ADD
node when incrementing the pointer, but forgot to pass them to
SelectionDAG::getNode.

Differential Revision: https://reviews.llvm.org/D87496
2020-09-14 08:03:55 +01:00
Fangrui Song 4d7b194543 [llvm-cov gcov] Refactor counting and reporting
The current organization of FileInfo and its referenced utility functions of
(GCOVFile, GCOVFunction, GCOVBlock) is messy. Some members of FileInfo are just
copied from GCOVFile. FileInfo::print (.gcov output and --intermediate output)
is interleaved with branch statistics and computation of line execution counts.
--intermediate has to do redundant .gcov output to gather branch statistics.

This patch deletes lots of code and introduces a clearer work flow:

```
fn collectFunction
  for each block b
    for each line lineNum
      let line be LineInfo of the file on lineNum
      line.exists = 1
      increment function's lines & linesExec if necessary
      increment line.count
      line.blocks.push_back(&b)

fn collectSourceLine
  compute cycle counts
  count = incoming_counts + cycle_counts
  if line.exists
    ++summary->lines
    if line.count
      ++summary->linesExec

fn collectSource
  for each line
    call collectSourceLine

fn main
  for each function
    call collectFunction
    print function summary
  for each source file
    call collectSource
    print file summary
    annotate the source file with line execution counts
  if -i
    print intermediate file
```

The output order of functions and files now follows the original order in
.gcno files.
2020-09-13 23:00:59 -07:00
Yevgeny Rouban 88690a9658 [CodeGenPrepare] Fix zapping dead operands of assume
This patch fixes a problem of the commit 52cc97a0.
A test case is created to demonstrate the crash caused by
the instruction iterator invalidated by the recursive
removal of dead operands of assume. The solution restarts
from the blocks's first instruction in case CurInstIterator
is invalidated by RecursivelyDeleteTriviallyDeadInstructions().

Reviewed By: bkramer

Differential Revision: https://reviews.llvm.org/D87434
2020-09-14 11:46:34 +07:00
Craig Topper 56b33391d3 [SelectionDAG] Move ISD:PARITY formation from DAGCombine to SimplifyDemandedBits.
Previously, we formed ISD::PARITY by looking for (and (ctpop X), 1)
but the AND might be separated from the ctpop. For example if the
parity result is multiplied by 2, we'll pull the AND through the
shift.

So to handle more cases, move to SimplifyDemandedBits where we
can handle more cases that result in only the LSB of the CTPOP
being used.
2020-09-13 21:04:13 -07:00
Lang Hames 783ba64a89 [JITLink] Improve formatting for Edge, Block and Symbol debugging output. 2020-09-13 15:44:07 -07:00
Fangrui Song b2c32c90ba [llvm-cov gcov] Add -r (--relative-only) && -s (--source-prefix)
gcov 4.7 introduced the two options.
https://sourceware.org/pipermail/gcc-patches/2011-November/328782.html

-r only dumps files with relative paths or absolute paths with the prefix
specified by -s. The two options are useful filtering out system header files.
2020-09-13 14:54:20 -07:00
David Blaikie ce89eeee16 PPCInstrInfo: Fix readability-inconsistent-declaration-parameter-name clang-tidy warning
Reduces the chance of confusion when calling the function with
autocomplete (will show the more accurate/informative variable name),
etc.
2020-09-13 13:08:17 -07:00
David Blaikie 6e06f1cd08 GCOVProfiling: Avoid use-after-move
Turns out this was use-after-move of function_ref, which is trivially
copyable and movable, so the move did nothing and use after move was
safe.

But since this function_ref is being copied into a std::function, change
the function_ref to be std::function to avoid extra layers of type
erasure indirection - and then it's a real use after move, and fix that
by referring to the moved-to member variable rather than the moved-from
parameter.
2020-09-13 12:54:36 -07:00
Qiu Chaofan a4c5351986 [DAGCombiner] Propagate FMF flags in FMA folding
DAG combiner folds (fma a 1.0 b) into (fadd a b) but the flag isn't
propagated into new fadd. This patch fixes that.

Some code in visitFMA is redundant and such support for vector constants
is missing. Need follow-up patch to clean.

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D87037
2020-09-14 00:19:06 +08:00
David Green 9237fde481 [CGP] Prevent optimizePhiType from iterating forever
The recently added optimizePhiType algorithm had no checks to make sure
it didn't continually iterate backward and forth between float and int
types. This means that given an input like store(phi(bitcast(load))), we
could convert that back and forth to store(bitcast(phi(load))). This
particular case would usually have been simplified to a different load
type (folding the bitcast into the load) before CGP, but other cases can
occur. The one that came up was phi(bitcast(phi)), where the two phi's
of different types were bitcast between. That was not helped by a dead
bitcast being kept around which could make conversion look profitable.

This adds an extra check of the bitcast Uses or Defs, to make sure that
at least one is grounded and will not end up being converted back. It
also makes sure that dead bitcasts are removed, and there is a minor
change to include newly created Phi nodes in the Visited set so that
they do not need to be revisited.

Differential Revision: https://reviews.llvm.org/D82676
2020-09-13 16:11:01 +01:00
Qiu Chaofan bec81dc67d Reland "[PowerPC] Implement instruction clustering for stores"
Commit 3c0b3250 introduced store fusion for PowerPC target, but it
brought failure under UB sanitizer and was reverted. This patch fixes
them.
2020-09-13 19:51:01 +08:00
Fangrui Song 5f4e9bf641 [gcov] Fix memory leak due to BranchProbabilityInfoWrapperPass
This is weird.
2020-09-13 00:44:32 -07:00
Fangrui Song 63182c2ac0 [gcov] Add spanning tree optimization
gcov is an "Edge Profiling with Edge Counters" application according to
Optimally Profiling and Tracing Programs (1994).

The minimum number of counters necessary is |E|-(|V|-1). The unmeasured edges
form a spanning tree. Both GCC --coverage and clang -fprofile-generate leverage
this optimization. This patch implements the optimization for clang --coverage.
The produced .gcda files are much smaller now.
2020-09-13 00:07:31 -07:00
Fangrui Song f086e85eea [gcov] Assign names to some types and loaded values used in @__llvm_internal*
This makes the generated IR much more readable.
2020-09-12 22:42:37 -07:00
Fangrui Song 8cf1ac97ce [llvm-cov gcov] Improve accuracy when some edges are not measured
Also guard against infinite recursion if GCOV_ARC_ON_TREE edges contain a cycle.
2020-09-12 22:33:41 -07:00