Commit Graph

127407 Commits

Author SHA1 Message Date
Hans Wennborg 1e1e3ba252 Unify the two CRC implementations
David added the JamCRC implementation in r246590. More recently, Eugene
added a CRC-32 implementation in r357901, which falls back to zlib's
crc32 function if present.

These checksums are essentially the same, so having multiple
implementations seems unnecessary. This replaces the CRC-32
implementation with the simpler one from JamCRC, and implements the
JamCRC interface in terms of CRC-32 since this means it can use zlib's
implementation when available, saving a few bytes and potentially making
it faster.

JamCRC took an ArrayRef<char> argument, and CRC-32 took a StringRef.
This patch changes it to ArrayRef<uint8_t> which I think is the best
choice, and simplifies a few of the callers nicely.

Differential revision: https://reviews.llvm.org/D68570

llvm-svn: 374148
2019-10-09 09:06:30 +00:00
David Blaikie 5841e9af1d DebugInfo: Move LLE enum handling to .def to match RLE handling
llvm-svn: 374122
2019-10-08 21:48:46 +00:00
Roman Lebedev 354ba6985c [CVP} Replace SExt with ZExt if the input is known-non-negative
Summary:
zero-extension is far more friendly for further analysis.
While this doesn't directly help with the shift-by-signext problem, this is not unrelated.

This has the following effect on test-suite (numbers collected after the finish of middle-end module pass manager):
| Statistic                            |     old |     new | delta | percent change |
| correlated-value-propagation.NumSExt |       0 |    6026 |  6026 |   +100.00%     |
| instcount.NumAddInst                 |  272860 |  271283 | -1577 |     -0.58%     |
| instcount.NumAllocaInst              |   27227 |   27226 | -1    |      0.00%     |
| instcount.NumAndInst                 |   63502 |   63320 | -182  |     -0.29%     |
| instcount.NumAShrInst                |   13498 |   13407 | -91   |     -0.67%     |
| instcount.NumAtomicCmpXchgInst       |    1159 |    1159 |  0    |      0.00%     |
| instcount.NumAtomicRMWInst           |    5036 |    5036 |  0    |      0.00%     |
| instcount.NumBitCastInst             |  672482 |  672353 | -129  |     -0.02%     |
| instcount.NumBrInst                  |  702768 |  702195 | -573  |     -0.08%     |
| instcount.NumCallInst                |  518285 |  518205 | -80   |     -0.02%     |
| instcount.NumExtractElementInst      |   18481 |   18482 |  1    |      0.01%     |
| instcount.NumExtractValueInst        |   18290 |   18288 | -2    |     -0.01%     |
| instcount.NumFAddInst                |  139035 |  138963 | -72   |     -0.05%     |
| instcount.NumFCmpInst                |   10358 |   10348 | -10   |     -0.10%     |
| instcount.NumFDivInst                |   30310 |   30302 | -8    |     -0.03%     |
| instcount.NumFenceInst               |     387 |     387 |  0    |      0.00%     |
| instcount.NumFMulInst                |   93873 |   93806 | -67   |     -0.07%     |
| instcount.NumFPExtInst               |    7148 |    7144 | -4    |     -0.06%     |
| instcount.NumFPToSIInst              |    2823 |    2838 |  15   |      0.53%     |
| instcount.NumFPToUIInst              |    1251 |    1251 |  0    |      0.00%     |
| instcount.NumFPTruncInst             |    2195 |    2191 | -4    |     -0.18%     |
| instcount.NumFSubInst                |   92109 |   92103 | -6    |     -0.01%     |
| instcount.NumGetElementPtrInst       | 1221423 | 1219157 | -2266 |     -0.19%     |
| instcount.NumICmpInst                |  479140 |  478929 | -211  |     -0.04%     |
| instcount.NumIndirectBrInst          |       2 |       2 |  0    |      0.00%     |
| instcount.NumInsertElementInst       |   66089 |   66094 |  5    |      0.01%     |
| instcount.NumInsertValueInst         |    2032 |    2030 | -2    |     -0.10%     |
| instcount.NumIntToPtrInst            |   19641 |   19641 |  0    |      0.00%     |
| instcount.NumInvokeInst              |   21789 |   21788 | -1    |      0.00%     |
| instcount.NumLandingPadInst          |   12051 |   12051 |  0    |      0.00%     |
| instcount.NumLoadInst                |  880079 |  878673 | -1406 |     -0.16%     |
| instcount.NumLShrInst                |   25919 |   25921 |  2    |      0.01%     |
| instcount.NumMulInst                 |   42416 |   42417 |  1    |      0.00%     |
| instcount.NumOrInst                  |  100826 |  100576 | -250  |     -0.25%     |
| instcount.NumPHIInst                 |  315118 |  314092 | -1026 |     -0.33%     |
| instcount.NumPtrToIntInst            |   15933 |   15939 |  6    |      0.04%     |
| instcount.NumResumeInst              |    2156 |    2156 |  0    |      0.00%     |
| instcount.NumRetInst                 |   84485 |   84484 | -1    |      0.00%     |
| instcount.NumSDivInst                |    8599 |    8597 | -2    |     -0.02%     |
| instcount.NumSelectInst              |   45577 |   45913 |  336  |      0.74%     |
| instcount.NumSExtInst                |   84026 |   78278 | -5748 |     -6.84%     |
| instcount.NumShlInst                 |   39796 |   39726 | -70   |     -0.18%     |
| instcount.NumShuffleVectorInst       |  100272 |  100292 |  20   |      0.02%     |
| instcount.NumSIToFPInst              |   29131 |   29113 | -18   |     -0.06%     |
| instcount.NumSRemInst                |    1543 |    1543 |  0    |      0.00%     |
| instcount.NumStoreInst               |  805394 |  804351 | -1043 |     -0.13%     |
| instcount.NumSubInst                 |   61337 |   61414 |  77   |      0.13%     |
| instcount.NumSwitchInst              |    8527 |    8524 | -3    |     -0.04%     |
| instcount.NumTruncInst               |   60523 |   60484 | -39   |     -0.06%     |
| instcount.NumUDivInst                |    2381 |    2381 |  0    |      0.00%     |
| instcount.NumUIToFPInst              |    5549 |    5549 |  0    |      0.00%     |
| instcount.NumUnreachableInst         |    9855 |    9855 |  0    |      0.00%     |
| instcount.NumURemInst                |    1305 |    1305 |  0    |      0.00%     |
| instcount.NumXorInst                 |   10230 |   10081 | -149  |     -1.46%     |
| instcount.NumZExtInst                |   60353 |   66840 |  6487 |     10.75%     |
| instcount.TotalBlocks                |  829582 |  829004 | -578  |     -0.07%     |
| instcount.TotalFuncs                 |   83818 |   83817 | -1    |      0.00%     |
| instcount.TotalInsts                 | 7316574 | 7308483 | -8091 |     -0.11%     |

TLDR: we produce -0.11% less instructions, -6.84% less `sext`, +10.75% more `zext`.
To be noted, clearly, not all new `zext`'s are produced by this fold.

(And now i guess it might have been interesting to measure this for D68103 :S)

Reviewers: nikic, spatel, reames, dberlin

Reviewed By: nikic

Subscribers: hiraditya, jfb, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D68654

llvm-svn: 374112
2019-10-08 20:29:48 +00:00
Daniel Sanders 4b7cabf1e1 [tblgen] Add getOperatorAsDef() to Record
Summary:
While working with DagInit's, it's often the case that you expect the
operator to be a reference to a def. This patch adds a wrapper for this
common case to reduce the amount of boilerplate callers need to duplicate
repeatedly.

getOperatorAsDef() returns the record if the DagInit has an operator that is
a DefInit. Otherwise, it prints a fatal error.

There's only a few pre-existing examples in LLVM at the moment and I've
left a few instances of the code this simplifies as they had more specific
error messages than the generic one this produces. I'm going to be using
this a fair bit in my subsequent patches.

Reviewers: bogner, volkan, nhaehnle

Reviewed By: nhaehnle

Subscribers: nhaehnle, hiraditya, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, lenary, s.egerton, pzheng, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D68424

llvm-svn: 374101
2019-10-08 18:41:32 +00:00
Yonghong Song 05e46979d2 [BPF] do compile-once run-everywhere relocation for bitfields
A bpf specific clang intrinsic is introduced:
   u32 __builtin_preserve_field_info(member_access, info_kind)
Depending on info_kind, different information will
be returned to the program. A relocation is also
recorded for this builtin so that bpf loader can
patch the instruction on the target host.
This clang intrinsic is used to get certain information
to facilitate struct/union member relocations.

The offset relocation is extended by 4 bytes to
include relocation kind.
Currently supported relocation kinds are
 enum {
    FIELD_BYTE_OFFSET = 0,
    FIELD_BYTE_SIZE,
    FIELD_EXISTENCE,
    FIELD_SIGNEDNESS,
    FIELD_LSHIFT_U64,
    FIELD_RSHIFT_U64,
 };
for __builtin_preserve_field_info. The old
access offset relocation is covered by
    FIELD_BYTE_OFFSET = 0.

An example:
struct s {
    int a;
    int b1:9;
    int b2:4;
};
enum {
    FIELD_BYTE_OFFSET = 0,
    FIELD_BYTE_SIZE,
    FIELD_EXISTENCE,
    FIELD_SIGNEDNESS,
    FIELD_LSHIFT_U64,
    FIELD_RSHIFT_U64,
};

void bpf_probe_read(void *, unsigned, const void *);
int field_read(struct s *arg) {
  unsigned long long ull = 0;
  unsigned offset = __builtin_preserve_field_info(arg->b2, FIELD_BYTE_OFFSET);
  unsigned size = __builtin_preserve_field_info(arg->b2, FIELD_BYTE_SIZE);
 #ifdef USE_PROBE_READ
  bpf_probe_read(&ull, size, (const void *)arg + offset);
  unsigned lshift = __builtin_preserve_field_info(arg->b2, FIELD_LSHIFT_U64);
 #if __BYTE_ORDER__ == __ORDER_BIG_ENDIAN__
  lshift = lshift + (size << 3) - 64;
 #endif
 #else
  switch(size) {
  case 1:
    ull = *(unsigned char *)((void *)arg + offset); break;
  case 2:
    ull = *(unsigned short *)((void *)arg + offset); break;
  case 4:
    ull = *(unsigned int *)((void *)arg + offset); break;
  case 8:
    ull = *(unsigned long long *)((void *)arg + offset); break;
  }
  unsigned lshift = __builtin_preserve_field_info(arg->b2, FIELD_LSHIFT_U64);
 #endif
  ull <<= lshift;
  if (__builtin_preserve_field_info(arg->b2, FIELD_SIGNEDNESS))
    return (long long)ull >> __builtin_preserve_field_info(arg->b2, FIELD_RSHIFT_U64);
  return ull >> __builtin_preserve_field_info(arg->b2, FIELD_RSHIFT_U64);
}

There is a minor overhead for bpf_probe_read() on big endian.

The code and relocation generated for field_read where bpf_probe_read() is
used to access argument data on little endian mode:
        r3 = r1
        r1 = 0
        r1 = 4  <=== relocation (FIELD_BYTE_OFFSET)
        r3 += r1
        r1 = r10
        r1 += -8
        r2 = 4  <=== relocation (FIELD_BYTE_SIZE)
        call bpf_probe_read
        r2 = 51 <=== relocation (FIELD_LSHIFT_U64)
        r1 = *(u64 *)(r10 - 8)
        r1 <<= r2
        r2 = 60 <=== relocation (FIELD_RSHIFT_U64)
        r0 = r1
        r0 >>= r2
        r3 = 1  <=== relocation (FIELD_SIGNEDNESS)
        if r3 == 0 goto LBB0_2
        r1 s>>= r2
        r0 = r1
LBB0_2:
        exit

Compare to the above code between relocations FIELD_LSHIFT_U64 and
FIELD_LSHIFT_U64, the code with big endian mode has four more
instructions.
        r1 = 41   <=== relocation (FIELD_LSHIFT_U64)
        r6 += r1
        r6 += -64
        r6 <<= 32
        r6 >>= 32
        r1 = *(u64 *)(r10 - 8)
        r1 <<= r6
        r2 = 60   <=== relocation (FIELD_RSHIFT_U64)

The code and relocation generated when using direct load.
        r2 = 0
        r3 = 4
        r4 = 4
        if r4 s> 3 goto LBB0_3
        if r4 == 1 goto LBB0_5
        if r4 == 2 goto LBB0_6
        goto LBB0_9
LBB0_6:                                 # %sw.bb1
        r1 += r3
        r2 = *(u16 *)(r1 + 0)
        goto LBB0_9
LBB0_3:                                 # %entry
        if r4 == 4 goto LBB0_7
        if r4 == 8 goto LBB0_8
        goto LBB0_9
LBB0_8:                                 # %sw.bb9
        r1 += r3
        r2 = *(u64 *)(r1 + 0)
        goto LBB0_9
LBB0_5:                                 # %sw.bb
        r1 += r3
        r2 = *(u8 *)(r1 + 0)
        goto LBB0_9
LBB0_7:                                 # %sw.bb5
        r1 += r3
        r2 = *(u32 *)(r1 + 0)
LBB0_9:                                 # %sw.epilog
        r1 = 51
        r2 <<= r1
        r1 = 60
        r0 = r2
        r0 >>= r1
        r3 = 1
        if r3 == 0 goto LBB0_11
        r2 s>>= r1
        r0 = r2
LBB0_11:                                # %sw.epilog
        exit

Considering verifier is able to do limited constant
propogation following branches. The following is the
code actually traversed.
        r2 = 0
        r3 = 4   <=== relocation
        r4 = 4   <=== relocation
        if r4 s> 3 goto LBB0_3
LBB0_3:                                 # %entry
        if r4 == 4 goto LBB0_7
LBB0_7:                                 # %sw.bb5
        r1 += r3
        r2 = *(u32 *)(r1 + 0)
LBB0_9:                                 # %sw.epilog
        r1 = 51   <=== relocation
        r2 <<= r1
        r1 = 60   <=== relocation
        r0 = r2
        r0 >>= r1
        r3 = 1
        if r3 == 0 goto LBB0_11
        r2 s>>= r1
        r0 = r2
LBB0_11:                                # %sw.epilog
        exit

For native load case, the load size is calculated to be the
same as the size of load width LLVM otherwise used to load
the value which is then used to extract the bitfield value.

Differential Revision: https://reviews.llvm.org/D67980

llvm-svn: 374099
2019-10-08 18:23:17 +00:00
Matt Arsenault 190a17bbd1 AMDGPU: Fix i16 arithmetic pattern redundancy
There were 2 problems here. First, these patterns were duplicated to
handle the inverted shift operands instead of using the commuted
PatFrags.

Second, the point of the zext folding patterns don't apply to the
non-0ing high subtargets. They should be skipped instead of inserting
the extension. The zeroing high code would be emitted when necessary
anyway. This was also emitting unnecessary zexts in cases where the
high bits were undefined.

llvm-svn: 374092
2019-10-08 17:36:38 +00:00
Jinsong Ji 9912232b46 Revert "[LoopVectorize][PowerPC] Estimate int and float register pressure separately in loop-vectorize"
Also Revert "[LoopVectorize] Fix non-debug builds after rL374017"

This reverts commit 9f41deccc0.
This reverts commit 18b6fe07bc.

The patch is breaking PowerPC internal build, checked with author, reverting
on behalf of him for now due to timezone.

llvm-svn: 374091
2019-10-08 17:32:56 +00:00
Vedant Kumar 9852699dcb [CodeExtractor] Factor out and reuse shrinkwrap analysis
Factor out CodeExtractor's analysis of allocas (for shrinkwrapping
purposes), and allow the analysis to be reused.

This resolves a quadratic compile-time bug observed when compiling
AMDGPUDisassembler.cpp.o.

Pre-patch (Release + LTO clang):

```
   ---User Time---   --System Time--   --User+System--   ---Wall Time---  --- Name ---
  176.5278 ( 57.8%)   0.4915 ( 18.5%)  177.0192 ( 57.4%)  177.4112 ( 57.3%)  Hot Cold Splitting
```

Post-patch (ReleaseAsserts clang):

```
   ---User Time---   --System Time--   --User+System--   ---Wall Time---  --- Name ---
  1.4051 (  3.3%)   0.0079 (  0.3%)   1.4129 (  3.2%)   1.4129 (  3.2%)  Hot Cold Splitting
```

Testing: check-llvm, and comparing the AMDGPUDisassembler.cpp.o binary
pre- vs. post-patch.

An alternate approach is to hide CodeExtractorAnalysisCache from clients
of CodeExtractor, and to recompute the analysis from scratch inside of
CodeExtractor::extractCodeRegion(). This eliminates some redundant work
in the shrinkwrapping legality check. However, some clients continue to
exhibit O(n^2) compile time behavior as computing the analysis is O(n).

rdar://55912966

Differential Revision: https://reviews.llvm.org/D68616

llvm-svn: 374089
2019-10-08 17:17:51 +00:00
Tom Stellard 3a8d80944b AMDGPU: Add offsets to MMO when lowering buffer intrinsics
Summary:
Without offsets on the MachineMemOperands (MMOs),
MachineInstr::mayAlias() will return true for all reads and writes to the
same resource descriptor.  This leads to O(N^2) complexity in the MachineScheduler
when analyzing dependencies of buffer loads and stores.  It also limits
the SILoadStoreOptimizer from merging more instructions.

This patch reduces the compile time of one pathological compute shader
from 12 seconds to 1 second.

Reviewers: arsenm, nhaehnle

Reviewed By: arsenm

Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, hiraditya, jfb, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D65097

llvm-svn: 374087
2019-10-08 17:04:51 +00:00
Simon Pilgrim e746380f6a CodeGenPrepare - silence static analyzer dyn_cast<> null dereference warnings. NFCI.
The static analyzer is warning about potential null dereferences, but in these cases we should be able to use cast<> directly and if not assert will fire for us.

llvm-svn: 374085
2019-10-08 17:00:01 +00:00
Stanislav Mekhanoshin 8f002193bf [AMDGPU] Disable unused gfx10 dpp instructions
Inhibit generation of unused real dpp instructions on gfx10 just
like it is done on other subtargets. This does not change anything
because these are illegal anyway and not accepted, but it does
reduce the number of instruction definitions generated.

Differential Revision: https://reviews.llvm.org/D68607

llvm-svn: 374083
2019-10-08 16:56:01 +00:00
Heejin Ahn 6a37c5d6fc [WebAssembly] Fix a bug in 'try' placement
Summary:
When searching for local expression tree created by stackified
registers, for 'block' placement, we start the search from the previous
instruction of a BB's terminator. But in 'try''s case, we should start
from the previous instruction of a call that can throw, or a EH_LABEL
that precedes the call, because the return values of the call's previous
instructions can be stackified and consumed by the throwing call.

For example,
```
  i32.call @foo
  call @bar         ; may throw
  br $label0
```
In this case, if we start the search from the previous instruction of
the terminator (`br` here), we end up stopping at `call @bar` and place
a 'try' between `i32.call @foo` and `call @bar`, because `call @bar`
does not have a return value so it is not a local expression tree of
`br`.

But in this case, unlike when placing 'block's, we should start the
search from `call @bar`, because the return value of `i32.call @foo` is
stackified and used by `call @bar`.

Reviewers: dschuff

Subscribers: sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D68619

llvm-svn: 374073
2019-10-08 16:15:39 +00:00
Nikola Prica 98603a8153 [DebugInfo][If-Converter] Update call site info during the optimization
During the If-Converter optimization pay attention when copying or
deleting call instructions in order to keep call site information in
valid state.

Reviewers: aprantl, vsk, efriedma

Reviewed By: vsk, efriedma

Differential Revision: https://reviews.llvm.org/D66955

llvm-svn: 374068
2019-10-08 15:43:12 +00:00
Hideto Ueno 96e6ce4cd3 [Attributor][MustExec] Deduce dereferenceable and nonnull attribute using MustBeExecutedContextExplorer
Summary:
In D65186 and related patches, MustBeExecutedContextExplorer is introduced. This enables us to traverse instructions guaranteed to execute from function entry. If we can know the argument is used as `dereferenceable` or `nonnull` in these instructions, we can mark `dereferenceable` or `nonnull` in the argument definition:

1. Memory instruction (similar to D64258)
Trace memory instruction pointer operand. Currently, only inbounds GEPs are traced.
```
define i64* @f(i64* %a) {
entry:
  %add.ptr = getelementptr inbounds i64, i64* %a, i64 1
; (because of inbounds GEP we can know that %a is at least dereferenceable(16))
  store i64 1, i64* %add.ptr, align 8
  ret i64* %add.ptr ; dereferenceable 8 (because above instruction stores into it)
}
```

2. Propagation from callsite (similar to D27855)
If `deref` or `nonnull` are known in call site parameter attributes we can also say that argument also that attribute.

```
declare void @use3(i8* %x, i8* %y, i8* %z);
declare void @use3nonnull(i8* nonnull %x, i8* nonnull %y, i8* nonnull %z);

define void @parent1(i8* %a, i8* %b, i8* %c) {
  call void @use3nonnull(i8* %b, i8* %c, i8* %a)
; Above instruction is always executed so we can say that@parent1(i8* nonnnull %a, i8* nonnull %b, i8* nonnull %c)
  call void @use3(i8* %c, i8* %a, i8* %b)
  ret void
}
```

Reviewers: jdoerfert, sstefan1, spatel, reames

Reviewed By: jdoerfert

Subscribers: xbolva00, hiraditya, jfb, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D65402

llvm-svn: 374063
2019-10-08 15:25:56 +00:00
Cyndy Ishida fb92ef1e55 Revert [TextAPI] Introduce TBDv4
This reverts r374058 (git commit 5d566c5a46)

llvm-svn: 374062
2019-10-08 15:24:37 +00:00
Hideto Ueno 08daf8cf0a [Attributor] Add helper class to compose two structured deduction.
Summary: This patch introduces a generic way to compose two structured deductions.  This will be used for composing generic deduction with `MustBeExecutedExplorer` and other existing generic deduction.

Reviewers: jdoerfert, sstefan1

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66645

llvm-svn: 374060
2019-10-08 15:20:19 +00:00
Cyndy Ishida 5d566c5a46 [TextAPI] Introduce TBDv4
Summary:
This format introduces new features and platforms
The motivation for this format is to support more than 1 platform since previous versions only supported additional architectures and 1 platform,
for example ios + ios-simulator and macCatalyst.

Reviewers: ributzka, steven_wu

Reviewed By: ributzka

Subscribers: mgorny, hiraditya, mgrang, dexonsmith, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D67529

llvm-svn: 374058
2019-10-08 15:07:36 +00:00
Mirko Brkusanin 45e0f24373 [Mips] Emit proper ABI for _mcount calls
When -pg option is present than a call to _mcount is inserted into every
function. However since the proper ABI was not followed then the generated
gmon.out did not give proper results. By inserting needed instructions 
before every _mcount we can fix this.

Differential Revision: https://reviews.llvm.org/D68390

llvm-svn: 374055
2019-10-08 14:32:03 +00:00
Pavel Labath 6e0b1ce48e Object/minidump: Add support for the MemoryInfoList stream
Summary:
This patch adds the definitions of the constants and structures
necessary to interpret the MemoryInfoList minidump stream, as well as
the object::MinidumpFile interface to access the stream.

While the code is fairly simple, there is one important deviation from
the other minidump streams, which is worth calling out explicitly.
Unlike other "List" streams, the size of the records inside
MemoryInfoList stream is not known statically. Instead it is described
in the stream header. This makes it impossible to return
ArrayRef<MemoryInfo> from the accessor method, as it is done with other
streams. Instead, I create an iterator class, which can be parameterized
by the runtime size of the structure, and return
iterator_range<iterator> instead.

Reviewers: amccarth, jhenderson, clayborg

Subscribers: JosephTremoulet, zturner, markmentovai, lldb-commits, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D68210

llvm-svn: 374051
2019-10-08 14:15:32 +00:00
Sebastian Pop d0d52edae9 fix fmls fp16
Tim Northover remarked that the added patterns for fmls fp16
produce wrong code in case the fsub instruction has a
multiplication as its first operand, i.e., all the patterns FMLSv*_OP1:

> define <8 x half> @test_FMLSv8f16_OP1(<8 x half> %a, <8 x half> %b, <8 x half> %c) {
> ; CHECK-LABEL: test_FMLSv8f16_OP1:
> ; CHECK: fmls {{v[0-9]+}}.8h, {{v[0-9]+}}.8h, {{v[0-9]+}}.8h
> entry:
>
>   %mul = fmul fast <8 x half> %c, %b
>   %sub = fsub fast <8 x half> %mul, %a
>   ret <8 x half> %sub
> }
>
> This doesn't look right to me. The exact instruction produced is "fmls
> v0.8h, v2.8h, v1.8h", which I think calculates "v0 - v2*v1", but the
> IR is calculating "v2*v1-v0". The equivalent <4 x float> code also
> doesn't emit an fmls.

This patch generates an fmla and negates the value of the operand2 of the fsub.

Inspecting the pattern match, I found that there was another mistake in the
opcode to be selected: matching FMULv4*16 should generate FMLSv4*16
and not FMLSv2*32.

Tested on aarch64-linux with make check-all.

Differential Revision: https://reviews.llvm.org/D67990

llvm-svn: 374044
2019-10-08 13:23:57 +00:00
Graham Hunter b302561b76 [SVE][IR] Scalable Vector size queries and IR instruction support
* Adds a TypeSize struct to represent the known minimum size of a type
  along with a flag to indicate that the runtime size is a integer multiple
  of that size
* Converts existing size query functions from Type.h and DataLayout.h to
  return a TypeSize result
* Adds convenience methods (including a transparent conversion operator
  to uint64_t) so that most existing code 'just works' as if the return
  values were still scalars.
* Uses the new size queries along with ElementCount to ensure that all
  supported instructions used with scalable vectors can be constructed
  in IR.

Reviewers: hfinkel, lattner, rkruppe, greened, rovka, rengolin, sdesmalen

Reviewed By: rovka, sdesmalen

Differential Revision: https://reviews.llvm.org/D53137

llvm-svn: 374042
2019-10-08 12:53:54 +00:00
Nicolai Haehnle df6e67697b AMDGPU: Propagate undef flag during pre-RA exec mask optimizations
Summary: Issue: https://github.com/GPUOpen-Drivers/llpc/issues/204

Reviewers: arsenm, rampitec

Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D68184

llvm-svn: 374041
2019-10-08 12:46:32 +00:00
Nicolai Haehnle 7febdb7f27 MachineSSAUpdater: insert IMPLICIT_DEF at top of basic block
Summary:
When getValueInMiddleOfBlock happens to be called for a basic block
that has no incoming value at all, an IMPLICIT_DEF is inserted in that
block via GetValueAtEndOfBlockInternal. This IMPLICIT_DEF must be at
the top of its basic block or it will likely not reach the use that
the caller intends to insert.

Issue: https://github.com/GPUOpen-Drivers/llpc/issues/204

Reviewers: arsenm, rampitec

Subscribers: jvesely, wdng, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D68183

llvm-svn: 374040
2019-10-08 12:46:20 +00:00
Florian Hahn 537225a6a3 [LoopRotate] Unconditionally get DomTree.
LoopRotate is a loop pass and the DomTree should always be available.

Similar to a70c526143

llvm-svn: 374036
2019-10-08 11:54:42 +00:00
Andrea Di Biagio 8d6651f7b1 [MCA][LSUnit] Track loads and stores until retirement.
Before this patch, loads and stores were only tracked by their corresponding
queues in the LSUnit from dispatch until execute stage. In practice we should be
more conservative and assume that memory opcodes leave their queues at
retirement stage.

Basically, loads should leave the load queue only when they have completed and
delivered their data. We conservatively assume that a load is completed when it
is retired. Stores should be tracked by the store queue from dispatch until
retirement. In practice, stores can only leave the store queue if their data can
be written to the data cache.

This is mostly a mechanical change. With this patch, the retire stage notifies
the LSUnit when a memory instruction is retired. That would triggers the release
of LDQ/STQ entries.  The only visible change is in memory tests for the bdver2
model. That is because bdver2 is the only model that defines the load/store
queue size.

This patch partially addresses PR39830.

Differential Revision: https://reviews.llvm.org/D68266

llvm-svn: 374034
2019-10-08 10:46:01 +00:00
Nikola Prica 02682498b8 [ISEL][ARM][AARCH64] Tracking simple parameter forwarding registers
Support for tracking registers that forward function parameters into the
following function frame. For now we only support cases when parameter
is forwarded through single register.

Reviewers: aprantl, vsk, t.p.northover

Reviewed By: vsk

Differential Revision: https://reviews.llvm.org/D66953

llvm-svn: 374033
2019-10-08 09:43:05 +00:00
Florian Hahn a70c526143 [LoopRotate] Unconditionally get ScalarEvolution.
Summary: LoopRotate is a loop pass and SE should always be available.

Reviewers: anemet, asbirlea

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D68573

llvm-svn: 374026
2019-10-08 08:46:38 +00:00
Kristof Beyls 78bfe3ab94 [ARM] Generate vcmp instead of vcmpe
Based on the discussion in
http://lists.llvm.org/pipermail/llvm-dev/2019-October/135574.html, the
conclusion was reached that the ARM backend should produce vcmp instead
of vcmpe instructions by default, i.e. not be producing an Invalid
Operation exception when either arguments in a floating point compare
are quiet NaNs.

In the future, after constrained floating point intrinsics for floating
point compare have been introduced, vcmpe instructions probably should
be produced for those intrinsics - depending on the exact semantics
they'll be defined to have.

This patch logically consists of the following parts:
- Revert http://llvm.org/viewvc/llvm-project?rev=294945&view=rev and
  http://llvm.org/viewvc/llvm-project?rev=294968&view=rev, which
  implemented fine-tuning for when to produce vcmpe (i.e. not do it for
  equality comparisons). The complexity introduced by those patches
  isn't needed anymore if we just always produce vcmp instead. Maybe
  these patches need to be reintroduced again once support is needed to
  map potential LLVM-IR constrained floating point compare intrinsics to
  the ARM instruction set.
- Simply select vcmp, instead of vcmpe, see simple changes in
  lib/Target/ARM/ARMInstrVFP.td
- Adapt lots of tests that tested for vcmpe (instead of vcmp). For all
  of these test, the intent of what is tested for isn't related to
  whether the vcmp should produce an Invalid Operation exception or not.

Fixes PR43374.

Differential Revision: https://reviews.llvm.org/D68463

llvm-svn: 374025
2019-10-08 08:25:42 +00:00
Kai Nacke c9ddda8405 [Tools] Mark output of tools as text if it is text
Several LLVM tools write text files/streams without using OF_Text.
This can cause problems on platforms which distinguish between
text and binary output. This PR adds the OF_Text flag for the
following tools:

- llvm-dis
- llvm-dwarfdump
- llvm-mca
- llvm-mc (assembler files only)
- opt (assembler files only)
- RemarkStreamer (used e.g. by opt)

Reviewers: rnk, vivekvpandya, Bigcheese, andreadb

Differential Revision: https://reviews.llvm.org/D67696

llvm-svn: 374024
2019-10-08 08:21:20 +00:00
Kadir Cetinkaya 18b6fe07bc [LoopVectorize] Fix non-debug builds after rL374017
llvm-svn: 374021
2019-10-08 07:39:50 +00:00
Bill Wendling 411f1885b6 [IA] Recognize hexadecimal escape sequences
Summary:
Implement support for hexadecimal escape sequences to match how GNU 'as'
handles them. I.e., read all hexadecimal characters and truncate to the
lower 16 bits.

Reviewers: nickdesaulniers, jcai19

Subscribers: llvm-commits, hiraditya

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D68598

llvm-svn: 374018
2019-10-08 04:39:52 +00:00
Zi Xuan Wu 9f41deccc0 [LoopVectorize][PowerPC] Estimate int and float register pressure separately in loop-vectorize
In loop-vectorize, interleave count and vector factor depend on target register number. Currently, it does not
estimate different register pressure for different register class separately(especially for scalar type,
float type should not be on the same position with int type), so it's not accurate. Specifically,
it causes too many times interleaving/unrolling, result in too many register spills in loop body and hurting performance.

So we need classify the register classes in IR level, and importantly these are abstract register classes,
and are not the target register class of backend provided in td file. It's used to establish the mapping between
the types of IR values and the number of simultaneous live ranges to which we'd like to limit for some set of those types.

For example, POWER target, register num is special when VSX is enabled. When VSX is enabled, the number of int scalar register is 32(GPR),
float is 64(VSR), but for int and float vector register both are 64(VSR). So there should be 2 kinds of register class when vsx is enabled,
and 3 kinds of register class when VSX is NOT enabled.

It runs on POWER target, it makes big(+~30%) performance improvement in one specific bmk(503.bwaves_r) of spec2017 and no other obvious degressions.

Differential revision: https://reviews.llvm.org/D67148

llvm-svn: 374017
2019-10-08 03:28:33 +00:00
Chen Zheng 9806a1d5f9 [ConstantRange] [NFC] replace addWithNoSignedWrap with addWithNoWrap.
llvm-svn: 374016
2019-10-08 03:00:31 +00:00
Matt Arsenault c8a6df7130 AMDGPU/GlobalISel: Clamp G_SITOFP/G_UITOFP sources
llvm-svn: 373989
2019-10-07 23:33:08 +00:00
Johannes Doerfert 748538e166 [Attributor][NFC] Add debug output
llvm-svn: 373988
2019-10-07 23:30:04 +00:00
Johannes Doerfert d4bea8830c [Attributor][FIX] Remove initialize calls and add undefs
The initialization logic has become part of the Attributor but the
patches that introduced these calls here were in development when the
transition happened.

We also now clean up (undefine) the macros used to create attributes.

llvm-svn: 373987
2019-10-07 23:28:54 +00:00
Johannes Doerfert 766f2cc1a4 [Attributor] Use local linkage instead of internal
Local linkage is internal or private, and private is a specialization of
internal, so either is fine for all our "local linkage" queries.

llvm-svn: 373986
2019-10-07 23:21:52 +00:00
Johannes Doerfert 661db04b98 [Attributor] Use abstract call sites for call site callback
Summary:
When we iterate over uses of functions and expect them to be call sites,
we now use abstract call sites to allow callback calls.

Reviewers: sstefan1, uenoku

Subscribers: hiraditya, bollu, hfinkel, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D67871

llvm-svn: 373985
2019-10-07 23:14:58 +00:00
Craig Topper be7f81ece9 [X86] Shrink zero extends of gather indices from type less than i32 to types larger than i32.
Gather instructions can use i32 or i64 elements for indices. If
the index is zero extended from a type smaller than i32 to i64, we
can shrink the extend to just extend to i32.

llvm-svn: 373982
2019-10-07 23:03:12 +00:00
Reid Kleckner f9b67b810e [X86] Add new calling convention that guarantees tail call optimization
When the target option GuaranteedTailCallOpt is specified, calls with
the fastcc calling convention will be transformed into tail calls if
they are in tail position. This diff adds a new calling convention,
tailcc, currently supported only on X86, which behaves the same way as
fastcc, except that the GuaranteedTailCallOpt flag does not need to
enabled in order to enable tail call optimization.

Patch by Dwight Guth <dwight.guth@runtimeverification.com>!

Reviewed By: lebedev.ri, paquette, rnk

Differential Revision: https://reviews.llvm.org/D67855

llvm-svn: 373976
2019-10-07 22:28:58 +00:00
Heejin Ahn daeead4b02 [WebAssembly] Fix unwind mismatch stat computation
Summary:
There was a bug when computing the number of unwind destination
mismatches in CFGStackify. When there are many mismatched calls that
share the same (original) destination BB, they have to be counted
separately.

This also fixes a typo and runs `fixUnwindMismatches` only when the wasm
exception handling is enabled. This is to prevent unnecessary
computations and does not change behavior.

Reviewers: dschuff

Subscribers: sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D68552

llvm-svn: 373975
2019-10-07 22:19:40 +00:00
Heejin Ahn 58af5be28d [WebAssembly] Add memory intrinsics handling to mayThrow()
Summary:
Previously, `WebAssembly::mayThrow()` assumed all inputs are global
addresses. But when intrinsics, such as `memcpy`, `memmove`, or `memset`
are lowered to external symbols in instruction selection and later
emitted as library calls. And these functions don't throw.

This patch adds handling to those memory intrinsics to `mayThrow`
function. But while most of libcalls don't throw, we can't guarantee all
of them don't throw, so currently we conservatively return true for all
other external symbols.

I think a better way to solve this problem is to embed 'nounwind' info
in `TargetLowering::CallLoweringInfo`, so that we can access the info
from the backend. This will also enable transferring 'nounwind'
properties of LLVM IR instructions. Currently we don't transfer that
info and we can only access properties of callee functions, if the
callees are within the module. Other targets don't need this info in the
backend because they do all the processing before isel, but it will help
us because that info will reduce code size increase in fixing unwind
destination mismatches in CFGStackify.

But for now we return false for these memory intrinsics and true for all
other libcalls conservatively.

Reviewers: dschuff

Subscribers: sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D68553

llvm-svn: 373967
2019-10-07 21:14:45 +00:00
Johannes Doerfert 1097fab1cf [Attributor] Deduce memory behavior of functions and arguments
Deduce the memory behavior, aka "read-none", "read-only", or
"write-only", for functions and arguments.

Reviewers: sstefan1, uenoku

Subscribers: hiraditya, bollu, jfb, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D67384

llvm-svn: 373965
2019-10-07 21:07:57 +00:00
Roman Lebedev 7cdeac43e5 [InstCombine] Fold conditional sign-extend of high-bit-extract into high-bit-extract-with-signext (PR42389)
This can come up in Bit Stream abstractions.

The pattern looks big/scary, but it can't be simplified any further.
It only is so simple because a number of my preparatory folds had
happened already (shift amount reassociation / shift amount
reassociation in bit test, sign bit test detection).

Highlights:
* There are two main flavors: https://rise4fun.com/Alive/zWi
  The difference is add vs. sub, and left-shift of -1 vs. 1
* Since we only change the shift opcode,
  we can preserve the exact-ness: https://rise4fun.com/Alive/4u4
* There can be truncation after high-bit-extraction:
  https://rise4fun.com/Alive/slHc1   (the main pattern i'm after!)
  Which means that we need to ignore zext of shift amounts and of NBits.
* The sign-extending magic can be extended itself (in add pattern
  via sext, in sub pattern via zext. not the other way around!)
  https://rise4fun.com/Alive/NhG
  (or those sext/zext can be sinked into `select`!)
  Which again means we should pay attention when matching NBits.
* We can have both truncation of extraction and widening of magic:
  https://rise4fun.com/Alive/XTw
  In other words, i don't believe we need to have any checks on
  bitwidths of any of these constructs.

This is worsened in general by the fact that we may have `sext` instead
of `zext` for shift amounts, and we don't yet canonicalize to `zext`,
although we should. I have not done anything about that here.

Also, we really should have something to weed out `sub` like these,
by folding them into `add` variant.

https://bugs.llvm.org/show_bug.cgi?id=42389

llvm-svn: 373964
2019-10-07 20:53:27 +00:00
Roman Lebedev 0c73be590e [InstCombine] Move isSignBitCheck(), handle rest of the predicates
True, no test coverage is being added here. But those non-canonical
predicates that are already handled here already have no test coverage
as far as i can tell. I tried to add tests for them, but all the patterns
already get handled elsewhere.

llvm-svn: 373962
2019-10-07 20:53:08 +00:00
Roman Lebedev cb6d851bb6 [InstCombine][NFC] dropRedundantMaskingOfLeftShiftInput(): change how we deal with mask
Summary:
Currently, we pre-check whether we need to produce a mask or not.
This involves some rather magical constants.
I'd like to extend this fold to also handle the situation
when there's also a `trunc` before outer shift.
That will require another set of magical constants.
It's ugly.

Instead, we can just compute the mask, and check
whether mask is a pass-through (all-ones) or not.
This way we don't need to have any magical numbers.

This change is NFC other than the fact that we now compute
the mask and then check if we need (and can!) apply it.

Reviewers: spatel

Reviewed By: spatel

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D68470

llvm-svn: 373961
2019-10-07 20:53:00 +00:00
Roman Lebedev c3b394ffba [InstCombine] dropRedundantMaskingOfLeftShiftInput(): propagate undef shift amounts
Summary:
When we do `ConstantExpr::getZExt()`, that "extends" `undef` to `0`,
which means that for patterns a/b we'd assume that we must not produce
any bits for that channel, while in reality we simply didn't care
about that channel - i.e. we don't need to mask it.

Reviewers: spatel

Reviewed By: spatel

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D68239

llvm-svn: 373960
2019-10-07 20:52:52 +00:00
Cameron McInally 46d317fad4 [Bitcode] Update naming of UNOP_NEG to UNOP_FNEG
Differential Revision: https://reviews.llvm.org/D68588

llvm-svn: 373958
2019-10-07 20:41:25 +00:00
Matt Arsenault 538b73b797 AMDGPU/GlobalISel: Handle more G_INSERT cases
Start manually writing a table to get the subreg index. TableGen
should probably generate this, but I'm not sure what it looks like in
the arbitrary case where subregisters are allowed to not fully cover
the super-registers.

llvm-svn: 373947
2019-10-07 19:16:26 +00:00
Matt Arsenault 4bcdcad91b GlobalISel: Partially implement lower for G_INSERT
llvm-svn: 373946
2019-10-07 19:13:27 +00:00