Commit Graph

136235 Commits

Author SHA1 Message Date
David Sherwood 7a834a0a4e [SVE] Fix scalable vector bug in DataLayout::getIntPtrType
Fixed an issue in DataLayout::getIntPtrType where we were assuming
the input type was always a fixed vector type, which isn't true.

Added a test that exposed the problem to:

  Transforms/InstCombine/vector_gep1.ll

Differential Revision: https://reviews.llvm.org/D82294
2020-06-26 07:58:45 +01:00
Sjoerd Meijer 243a5329d4 [SelectionDAG] Lower @llvm.get.active.lane.mask to setcc
This lowers intrinsic @llvm.get.active.lane.mask to a setcc node, i.e. an icmp
ule, and creates vectors for its 2 arguments on which the comparison is
performed.

Differential Revision: https://reviews.llvm.org/D82292
2020-06-26 07:46:38 +01:00
Sjoerd Meijer 1319d9bb84 [ARM] Don't revert get.active.lane.mask in ARM Tail-Predication pass
Don't revert intrinsic get.active.lane.mask here, this is moved to isel
legalization in D82292.

Differential Revision: https://reviews.llvm.org/D82105
2020-06-26 07:42:39 +01:00
Igor Kudrin 70165bb7e9 [DebugInfo] Fix emitting offsets to CUs with -dwarf-sections-as-references=Enable.
The size of the field depends on the DWARF format, not the address size
of the target.

Differential Revision: https://reviews.llvm.org/D82311
2020-06-26 12:12:26 +07:00
Amy Kwan e0c02dc980 [PowerPC][Power10] Implement centrifuge, vector gather every nth bit, vector evaluate Builtins in LLVM/Clang
This patch implements builtins for the following prototypes:

unsigned long long __builtin_cfuged (unsigned long long, unsigned long long);
vector unsigned long long vec_cfuge (vector unsigned long long, vector unsigned long long);
unsigned long long vec_gnb (vector unsigned __int128, const unsigned int);
vector unsigned char vec_ternarylogic (vector unsigned char, vector unsigned char, vector unsigned char, const unsigned int);
vector unsigned short vec_ternarylogic (vector unsigned short, vector unsigned short, vector unsigned short, const unsigned int);
vector unsigned int vec_ternarylogic (vector unsigned int, vector unsigned int, vector unsigned int, const unsigned int);
vector unsigned long long vec_ternarylogic (vector unsigned long long, vector unsigned long long, vector unsigned long long, const unsigned int);
vector unsigned __int128 vec_ternarylogic (vector unsigned __int128, vector unsigned __int128, vector unsigned __int128, const unsigned int);

Differential Revision: https://reviews.llvm.org/D80970
2020-06-25 21:34:41 -05:00
Arthur Eubanks 0c6bf90b56 [NewPM][BasicAA] Rename basicaa -> basic-aa, add alias
Summary:
BasicAA under the new pass manager is called "basic-aa", which fits more
with the other AA names which almost always contain a dash.

Keep an alias from basicaa -> basic-aa.

Will change all references of "basicaa" to "basic-aa", then remove the
alias.

Makes check-llvm failures under NPM go from 2307 to 1867.

Reviewers: asbirlea, ychen

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D82607
2020-06-25 18:08:34 -07:00
Michael Liao dccfaacf93 [InferAddressSpaces] Handle the pair of `ptrtoint`/`inttoptr`.
Summary:
- `ptrtoint` and `inttoptr` are defined as no-op casts if the integer
  value as the same size as the pointer value. The pair of
  `ptrtoint`/`inttoptr` is in fact a no-op cast sequence between
  different address spaces. Teach `infer-address-spaces` to handle them
  like a `bitcast`.

Reviewers: arsenm, chandlerc

Subscribers: jvesely, wdng, nhaehnle, hiraditya, kerbowa, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D81938
2020-06-25 20:46:56 -04:00
Amara Emerson 97a34b5f8d [AArch64][GlobalISel] Fix extended shift addressing mode selection not handling sxth.
The complex pattern for extended shift offsets only allow sxtw as the extend,
not sxth. Our equivalent function to do this was not rejecting SXTH so we
were miscompiling. This was exposed by D81992.
2020-06-25 17:24:32 -07:00
Mehdi Amini 4abf024336 Remove references to the 4.0 release as a major breaking (NFC)
This is cleaning up comments (mostly in the bitcode handling) about
removing some backward compatibility aspect in the 4.0 release.
Historically, "4.0" was used during the development of the 3.x
versions as "this future major breaking change version". At the time
the major number was used to indicate the compatibility. When we
reached 3.9 we decided to change the numbering, instead of going to
3.10 we went to 4.0 but after changing the meaning of the major
number to not mean anything anymore with respect to bitcode backward
compatibility.

The current policy
(https://llvm.org/docs/DeveloperPolicy.html#ir-backwards-compatibility)
indicates only now:

  The current LLVM version supports loading any bitcode since version 3.0.

Differential Revision: https://reviews.llvm.org/D82514
2020-06-25 23:49:07 +00:00
Wouter van Oortmerssen b9a539c010 [WebAssembly] Adding 64-bit versions of __stack_pointer and other globals
We have 6 globals, all of which except for __table_base are 64-bit under wasm64.

Differential Revision: https://reviews.llvm.org/D82130
2020-06-25 15:52:44 -07:00
Jessica Paquette 7fb84dff69 [AArch64][GlobalISel] Port buildvector -> dup pattern from AArch64ISelLowering
Given this:

```
%x:_(<n x sK>) = G_BUILD_VECTOR %lane, ...
...
%y:_(<n x sK>) = G_SHUFFLE_VECTOR %x(<n x sK>), %foo, shufflemask(0, 0, ...)
```

We can produce:

```
%y:_(<n x sK) = G_DUP %lane(sK)
```

Doesn't seem to be too common, but AArch64ISelLowering attempts to do this
before trying to produce a DUPLANE. Might as well port it.

Also make it so that when the splat has an undef mask, we try setting it to
0. SDAG does this, and it makes sure that when we get the build vector operand,
we actually get a source operand.

Differential Revision: https://reviews.llvm.org/D81979
2020-06-25 14:19:06 -07:00
Paul Walker 2c09e91054 [MVT] Add missing floating point types for 1024/2048-bit vectors.
Summary:
This patch adds entries for:
  v64f16
  v128f16
  v64bf16
  v128bf16
  v32f64

Subscribers: dschuff, hiraditya, aheejin, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D82466
2020-06-25 21:13:31 +00:00
David Green d79b57b8bb [ARM] Split FPExt loads
This extends PerformSplittingToWideningLoad to also handle FP_Ext, as
well as sign and zero extends. It uses an integer extending load
followed by a VCVTL on the bottom lanes to efficiently perform an fpext
on a smaller than legal type.

The existing code had to be rewritten a little to not just split the
node in two and let legalization handle it from there, but to actually
split into legal chunks.

Differential Revision: https://reviews.llvm.org/D81340
2020-06-25 21:55:13 +01:00
David Green 8532b2ee89 [ARM] MVE VCVT lowering for f16->f32 extends
This adds code to lower f16 to f32 fp_exts's using an MVE VCVT
instructions, similar to a recent similar patch for fp_trunc. Again it
goes through the lowering of a BUILD_VECTOR, but is slightly simpler
only having to deal with interleaved indices. It adds a VCVTL node to
lower to, similar to VCVTN.

Differential Revision: https://reviews.llvm.org/D81339
2020-06-25 20:54:26 +01:00
Craig Topper 6673d69226 [X86] Don't imply -mprfchw when -m3dnow is specified. Enable prefetchw in the backend with 3dnow feature.
The PREFETCHW instruction was originally part of the 3DNow. But
it was given its own CPUID bit on later CPUs just before 3DNow
was deprecated.

We were setting the -mprfchw flag if -m3dnow was passed or the CPU
supported 3dnow unless -mno-prfchw was passed. But -march=native
on a CPU without the PRFCHW CPUID bit set will pass -mno-prfchw.
So -march=k8 will behave differently than -march=native on a K8
for example.

So remove this implicit setting from the frontend and instead
enable the backend to use PREFETCHW if 3dnow OR prfchw is enabled.

Also enable PRFCHW flag on amdfam10/barcelona which seems to be
where this CPUID bit was introduced. That CPU also supported
3dnow.
2020-06-25 12:46:52 -07:00
Hubert Tong 0d0dbd6170 [NFC][Support] Make Unix/Program.inc separately compilable
To improve CI checks, make `Unix/Program.inc` separately compilable.
2020-06-25 15:41:17 -04:00
Hubert Tong 7b2eb7a621 [Support][AIX] Add declaration of wait4 to fix build
While `wait4` is not documented for AIX, it is available; however, even
on systems where it is available, the system headers do not always
provide a declaration of the function. This patch provides a declaration
of `wait4` for AIX.

Reviewed By: daltenty

Differential Revision: https://reviews.llvm.org/D82282
2020-06-25 15:40:07 -04:00
Craig Topper 01c18f9199 Revert "[X86] Don't imply -mprfchw when -m3dnow is specified. Enable prefetchw in the backend with 3dnow feature."
This is failing on the bots.

This reverts commit 636d31a5c3.
2020-06-25 11:43:02 -07:00
David Green 0bfb4c2506 [ARM] Add FP_ROUND handling to splitting MVE stores
This splits MVE vector stores of a fp_trunc in the same way that we do
for standard trunc's. It extends PerformSplittingToNarrowingStores to
handle fp_round, splitting the store into pieces and adding a VCVTNb to
perform the actual fp_round. The actual store is then converted to an
integer store so that it can truncate bottom lanes of the result.

Differential Revision: https://reviews.llvm.org/D81141
2020-06-25 19:37:15 +01:00
Craig Topper 636d31a5c3 [X86] Don't imply -mprfchw when -m3dnow is specified. Enable prefetchw in the backend with 3dnow feature.
The PREFETCHW instruction was originally part of the 3DNow. But
it was given its own CPUID bit on later CPUs just before 3DNow
was deprecated.

We were setting the -mprfchw flag if -m3dnow was passed or the CPU
supported 3dnow unless -mno-prfchw was passed. But -march=native
on a CPU without the PRFCHW CPUID bit set will pass -mno-prfchw.
So -march=k8 will behave differently than -march=native on a K8
for example.

So remove this implicit setting from the frontend and instead
enable the backend to use PREFETCHW if 3dnow OR prfchw is enabled.

Also enable PRFCHW flag on amdfam10/barcelona which seems to be
where this CPUID bit was introduced. That CPU also supported
3dnow.
2020-06-25 11:25:35 -07:00
Hiroshi Yamauchi 9878996c70 Revert "[PGO] Extend the value profile buckets for mem op sizes."
This reverts commit 63a89693f0.

Due to a build failure like http://lab.llvm.org:8011/builders/sanitizer-windows/builds/65386/steps/annotate/logs/stdio
2020-06-25 11:13:49 -07:00
Kirill Naumov d48c7859fb [InlineCost] GetElementPtr with constant operands
If the GEP instruction contanins only constants as its arguments,
then it should be recognized as a constant. For now, there was
also added a flag to turn off this simplification if it causes
any regressions ("disable-gep-const-evaluation") which is off
by default. Once I gather needed data of the effectiveness of
this simplification, the flag will be deleted.

Reviewers: apilipenko, davidxl, mtrofin

Reviewed By: mtrofin

Differential Revision: https://reviews.llvm.org/D81026
2020-06-25 18:09:51 +00:00
Hiroshi Yamauchi 63a89693f0 [PGO] Extend the value profile buckets for mem op sizes.
Extend the memop value profile buckets to be more flexible (could accommodate a
mix of individual values and ranges) and to cover more value ranges (from 11 to
22 buckets).

Disabled behind a flag (to be enabled separately) and the existing code to be
removed later.

Differential Revision: https://reviews.llvm.org/D81682
2020-06-25 10:22:56 -07:00
Yuanfang Chen c4b1daed1d [NewPM] Move debugging log printing after PassInstrumentation before-pass-callbacks
For passes got skipped, this is confusing because the log said it is `running pass`
but it is skipped later.

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D82511
2020-06-25 10:03:25 -07:00
Francesco Petrogalli 7200fa38a9 [sve][acle] Add some C intrinsics for brain float types.
Summary:
The following intrinsics has been added:

svuint16_t svcnt[_bf16]_m(svuint16_t inactive, svbool_t pg, svbfloat16_t op)
svuint16_t svcnt[_bf16]_x(svbool_t pg, svbfloat16_t op)
svuint16_t svcnt[_bf16]_z(svbool_t pg, svbfloat16_t op)

svbfloat16_t svtbl[_bf16](svbfloat16_t data, svuint16_t indices)

svbfloat16_t svtbl2[_bf16](svbfloat16x2_t data, svuint16_t indices)

svbfloat16_t svtbx[_bf16](svbfloat16_t fallback, svbfloat16_t data, svuint16_t indices)

Reviewers: c-rhodes, kmclaughlin, efriedma, sdesmalen, ctetreau

Subscribers: tschuett, hiraditya, rkruppe, psnobl, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D82429
2020-06-25 16:31:01 +00:00
Simon Pilgrim f6329a6875 GVN.h - reduce AliasAnalysis.h include to forward declaration. NFC.
Cleanup MemoryDependenceAnalysis.h as well - GVN.h was also implicitly including AliasAnalysis.h via this.

Fix implicit include dependencies in source files and replace legacy AliasAnalysis typedef with AAResults where necessary.
2020-06-25 16:59:35 +01:00
Arthur Eubanks 85ff5b524e [NewPM] Separate out alias analysis passes in opt
Summary:
This somewhat matches the --aa-pipeline option, which separates out any
AA analyses to make sure they run before other passes.

Makes check-llvm failures under new PM go from 2356 -> 2303.

AA passes are not handled by PassBuilder::parsePassPipeline() but rather
PassBuilder::parseAAPipeline(), which is why this fixes some failures.

Reviewers: asbirlea, hans, ychen, leonardchan

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D82488
2020-06-25 08:53:57 -07:00
Sanjay Patel c9e8c9e3ea [InstCombine] fold fmul/fdiv with fabs operands
fabs(X) * fabs(Y) --> fabs(X * Y)
fabs(X) / fabs(Y) --> fabs(X / Y)

If both operands of fmul/fdiv are positive, then the result must be positive.

There's a NAN corner-case that prevents removing the more specific fold just
above this one:
fabs(X) * fabs(X) -> X * X
That fold works even with NAN because the sign-bit result of the multiply is
not specified if X is NAN.

We can't remove that and use the more general fold that is proposed here
because once we convert to this:
fabs (X * X)
...it is not legal to simplify the 'fabs' out of that expression when X is NAN.
That's because fabs() guarantees that the sign-bit is always cleared - even
for NAN values.

So this patch has the potential to lose information, but it seems unlikely if
we do the more specific fold ahead of this one.

Differential Revision: https://reviews.llvm.org/D82277
2020-06-25 11:35:38 -04:00
David Green b044a82270 [ARM] Fixup for signed comparison warning. NFC 2020-06-25 16:29:44 +01:00
Simon Pilgrim 1472e2a792 Remove orphan AMDGPUAAResult::Aliases and AMDGPUAAResult::PathAliases declarations. NFC. 2020-06-25 16:00:44 +01:00
Simon Pilgrim 8c2082e1dc GlobalsModRef.h - reduce CallGraph.h include to forward declarations. NFC.
Fix implicit include dependencies in source files.
2020-06-25 16:00:43 +01:00
Simon Pilgrim db69b17409 LoopAccessAnalysis.h - reduce AliasAnalysis.h include to forward declaration. NFC.
Fix implicit include dependencies in source files and replace legacy AliasAnalysis typedef with AAResults where necessary.
2020-06-25 16:00:42 +01:00
David Green 3cb2190b0b [ARM] MVE VCVT lowering for f32->f16 truncs
This adds code to lower f32 to f16 fp_trunc's using a pair of MVE VCVT
instructions. Due to v4f16 not being legal, fp_round are often split up
fairly early. So this reconstructs the vcvt's from a buildvector of
fp_rounds from two vector inputs. Something like:

BUILDVECTOR(FP_ROUND(EXTRACT_ELT(X, 0),
            FP_ROUND(EXTRACT_ELT(Y, 0),
            FP_ROUND(EXTRACT_ELT(X, 1),
            FP_ROUND(EXTRACT_ELT(Y, 1), ...)

It adds a VCVTN node to handle this, which like VMOVN or VQMOVN lowers
into the top/bottom lanes of an MVE instruction.

Differential Revision: https://reviews.llvm.org/D81139
2020-06-25 15:59:36 +01:00
Victor Campos da852b03b0 [AArch64] Emit warning when disassembling unpredictable LDRAA and LDRAB
Summary:
LDRAA and LDRAB in their writeback variant should softfail when the same
register is used as result and base.

This patch adds a custom decoder that catches such case and emits a
warning when it occurs.

Differential Revision: https://reviews.llvm.org/D82541
2020-06-25 15:56:36 +01:00
Thomas Preud'homme 6c67ee0f58 [MC] Fix PR45805: infinite recursion in assembler
Give up folding an expression if the fragment of one of the operands
would require laying out a fragment already being laid out. This
prevents hitting an infinite recursion when a fill size expression
refers to a later fragment since computing the offset of that fragment
would require laying out the fill fragment and thus computing its size
expression.

Reviewed By: echristo

Differential Revision: https://reviews.llvm.org/D79570
2020-06-25 15:42:36 +01:00
Xing GUO 17326ebbd6 [ObjectYAML][DWARF] Format codes. NFC. 2020-06-25 21:53:06 +08:00
Zarko Todorovski e504a23b63 [NFC][PPC][AIX] Add stack frame layout diagram to PPCISelLowering.cpp
Summary:
This NFC patch adds a diagram of the AIX ABI stack frame layout.

Based on https://www.ibm.com/support/knowledgecenter/en/ssw_aix_72/assembler/idalangref_runtime_process.html

Reviewers: sfertile, cebowleratibm, hubert.reinterpretcast, Xiangling_L

Reviewed By: sfertile

Subscribers: wuzish, nemanjai, hiraditya, kbarton, llvm-commits

Tags: #powerpc, #llvm

Differential Revision: https://reviews.llvm.org/D82408
2020-06-25 09:41:42 -04:00
Guillaume Chatelet 324cda2073 [Alignment][NFC] Conform X86, ARM and AArch64 TargetTransformInfo backends to the public API
The main interface has been migrated to Align already but a few backends where broadening the type from Align to MaybeAlign.
This patch makes sure all implementations conform to the public API.

Differential Revision: https://reviews.llvm.org/D82465
2020-06-25 13:23:13 +00:00
Simon Pilgrim 1815b77c3e LiveIntervals.h.h - reduce AliasAnalysis.h include to forward declaration. NFC.
Fix implicit include dependencies in source files and replace legacy AliasAnalysis typedef with AAResults where necessary.
2020-06-25 14:22:21 +01:00
Simon Pilgrim 1020a661e5 Attributes.cpp - fix include sorting order. NFC. 2020-06-25 14:22:20 +01:00
Simon Pilgrim c941b643e6 IRBuilder.cpp - fix include sorting order. NFC. 2020-06-25 14:22:20 +01:00
Simon Pilgrim 792e4a8c97 CodeGenPrepare.cpp - remove unused IntrinsicsX86.h header. NFC. 2020-06-25 14:22:19 +01:00
Simon Pilgrim 172c36a100 Fix typos in CodeGenPrepare::splitLargeGEPOffsets comments. 2020-06-25 14:22:19 +01:00
Guillaume Chatelet 2e7bba693e [Alignment][NFC] Use Align for TargetCallingConv::OrigAlign
This patch replaces D69249.

This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Differential Revision: https://reviews.llvm.org/D82307
2020-06-25 13:21:22 +00:00
Florian Hahn 4837daf883 [DSE,MSSA] Check if Def is removable only wen we try to remove it.
Non-removable MemoryDefs can still eliminate other defs. Update the
isRemovable checks to only candidates for removal.
2020-06-25 14:01:10 +01:00
Tyker c95ffadb24 [AssumeBundles] Use operand bundles to encode alignment assumptions
Summary:
NOTE: There is a mailing list discussion on this: http://lists.llvm.org/pipermail/llvm-dev/2019-December/137632.html

Complemantary to the assumption outliner prototype in D71692, this patch
shows how we could simplify the code emitted for an alignemnt
assumption. The generated code is smaller, less fragile, and it makes it
easier to recognize the additional use as a "assumption use".

As mentioned in D71692 and on the mailing list, we could adopt this
scheme, and similar schemes for other patterns, without adopting the
assumption outlining.

Reviewers: hfinkel, xbolva00, lebedev.ri, nikic, rjmccall, spatel, jdoerfert, sstefan1

Reviewed By: jdoerfert

Subscribers: yamauchi, kuter, fhahn, merge_guards_bot, hiraditya, bollu, rkruppe, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D71739
2020-06-25 12:59:44 +02:00
Sam Tebbs 187f627a50 [ARM] Allow tail predication on sadd_sat and uadd_sat intrinsics
This patch stops the sadd_sat and uadd_sat intrinsics from blocking tail predication.

Differential revision: https://reviews.llvm.org/D82377
2020-06-25 11:54:29 +01:00
Simon Pilgrim e367c0081c FPEnv.h - reduce includes to forward declarations. NFC.
Ensure FPEnv.cpp includes FPEnv.h first to check for hidden dependencies.
2020-06-25 11:40:45 +01:00
Piotr Sobczak 0045786f14 [AMDGPU] Select s_cselect
Summary:
Add patterns to select s_cselect in the isel.

Handle more cases of implicit SCC accesses in si-fix-sgpr-copies
to allow new patterns to work.

Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, asbirlea, kerbowa, llvm-commits

Tags: #llvm

Re-commit D81925 with a bugfix D82370.

Differential Revision: https://reviews.llvm.org/D81925
Differential Revision: https://reviews.llvm.org/D82370
2020-06-25 10:38:23 +02:00
David Sherwood ee26a31e7b [SVE] Make ConstantFoldGetElementPtr work for scalable vectors of indices
This patch fixes a compiler crash that was hit when trying to simplify
the following code:

getelementptr [2 x i64], [2 x i64]* null, i64 0, <vscale x 2 x i64> zeroinitializer

For the case where we have a null pointer value like above, we just
need to ensure we don't assume the indices are always fixed width.

Differential Revision: https://reviews.llvm.org/D82183
2020-06-25 07:28:19 +01:00
Max Kazantsev 1eeb714787 [InstCombine] Combine select & Phi by same condition
This patch transforms
```
p = phi [x, y]
s = select cond, z, p
```
with
```
s = phi[x, z]
```
if we can prove that the Phi node takes values basing on select's condition.

Differential Revision: https://reviews.llvm.org/D82072
Reviewed By: nikic
2020-06-25 10:44:10 +07:00
Craig Topper a5041987ed [X86] Emit a reg-reg copy for fast isel of vector bitcasts.
Previously we just updated a map and moved on. But it possible
we cached known bits information with the vreg that can be used by
another basic block. If the other basic block has a different view
of the VT these known bits won't make sense.

By emitting a copy we ensure we have different vregs before and
after the bitcast. This prevents the known bits from being used
with the wrong type.

Differential Revision: https://reviews.llvm.org/D82517
2020-06-24 20:15:21 -07:00
Wang, Pengfei b2eb1c5793 [X86] Fix a typo error.
Summary: This will result opcode MULX32Hrm been emitted to MULX32Hrr.

Reviewed by: craig.topper

Differential Revision: https://reviews.llvm.org/D82472
2020-06-25 10:06:27 +08:00
Amara Emerson 090c108d04 Don't inline dynamic allocas that simplify to huge static allocas.
Some sequences of optimizations can generate call sites which may never be
executed during runtime, and through constant propagation result in dynamic
allocas being converted to static allocas with very large allocation amounts.

The inliner tries to move these to the caller's entry block, resulting in the
stack limits being reached/bypassed. Avoid inlining functions if this would
result.

The threshold of 64k currently doesn't get triggered on the test suite with an
-Os LTO build on arm64, care should be taken in changing this in future to avoid
needlessly pessimising inlining behaviour.

Differential Revision: https://reviews.llvm.org/D81765
2020-06-24 17:39:03 -07:00
Xing GUO 93bc571d47 [DWARFYAML][debug_gnu_*] 'Descriptor' field should be 1-byte. NFC.
The 'Descriptor' field of .debug_gnu_pubnames and .debug_gnu_pubtypes
section should be 1-byte rather than 4-byte. This patch helps resolve
this issue.
2020-06-25 08:21:13 +08:00
Kirill Naumov 7f094f7f9d [InlineCost] PrinterPass prints constants to which instructions are simplified
This patch enables printing of constants to see which instructions were
constant-folded. Needed for tests and better visiual analysis of
inliner's work.

Reviewers: apilipenko, mtrofin, davidxl, fedor.sergeev

Reviewed By: mtrofin

Differential Revision: https://reviews.llvm.org/D81024
2020-06-24 22:52:31 +00:00
Scott Linder 4d81aec40c [MIR] Fix CFI_INSTRUCTION escape printing
Summary:
The printer seems to intend to not print the trailing comma but has a
copy-paste error for the last value in the escape, and the parser
enforces having no trailing comma, but somehow a test was never included
to actually confirm it.

Reviewers: thegameg, arsenm

Reviewed By: thegameg, arsenm

Subscribers: wdng, arsenm, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D82478
2020-06-24 18:15:28 -04:00
Roman Lebedev 8911a35180
[SROA] convertValue(): we can have <N x iK*> to <M x iQ> cast
Provided test case crashes otherwise.
Much like to the opposite case.
2020-06-25 00:58:54 +03:00
Roman Lebedev 07a23c06dd
[SROA] convertValue(): we can have <N x iK> to <M x iQ*> cast
Provided test case crashes otherwise.

If NewTy is already DL.getIntPtrType(NewTy),
CreateBitCast() won't actually create any bitcast,
so we are better off just doing the general thing.
2020-06-25 00:58:53 +03:00
Roman Lebedev 2b8d706b19
[IR] GetUnderlyingObject(), stripPointerCastsAndOffsets(): don't crash on `bitcast <1 x i8*> to i8*`
I'm not sure how to write standalone tests for each of two changes here.
If either one of these two fixes is missing, the test fill crash.
2020-06-25 00:58:53 +03:00
Roman Lebedev 381054a989
[InstCombine] visitBitCast(): do not crash on weird `bitcast <1 x i8*> to i8*`
Even if we know that RHS of a bitcast is a pointer,
we can't assume LHS is, because it might be
a single-element vector of pointer.
2020-06-25 00:58:53 +03:00
Roman Lebedev 1e2691fe23
[NFCI] SCEV: promote ScalarEvolutionDivision into an publicly usable class
This makes it usable from outside of SCEV,
while previously it was internal to the ScalarEvolution.cpp

In particular, i want to use it in an WIP alloca promotion helper pass,
to analyze if some SCEV is a multiple of some other SCEV.
2020-06-25 00:58:53 +03:00
Yuanfang Chen ebc88811b5 Remove Passes dependency on CodeGen
The dependency was introduced in
5134020ea6. The only functional change
from this removal would be the new PM interface for the two codegen
passes. This is not necessary since we don't have codegen pipeline using
new PM yet. This removal is to break the potential circular dependency between
Passes and CodeGen once the codegen begins to gain new PM support.
2020-06-24 14:52:46 -07:00
Fangrui Song c6d01ed046 [TextAPI/MachO] Fix style issues. NFC
See https://llvm.org/docs/CodingStandards.html#use-namespace-qualifiers-to-implement-previously-declared-functions
2020-06-24 14:43:45 -07:00
Mitch Phillips 10045cbe01 Revert "[BitcodeReader] Fix DelayedShuffle handling for ConstantExpr shuffles."
Patch has a memory leak bug that broke the ASan buildbots. More info
available at: https://reviews.llvm.org/D80330

This reverts commit b5740105d2.
2020-06-24 14:40:45 -07:00
Stefan Agner b7d41a11cd [ARM] Make cp10 and cp11 usage a warning
The ARM ARM considers p10/p11 valid arguments for MCR/MRC instructions.
MRC instructions with p10 arguments are also used in kernel code which
is shared for different architectures. Turn usage of p10/p11 to warnings
for ARMv7/ARMv8-M.

Reviewers: rengolin, olista01, t.p.northover, efriedma, psmith, simon_tatham

Reviewed By: simon_tatham

Subscribers: hiraditya, danielkiss, jcai19, tpimh, nickdesaulniers, peter.smith, javed.absar, kristof.beyls, jdoerfert, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D59733
2020-06-24 23:37:54 +02:00
Kirill Naumov 6a5d7d498c [InlineCost] InlineCostAnnotationWriterPass introduced
This class allows to see the inliner's decisions for better
optimization verifications and tests. To use, use flag
"-passes="print<inline-cost>"".

This is the second attempt to integrate the patch.
The problem from the first try has been discussed and
fixed in D82205.

Reviewers: apilipenko, mtrofin, davidxl, fedor.sergeev

Reviewed By: mtrofin

Differential revision: https://reviews.llvm.org/D81743
2020-06-24 21:27:07 +00:00
Amy Kwan d82f26cc4b [PowerPC][Power10] Implement Count Leading/Trailing Zeroes Builtins under bit Mask in LLVM/Clang
This patch implements builtins for the following prototypes:

unsigned long long __builtin_cntlzdm (unsigned long long, unsigned long long)
unsigned long long __builtin_cnttzdm (unsigned long long, unsigned long long)
vector unsigned long long vec_cntlzm (vector unsigned long long, vector unsigned long long)
vector unsigned long long vec_cnttzm (vector unsigned long long, vector unsigned long long)

Differential Revision: https://reviews.llvm.org/D80941
2020-06-24 16:03:45 -05:00
Jinsong Ji 81b2d1d112 [NFC][PowerPC] Fix some typos in MachineCombiner comments 2020-06-24 20:40:57 +00:00
Christopher Tetreault 3d123e17d8 [SVE] Remove calls to VectorType::getNumElements from IPO
Reviewers: efriedma, jdoerfert, sdesmalen, kmclaughlin

Reviewed By: efriedma, jdoerfert

Subscribers: tschuett, hiraditya, rkruppe, psnobl, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D82219
2020-06-24 13:38:51 -07:00
dfukalov 7ddee0922f [NFCI][CostModel] Add const to Value*.
Summary:
Get back `const` partially lost in one of recent changes.
Additionally specify explicit qualifiers in few places.

Reviewers: samparker

Reviewed By: samparker

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D82383
2020-06-24 23:16:08 +03:00
Kirill Naumov ca899bf90a [InlineCost] Added InlineCostCallAnalyzer::print()
For the upcoming changes, we need to have an ability to dump
InlineCostCallAnalyzer info in non-debug builds as well.

Reviewed-By: mtrofin
Differential Revision: https://reviews.llvm.org/D82205
2020-06-24 20:07:27 +00:00
Florian Hahn 35bb9bfbb0 [SLP] Limit GEP lists based on width of index computation.
D68667 introduced a tighter limit to the number of GEPs to simplify
together. The limit was based on the vector element size of the pointer,
but the pointers themselves are not actually put in vectors.

IIUC we try to vectorize the index computations here, so we should base
the limit on the vector element size of the computation of the index.

This restores the test regression on AArch64 and also restores the
vectorization for a important pattern in SPEC2006/464.h264ref on
AArch64 (@test_i16_extend). We get a large benefit from doing a single
load up front and then processing the index computations in vectors.

Note that we could probably even further improve the AArch64 codegen, if
we would do zexts to i32 instead of i64 for the sub operands and then do
a single vector sext on the result of the subtractions. AArch64 provides
dedicated vector instructions to do so. Sketch of proof in Alive:
https://alive2.llvm.org/ce/z/A4xYAB

Reviewers: craig.topper, RKSimon, xbolva00, ABataev, spatel

Reviewed By: ABataev, spatel

Differential Revision: https://reviews.llvm.org/D82418
2020-06-24 19:56:53 +01:00
Simon Pilgrim 6c6adde84f InstCombineInternal.h - reduce AliasAnalysis.h include to forward declaration. NFC.
Fix implicit include dependencies in source files and replace legacy AliasAnalysis typedef with AAResults where necessary.
2020-06-24 19:27:38 +01:00
Simon Pilgrim a53dddb3e9 Local.h - reduce includes to forward declarations. NFC.
Fix implicit include dependencies in source files and replace legacy AliasAnalysis typedef with AAResults where necessary.
2020-06-24 19:27:37 +01:00
tatz.j@northeastern.edu af5e61bf4f [NVPTX] Fix for NVPTX module asm regression
Currently module asm ends up emitted twice and at the wrong place in the PTX.
This patch moves module asm generation into emitStartOfAsmFile() which puts at
the correct location in the generated PTX.

Differential Revision: https://reviews.llvm.org/D82280
2020-06-24 11:17:09 -07:00
Teresa Johnson d291bd510e [WPD] Allow virtual calls to be analyzed with multiple type tests
Summary:
In D52514 I had fixed a bug with WPD after indirect call promotion, by
checking that a type test being analyzed dominates potential virtual
calls. With that fix I included a small effiency enhancement to avoid
processing a devirt candidate multiple times (when there are multiple
type tests). This latter change wasn't in response to any measured
efficiency issues, it was merely theoretical. Unfortuantely, it turns
out to limit optimization opportunities after inlining.

Specifically, consider code that looks like:

class A {
  virtual void foo();
};
class B : public A {
  void foo();
}
void callee(A *a) {
  a->foo(); // Call 1
}
void caller(B *b) {
  b->foo(); // Call 2
  callee(b);
}

After inlining callee into caller, because of the existing call to
b->foo() in caller there will be 2 type tests in caller for the vtable
pointer of b: the original type test against B from Call 2, and the
inlined type test against A from Call 1. If the code was compiled with
-fstrict-vtable-pointers, then after optimization WPD will see that
both type tests are associated with the inlined virtual Call 1.
With my earlier change to only process a virtual call against one type
test, we may only consider virtual Call 1 against the base class A type
test, which can't be devirtualized. With my change here to remove this
restriction, it also gets considered for the type test against the
derived class B type test, where it can be devirtualized.

Note that if caller didn't include it's own earlier virtual call
b->foo() we will not be able to devirtualize after inlining callee even
after this fix, since there would not be a type test against B in the
IR. As a future enhancement we can consider inserting type tests at call
sites that pass pointers to classes with virtual calls, to enable
context-sensitive devirtualization after inlining.

Reviewers: pcc, vitalybuka, evgeny777

Subscribers: Prazek, hiraditya, steven_wu, dexonsmith, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D79235
2020-06-24 10:51:24 -07:00
Craig Topper 8dc92142e3 [X86] Replace PROC macros with an enum and a lookup table of processor information.
This patch removes the PROC macro in favor of CPUKind enum and a
table that contains information about CPUs.

The current information in the table is the CPU name, CPUKind enum
value, key feature for target multiversioning, and Is64Bit capable.
For the strings that are aliases, I've duplicated the information
in the table. This means there are more rows in the table than
CPUKind enums.

This replaces multiple StringSwitch's with loops through the table.
They are linear searches due to the table being more logically
ordered than alphabetical. The StringSwitch's would have also been
linear. I've used StringLiteral on the strings in the table so we
can quickly check the length while searching.

I contemplated having a CPUKind for each string so there was a 1:1
mapping, but didn't want to spread more names to the places that
use the enum.

My ultimate goal here is to store the features for each CPU as a
bitset within the table. Hoping to use constexpr to make this
composable so we can group features and inherit them. After the
table lookup we can turn the bitset into a list of strings for the
frontend. The current switch we have for selecting features for
CPUs has become difficult to maintain while trying to express
inheritance relationships.

Differential Revision: https://reviews.llvm.org/D82414
2020-06-24 10:46:25 -07:00
Simon Pilgrim c18b753686 LoopUtils.h - reduce AliasAnalysis.h include to forward declarations. NFC.
Fix implicit include dependencies in source files and replace legacy AliasAnalysis typedef with AAResults where necessary.
2020-06-24 17:58:38 +01:00
dstuttar e8775c8d81 [AMDGPU] Make sure to fix implicit operands on insertBranch
Summary:
Without fixImplicitOperands we may end up creating default implicit operands
that are the wrong wave size

Includes simple test that provokes insertBranch in the correct way to expose the
issue being fixed.

Change-Id: I92bdcdee9fcb7b4d91529b84e76a48ac8218483e

Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, tpr, t-tye, hiraditya, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D82459
2020-06-24 16:50:48 +01:00
Matt Arsenault a448670752 AMDGPU/GlobalISel: Legalize 64-bit G_SDIV/G_SREM
Now all the divisions should be complete, although we should fix
emitting the entire common part for div/rem when you use both.
2020-06-24 11:39:45 -04:00
Matt Arsenault b5c4e6c148 AMDGPU/GlobalISel: Invert parameter for div/rem lowering function 2020-06-24 11:39:45 -04:00
Ikhlas Ajbar 085701b8b0 [Hexagon] Reducing minimum alignment requirement
This patch reduces minimum alignment requirement to 1 byte for arguments
passed by value on stack.
2020-06-24 10:28:37 -05:00
Matt Arsenault 778351df77 Revert "[AMDGPU] Enable compare operations to be selected by divergence"
This reverts commit 521ac0b5ce.

Reported to break thousands of piglit tests.
2020-06-24 11:21:30 -04:00
Arthur Eubanks b5979a383a [NewPM] Add SimpleLoopUnswitchPass to PassRegistry.def
Summary:
Seems to just be missing from PassRegistry.def.

Makes the number of check-llvm failures under new PM go from 2619 to 2581.

Reviewers: hans, ychen, asbirlea, leonardchan

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D82422
2020-06-24 08:20:34 -07:00
Arthur Eubanks fcf0741262 [NewPM] Handle -simplifycfg in opt
Summary:
-simplifycfg is the legacy pass name for SimplifyCFGPass.

There is already -simplify-cfg in FUNCTION_PASS_WITH_PARAMS which
handles options for SimplifyCFGPass. Maybe that should be renamed to
-simplifycfg as well?

This reduces the number of check-llvm failures under NewPM from 2619 to 2392.

Reviewers: hans, leonardchan, asbirlea, ychen

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D82421
2020-06-24 08:20:08 -07:00
Mircea Trofin bdceefe95b [llvm] Release-mode ML InlineAdvisor
Summary:
This implementation uses a pre-trained model which is statically
compiled into a native function.

RFC: http://lists.llvm.org/pipermail/llvm-dev/2020-April/140763.html

Reviewers: davidxl, jdoerfert, dblaikie

Subscribers: mgorny, eraman, hiraditya, arphaman, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D81515
2020-06-24 08:18:42 -07:00
Sanjay Patel a0f967418f [VectorCombine] give invalid index value a name; NFC 2020-06-24 11:10:36 -04:00
Matt Arsenault c5d240093b WebAssembly: Don't store MachineFunction in MachineFunctionInfo
Soon it will be disallowed to depend on MachineFunction state in the
constructor. This was only being used to get the MachineRegisterInfo
for an assert, which I'm not sure is necessarily worth it. I would
think any missing defs would be caught by the verifier later instead.
2020-06-24 10:52:58 -04:00
Tim Corringham c3b3b999ec [AMDGPU] Avoid redundant mode register writes
Summary:
The SIModeRegister pass attempts to generate the minimal number of
writes to the mode register. However it was failing to correctly
deal with some loops, resulting in some redundant setreg instructions
being inserted.

This change amends the pass to avoid generating these redundant
instructions.

Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D82215
2020-06-24 14:11:29 +01:00
Simon Pilgrim bf77c7ef2d Loads.h - reduce AliasAnalysis.h include to forward declarations. NFC.
Fix implicit include dependencies in source files.
2020-06-24 13:49:04 +01:00
Florian Hahn 4e62c6359c [DSE] Eliminate stores at the end of the function.
This patch add support for eliminating MemoryDefs that do not have any
aliasing users, which indicates that there are no reads/writes to the
memory location until the end of the function.

To eliminate such defs, we have to ensure that the underlying object is
not visible in the caller and does not escape via returning. We need a
separate check for that, as InvisibleToCaller does not consider returns.

Reviewers: dmgreen, rnk, efriedma, bryant, asbirlea, Tyker, george.burgess.iv

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D72631
2020-06-24 12:58:20 +01:00
sstefan1 0f426935bb [OpenMPOpt] ICV macro definitions
Summary:
This defines some basic information about ICVs in `OMPKinds.def`.
We also emit remarks with initial values for each function (which are default for now)
as a way to test this.

Reviewers: jdoerfert, JonChesterfield, hamax97, jhuber6

Subscribers: yaxunl, hiraditya, guansong, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D82193
2020-06-24 13:43:35 +02:00
Simon Pilgrim 90ad37646f ObjCARC.h - remove unnecessary includes. NFC.
Add implicit InstIterator.h dependency in ObjCARCContract.cpp
2020-06-24 12:30:59 +01:00
Cullen Rhodes 26502ad609 [AArch64][SVE] Add bfloat16 support to perm and select intrinsics
Summary:
Added for following intrinsics:

  * zip1, zip2, zip1q, zip2q
  * trn1, trn2, trn1q, trn2q
  * uzp1, uzp2, uzp1q, uzp2q
  * splice
  * rev
  * sel

Reviewed By: david-arm

Differential Revision: https://reviews.llvm.org/D82182
2020-06-24 10:04:51 +00:00
Kerry McLaughlin 3d6cab271c [AArch64][SVE] Add bfloat16 support to load intrinsics
Summary:
Bfloat16 support added for the following intrinsics:
 - LD1
 - LD1RQ
 - LDNT1
 - LDNF1
 - LDFF1

Reviewers: sdesmalen, c-rhodes, efriedma, stuij, fpetrogalli, david-arm

Reviewed By: fpetrogalli

Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, danielkiss, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D82298
2020-06-24 10:32:19 +01:00
alex-t 521ac0b5ce [AMDGPU] Enable compare operations to be selected by divergence
Summary: Details: This patch enables SETCC to be selected to S_CMP_* if uniform and V_CMP_* if divergent.

Reviewers: rampitec, arsenm

Reviewed By: rampitec

Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D82194
2020-06-24 11:50:40 +03:00
Simon Tatham b769eb02b5 [ARM][BFloat] Legalize bf16 type even without fullfp16.
Summary:
This change permits scalar bfloats to be loaded, stored, moved and
used as function call arguments and return values, whenever the bf16
feature is supported by the subtarget.

Previously that was only supported in the presence of the fullfp16
feature, because the code generation strategy depended on instructions
from that extension. This change adds alternative code generation
strategies so that those operations can be done even without fullfp16.

The strategy for loads and stores is to replace VLDRH/VSTRH with
integer LDRH/STRH plus a move between register classes. I've written
isel patterns for those, conditional on //not// having the fullfp16
feature (so that in the fullfp16 case, the existing patterns will
still be used).

For function arguments and returns, instead of writing isel patterns
to match `VMOVhr` and `VMOVrh`, I've avoided generating those SDNodes
in the first place, by factoring out the code that constructs them
into helper functions `MoveToHPR` and `MoveFromHPR` which have a
fallback for non-fullfp16 subtargets.

The current output code is not especially pretty: in the new test file
you can see unnecessary store/load pairs implementing no-op bitcasts,
and lots of pointless moves back and forth between FP registers and
GPRs. But it at least works, which is an improvement on the previous
situation.

Reviewers: dmgreen, SjoerdMeijer, stuij, chill, miyuki, labrinea

Reviewed By: dmgreen, labrinea

Subscribers: labrinea, kristof.beyls, hiraditya, danielkiss, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D82372
2020-06-24 09:36:26 +01:00
Craig Topper 8172ed91f8 [X86] Speculatively fix to X86AvoidStoreForwardingBlocks not deference a machine mem operand if there isn't one present.
Eric Christopher informed me that FastISel memcpy handling creates
load/store instructions without mem operands. We should fix that,
but I doubt that's the only case of missed mem operands so seems
better to be defensive here.

I don't have a test case yet, but I'll try to add one if i get a
test from Eric.
2020-06-24 00:13:58 -07:00
Craig Topper 31c40f2d6b [X86] Add mayLoad/mayStore flags to some X87 instructions that don't have isel patterns to infer them from.
Should remove part of the differences in D81833 due to some
some of these getting isel patterns.
2020-06-23 23:40:30 -07:00
Eli Friedman b5740105d2 [BitcodeReader] Fix DelayedShuffle handling for ConstantExpr shuffles.
The indexing was messed up, so the result was completely broken.

Shuffle constant exprs are rare in practice; without vscale types,
constant folding generally elminates them. So sort of hard to trip over.

Fixes regression from D72467.

Differential Revision: https://reviews.llvm.org/D80330
2020-06-23 19:50:30 -07:00
Amara Emerson fceadbcb33 [AArch64][GlobalISel] Improve codegen for some constant vectors by using constant pool loads.
There's more smarts in AArch64ISelLowering that we don't have yet, but this
change incrementally improves some of the more common patterns. I think future
iterations will want to use some combination of PostLegalizerCombiner and the
selector to catch the other cases.

Differential Revision: https://reviews.llvm.org/D82340
2020-06-23 19:23:47 -07:00
Eli Friedman a2caa3b614 Remove GlobalValue::getAlignment().
This function is deceptive at best: it doesn't return what you'd expect.
If you have an arbitrary GlobalValue and you want to determine the
alignment of that pointer, Value::getPointerAlignment() returns the
correct value.  If you want the actual declared alignment of a function
or variable, GlobalObject::getAlignment() returns that.

This patch switches all the users of GlobalValue::getAlignment to an
appropriate alternative.

Differential Revision: https://reviews.llvm.org/D80368
2020-06-23 19:13:42 -07:00
Vedant Kumar f8bd6a75ed [SimplifyCFG] Drop debug loc in SpeculativelyExecuteBB
Summary:
According to HowToUpdateDebugInfo.rst:

```
Preserving the debug locations of speculated instructions can make
it seem like a condition is true when it's not (or vice versa), which
leads to a confusing single-stepping experience
```

This patch follows the recommendation to drop debug locations on
speculated instructions.

Reviewers: aprantl, davide

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D82420
2020-06-23 18:25:52 -07:00
Matt Arsenault a162048a47 AMDGPU/GlobalISel: Fix fixed ABI special VGPR function arguments
I forgot to copy the new fixed function ABI into GlobalISel, so this
was mismatched with the DAG compiled calling function. This was
allocating part of the argument list to v31, which was supposed to be
reserved for the workitem IDs.
2020-06-23 21:21:35 -04:00
Eli Friedman e9d4e34ab8 [AArch64][SVE] Add legalization support for i32/i64 vector srem/urem
Implement them on top of sdiv/udiv, similar to what we do for integer
types.

Potential future work: implementing i8/i16 srem/urem, optimizations for
constant divisors, optimizing the mul+sub to mls.

Differential Revision: https://reviews.llvm.org/D81511
2020-06-23 16:27:52 -07:00
Eli Friedman 90ad786947 [IR] Prefer scalar type for struct indexes in GEP constant expressions.
This has two advantages: one, it's simpler, and two, it doesn't require
heroic pattern matching with scalable vectors.

Also includes a small fix to DataLayout to allow the scalable vector
testcase to work correctly.

Differential Revision: https://reviews.llvm.org/D82061
2020-06-23 16:14:36 -07:00
Sam Clegg e49584a34a [WebAssembly] Fix for use of uninitialized member in WasmObjectWriter.cpp
Currently, section indices may be passed uninitialized by value if
writing the section fails. Removes section indices form class
initialization and returns them from the write{Code,Data}Section
function calls instead.

Patch by Gui Andrade!

Differential Revision: https://reviews.llvm.org/D81702
2020-06-23 15:26:18 -07:00
David Green d604cc6e9a [ARM] Mark more integer instructions as not having side effects.
LDRD and STRD along with UBFX and SBFX are selected from DAGToDAG
transforms, so do not have tblgen patterns. They don't get marked as
having side effects so cannot be scheduled as efficiently as you would
like.

This specifically marks then as not having side effects.

Differential Revision: https://reviews.llvm.org/D82358
2020-06-23 22:45:51 +01:00
Christopher Tetreault 433c9adf7b [SVE] Remove calls to VectorType::getNumElements from AsmParser
Reviewers: efriedma, RKSimon, c-rhodes, fpetrogalli

Reviewed By: fpetrogalli

Subscribers: tschuett, hiraditya, rkruppe, psnobl, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D82208
2020-06-23 14:31:49 -07:00
Zequan Wu 6a822e20ce [ASan][MSan] Remove EmptyAsm and set the CallInst to nomerge to avoid from merging.
Summary: `nomerge` attribute was added at D78659. So, we can remove the EmptyAsm workaround in ASan the MSan and use this attribute.

Reviewers: vitalybuka

Reviewed By: vitalybuka

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D82322
2020-06-23 14:22:53 -07:00
Ryan Santhiraraja f64dc4e686 Preserve GlobalsAA analysis result in InjectTLIMappings
InjectTLIMappings fails to preserve the analysis result of GlobalsAA. Not preserving the analysis might affect benchmark performance. This change fixes this issue.

Patch by: Ryan Santhiraraja <rsanthir@quicinc.com>

Reviewers: fpetrogalli, joerg, fhahn

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D82343
2020-06-23 22:05:42 +01:00
Nikita Popov 6904c7129b [IR] Remove MSVC warning workaround (NFC)
While LLVM does fold this to x+1, GCC does not. As this is hot
code, let's try to avoid that.

According to
https://developercommunity.visualstudio.com/content/problem/211134/unsigned-integer-overflows-in-constexpr-functionsa.html
this spurious warning in MSVC has been fixed in Visual Studio 2019
Version 16.4. Let's see if there are any build bots running old
MSVC versions with warnings treated as errors...
2020-06-23 22:33:57 +02:00
Christopher Tetreault e6d8636935 [SVE] Remove calls to VectorType::getNumElements from Bitcode
Reviewers: efriedma, evgeny777, tejohnson, david-arm, kmclaughlin

Reviewed By: david-arm

Subscribers: tschuett, hiraditya, rkruppe, psnobl, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D82209
2020-06-23 13:21:40 -07:00
Nikita Popov 52e86797ba [IR] Remove unnecessary uint64_t casts (NFC)
As pointed out by foad, it's not necessary to work on uint64_t
here. The values used here fit uint8_t.
2020-06-23 22:20:15 +02:00
Florian Hahn ff4de8683a [DSE,MSSA] Treat `store 0` after calloc as noop stores.
This patch extends storeIsNoop to also detect stores of 0 to an calloced
object. This basically ports the logic from legacy DSE to the MemorySSA
backed version.

It triggers in a few cases on MultiSource, SPEC2000, SPEC2006 with -O3
LTO:

Same hash: 218 (filtered out)
Remaining: 19
Metric: dse.NumNoopStores

Program                                        base   patch2 diff
 test-suite...CFP2000/177.mesa/177.mesa.test     1.00  15.00 1400.0%
 test-suite...6/482.sphinx3/482.sphinx3.test     1.00  14.00 1300.0%
 test-suite...lications/ClamAV/clamscan.test     2.00  28.00 1300.0%
 test-suite...CFP2006/433.milc/433.milc.test     1.00   8.00 700.0%
 test-suite...pplications/oggenc/oggenc.test     2.00   9.00 350.0%
 test-suite.../CINT2000/176.gcc/176.gcc.test     6.00   6.00  0.0%
 test-suite.../CINT2006/403.gcc/403.gcc.test    NaN   137.00  nan%
 test-suite...libquantum/462.libquantum.test    NaN     3.00  nan%
 test-suite...6/464.h264ref/464.h264ref.test    NaN     7.00  nan%
 test-suite...decode/alacconvert-decode.test    NaN     2.00  nan%
 test-suite...encode/alacconvert-encode.test    NaN     2.00  nan%
 test-suite...ications/JM/ldecod/ldecod.test    NaN     9.00  nan%
 test-suite...ications/JM/lencod/lencod.test    NaN    39.00  nan%
 test-suite.../Applications/lemon/lemon.test    NaN     2.00  nan%
 test-suite...pplications/treecc/treecc.test    NaN     4.00  nan%
 test-suite...hmarks/McCat/08-main/main.test    NaN     4.00  nan%
 test-suite...nsumer-lame/consumer-lame.test    NaN     3.00  nan%
 test-suite.../Prolangs-C/bison/mybison.test    NaN     1.00  nan%
 test-suite...arks/mafft/pairlocalalign.test    NaN    30.00  nan%

Reviewers: efriedma, zoecarver, asbirlea

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D82204
2020-06-23 21:01:39 +01:00
Your Name cc9d693856 [AMDGPU/MemOpsCluster] Implement new heuristic for computing max mem ops cluster size
Summary:
Make use of both the - (1) clustered bytes and (2) cluster length, to decide on
the max number of mem ops that can be clustered. On an average, when loads
are dword or smaller, consider `5` as max threshold, otherwise `4`. This
heuristic is purely based on different experimentation conducted, and there is
no analytical logic here.

Reviewers: foad, rampitec, arsenm, vpykhtin

Reviewed By: rampitec

Subscribers: llvm-commits, kerbowa, hiraditya, t-tye, Anastasia, tpr, dstuttard, yaxunl, nhaehnle, wdng, jvesely, kzhuravl, thakis

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D82393
2020-06-24 00:39:41 +05:30
Christopher Tetreault 4d1fd33561 [SVE] Remove calls to VectorType::getNumElements from FuzzMutate
Reviewers: efriedma, bkramer, kmclaughlin, sdesmalen

Reviewed By: sdesmalen

Subscribers: tschuett, hiraditya, rkruppe, psnobl, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D82212
2020-06-23 11:02:20 -07:00
Simon Pilgrim e7e204a373 [X86][AVX] Attempt to lower v16i32/v16f32 shuffles with lowerShuffleAsRepeatedMaskAndLanePermute
Avoids prematurely creating permps/permd variable shuffles.

Fixes PR46249
2020-06-23 18:33:50 +01:00
Simon Pilgrim ddc6ec9470 WithColor.h - reduce CommandLine.h include to forward declaration. NFC.
WithColor.h is one of the most common headers, we can severely reduce its frontend impact (in ClangBuildAnalyzer reports) by removing the bulky CommandLine.h include, forward declaring llvm:🆑:OptionCategory and just including raw_ostream.h instead.
2020-06-23 17:07:53 +01:00
Xing GUO 45fa936855 [ObjectYAML][DWARF] Remove unused context. NFC.
The context is unused. This patch helps remove it.

Reviewed By: jhenderson

Differential Revision: https://reviews.llvm.org/D82351
2020-06-24 00:02:51 +08:00
Xing GUO fad54c50e4 [ObjectYAML][ELF] Add support for emitting the .debug_pubtypes section.
This patch helps add support for emitting the .debug_pubtypes section.

Reviewed By: jhenderson

Differential Revision: https://reviews.llvm.org/D82347
2020-06-24 00:01:07 +08:00
Momchil Velikov adf7973fd3 [ARM] Describe defs/uses of VLLDM and VLSTM
The VLLDM and VLSTM instructions are incompletely specified.  They
(potentially) write (or read, respectively) registers Q0-Q7, VPR, and
FPSCR, but the compiler is unaware of it.

In the new test case `cmse-vlldm-no-reorder.ll` case the compiler
missed an anti-dependency and reordered a `VLLDM` ahead of the
instruction, which stashed the return value from the non-secure call,
effectively clobbering said value.

This test case does not fail with upstream LLVM, because of scheduling
differences and I couldn't find a test case for the VLSTM either.

Differential Revision: https://reviews.llvm.org/D81586
2020-06-23 16:04:23 +01:00
Valentin Clement d90443b1d9 [openmp] Base of tablegen generated OpenMP common declaration
Summary:
As discussed previously when landing patch for OpenMP in Flang, the idea is
to share common part of the OpenMP declaration between the different Frontend.
While doing this it was thought that moving to tablegen instead of Macros will also
give a cleaner and more powerful way of generating these declaration.
This first part of a future series of patches is setting up the base .td file for
DirectiveLanguage as well as the OpenMP version of it. The base file is meant to
be used by other directive language such as OpenACC.
In this first patch, the Directive and Clause enums are generated with tablegen
instead of the macros on OMPConstants.h. The next pacth will extend this
to other enum and move the Flang frontend to use it.

Reviewers: jdoerfert, DavidTruby, fghanim, ABataev, jdenny, hfinkel, jhuber6, kiranchandramohan, kiranktp

Reviewed By: jdoerfert, jdenny

Subscribers: arphaman, martong, cfe-commits, mgorny, yaxunl, hiraditya, guansong, jfb, sstefan1, aaron.ballman, llvm-commits

Tags: #llvm, #openmp, #clang

Differential Revision: https://reviews.llvm.org/D81736
2020-06-23 10:32:32 -04:00
Mikhail Maltsev 3f353a2e5a [BFloat] Add convert/copy instrinsic support
This patch is part of a series implementing the Bfloat16 extension of the Armv8.6-a architecture, as detailed here:

https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/arm-architecture-developments-armv8-6-a

Specifically it adds intrinsic support in clang and llvm for Arm and AArch64.

The bfloat type, and its properties are specified in the Arm Architecture Reference Manual:

https://developer.arm.com/docs/ddi0487/latest/arm-architecture-reference-manual-armv8-for-armv8-a-architecture-profile

The following people contributed to this patch:
  - Alexandros Lamprineas
  - Luke Cheeseman
  - Mikhail Maltsev
  - Momchil Velikov
  - Luke Geeson

Differential Revision: https://reviews.llvm.org/D80928
2020-06-23 14:27:05 +00:00
Matt Arsenault db777eaea3 AMDGPU/GlobalISel: Fix asserts on non-s32 sitofp/uitofp sources
The combine to form cvt_f32_ubyte0 was assuming the source type was
always 32-bit, but this needs to tolerate any legal source type.
2020-06-23 10:00:35 -04:00
Xing GUO 8c7775e9a7 [ObjectYAML][ELF] Add support for emitting the .debug_pubnames section.
This patch helps add support for emitting the .debug_pubnames section to yaml2elf.

Known issues:
- Current implementation doesn't support emitting multiple sets of entries.
- Doesn't support DWARF64.

Reviewed By: jhenderson

Differential Revision: https://reviews.llvm.org/D82296
2020-06-23 20:40:33 +08:00
Mikhail Maltsev 9c579540ff [ARM] BFloat MatMul Intrinsics&CodeGen
Summary:
This patch adds support for BFloat Matrix Multiplication Intrinsics
and Code Generation from __bf16 to AArch32. This includes IR intrinsics. Tests are
provided as needed.

This patch is part of a series implementing the Bfloat16 extension of
the
Armv8.6-a architecture, as detailed here:

https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/arm-architecture-developments-armv8-6-a

The bfloat type and its properties are specified in the Arm
Architecture
Reference Manual:

https://developer.arm.com/docs/ddi0487/latest/arm-architecture-reference-manual-armv8-for-armv8-a-architecture-profile

The following people contributed to this patch:

 - Luke Geeson
 - Momchil Velikov
 - Mikhail Maltsev
 - Luke Cheeseman
 - Simon Tatham

Reviewers: stuij, t.p.northover, SjoerdMeijer, sdesmalen, fpetrogalli, LukeGeeson, simon_tatham, dmgreen, MarkMurrayARM

Reviewed By: MarkMurrayARM

Subscribers: MarkMurrayARM, danielkiss, kristof.beyls, hiraditya, cfe-commits, llvm-commits, chill, miyuki

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D81740
2020-06-23 12:06:37 +00:00
hsmahesha 5832950adb [AMDGPU/MemOpsCluster] Compute `width` for `MIMG` instruction class.
Summary:
`width` computation is missing for newly added `MIMG`
instruction class. Add it.

Reviewers: foad, rampitec, arsenm

Reviewed By: foad

Subscribers: MatzeB, javed.absar, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D81649
2020-06-23 17:32:17 +05:30
Georgii Rymar 1e820e82b1 [DebugInfo/DWARF] - Do not hang when CFI are truncated.
Currently when the .eh_frame section is truncated so that
CFI instructions can't be read, it is possible to enter
an infinite loop.

It happens because `CFIProgram::parse` does not handle errors properly.
This patch fixes the issue.

Differential revision: https://reviews.llvm.org/D82017
2020-06-23 14:39:24 +03:00
Simon Pilgrim cdceef4a4f [Analysis] Ensure we include CommandLine.h if we declare any cl::opt flags. NFC. 2020-06-23 12:29:51 +01:00
Sander de Smalen 121e585ec8 [AArch64][SVE] ACLE: Add bfloat16 to struct load/stores.
This patch contains:
- Support in LLVM CodeGen for bfloat16 types for ld2/3/4 and st2/3/4.
- New bfloat16 ACLE builtins for svld(2|3|4)[_vnum] and svst(2|3|4)[_vnum]

Reviewers: stuij, efriedma, c-rhodes, fpetrogalli

Reviewed By: fpetrogalli

Tags: #clang, #lldb, #llvm

Differential Revision: https://reviews.llvm.org/D82187
2020-06-23 12:12:35 +01:00
Simon Pilgrim 36bc10e74a [Transforms] Ensure we include CommandLine.h if we declare any cl::opt flags 2020-06-23 12:11:51 +01:00
Kerry McLaughlin 5080503174 [SVE][CodeGen] Legalisation of vsetcc with scalable types
Summary: Changes SplitVecOp_VSETCC to use getVectorElementCount()

Reviewers: sdesmalen, efriedma, dancgr

Reviewed By: efriedma

Subscribers: david-arm, tschuett, hiraditya, rkruppe, psnobl, huihuiz, cfe-commits, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D79167
2020-06-23 11:56:29 +01:00
Roman Lebedev d57e9aca01
[IndVarSimplify] Don't replace IV user with unsafe loop-invariant (PR45360)
Summary:
As [[ https://bugs.llvm.org/show_bug.cgi?id=45360 | PR45360 ]] reports,
with new cost-model we can sometimes end up being able to expand `udiv`/`urem` instructions.
And that exposes at least one instance of when we do that
regardless of whether or not it is safe to do.
In this particular case, it's `SimplifyIndvar::replaceIVUserWithLoopInvariant()`.

It seems to me, we simply need to check with `isSafeToExpandAt()` first.

The test isn't great. I'm not sure how to make it only run `-indvars`.

Fixes [[ https://bugs.llvm.org/show_bug.cgi?id=45360 | PR45360 ]].

Reviewers: mkazantsev, reames, helloqirun

Reviewed By: mkazantsev

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D82108
2020-06-23 13:53:15 +03:00
Chen Zheng 7ab05d9a60 [PowerPC] fold addi's imm operand to its imm form consumer's displacement
This patch adds a function to do following transformation:

%0:g8rc_and_g8rc_nox0 = ADDI8 %5:g8rc_and_g8rc_nox0, 144
STD killed %7:g8rc, 16, %0:g8rc_and_g8rc_nox0 :: (store 8 into %ir.8)

------>

STD killed %7:g8rc, 160, %5:g8rc_and_g8rc_nox0 :: (store 8 into %ir.8)

Reviewed By: steven.zhang

Differential Revision: https://reviews.llvm.org/D81723
2020-06-23 06:28:18 -04:00
Simon Pilgrim 4c257bb44e [X86] truncateVectorWithPACK - fix outdated comment. NFC.
We perform PACKSS/PACKUS on AVX512 targets if the calling function wants to.
2020-06-23 10:45:27 +01:00
Simon Pilgrim bcc0dc3832 [DAG] visitSIGN_EXTEND_INREG - rename EVT variable. NFCI.
We had a EVT type variable called EVT, which isn't a good idea....
2020-06-23 10:45:27 +01:00
Paul Walker 499c63288f [SVE] Code generation for fixed length vector loads & stores.
Summary:
This patch adds base support for code generating fixed length
vector operations targeting a known SVE vector length. To achieve
this we lower fixed length vector operations to equivalent scalable
vector operations, whereby SVE predication is used to limit the
elements processed to those present within the fixed length vector.

Specifically this patch implements load and store operations, which
get lowered to their masked counterparts thusly:

  V = load(Addr) =>
    V = extract_fixed_vector(masked_load(make_pred(V.NumElts), Addr))

  store(V, (Addr)) =>
    masked_store(insert_fixed_vector(V), make_pred(V.NumElts), Addr))

Reviewers: rengolin, efriedma

Subscribers: tschuett, hiraditya, rkruppe, psnobl, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D80385
2020-06-23 09:39:03 +00:00
Dylan McKay b9c26a9cfe [AVR] Rewrite the function calling convention.
Summary:
The previous version relied on the standard calling convention using
std::reverse() to try to force the AVR ABI. But this only works for
simple cases, it fails for example with aggregate types.

This patch rewrites the calling convention with custom C++ code, that
implements the ABI defined in https://gcc.gnu.org/wiki/avr-gcc.

To do that it adds a few 16-bit pseudo registers for unaligned argument
passing, such as R24R23. For example this function:

    define void @fun({ i8, i16 } %a)

will pass %a.0 in R22 and %a.1 in R24R23.

There are no instructions that can use these pseudo registers, so a new
register class, DREGSMOVW, is defined to make them apart.

Also the ArgCC_AVR_BUILTIN_DIV is no longer necessary, as it is
identical to the C++ behavior (actually the clobber list is more strict
for __div* functions, but that is currently unimplemented).

Reviewers: dylanmckay

Subscribers: Gaelan, Sh4rK, indirect, jwagen, efriedma, dsprenkels, hiraditya, Jim, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D68524

Patch by Rodrigo Rivas Costa.
2020-06-23 21:36:18 +12:00
James Henderson 9782c922cb [DebugInfo] Print line table extended opcode bytes if parsing fails
Previously, if there was an error whilst parsing the operands of an
extended opcode, the operands would be treated as zero and printed. This
could potentially be slightly confusing. This patch changes the
behaviour to print the raw bytes instead.

Reviewed by: ikudrin

Differential Revision: https://reviews.llvm.org/D81570
2020-06-23 10:04:02 +01:00
Simon Pilgrim 7a55d98497 ProfileSummary.cpp - fix implicit Format.h dependency. NFC.
ProfileSummary was depending on other headers (notably WithColor.h) to define format().
2020-06-23 09:43:40 +01:00
Simon Pilgrim 0acd22b8fb StatepointLowering.cpp - fix implicit CommandLine.h dependency. NFC.
StatepointLowering defines a cl::opt but don't include CommandLine.h.
2020-06-23 09:43:39 +01:00
Florian Hahn a822ec75cc [DSE,MSSA] Treat passed by value args as invisible to caller.
This updates the MemorySSA backed implementation to treat arguments
passed by value similar to allocas: in they are assumed to be invisible
in the caller. This is similar to how they are treated in legacy DSE.

Reviewers: efriedma, asbirlea, george.burgess.iv

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D82222
2020-06-23 08:58:51 +01:00
Alex Lorenz 1c4a42a4d8 [Triple] support macOS 11 os version number
macOS goes to 11! This commit adds support for the new version number by ensuring
that existing version comparison routines, and the 'darwin' OS identifier
understands the new numbering scheme. It also adds a new utility method
'getCanonicalVersionForOS', which lets users translate some uses of
macOS 10.16 into macOS 11. This utility method will be used in upcoming
clang and swift commits.

Differential Revision: https://reviews.llvm.org/D82337
2020-06-22 23:03:47 -07:00
Michael Liao f95850ce9c [SROA] Teach SROA to perform no-op pointer conversion.
Summary:
- When promoting a pointer from memory to register, SROA skips pointers
  from different address spaces. However, as `ptrtoint` and `inttoptr`
  are defined as no-op casts if that integer type has the same as the
  pointer value, generate the pair of `ptrtoint`/`inttoptr` (no-op cast)
  sequence to convert pointers from different address spaces if they
  have the same size.

Reviewers: arsenm, chandlerc, lebedev.ri

Subscribers:

Differential Revision: https://reviews.llvm.org/D81943
2020-06-23 01:49:27 -04:00
Max Kazantsev 9bff376e5c [InstCombine] Replace selects with Phis
We can sometimes replace a select with a Phi node if all of its values
are available on respective incoming edges.

Differential Revision: https://reviews.llvm.org/D82005
Reviewed By: nikic
2020-06-23 12:12:59 +07:00
Michael Liao b1360caa82 [SDAG] Add new AssertAlign ISD node.
Summary:
- AssertAlign node records the guaranteed alignment on its source node,
  where these alignments are retrieved from alignment attributes in LLVM
  IR. These tracked alignments could help DAG combining and lowering
  generating efficient code.
- In this patch, the basic support of AssertAlign node is added. So far,
  we only generate AssertAlign nodes on return values from intrinsic
  calls.
- Addressing selection in AMDGPU is revised accordingly to capture the
  new (base + offset) patterns.

Reviewers: arsenm, bogner

Subscribers: jvesely, wdng, nhaehnle, tpr, hiraditya, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D81711
2020-06-23 00:51:11 -04:00
Amy Kwan 19df9e2959 [PowerPC][Power10] Implement VSX PCV Generate Operations in LLVM/Clang
This patch implements builtins for the following prototypes for the VSX Permute
Control Vector Generate with Mask Instructions:

vector unsigned char vec_genpcvm (vector unsigned char, const int);
vector unsigned short vec_genpcvm (vector unsigned short, const int);
vector unsigned int vec_genpcvm (vector unsigned int, const int);
vector unsigned long long vec_genpcvm (vector unsigned long long, const int);

Differential Revision: https://reviews.llvm.org/D81774
2020-06-22 21:09:34 -05:00
Sanjay Patel 8953ecf22b [InstCombine] reassociate diff of sums into sum of diffs
This is the integer sibling to D81491.

(a[0] + a[1] + a[2] + a[3]) - (b[0] + b[1] + b[2] +b[3]) -->
(a[0] - b[0]) + (a[1] - b[1]) + (a[2] - b[2]) + (a[3] - b[3])

Removing the "experimental" from these intrinsics is likely
not too far away.
2020-06-22 20:47:09 -04:00
Sanjay Patel 54143e2bd5 [VectorCombine] do not use magic number for undef mask element; NFC 2020-06-22 20:47:09 -04:00
Ayke van Laethem eac4a60154
[AVR] Disassemble double register instructions
Add disassembly support for the movw, adiw, and sbiw instructions.

I had previously committed test cases for the adiw and sbiw
instructions, but had accidentally made them not runnable so they were
skipped all this time. Oops. This patch fixes that by adding support for
disassembling those instructions.

Differential Revision: https://reviews.llvm.org/D82093
2020-06-23 02:18:04 +02:00
Ayke van Laethem 9f09c29f01
[AVR] Disassemble instructions with fixed Z operand
Some instructions have a fixed Z register and don't have an explicit
register operand. This can be worked around by simply printing the
operand directly if the particular register class is detected.

The LPM and ELPM instructions also needed a custom decoder, which is
also included in this patch.

Differential Revision: https://reviews.llvm.org/D82088
2020-06-23 02:17:53 +02:00
Ayke van Laethem ec9efb856c
[AVR] Disassemble multiplication instructions
These can often only use a limited range of registers, and apparently
need special decoding support.

Differential Revision: https://reviews.llvm.org/D81971
2020-06-23 02:17:37 +02:00
Ayke van Laethem 01c2209d51
[AVR] Decode single register instructions
This is a set of instructions that take just a single register as an
operand, with no immediates. Because all instructions share the same
format, I haven't added exhaustive bit testing to all instructions but
just to the inc instruction.

Differential Revision: https://reviews.llvm.org/D81968
2020-06-23 02:17:15 +02:00
Ayke van Laethem ff4817ec2a
[AVR] Don't adjust for instruction size
I'm not entirely sure why this was ever needed, but when I remove both
adjustments all tests still pass.

This fixes a bug where a long branch (using the `jmp` instead of the
`rjmp` instruction) was incorrectly adjusted by 2 because it jumps to an
absolute address instead of a PC-relative address. I could have added
AVR::fixup_call to the list of exceptions, but it seemed more sensible
to me to just remove this code.

Differential Revision: https://reviews.llvm.org/D78459
2020-06-23 02:15:42 +02:00
Sam Clegg 79aad89d8d [WebAssembly] Add support for externalref to MC and wasm-ld
This allows code for handling externref values to be processed by the
assembler and linker.

Differential Revision: https://reviews.llvm.org/D81977
2020-06-22 15:57:24 -07:00
Christopher Tetreault cd6848f6e1 [SVE] Remove calls to VectorType::getNumElements from ARM
Reviewers: efriedma, greened, c-rhodes, david-arm, dmgreen

Reviewed By: dmgreen

Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, dmgreen, danielkiss, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D82216
2020-06-22 15:18:58 -07:00
Arthur Eubanks d335c1317b Fix dynamic alloca detection in CloneBasicBlock
Summary:
Simply check AI->isStaticAlloca instead of reimplementing checks for
static/dynamic allocas.

Reviewers: efriedma

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D82328
2020-06-22 15:06:28 -07:00
Craig Topper 23654d9e7a Recommit "[X86] Calculate the needed size of the feature arrays in _cpu_indicator_init and getHostCPUName using the size of the feature enum."
Hopefully this version will fix the previously buildbot failure
2020-06-22 13:32:03 -07:00
Greg Clayton ccf5a44917 Fix the verification of DIEs with DW_AT_ranges.
Summary: Previous code would try to verify DW_AT_ranges and if any ranges would overlap, it would stop attributing any ranges after this to the DIE which caused incorrect errors to be reported that a DIE's address ranges were not contained in the parent DIE's ranges. Added a fix and a test.

Reviewers: aprantl, labath, probinson, JDevlieghere, jhenderson

Subscribers: hiraditya, MaskRay, cmtice, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D79962
2020-06-22 13:13:48 -07:00
Lei Zhang 315bd96437 Use std::make_tuple instead initializer list
Hopefully this pleases GCC-5 and fixes the build error:

LowerExpectIntrinsic.cpp:62:53: error: converting to
'std::tuple<unsigned int, unsigned int>' from initializer list would use
explicit constructor 'constexpr std::tuple<_T1, _T2>::tuple(_U1&&,
_U2&&) [with _U1 = llvm:🆑:opt<unsigned int>&; _U2 =
llvm:🆑:opt<unsigned int>&; <template-parameter-2-3> = void; _T1 =
unsigned int; _T2 = unsigned int]'
     return {LikelyBranchWeight, UnlikelyBranchWeight};

Differential Revision: https://reviews.llvm.org/D82325
2020-06-22 15:43:40 -04:00
Hans Wennborg 1357c06578 Revert "[X86][SSE] MatchVectorAllZeroTest - handle OR vector reductions"
This caused a Chromium test to miscompile. See discussion on the Phabricator
review.

> This patch extends MatchVectorAllZeroTest to handle OR vector reduction patterns where the result is compared against zero.
>
> Fixes PR45378
>
> Differential Revision: https://reviews.llvm.org/D81547

This reverts 057c9c7ee0
2020-06-22 21:27:11 +02:00
Christopher Tetreault 5e2c736395 [SVE] Remove calls to VectorType::getNumElements from WebASM
Summary:
The getNumElements in base VectorType is being deprecated.

See: http://lists.llvm.org/pipermail/llvm-dev/2020-March/139811.html

Reviewers: efriedma, tlively, fpetrogalli, c-rhodes, dschuff

Reviewed By: tlively, dschuff

Subscribers: dschuff, sbc100, tschuett, jgravelle-google, hiraditya, aheejin, rkruppe, psnobl, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D82217
2020-06-22 12:25:08 -07:00
Craig Topper bebea4221d Revert "[X86] Calculate the needed size of the feature arrays in _cpu_indicator_init and getHostCPUName using the size of the feature enum."
Seems to breaking build.

This reverts commit 5ac144fe64.
2020-06-22 12:20:40 -07:00
Craig Topper 5ac144fe64 [X86] Calculate the needed size of the feature arrays in _cpu_indicator_init and getHostCPUName using the size of the feature enum.
Move 0 initialization up to the caller so we don't need to know
the size.
2020-06-22 11:46:20 -07:00
Zhi Zhuang 37fb860301 Add support of __builtin_expect_with_probability
Add a new builtin-function __builtin_expect_with_probability and
intrinsic llvm.expect.with.probability.
The interface is __builtin_expect_with_probability(long expr, long
expected, double probability).
It is mainly the same as __builtin_expect besides one more argument
indicating the probability of expression equal to expected value. The
probability should be a constant floating-point expression and be in
range [0.0, 1.0] inclusive.
It is similar to builtin-expect-with-probability function in GCC
built-in functions.

Differential Revision: https://reviews.llvm.org/D79830
2020-06-22 10:21:28 -07:00
Hiroshi Yamauchi 9e1decf743 [PGO][PGSO] Enable non-cold size opts under partial profile sample PGO.
Summary: Similar to D81020. Follow up D78949.

Reviewers: davidxl

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D82053
2020-06-22 10:12:48 -07:00
Francesco Petrogalli ef597eda8e [sve][acle] Add SVE BFloat16 extensions.
Summary:
List of intrinsics:

svfloat32_t svbfdot[_f32](svfloat32_t op1, svbfloat16_t op2, svbfloat16_t op3)
svfloat32_t svbfdot[_n_f32](svfloat32_t op1, svbfloat16_t op2, bfloat16_t op3)
svfloat32_t svbfdot_lane[_f32](svfloat32_t op1, svbfloat16_t op2, svbfloat16_t op3, uint64_t imm_index)

svfloat32_t svbfmmla[_f32](svfloat32_t op1, svbfloat16_t op2, svbfloat16_t op3)

svfloat32_t svbfmlalb[_f32](svfloat32_t op1, svbfloat16_t op2, svbfloat16_t op3)
svfloat32_t svbfmlalb[_n_f32](svfloat32_t op1, svbfloat16_t op2, bfloat16_t op3)
svfloat32_t svbfmlalb_lane[_f32](svfloat32_t op1, svbfloat16_t op2, svbfloat16_t op3, uint64_t imm_index)

svfloat32_t svbfmlalt[_f32](svfloat32_t op1, svbfloat16_t op2, svbfloat16_t op3)
svfloat32_t svbfmlalt[_n_f32](svfloat32_t op1, svbfloat16_t op2, bfloat16_t op3)
svfloat32_t svbfmlalt_lane[_f32](svfloat32_t op1, svbfloat16_t op2, svbfloat16_t op3, uint64_t imm_index)

svbfloat16_t svcvt_bf16[_f32]_m(svbfloat16_t inactive, svbool_t pg, svfloat32_t op)
svbfloat16_t svcvt_bf16[_f32]_x(svbool_t pg, svfloat32_t op)
svbfloat16_t svcvt_bf16[_f32]_z(svbool_t pg, svfloat32_t op)

svbfloat16_t svcvtnt_bf16[_f32]_m(svbfloat16_t even, svbool_t pg, svfloat32_t op)
svbfloat16_t svcvtnt_bf16[_f32]_x(svbfloat16_t even, svbool_t pg, svfloat32_t op)

For reference, see section 7.2 of "Arm C Language Extensions for SVE - Version 00bet4"

Reviewers: sdesmalen, ctetreau, efriedma, david-arm, rengolin

Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D82141
2020-06-22 16:53:02 +00:00
Sanjay Patel 9934cc544c [VectorCombine] make helper function for shift-shuffle; NFC
This will probably be useful for other extract patterns.
2020-06-22 12:23:52 -04:00
Florian Hahn 328c8642e2 [DSE,MSSA] Reorder DSE blocking checks.
Currently we stop exploring candidates too early in some cases.

In particular, we can continue checking the defining accesses of
non-removable MemoryDefs and defs without analyzable write location
(read clobbers are already ruled out using MemorySSA at this point).
2020-06-22 17:16:34 +01:00
Fangrui Song c52bee61e9 [MCParser] Support quoted section name for COFF
This features matches ELFAsmParser and makes it possible to use `.section ".llvm.call-graph-profile","n"`

Reviewed By: zequanwu

Differential Revision: https://reviews.llvm.org/D82240
2020-06-22 09:11:44 -07:00
stozer 539381da26 [DebugInfo] Update MachineInstr to help support variadic DBG_VALUE instructions
Following on from this RFC[0] from a while back, this is the first patch towards
implementing variadic debug values.

This patch specifically adds a set of functions to MachineInstr for performing
operations specific to debug values, and replacing uses of the more general
functions where appropriate. The most prevalent of these is replacing
getOperand(0) with getDebugOperand(0) for debug-value-specific code, as the
operands corresponding to values will no longer be at index 0, but index 2 and
upwards: getDebugOperand(x) == getOperand(x+2). Similar replacements have been
added for the other operands, along with some helper functions to replace
oft-repeated code and operate on a variable number of value operands.

[0] http://lists.llvm.org/pipermail/llvm-dev/2020-February/139376.html<Paste>

Differential Revision: https://reviews.llvm.org/D81852
2020-06-22 16:01:12 +01:00
Guillaume Chatelet 597a9070b5 [ARC] Add missing return statement 2020-06-22 14:57:29 +00:00
Sanjay Patel 98c2f4eea5 [VectorCombine] add helper to replace uses and rename
The tests are regenerated to show a path that missed renaming,
but there should be no functional difference from this patch.
2020-06-22 09:58:49 -04:00
Valentin Clement 8383ac6197 Revert commit 9e52530 because of dependencies issue
This reverts commit 9e525309fb.
2020-06-22 09:56:14 -04:00
Valentin Clement 9e525309fb [openmp] Base of tablegen generated OpenMP common declaration
Summary:
As discussed previously when landing patch for OpenMP in Flang, the idea is
to share common part of the OpenMP declaration between the different Frontend.
While doing this it was thought that moving to tablegen instead of Macros will also
give a cleaner and more powerful way of generating these declaration.
This first part of a future series of patches is setting up the base .td file for
DirectiveLanguage as well as the OpenMP version of it. The base file is meant to
be used by other directive language such as OpenACC.
In this first patch, the Directive and Clause enums are generated with tablegen
instead of the macros on OMPConstants.h. The next pacth will extend this
to other enum and move the Flang frontend to use it.

Reviewers: jdoerfert, DavidTruby, fghanim, ABataev, jdenny, hfinkel, jhuber6, kiranchandramohan, kiranktp

Reviewed By: jdoerfert, jdenny

Subscribers: cfe-commits, mgorny, yaxunl, hiraditya, guansong, jfb, sstefan1, aaron.ballman, llvm-commits

Tags: #llvm, #openmp, #clang

Differential Revision: https://reviews.llvm.org/D81736
2020-06-22 09:34:53 -04:00
Xing GUO 03480c80d3 [DWARFYAML][debug_info] Add support for error handling.
This patch helps add support for error handling in `DWARFYAML::emitDebugInfo()`.

Reviewed By: jhenderson

Differential Revision: https://reviews.llvm.org/D82275
2020-06-22 21:36:13 +08:00
Xing GUO 3a48a632d0 [DWARFYAML][debug_info] Use 'AbbrCode' to index the abbreviation.
Before this patch, we use `(uint32_t)AbbrCode - (uint32_t)FirstAbbrCode` to index the abbreviation. It's impossible for we to use the preceeding abbreviation of the previous one (e.g., if the previous DIE's `AbbrCode` is 2, we are unable to use the abbreviation with index 1). In this patch, we use `AbbrCode` to index the abbreviation directly.

Reviewed By: jhenderson

Differential Revision: https://reviews.llvm.org/D82173
2020-06-22 21:34:02 +08:00
Simon Pilgrim 48d1a2d6d0 [DAG] Add SimplifyMultipleUseDemandedVectorElts helper for SimplifyMultipleUseDemandedBits. NFCI.
We have many cases where we call SimplifyMultipleUseDemandedBits and demand specific vector elements, but all the bits from them - this adds a helper wrapper to handle this.
2020-06-22 14:24:39 +01:00
Sanjay Patel de65b356dc [VectorCombine] add/use pass-level IRBuilder
This saves creating/destroying a builder every time we
perform some transform.

The tests show instruction ordering diffs resulting from
always inserting at the root instruction now, but those
should be benign.
2020-06-22 09:01:29 -04:00
Jay Foad 9761d3cf9c [AMDGPU] Update more live intervals in SIWholeQuadMode
This fixes various assertion failures that would otherwise be triggered
by a later patch to move SIWholeQuadMode later in the pass pipeline.

Differential Revision: https://reviews.llvm.org/D82190
2020-06-22 13:50:15 +01:00
Sanjay Patel cce625f73d [VectorCombine] improve IR debugging by providing/salvaging value names
The tests are regenerated to show the diffs, but there should be no
functional change from this patch.
2020-06-22 08:35:47 -04:00
Tim Corringham 96ecead5a2 [AMDGPU] clang-format of SIModeRegister.cpp
Ran clang-format just to ease future reviews. No functional changes.
2020-06-22 13:31:52 +01:00
Simon Pilgrim ecc5d7ee0d [DAG] SimplifyMultipleUseDemandedBits - drop unnecessary *_EXTEND_VECTOR_INREG cases
For little endian targets, if we only need the lowest element and none of the extended bits then we can just use the (bitcasted) source vector directly.

We already do this in SimplifyDemandedBits, this adds the SimplifyMultipleUseDemandedBits equivalent.
2020-06-22 12:35:32 +01:00
Tres Popp 09d72ad399 Revert "[CGP] Enable CodeGenPrepares phi type convertion."
This reverts commit 67121d7b82.

This is causing compile times to be 2x slower on some large binaries.
2020-06-22 13:06:18 +02:00
Serguei Katkov eae0d2e9b2 Revert "[Peeling] Extend the scope of peeling a bit"
This reverts commit 29b2c1ca72.

The patch causes the DT verifier failure like:
DominatorTree is different than a freshly computed one!

Not sure the patch itself it wrong but revert to investigate the failure.
2020-06-22 17:48:29 +07:00
Vitaly Buka 5d964e262f [StackSafety] Check variable lifetime
We can't consider variable safe if out-of-lifetime access is possible.
So if StackLifetime can't prove that the instruction always uses
the variable when it's still alive, we consider it unsafe.
2020-06-22 03:45:29 -07:00
Vitaly Buka 8f592ed333 [StackSafety] Ignore unreachable instructions
Usually DominatorTree provides this info, but here we use
StackLifetime. The reason is that in the next patch StackLifetime
will be used for actual lifetime checks and we can avoid
forwarding the DominatorTree into this code.
2020-06-22 03:45:29 -07:00
Anton Korobeynikov 6cb80fbe40 Revert "[MSP430] Update register names"
This reverts commit 8f6620f663.
2020-06-22 13:37:22 +03:00
Anatoly Trosinenko 8f6620f663 [MSP430] Update register names
When writing a unit test on replacing standard epilogue sequences with `BR __mspabi_func_epilog_<N>`, by manually asm-clobbering `rN` - `r10` for N = 4..10, everything worked well except for seeming inability to clobber r4.

The problem was that MSP430 code generator of LLVM used an obsolete name FP for that register. Things were worse because when `llc` read an unknown register name, it silently ignored it.

Differential Revision: https://reviews.llvm.org/D82184
2020-06-22 13:24:03 +03:00
Momchil Velikov 75b0bbca1d [LTO] Use StringRef instead of C-style strings in setCodeGenDebugOptions
Fixes an issue with missing nul-terminators and saves us some string
copying, compared to a version which would insert nul-terminators.

Differential Revision: https://reviews.llvm.org/D82033
2020-06-22 11:22:18 +01:00
Anatoly Trosinenko a5bd75aab8 [MSP430] Enable some basic support for debug information
This commit technically permits LLVM to emit the debug information for ELF files for MSP430 architecture. Aside from this, it only defines the register numbers as defined by part 10.1 of MSP430 EABI specification (assuming the 1-byte subregisters share the register numbers with corresponding full-size registers).

This commit was basically tested by me with TI-provided GCC 8.3.1 toolchain by compiling an example program with `clang` (please note manual linking may be required due to upstream `clang` not yet handling the `-msim` option necessary to run binaries on the GDB-provided simulator) and then running it and single-stepping with `msp430-elf-gdb` like this:

```
$sysroot/bin/msp430-elf-gdb ./test -ex "target sim" -ex "load ./test"
(gdb) ... traditional GDB commands follow ...
```

While this implementation is most probably far from completeness and is considered experimental, it can already help with debugging MSP430 programs as well as finding issues in LLVM debug info support for MSP430 itself.

One of the use cases includes trying to find a point where UBSan check in a trap-on-error mode was triggered.

The expected debug information format is described in the [MSP430 Embedded Application Binary Interface](http://www.ti.com/lit/an/slaa534/slaa534.pdf) specification, part 10.

Differential Revision: https://reviews.llvm.org/D81488
2020-06-22 13:14:07 +03:00
Anatoly Trosinenko 359fae6eb0 [DebugInfo] Explicitly permit addr_size = 0x02 when parsing DWARF data
Current LLVM implementation uses `MCAsmInfo::CodePointerSize` as addr_size when emitting the DWARF data. llvm-dwarfdump, on the other hand, handles `addr_size`s of 4 and 8 properly and considers all other sizes as an error. This works for most of mainline targets except for MSP430 and AVR.

msp430-gcc v8.3.1 emits DWARF32 with addr_size = 4 (DWARF32 does not imply addr_size = 4, 32 refers to internal offset width of 4 bytes) that is handled by llvm-dwarfdump already. Still, emitting 2-byte target pointers on MSP430 seems correct as well (but not for MSP430X that is supported by msp430-gcc but not by LLVM and has 20-bit address space).

This patch make it possible for MSP430 debug info support to be tested with llvm-dwarfdump.

Differential Revision: https://reviews.llvm.org/D82055
2020-06-22 13:11:55 +03:00
Florian Hahn 0e19ff02d8 [DSE,MSSA] Remove unused arguments for isDSEBarrier (NFC). 2020-06-22 10:58:53 +01:00
Djordje Todorovic 792786e34d [CSInfo][MIPS] Don't describe parameters loaded by sub/super reg copy
When describing parameter value loaded by a COPY instruction, consider
case where needed Reg value is a sub- or super- register of the COPY
instruction's destination register. Without this patch, compile process
will crash with the assertion "TargetInstrInfo::describeLoadedValue
can't describe super- or sub-regs for copy instructions".

Patch by Nikola Tesic

Differential revision: https://reviews.llvm.org/D82000
2020-06-22 10:49:02 +02:00
Serguei Katkov 29b2c1ca72 [Peeling] Extend the scope of peeling a bit
Currently we allow peeling of the loops if there is a exiting latch block
and all other exits are blocks ending with deopt.

Actually we want that exit would end up with deopt unconditionally but
it is not required that exit itself ends with deopt.

Reviewers: reames, ashlykov, fhahn, apilipenko, fedor.sergeev
Reviewed By: apilipenko
Subscribers: hiraditya, zzheng, dantrushin, llvm-commits
Differential Revision: https://reviews.llvm.org/D81140
2020-06-22 12:17:44 +07:00
Michael Liao 20a1700293 [amdgpu] Fix REL32 relocations with negative offsets.
Summary: - The offset should be treated as a signed one.

Reviewers: rampitec, arsenm

Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D82234
2020-06-21 23:09:03 -04:00
Sanjay Patel 6bdd531af5 [VectorCombine] create class for pass to hold analyses, etc; NFC
This doesn't change anything currently, but it would make sense
to create a class-level IRBuilder instead of recreating that
everywhere. As we expand to more optimizations, we will probably
also want to hold things like the DataLayout or other constant
refs in here too.
2020-06-21 16:07:33 -04:00
David Green 67121d7b82 [CGP] Enable CodeGenPrepares phi type convertion. 2020-06-21 16:46:16 +01:00
Florian Hahn 40569db7b3 [DSE,MSSA] Move reachability check to main loop.
As we traverse the CFG backwards, we could end up reaching unreachable
blocks. For unreachable blocks, we won't have computed post order
numbers and because DomAccess is reachable, unreachable blocks cannot be
on any path from it.

This fixes a crash with unreachable blocks.
2020-06-21 16:38:10 +01:00
David Green 730ecb63ec [CGP] Convert phi types
If a collection of interconnected phi nodes is only ever loaded, stored
or bitcast then we can convert the whole set to the bitcast type,
potentially helping to reduce the number of register moves needed as the
phi's are passed across basic block boundaries. This has to be done in
CodegenPrepare as it naturally straddles basic blocks.

The alorithm just looks from phi nodes, looking at uses and operands for
a collection of nodes that all together are bitcast between float and
integer types. We record visited phi nodes to not have to process them
more than once. The whole subgraph is then replaced with a new type.
Loads and Stores are bitcast to the correct type, which should then be
folded into the load/store, changing it's type.

This comes up in the biquad testcase due to the way MVE needs to keep
values in integer registers. I have also seen it come up from aarch64
partner example code, where a complicated set of sroa/inlining produced
integer phis, where float would have been a better choice.

I also added undef and extract element handling which increased the
potency in some cases.

This adds it with an option that defaults to off, and disabled for 32bit
X86 due to potential issues around canonicalizing NaNs.

Differential Revision: https://reviews.llvm.org/D81827
2020-06-21 15:54:17 +01:00
Nikita Popov 37d3030711 [ValueTracking, BasicAA] Don't simplify instructions
GetUnderlyingObject() (and by required symmetry
DecomposeGEPExpression()) will call SimplifyInstruction() on the
passed value if other checks fail. This simplification is very
expensive, but has little effect in practice. This patch removes
the SimplifyInstruction call(), and replaces it with a check for
single-argument phis (which can occur in canonical IR in LCSSA
form), which is the only useful simplification case I was able to
identify.

At O3 the geomean CTMark improvement is -1.7%. The largest
improvement is SPASS with ThinLTO at -6%.

In test-suite, I see only two tests with a hash difference and
no code size difference (PAQ8p, Ptrdist), which indicates that
the simplification only ends up being useful very rarely. (I would
have liked to figure out which simplification is responsible here,
but wasn't able to spot it looking at transformation logs.)

The AMDGPU test case that is update was using two selects with
undef condition, in which case GetUnderlyingObject will return
the first select operand as the underlying object. This will of
course not happen with non-undef conditions, so this was not
testing anything realistic. Additionally this illustrates potential
unsoundness: While GetUnderlyingObject will pick the first operand,
the select might be later replaced by the second operand, resulting
in inconsistent assumptions about the undef value.

Differential Revision: https://reviews.llvm.org/D82261
2020-06-21 16:31:07 +02:00
Sanjay Patel 2ad42c2653 [ValueTracking] improve analysis for fdiv with same operands
(The 'nnan' variant of this pattern is already tested to produce '1.0'.)

https://alive2.llvm.org/ce/z/D4hPBy

define i1 @src(float %x, i32 %y) {
%0:
  %d = fdiv float %x, %x
  %uge = fcmp uge float %d, 0.000000
  ret i1 %uge
}
=>
define i1 @tgt(float %x, i32 %y) {
%0:
  ret i1 1
}
Transformation seems to be correct!
2020-06-21 09:07:59 -04:00
Simon Pilgrim fb9f9dc318 [X86][SSE] Add SimplifyDemandedVectorEltsForTargetShuffle to handle target shuffle variable masks
Pulled out from the ongoing work on D66004, currently we don't do a good job of simplifying variable shuffle masks that have already lowered to constant pool entries.

This patch adds SimplifyDemandedVectorEltsForTargetShuffle (a custom x86 helper) to first try SimplifyDemandedVectorElts (which we already do) and then constant pool simplification to help mark undefined elements.

To prevent lowering/combines infinite loops, we only handle basic constant pool loads instead of creating new BUILD_VECTOR nodes for lowering - e.g. we don't try to convert them to broadcast/vzext_load - there might be some benefit to this but if so I'd rather we come up with some way to reuse existing code than reimplement a lot of BUILD_VECTOR code.

Differential Revision: https://reviews.llvm.org/D81791
2020-06-21 11:16:07 +01:00
clfbbn 10b0539772 [Attributor][NFC] Fix indentation
Summary: The patch D81022 seems to break the indentation of the `cleanupIR()` function. This patch fixes this problem

Reviewers: jdoerfert, sstefan1, uenoku

Reviewed By: jdoerfert

Subscribers: hiraditya, uenoku, kuter, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D82260
2020-06-21 15:43:32 +08:00
Wenlei He 7c8a6936bf [Remarks] Add callsite locations to inline remarks
Summary:
Add call site location info into inline remarks so we can differentiate inline sites.
This can be useful for inliner tuning. We can also reconstruct full hierarchical inline
tree from parsing such remarks. The messege of inline remark is also tweaked so we can
differentiate SampleProfileLoader inline from CGSCC inline.

Reviewers: wmi, davidxl, hoy

Subscribers: hiraditya, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D82213
2020-06-20 23:32:10 -07:00
Amy Kwan cc95635b1b [PowerPC][Power10] Implement Vector Clear Left/Rightmost Bytes Builtins in LLVM/Clang
This patch implements builtins for the following prototypes:
```
vector signed char vec_clrl (vector signed char a, unsigned int n);
vector unsigned char vec_clrl (vector unsigned char a, unsigned int n);
vector signed char vec_clrr (vector signed char a, unsigned int n);
vector signed char vec_clrr (vector unsigned char a, unsigned int n);
```

Differential Revision: https://reviews.llvm.org/D81707
2020-06-20 18:29:16 -05:00
Eric Christopher dc20419351 Rename function to more accurately reflect what it does. 2020-06-20 14:37:29 -07:00
Eric Christopher 8116d01905 Typos around a -> an. 2020-06-20 14:04:48 -07:00
Sanjay Patel 741e20f3d6 [VectorCombine] fix assert for type of compare operand
As shown in the post-commit comment for D81661 - we need to
loosen the type assertion to allow scalarization of a compare
for vectors of pointers.
2020-06-20 15:20:17 -04:00
Sanjay Patel 7b201bfcac [InstCombine] remove unused parameter and add assert; NFC 2020-06-20 11:47:00 -04:00
Sanjay Patel d84cdb81ed [InstCombine] fabs(X) / fabs(X) -> X / X
Also, consolidate related folds so we don't miss/repeat these.
2020-06-20 10:20:21 -04:00
Simon Pilgrim 89dcbdfcfd [X86] combineSetCCMOVMSK - consistently use CmpBits variable. NFCI.
The comparison value should be the same size - I've added an assert to be absolutely certain.
2020-06-20 12:35:24 +01:00
Simon Pilgrim 56a9332328 [X86][SSE] Fold MOVMSK(PCMPEQ(X,0)) != -1 -> !PTESTZ(X,X) allof patterns 2020-06-20 12:17:32 +01:00
Nikita Popov d3d4e4bcb7 [LVI] Extract addValueHandle() method (NFC)
There will be more places registering value handles.
2020-06-20 13:05:42 +02:00
Nikita Popov 64ecf85f63 [LVI] Use find_as() where possible (NFC)
This prevents us from creating temporary PoisoningVHs and
AssertingVHs while performing hashmap lookups. As such, it only
matters in assertion-enabled builds.
2020-06-20 13:05:42 +02:00
Florian Hahn 9a7d80a32c Revert "[BasicAA] Use known lower bounds for index values for size based check."
This potentially related to https://bugs.llvm.org/show_bug.cgi?id=46335
and causes a slight compile-time regression. Revert while investigating.

This reverts commit d99a1848c4.
2020-06-20 10:06:05 +01:00
Eric Christopher 10563e16aa [Analysis/Transforms/Sanitizers] As part of using inclusive language
within the llvm project, migrate away from the use of blacklist and
whitelist.
2020-06-20 00:42:26 -07:00
Eric Christopher 858d385578 As part of using inclusive language within the llvm project,
migrate away from the use of blacklist and whitelist.
2020-06-20 00:24:57 -07:00
Eric Christopher cf23852587 [Target] As part of using inclusive language within the llvm project,
migrate away from the use of blacklist and whitelist.

This change affects an internal llvm command line option.
2020-06-20 00:06:39 -07:00
Xing GUO 6770349592 [DWARFYAML][debug_info] Fix array index out of bounds error
This patch is trying to fix the array index out of bounds error. I observed it in (https://reviews.llvm.org/harbormaster/unit/view/99638/).

Reviewed By: jhenderson, MaskRay

Differential Revision: https://reviews.llvm.org/D82139
2020-06-20 15:08:24 +08:00
Craig Topper c721bc081e [X86] Correct the implementation of ud1(a.k.a. ud2b) instruction.
We were missing the modrm byte this instruction has according
to current Intel SDM. Experiments with gcc indicate that different
modrm values are chosen based on 2 operands so I've added those
as well.

I think our previous implementation was based on an older behavior of
binutils that has since been changed.
2020-06-19 23:57:48 -07:00
Craig Topper 0dda5e4ce2 [X86] Ignore bits 2:0 of the modrm byte when disassembling lfence, mfence, and sfence.
These are documented as using modrm byte of 0xe8, 0xf0, and 0xf8
respectively. But hardware ignore bits 2:0. So 0xe9-0xef is treated
the same as 0xe8. Similar for the other two.

Fixing this required adding 8 new formats to the X86 instructions
to convey this information. Could have gotten away with 3, but
adding all 8 made for a more logical conversion from format to
modrm encoding.

I renumbered the format encodings to keep the register modrm
formats grouped together.
2020-06-19 22:24:24 -07:00
Fangrui Song 2a4317bfb3 [SanitizeCoverage] Rename -fsanitize-coverage-{white,black}list to -fsanitize-coverage-{allow,block}list
Keep deprecated -fsanitize-coverage-{white,black}list as aliases for compatibility for now.

Reviewed By: echristo

Differential Revision: https://reviews.llvm.org/D82244
2020-06-19 22:22:47 -07:00
Yevgeny Rouban 6429471e8b [IR] Convert profile metadata in createCallMatchingInvoke()
When an invoke instruction is converted to a call its
profile metadata is dropped because it has incompatible
format (see commit 16ad6eeb94).
This patch adds an attempt to convert profile data to
format of the call instruction. This used to work well
before the commit dcfa78a4cc.

Reviewers: reames
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D82071
2020-06-20 12:10:31 +07:00
Wang Rui dd48c57da3 [Mips] Error if a non-immediate operand is used while an immediate is expected
The 32-bit type relocation (R_MIPS_32) cannot be used for instructions below:

ori $4, $4, start
ori $4, $4, (start - .)

We should print an error instead.

Reviewed By: atanasyan, MaskRay

Differential Revision: https://reviews.llvm.org/D81908
2020-06-19 22:08:59 -07:00
Vitaly Buka 3d8149db3c [StackSafety,NFC] Don't rerun on LiveIn change 2020-06-19 21:29:31 -07:00
Xing GUO 1cfdda57fa [ObjectYAML][ELF] Add support for emitting the .debug_info section.
This patch helps add support for emitting the .debug_info section to yaml2elf.

Reviewed By: jhenderson, grimar, MaskRay

Differential Revision: https://reviews.llvm.org/D82073
2020-06-20 12:13:01 +08:00
Carl Ritson 4a7de36afc [AMDGPU] Avoid use of V_READLANE into EXEC in SGPR spills
Always prefer to clobber input SGPRs and restore them after the
spill.  This applies to both spills to VGPRs and scratch.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D81914
2020-06-20 12:10:47 +09:00
romanova-ekaterina d7fad626e9 Error related to ThinLTO caching needs to be downgraded to a remark
This is a fix for PR #46392 (Diagnostic message (error) related to
ThinLTO caching needs to be downgraded to a remark).

There are diagnostic messages related to ThinLTO caching that contain
the word "error", but they are really just notices/remarks for users,
and they don't cause a build failure. The word "error" appearing can be
confusing to users, and may even cause deeper problems.

User's build system might be designed to interpret any error messages
(even a benign error message as the one above) reported by the compiler
as a build failure, thus causing the build to fail "needlessly". In
short, the term "error" in this diagnostic is misleading at best, and
may be causing build systems to fail at worst.

Differential Revision: https://reviews.llvm.org/D82138
2020-06-19 16:03:29 -07:00
Eric Christopher b6536e549d As part of using inclusive language within the llvm project,
migrate away from the use of blacklist and whitelist.
2020-06-19 15:12:18 -07:00
Heejin Ahn 83c26eae23 [WebAssembly] Remove TEEs when dests are unstackified
When created in RegStackify pass, `TEE` has two destinations, where
op0 is stackified and op1 is not. But it is possible that
op0 becomes unstackified in `fixUnwindMismatches` function in
CFGStackify pass when a nested try-catch-end is introduced, violating
the invariant of `TEE`s destinations.

In this case we convert the `TEE` into two `COPY`s, which will
eventually be resolved in ExplicitLocals.

Reviewed By: dschuff

Differential Revision: https://reviews.llvm.org/D81851
2020-06-19 14:55:21 -07:00
Martin Storsjö cdbd299800 [Support] Fix building for mingw on a case sensitive file system
This fixes cross building on a case sensitive file system after
2e613d2ded. (The official Windows
SDKs don't have self-consistent casing and can't be used as such on
case sentisive file systems without case fixups, while mingw headers
consistently use lower case.)
2020-06-20 00:39:22 +03:00
Amara Emerson 1feeecf224 [AArch64][GlobalISel] Make G_SEXT_INREG legal and add selection support.
We were defaulting to the lower action for this, resulting in SHL+ASHR
sequences. On AArch64 we can do this in one instruction for an arbitrary
extension using SBFM as we do for G_SEXT.

Differential Revision: https://reviews.llvm.org/D81992
2020-06-19 13:20:41 -07:00
Sanjay Patel 216a37bb46 [VectorCombine] refactor extract-extract logic; NFCI 2020-06-19 14:52:27 -04:00
Lang Hames bf783a6aa8 [JITLink] Display host -> target address mapping in debugging output.
This can be helpful for sanity checking JITLink memory manager behavior.
2020-06-19 10:05:02 -07:00
Sanjay Patel 6d864097a2 [VectorCombine] fix crash while transforming constants
This is a variation of the proposal in D82049 with an extra test.
2020-06-19 12:30:32 -04:00
Stanislav Mekhanoshin 2b87a44c49 [AMDGPU] Some formatting fixes. NFC. 2020-06-19 09:02:59 -07:00
Piotr Sobczak 6d9565d6d5 Revert "[AMDGPU] Select s_cselect"
This caused some failures detected by the buildbot with
expensive checks enabled.

This reverts commit 4067de569f.
2020-06-19 16:41:04 +02:00
dfukalov 129388ddc4 [AMDGPU][CostModel] Add fneg cost estimation
Summary: The estimation uses AMDGPUTargetLowering::isFNegFree()

Reviewers: rampitec

Reviewed By: rampitec

Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D82065
2020-06-19 17:31:35 +03:00
Piotr Sobczak 4067de569f [AMDGPU] Select s_cselect
Summary:
Add patterns to select s_cselect in the isel.

Handle more cases of implicit SCC accesses in si-fix-sgpr-copies
to allow new patterns to work.

Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, asbirlea, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D81925
2020-06-19 16:17:46 +02:00
Sjoerd Meijer 4aa893b8f2 [ARM][MVE] tail-predication: renamed internal option.
Renamed -force-tail-predication to -force-mve-tail-predication because
that's more descriptive and consistent.
2020-06-19 15:07:06 +01:00
Mikhail Maltsev 490f78c038 [ARM][BFloat] Implement lowering of bf16 load/store intrinsics
Reviewers: labrinea, dmgreen, pratlucas, LukeGeeson

Reviewed By: dmgreen

Subscribers: kristof.beyls, hiraditya, danielkiss, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D81486
2020-06-19 14:02:35 +00:00
Mikhail Maltsev 7526881246 [ARM][BFloat] Lowering of create/get/set/dup intrinsics
This patch adds codegen for the following BFloat
operations to the ARM backend:
* concatenation of bf16 vectors
* bf16 vector element extraction
* bf16 vector element insertion
* duplication of a bf16 value into each lane of a vector
* duplication of a bf16 vector lane into each lane

Differential Revision: https://reviews.llvm.org/D81411
2020-06-19 12:52:40 +00:00
Simon Pilgrim c143db3b10 [X86][SSE] combineHorizontalPredicateResult - improve all_of(X == 0) for vXi64 on pre-SSE41 targets
Without SSE41 we don't have the PCMPEQQ instruction, making cmp-with-zero reductions more complicated than necessary. We can compare as vXi32 (PCMPEQD) and tweak the MOVMSK comparison to test upper/lower DWORD comparisons.

This pre-fixes something that occurs with null tests for vectors of (64-bit) pointers such as in PR35129.
2020-06-19 11:43:25 +01:00
Vitaly Buka 0e1bdeafc9 [StackSafety,NFC] Fix comment 2020-06-19 03:11:13 -07:00
Tyker 67448a8ccc try to fix build bot after b7338fb1a6 2020-06-19 12:02:09 +02:00
Simon Pilgrim cad2038700 [X86][SSE] combineSetCCMOVMSK - fold MOVMSK(SHUFFLE(X,u)) -> MOVMSK(X)
If we're permuting ALL the elements of a single vector, then for allof/anyof MOVMSK tests we can avoid the shuffle entirely.
2020-06-19 10:57:52 +01:00
David Sherwood 584d0d5c17 [SVE] Fall back on DAG ISel at -O0 when encountering scalable types
At the moment we use Global ISel by default at -O0, however it is
currently not capable of dealing with scalable vectors for two
reasons:

1. The register banks know nothing about SVE registers.
2. The LLT (Low Level Type) class knows nothing about scalable
   vectors.

For now, the easiest way to avoid users hitting issues when using
the SVE ACLE is to fall back on normal DAG ISel when encountering
instructions that operate on scalable vector types.

I've added a couple of RUN lines to existing SVE tests to ensure
we can compile at -O0. I've also added some new tests to

  CodeGen/AArch64/GlobalISel/arm64-fallback.ll

that demonstrate we correctly fallback to DAG ISel at -O0 when
lowering formal arguments or translating instructions that involve
scalable vector types.

Differential Revision: https://reviews.llvm.org/D81557
2020-06-19 10:57:00 +01:00