Commit Graph

136235 Commits

Author SHA1 Message Date
Matt Arsenault d9f0c3663f PPC: Don't store function in PPCFunctionInfo
Continue migrating targets from depending on the MachineFunction
during the initial construction.
2020-06-30 16:08:51 -04:00
Matt Arsenault 669bb3111f Mips: Don't store MachineFunction in MipsFunctionInfo
It will soon be disallowed to depend on MachineFunction state on
construction.
2020-06-30 16:08:51 -04:00
Eli Friedman 15440191b5 [IR] Delete llvm::Constants using the correct type.
In most cases, this doesn't have much impact: the destructors just call
the base class destructor anyway.  A few subclasses of ConstantExpr
actually store non-trivial data, though. Make sure we clean up
appropriately.

This is sort of ugly, but I don't see a good alternative given the
constraints.

Issue found by asan buildbots running the testcase for D80330.

Differential Revision: https://reviews.llvm.org/D82509
2020-06-30 12:37:53 -07:00
Florian Hahn 1ccc49924a [AArch64] Add getCFInstrCost, treat branches as free for throughput.
D79164/2596da31740f changed getCFInstrCost to return 1 per default.
AArch64 did not have its own implementation, hence the throughput cost
of CFI instructions is overestimated. On most cores, most branches should
be predicated and essentially free throughput wise.

This restores a 9% performance regression on a SPEC2006 benchmark on
AArch64 with -O3 LTO & PGO.

This patch effectively restores pre 2596da3174 behavior for AArch64
and undoes the AArch64 test changes of the patch.

Reviewers: samparker, dmgreen, anemet

Reviewed By: samparker

Differential Revision: https://reviews.llvm.org/D82755
2020-06-30 20:34:04 +01:00
Craig Topper 3537939cda [X86] Move frontend CPU feature initialization to a look up table based implementation. NFCI
This replaces the switch statement implementation in the clang's
X86.cpp with a lookup table in X86TargetParser.cpp.

I've used constexpr and copy of the FeatureBitset from
SubtargetFeature.h to store the features in a lookup table.
After the lookup the bitset is translated into strings for use
by the rest of the frontend code.

I had to modify the implementation of the FeatureBitset to avoid
bugs in gcc 5.5 constexpr handling. It seems to not like the
same array entry to be used on the left side and right hand side
of an assignment or &= or |=. I've also used uint32_t instead of
uint64_t and sized based on the X86::CPU_FEATURE_MAX.

I've initialized the features for different CPUs outside of the
table so that we can express inheritance in an adhoc way. This
was one of the big limitations of the switch and we had resorted
to labels and gotos.

Differential Revision: https://reviews.llvm.org/D82731
2020-06-30 12:04:58 -07:00
David Green 9e49d1d9b8 [InstCombine] fma x, y, 0 -> fmul x, y
If the addend of the fma is zero, common sense would suggest that we can
convert fma x, y, 0.0 to fmul x, y. This comes up with some user code
that was expecting the first fma in an unrolled loop to simplify to a
fmul.

Floating point often does not follow naive common sense though. Alive
suggests that this should be guarded by nsz (as fadd -0.0, 0.0 = 0.0).
fma x, y, -0.0 is always valid.

Differential Revision: https://reviews.llvm.org/D82778
2020-06-30 19:56:37 +01:00
Valentin Clement 1a70077b5a [openmp] Move Directive and Clause helper function to tablegen
Summary:
Follow up to D81736. Move getOpenMPDirectiveKind, getOpenMPClauseKind, getOpenMPDirectiveName and
getOpenMPClauseName to the new tablegen code generation. The code is generated in a new file named OMP.cpp.inc

Reviewers: jdoerfert, jdenny, thakis

Reviewed By: jdoerfert, jdenny

Subscribers: mgorny, yaxunl, hiraditya, guansong, sstefan1, llvm-commits, thakis

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D82405
2020-06-30 14:51:59 -04:00
Alex Lorenz 24a1447b02 [macho] emit LC_BUILD_VERSION load command for supported OSes and platforms
This change lets LLVM use the LC_BUILD_VERSION command when building for macOS 10.14, iOS 12, tvOS 12, and watchOS 5.
Additionally, this change ensures that new platforms like Apple Silicon macOS / Mac Catalyst,
and simulators running on Apple Silicon alway use LC_BUILD_VERSION with the OS version set to the
minimum supported OS version if the deployment target version is older.

Differential Revision: https://reviews.llvm.org/D82836
2020-06-30 11:48:17 -07:00
Reid Kleckner b7402edce3 [PDB] Defer public serialization until PDB writing
This reduces peak memory on my test case from 1960.14MB to 1700.63MB
(-260MB, -13.2%) with no measurable impact on CPU time. I'm currently
working with a publics stream that is about 277MB. Before this change,
we would allocate 277MB of heap memory, serialize publics into them,
hold onto that heap memory, open the PDB, and commit into it.  After
this change, we defer the serialization until commit time.

In the last change I made to public writing, I re-sorted the list of
publics multiple times in place to avoid allocating new temporary data
structures. Deferring serialization until later requires that we don't
reorder the publics. Instead of sorting the publics, I partially
construct the hash table data structures, store a publics index in them,
and then sort the hash table data structures. Later, I replace the index
with the symbol record offset.

This change also addresses a FIXME and moves the list of global and
public records from GSIHashStreamBuilder to GSIStreamBuilder. Now that
publics aren't being serialized, it makes even less sense to store them
as a list of CVSymbol records. The hash table used to deduplicate
globals is moved as well, since that is specific to globals, and not
publics.

Reviewed By: aganea, hans

Differential Revision: https://reviews.llvm.org/D81296
2020-06-30 11:28:04 -07:00
Christopher Tetreault ab35ba5742 [SVE] Remove calls to VectorType::getNumElements from AArch64
Reviewers: efriedma, paquette, david-arm, kmclaughlin

Reviewed By: david-arm

Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, danielkiss, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D82214
2020-06-30 11:17:50 -07:00
Christopher Tetreault 9b500e564a [SVE] Remove calls to VectorType::getNumElements from ExecutionEngine
Reviewers: efriedma, lhames, sdesmalen, fpetrogalli

Reviewed By: lhames, sdesmalen

Subscribers: tschuett, hiraditya, rkruppe, psnobl, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D82211
2020-06-30 11:05:38 -07:00
Hsiangkai Wang a7b0f39185 [MVT] Add new MVT types for RISC-V vector.
In RISC-V vector extension, users could group multiple vector registers
as one pseudo register. In mixed width operations, users could use
partial vector registers to reduce the register pressure. The parameter
to control register grouping and partial use is called LMUL. LMUL is a
part of the type. So, we have a bunch of vector types. In order to
support all these types, we need new MVT types in LLVM. In this patch, I
added several MVT types that are used in RISC-V vector implementation.
This is a standalone patch for MVT types without RISC-V related implementation.

Differential revision: https://reviews.llvm.org/D81724
2020-07-01 01:07:50 +08:00
Samuel Tebbs 3324e3a6ee [ARM] Allow the fabs intrinsic to be tail predicated
This patch stops the fabs intrinsic from blocking tail predication.

Differential Revision: https://reviews.llvm.org/D82570
2020-06-30 17:27:28 +01:00
Simon Pilgrim 32f8cd9a6a Pass MDFieldPrinter::printAPInt APInt arg by reference not value.
Noticed by clang-tidy performance-unnecessary-value-param warning.
2020-06-30 17:18:20 +01:00
Samuel Tebbs 66fa313999 [ARM] Allow the usub_sat and ssub_sat intrinsics to be tail predicated
This patch stops the usub_sat and ssub_sat intrinsics from blocking tail predication.

Differential Revision: https://reviews.llvm.org/D82571
2020-06-30 17:16:58 +01:00
Matt Arsenault b7f6ecf0c7 RegAlloc: Start using Register 2020-06-30 12:13:08 -04:00
Matt Arsenault af1eeaf380 BranchFolding: Use Register 2020-06-30 12:13:08 -04:00
Matt Arsenault edb4a5cb36 TailDuplicator: Use Register 2020-06-30 12:13:08 -04:00
Matt Arsenault cac655f233 AMDGPU: Use Register 2020-06-30 12:13:08 -04:00
Matt Arsenault 249933f254 X86: Use Register 2020-06-30 12:13:08 -04:00
Sjoerd Meijer af45907653 [ARM][MVE] Tail-predication: clean-up of unused code
After the rewrite of this pass (D79175) I missed one thing: the inserted VCTP
intrinsic can be cloned to exit blocks if there are instructions present in it
that perform the same operation, but this wasn't triggering anymore. However,
it turns out that for handling reductions, see D75533, it's actually easier not
not to have the VCTP in exit blocks, so this removes that code.

This was possible because it turned out that some other code that depended on
this, rematerialization of the trip count enabling more dead code removal
later, wasn't doing much anymore due to more aggressive dead code removal that
was added to the low-overhead loops pass.

Differential Revision: https://reviews.llvm.org/D82773
2020-06-30 17:09:36 +01:00
Samuel Tebbs d9cb811cbf [ARM] Allow rounding intrinsics to be tail predicated
This patch stops the trunc, rint, round, floor and ceil intrinsics from blocking tail predication.

Differential Revision: https://reviews.llvm.org/D82553
2020-06-30 16:52:25 +01:00
Guillaume Chatelet 423458ec09 [Alignment][NFC] TargetLowering::allowsMemoryAccessForAlignment
First patch of a series to adapt TargetLowering::allowsXXX functions

This patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Differential Revision: https://reviews.llvm.org/D81372
2020-06-30 15:31:24 +00:00
Simon Pilgrim 82de018954 [X86][SSE] LowerVectorAllZero - add support for masked OR-reductions
If we're masking the result of an OR-reduction before comparing against zero, we can fold this into the PTEST() / MOVMSK(CMPEQ()) codegen by pre-masking the source value.

This works particularly well on PTEST which performs the AND as part of its operation, but the MOVMSK variant also benefits for non-V2I64 cases.

Fixes PR44781
2020-06-30 14:38:52 +01:00
Guillaume Chatelet c1cd61e02a [Alignment][NFC] Migrate SelectionDAGTargetInfo::EmitTargetCodeForMemcpy to Align
This patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Differential Revision: https://reviews.llvm.org/D82849
2020-06-30 13:12:31 +00:00
Guillaume Chatelet 306d7c6929 [Alignment][NFC] Migrate SelectionDAGTargetInfo::EmitTargetCodeForMemmove to Align
This patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Differential Revision: https://reviews.llvm.org/D82850
2020-06-30 12:46:59 +00:00
Guillaume Chatelet 6a6af30d43 [Alignment][NFC] Migrate SelectionDAGTargetInfo::EmitTargetCodeForMemset to Align
This patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Differential Revision: https://reviews.llvm.org/D82851
2020-06-30 12:46:26 +00:00
dfukalov 1a6cebb4d1 [PM] Fix new PM to perform SpeculativeExecution as in old PM
Summary:
Old PM runs SpeculativeExecutionPass for targets that have divergent branches.
It uses `createSpeculativeExecutionIfHasBranchDivergencePass` that creates
the pass with `OnlyIfDivergentTarget=true`, whereas new PM just created the
pass with default `OnlyIfDivergentTarget=fase` so it unexpectedly runs and
causes buildbot test fails.

Reviewers: chandlerc, arsenm

Reviewed By: arsenm

Subscribers: wdng, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D82735
2020-06-30 15:21:04 +03:00
Ilya Leoshkevich 6764869548 [SystemZ] Add NoMerge MIFlag
Summary:
This fixes ASan and MSan tests on SystemZ after
commit 6a822e20ce ("[ASan][MSan] Remove EmptyAsm and set the CallInst
to nomerge to avoid from merging.").

Based on commit 80e107ccd0 ("Add NoMerge MIFlag to avoid MIR branch
folding").

Reviewers: uweigand, jonpa

Reviewed By: uweigand

Subscribers: hiraditya, llvm-commits, Andreas-Krebbel

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D82794
2020-06-30 12:44:45 +02:00
Balazs Benics 815a8100e0 [llvm][Z3][NFC] Improve mkBitvector performance
We convert `APSInt`s to Z3 Bitvectors in an inefficient way for most cases.
We should not serialize to std::string just to pass an int64 integer.

For the vast majority of cases, we use at most 64-bit width integers (at least
in the Clang Static Analyzer). We should simply call the `Z3_mk_unsigned_int64`
and `Z3_mk_int64` instead of the `Z3_mk_numeral` as stated in the Z3 docs.
Which says:
> It (`Z3_mk_unsigned_int64`, etc.) is slightly faster than `Z3_mk_numeral` since
> it is not necessary to parse a string.

If the `APSInt` is wider than 64 bits, we will use the `Z3_mk_numeral` with a
`SmallString` instead of a heap-allocated `std::string`.

Differential Revision: https://reviews.llvm.org/D78453
2020-06-30 12:26:50 +02:00
Guillaume Chatelet 2c5ff48e61 [Alignment][NFC] Migrate AtomicExpandPass to Align
This is a followup on D78403.
I'm unsure about `getAtomicOpAlign` overloads that take `AtomicRMWInst` and `AtomicCmpXchgInst`, shouldn't `getAlign` provide the correct answer already?

Differential Revision: https://reviews.llvm.org/D81369
2020-06-30 09:54:45 +00:00
Georgii Rymar 64bae035ef [yaml2obj] - Support reading a content as an array of bytes using the new 'ContentArray' key.
It implements the way to describe a section content using a multi line description. E.g:

```
- Name:         .foo
  Type:         SHT_PROGBITS
  ContentArray: [ 0x11, 0x22, 0x33, 0x44,                                ## .long 11223344
                  0x55, 0x66,                                            ## .short 5566.
                  0x77,                                                  ## .byte 0x77
                  0x88, 0x99, 0xAA, 0xBB, 0xCC, 0xDD, 0xEE, 0xFF, 0x00 ] ## .quad 0x8899aabbccddeeff
```

It was briefly discussed in D75123 thread previously.

Differential revision: https://reviews.llvm.org/D82366
2020-06-30 12:13:23 +03:00
Petar Avramovic d717382633 AMDGPU/GlobalISel: Select icmp intrinsic
Select into corresponding V_CMP instruction based on CmpInst predicate,
stored as immediate, in last operand.

Differential Revision: https://reviews.llvm.org/D82652
2020-06-30 10:57:41 +02:00
Petar Avramovic 4b980cc9ca [GlobalISel][InlineAsm] Add support for matching input constraints
Find def operand that corresponds to matching constraint and
tie input to that operand.

Differential Revision: https://reviews.llvm.org/D82651
2020-06-30 10:49:05 +02:00
Xing GUO fe08ab542b [DWARFYAML][debug_info] Replace 'InitialLength' with 'Format' and 'Length'.
'InitialLength' is replaced with 'Format' (DWARF32 by default) and 'Length' in this patch.
Besides, test cases for DWARFv4 and DWARFv5, DWARF32 and DWARF64 is
added.

Reviewed By: jhenderson

Differential Revision: https://reviews.llvm.org/D82622
2020-06-30 16:28:39 +08:00
Guillaume Chatelet 5f8bdb3e6a [Alignment][NFC] TargetLowering::allowsMemoryAccess
Second patch of a series to adapt TargetLowering::allowsXXX functions

This patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Differential Revision: https://reviews.llvm.org/D82785
2020-06-30 08:17:00 +00:00
Guillaume Chatelet a976ea3209 [Alignment][NFC] Migrate PPC, X86 and XCore backends to Align
This patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Differential Revision: https://reviews.llvm.org/D82779
2020-06-30 08:08:45 +00:00
Anatoly Trosinenko 19e75717ef [MSP430] Declare comparison LibCalls as returning i16 instead of i32
For TI's distribution of msp430-gcc
```
msp430-elf-gcc -S -o- -Os -x c - <<< "int f(float a, float b) { return a != b; }"
```
does not mention `R13` at all. `__libgcc_cmp_return__` machine mode is 2 byte on MSP430, as well.

Differential Revision: https://reviews.llvm.org/D82635
2020-06-30 11:04:22 +03:00
Guillaume Chatelet 4f5133a4dc [Alignment][NFC] Migrate AArch64, ARM, Hexagon, MSP and NVPTX backends to Align
This patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Differential Revision: https://reviews.llvm.org/D82749
2020-06-30 07:56:17 +00:00
Craig Topper 767c9c5bf5 [X86] Remove an isel pattern than can never match. Remove bitcasts of loads from a few others. 2020-06-30 00:17:56 -07:00
David Sherwood c02332a693 [CodeGen] Fix warning in getNode for EXTRACT_SUBVECTOR
Fix a warning in getNode() when extracting a subvector from a
concat vector. We can simply replace the call to getVectorNumElements
with getVectorMinNumElements as this follows the defined behaviour
for EXTRACT_SUBVECTOR.

Differential Revision: https://reviews.llvm.org/D82746
2020-06-30 08:11:41 +01:00
Jonas Paulsson ef7aad0db4 [SystemZ] Improve handling of ZERO_EXTEND_VECTOR_INREG.
Instead of doing multiple unpacks when zero extending vectors (e.g. v2i16 ->
v2i64), benchmarks have shown that it is better to do a VPERM (vector
permute) since that is only one sequential instruction on the critical path.

This patch achieves this by

1. Expand ZERO_EXTEND_VECTOR_INREG into a vector shuffle with a zero vector
   instead of (multiple) unpacks.

2. Improve SystemZ::GeneralShuffle to perform a single unpack as the last
   operation if Bytes matches it.

Review: Ulrich Weigand
Differential Revision: https://reviews.llvm.org/D78486
2020-06-30 09:08:10 +02:00
David Sherwood 46a7f4d6f4 [SVE][CodeGen] Fix bug in DAGCombiner::reduceBuildVecToShuffle
When trying to reduce a BUILD_VECTOR to a SHUFFLE_VECTOR it's
important that we carefully check the vector types that led to
that BUILD_VECTOR. In the test I have attached to this commit
there is a case where the results of two SVE faddv instructions
are being stored to consecutive memory locations. With my fix,
as part of merging those stores we discover that each BUILD_VECTOR
element came from an extract of a SVE vector element and
therefore bail out.

Differential Revision: https://reviews.llvm.org/D82564
2020-06-30 07:28:15 +01:00
Max Kazantsev f01d9e6fc3 [SimplifyCFG] Fix inconsistency in block size assessment for threading
Sometimes SimplifyCFG may decide to perform jump threading. In order
to do it, it follows the following algorithm:

1. Checks if the block is small enough for threading;
2. If yes, inserts a PR Phi relying that the next iteration will remove it
   by performing jump threading;
3. The next iteration checks the block again and performs the threading.

This logic has a corner case: inserting the PR Phi increases block's size
by 1. If the block size at first check was max possible, one more Phi will
exceed this size, and we will neither perform threading nor remove the
created Phi node. As result, we will end up with worse IR than before.

This patch fixes this situation by excluding Phis from block size computation.
Excluding Phis from size computation for threading also makes sense by
itself because in case of threadign all those Phis will be removed.

Differential Revision: https://reviews.llvm.org/D81835
Reviewed By: asbirlea, nikic
2020-06-30 12:40:07 +07:00
Craig Topper 9b04d69cce [X86] Prefer AND over PSHUFB for v64i8 when possible
If the shuffle is a blend and one input is a 0 vector, we should prefer AND over PSHUFB since its available on more execution ports.

Differential Revision: https://reviews.llvm.org/D82798
2020-06-29 16:26:53 -07:00
Christopher Tetreault bdcd200629 [SVE] Remove calls to VectorType::getNumElements from Instrumentation
Reviewers: efriedma, pcc, gchatelet, kmclaughlin, sdesmalen

Reviewed By: sdesmalen

Subscribers: tschuett, hiraditya, rkruppe, psnobl, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D82241
2020-06-29 15:20:24 -07:00
Lei Huang af9cc2d2af [PowerPC] Fix FeatureISA3_1 def in PPC.td to imply FeatureISA3_0. 2020-06-29 16:13:02 -05:00
Alex Lorenz f7a14514ee [darwin][driver] isMacosxVersionLT should check against the minimum supported OS version
This change ensures that the Darwin driver doesn't add unsupported libraries to the link
invocation when linking the Apple Silicon macOS slice.

rdar://61011136

Differential Revision: https://reviews.llvm.org/D82696
2020-06-29 12:21:54 -07:00
Matt Arsenault 2790516418 X86: Use MOV32r0 pseudo instead of directly emitting xor
This was producing reg = xor undef reg, undef reg. This looks similar
to a use of a value to define itself, and I want to disallow undef
uses for SSA virtual registers. If this were to use implicit_def,
there's no guarantee the two operands end up using the same register
(I think no guarantee exists even if the two operands start out as the
same register, but this was violated when I switched this to use an
explicit implicit_def). The MOV32r0 pseudo evidently exists to handle
this case, so use it instead. This was more work than I expected for
the 64-bit case, but I didn't see any helper for materializing a
64-bit 0.
2020-06-29 14:45:20 -04:00
Reid Kleckner 6d01a94193 Silence unused var warning in NDEBUG build 2020-06-29 11:40:49 -07:00
Christopher Tetreault 0da1e7ebf9 [SVE] Remove calls to VectorType::getNumElements from X86
Reviewers: efriedma, RKSimon, craig.topper, fpetrogalli, c-rhodes

Reviewed By: RKSimon

Subscribers: tschuett, hiraditya, rkruppe, psnobl, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D82508
2020-06-29 11:10:35 -07:00
Nemanja Ivanovic d2533d96e1 [PowerPC] Fix crash for shuffle canonicalization with elt 0 from RHS
Commit 1fed131660 assumed that shuffle vector canonicalization will
always ensure that the shuffle mask will be ordered so that element
zero comes from the LHS vector. However there is code out there for
which this is not the case. This patch simply removes that unsafe
assumption and makes the code work regardless of the source of the
first element.
2020-06-29 12:26:08 -05:00
serge-sans-paille b4130e6e99 Correctly report Changed status in FoldBranchToCommonDest
It's possible for the first loop trip(s) to set the `Changed` Status, and to a
later one to early exit, in which case `Changed` must be return.

Differential Revision: https://reviews.llvm.org/D82753
2020-06-29 18:13:42 +02:00
Francesco Petrogalli 67e4330fac [sve][acle] Implement some of the C intrinsics for brain float.
Summary:
The following intrinsics have been extended to support brain float types:

svbfloat16_t svclasta[_bf16](svbool_t pg, svbfloat16_t fallback, svbfloat16_t data)
bfloat16_t svclasta[_n_bf16](svbool_t pg, bfloat16_t fallback, svbfloat16_t data)
bfloat16_t svlasta[_bf16](svbool_t pg, svbfloat16_t op)

svbfloat16_t svclastb[_bf16](svbool_t pg, svbfloat16_t fallback, svbfloat16_t data)
bfloat16_t svclastb[_n_bf16](svbool_t pg, bfloat16_t fallback, svbfloat16_t data)
bfloat16_t svlastb[_bf16](svbool_t pg, svbfloat16_t op)

svbfloat16_t svdup[_n]_bf16(bfloat16_t op)
svbfloat16_t svdup[_n]_bf16_m(svbfloat16_t inactive, svbool_t pg, bfloat16_t op)
svbfloat16_t svdup[_n]_bf16_x(svbool_t pg, bfloat16_t op)
svbfloat16_t svdup[_n]_bf16_z(svbool_t pg, bfloat16_t op)

svbfloat16_t svdupq[_n]_bf16(bfloat16_t x0, bfloat16_t x1, bfloat16_t x2, bfloat16_t x3, bfloat16_t x4, bfloat16_t x5, bfloat16_t x6, bfloat16_t x7)
svbfloat16_t svdupq_lane[_bf16](svbfloat16_t data, uint64_t index)

svbfloat16_t svinsr[_n_bf16](svbfloat16_t op1, bfloat16_t op2)

Reviewers: sdesmalen, kmclaughlin, c-rhodes, ctetreau, efriedma

Subscribers: tschuett, hiraditya, rkruppe, psnobl, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D82345
2020-06-29 16:09:08 +00:00
Christudasan Devadasan 226cda58d5 [AMDGPU] Moving SI_RETURN_TO_EPILOG handling out of SIInsertSkips.
For now, moving it to SIPreEmitPeephole.
Should find a right place to have this code.

Reviewed By: nhaehnle

Differential revision: https://reviews.llvm.org/D77544
2020-06-29 20:41:53 +05:30
David Green deb72ce298 [ARM] Better reductions
MVE has native reductions for integer add and min/max. The others need
to be expanded to a series of extract's and scalar operators to reduce
the vector into a single scalar. The default codegen for that expands
the reduction into a series of in-order operations.

This modifies that to something more suitable for MVE. The basic idea is
to use vector operations until there are 4 remaining items then switch
to pairwise operations. For example a v8f16 fadd reduction would become:
Y = VREV X
Z = ADD(X, Y)
z0 = Z[0] + Z[1]
z1 = Z[2] + Z[3]
return z0 + z1

The awkwardness (there is always some) comes in from something like a
v4f16, which is first legalized by adding identity values to the extra
lanes of the reduction, and which can then not be optimized away through
the vrev; fadd combo, the inserts remain. I've made sure they custom
lower so that we can produce the pairwise additions before the extra
values are added.

Differential Revision: https://reviews.llvm.org/D81397
2020-06-29 16:04:13 +01:00
Simon Pilgrim 333aa690f4 [X86][SSE] MatchVectorAllZeroTest - handle OR vector reductions (REAPPLIED)
This patch extends MatchVectorAllZeroTest to handle OR vector reduction patterns where the result is compared against zero.

Reapplied with a fix for a chromium regression due to a missing isNullConstant() check in combineSetCC: https://bugs.chromium.org/p/chromium/issues/detail?id=1097758

Fixes PR45378

Differential Revision: https://reviews.llvm.org/D81547
2020-06-29 15:50:44 +01:00
Nemanja Ivanovic 57ad8f4730 [PowerPC] Don't combine SCALAR_TO_VECTOR without VSX
Most of the patterns for PPCISD::SCALAR_TO_VECTOR_PERMUTED require
VSX. So don't emit them if the subtarget doesn't have VSX.
This resolves the issue reported on
https://reviews.llvm.org/rG1fed131660b2c5d3ea7007e273a7a5da80699445
2020-06-29 09:48:57 -05:00
Sanjay Patel b6315aee5b [VectorCombine] try to form vector compare and binop to eliminate scalar ops
binop i1 (cmp Pred (ext X, Index0), C0), (cmp Pred (ext X, Index1), C1)
-->
vcmp = cmp Pred X, VecC
ext (binop vNi1 vcmp, (shuffle vcmp, Index1)), Index0

This is a larger pattern than the existing extractelement folds because we can't
reasonably vectorize the sub-patterns with constants based on cost model calcs
(it doesn't usually make sense to replace a single extracted scalar op with
constant operand with a vector op).

I salvaged as much of the existing logic as I could, but there might be better
ways to share and reduce code.

The motivating case from PR43745:
https://bugs.llvm.org/show_bug.cgi?id=43745
...is the special case of a 2-way reduction. We tried to get SLP to handle that
particular pattern in D59710, but that caused crashing and regressions.
This patch is more general, but hopefully safer.

The v2f64 test with SSE2 surprised me - the cost model accounting looks like this:
OldCost = 0 (free extract of f64 at index 0) + 1 (extract of f64 at index 1) + 2 (scalar fcmps) + 1 (and of bools) = 4
NewCost = 2 (vector fcmp) + 1 (shuffle) + 1 (vector 'and') + 1 (extract of bool) = 5

Differential Revision: https://reviews.llvm.org/D82474
2020-06-29 10:38:52 -04:00
Matt Arsenault d0b0b252e1 AMDGPU: Use IsSSA property check instead of asserting on isSSA
Also fix an SSA violation in a test the MIRParser/verifier fails to
catch. It's illegal to define a subregister in SSA. For the purpose of
the test, it just needs to define the super-register to use the
subregister in the use operand.
2020-06-29 10:05:23 -04:00
Sanjay Patel 3b95d8346d [VectorCombine] refactor - make helper function for extract to shuffle logic; NFC
Preliminary for D82474
2020-06-29 09:55:34 -04:00
Luís Marques 2cb0644f90 [RISCV] Split the pseudo instruction splitting pass
Extracts the atomic pseudo-instructions' splitting from `riscv-expand-pseudo`
/ `RISCVExpandPseudo` into its own pass, `riscv-expand-atomic-pseudo` /
`RISCVExpandAtomicPseudo`. This allows for the expansion of atomic operations
to continue to happen late (the new pass is added in `addPreEmitPass2`, so
those expansions continue to happen in the same place), while the remaining
pseudo-instructions can now be expanded earlier and benefit from more
optimization passes. The nonatomics pass is now added in `addPreSched2`.

Differential Revision: https://reviews.llvm.org/D79635
2020-06-29 14:35:57 +01:00
Sebastian Neubauer 874fcd4e8f Add intrinsic helper function
It simplifies getting generic argument types from intrinsics.

Differential Revision: https://reviews.llvm.org/D81084
2020-06-29 14:47:46 +02:00
Sander de Smalen 39f6a36a24 [AArch64][SVE] NFCI: Choose consistent naming for predicated SDAG nodes
This patch proposes a naming convention for operations that take
a general predicate (and are thus predicated) that specifies
what happens to the false lanes.

Currently the _PRED suffix is used, which doesn't really say much other
than that it takes a predicate. In some instances this means it has
merging predication and in other cases it means zeroing-predication.

This patch also changes the order of operands to
AArch64ISD::DUP_MERGE_PASSTHRU, to pass the predicate as the first
operand, which is in line with all other predicates nodes. It takes the
passthru value as an explicit passthru value, which is always passed as
the last operand.

Reviewers: paulwalker-arm, cameron.mcinally, eli.friedman, dancgr, efriedma

Reviewed By: paulwalker-arm

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D81850
2020-06-29 13:37:30 +01:00
John Brawn ce1fa201af [Driver] When forcing a crash print the bug report message
Commit a945037e8f moved the printing of the
"PLEASE submit a bug report" message to the crash handler, but that means we
don't print it when forcing a crash using FORCE_CLANG_DIAGNOSTICS_CRASH. Fix
this by adding a function to get the bug report message and printing it when
forcing a crash.

Differential Revision: https://reviews.llvm.org/D81672
2020-06-29 13:13:12 +01:00
Guillaume Chatelet 52911428ef [Alignment][NFC] Migrate AMDGPU backend to Align
This patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Differential Revision: https://reviews.llvm.org/D82743
2020-06-29 11:56:06 +00:00
Guillaume Chatelet 368a5e3a66 [Alignment][NFC] migrate DataLayout::getPreferredAlignment
This patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Differential Revision: https://reviews.llvm.org/D82752
2020-06-29 11:24:36 +00:00
Simon Pilgrim 3521ecf1f8 [X86] Add vector support to targetShrinkDemandedConstant for OR/XOR opcodes
If a constant is only allsignbits in the demanded/active bits, then sign extend it to an allsignbits bool pattern for OR/XOR ops.

This also requires SimplifyDemandedBits XOR handling to be modified to call ShrinkDemandedConstant on any (non-NOT) XOR pattern to account for non-splat cases.

Next step towards fixing PR45808 - with this patch we now get a <-1,-1,0,0> v4i64 constant instead of <1,1,0,0>.

Differential Revision: https://reviews.llvm.org/D82257
2020-06-29 12:19:05 +01:00
Cullen Rhodes d5fc592b7c [AArch64][SVE] Add bfloat16 support to svext intrinsic
Reviewers: sdesmalen, kmclaughlin, efriedma, david-arm, fpetrogalli

Reviewed By: sdesmalen, fpetrogalli

Differential Revision: https://reviews.llvm.org/D82391
2020-06-29 11:08:38 +00:00
Kerry McLaughlin bb6603f013 [AArch64][SVE] Bail out of performPostLD1Combine for scalable types
Summary:
performPostLD1Combine will introduce either a LD1LANEpost
or LD1DUPpost node, which will cause selection failure if the
return type is a scalable vector.

Reviewers: sdesmalen, c-rhodes, efriedma

Reviewed By: efriedma

Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, danielkiss, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D82670
2020-06-29 11:59:53 +01:00
Simon Pilgrim 973685fc78 [TargetLowering] Add DemandedElts arg to ShrinkDemandedConstant
Pre-commit for D82257, this adds a DemandedElts arg to ShrinkDemandedConstant/targetShrinkDemandedConstant which will allow future patches to (optionally) add vector support.
2020-06-29 11:46:58 +01:00
Guillaume Chatelet 3500d9ec95 Fix invalid alignment in DAGCombiner::isLegalNarrowLdSt
`ShAmt / 8` can be a non power of two, this can lead to an invalid alignment.
context: https://reviews.llvm.org/D41350#inline-749165

Differential Revision: https://reviews.llvm.org/D82565
2020-06-29 09:22:15 +00:00
Xing GUO 8f9ca561a2 [ObjectYAML][DWARF] Collect diagnostic message when YAMLParser fails.
Before this patch, the diagnostic message is printed to `errs()` directly, which makes it difficult to use `FailedWithMessage()` in unit testing.
In this patch, we add a custom error handler for YAMLParser, which helps collect diagnostic messages and make it easy to use `FailedWithMessage()` to check error messages.

Reviewed By: jhenderson, MaskRay

Differential Revision: https://reviews.llvm.org/D82630
2020-06-29 16:13:53 +08:00
Sergey Dmitriev 1becd298b8 [NFC] CallGraph related cleanup
Summary: Tidy up some CallGraph-related code in preparation for D82572.

Reviewers: jdoerfert

Reviewed By: jdoerfert

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D82686
2020-06-28 15:27:39 -07:00
Xun Li c8755b6378 [Coroutines] Optimize the lifespan of temporary co_await object
Summary:
If we ever assign co_await to a temporary variable, such as foo(co_await expr),
we generate AST that looks like this: MaterializedTemporaryExpr(CoawaitExpr(...)).
MaterializedTemporaryExpr would emit an intrinsics that marks the lifetime start of the
temporary storage. However such temporary storage will not be used until co_await is ready
to write the result. Marking the lifetime start way too early causes extra storage to be
put in the coroutine frame instead of the stack.
As you can see from https://godbolt.org/z/zVx_eB, the frame generated for get_big_object2 is 12K, which contains a big_object object unnecessarily.
After this patch, the frame size for get_big_object2 is now only 8K. There are still room for improvements, in particular, GCC has a 4K frame for this function. But that's a separate problem and not addressed in this patch.

The basic idea of this patch is during CoroSplit, look for every local variable in the coroutine created through AllocaInst, identify all the lifetime start/end markers and the use of the variables, and sink the lifetime.start maker to the places as close to the first-ever use as possible.

Reviewers: lewissbaker, modocache, junparser

Reviewed By: junparser

Subscribers: hiraditya, llvm-commits, rsmith, ChuanqiXu, cfe-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D82314
2020-06-28 10:18:15 -07:00
Nikita Popov 614b995cac [LVI] Refactor value from icmp cond handling (NFC)
Rewrite this in a way that is more amenable to extension.
2020-06-28 15:04:02 +02:00
Simon Pilgrim e07a982693 [X86] combineScalarToVector - handle (v2i64 scalar_to_vector(aextload)) as well as (v2i64 scalar_to_vector(aext))
We already fold (v2i64 scalar_to_vector(aext)) -> (v2i64 bitcast(v4i32 scalar_to_vector(x))), this adds support for similar aextload cases and also handles v2f64 cases that wrap the i64 extension behind bitcasts.

Fixes the remaining issue with PR39016
2020-06-28 13:00:32 +01:00
madhur13490 299dee91b3 Revert accidentally landed patch citing o build errors
Summary: This reverts commit c73966c2f7.

Reviewers:

Subscribers:
2020-06-28 11:52:33 +00:00
madhur13490 c73966c2f7 Improve stack object printing. NFC.
Reviewers: madhur13490

Reviewed By: madhur13490

Subscribers: qcolombet, arsenm, jvesely, nhaehnle, hiraditya, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D82712
2020-06-28 11:43:33 +00:00
dfukalov c7bcd431d9 SpeculativeExecution: fix incorrect debug info move
Summary:
Debug info related instructions got zero cost so hoisted unconditionally

Bugzilla: https://bugs.llvm.org/show_bug.cgi?id=46267

Reviewers: arsenm, nhaehnle, chandlerc, aprantl

Reviewed By: aprantl

Subscribers: ormris, uabelho, wdng, aprantl, hiraditya, llvm-commits

Tags: #llvm, #debug-info

Differential Revision: https://reviews.llvm.org/D81730
2020-06-28 14:35:00 +03:00
Brad Smith 66b7ba52b7 Add OpenBSD support to be able to retrieve the thread id 2020-06-27 21:14:44 -04:00
Benjamin Kramer 85b53598a9 [RISCV] Silence unused variable warning in Release builds. NFC. 2020-06-27 23:24:28 +02:00
Nikita Popov 323cb26cef [ValueTracking] Use a switch statement (NFC) 2020-06-27 22:42:43 +02:00
Simon Pilgrim 393b4bd136 [X86] SimplifyDemandedVectorEltsForTargetNode - merge shuffle/pack lower demanded elements handling.
Generalize the vector operand extraction code for shuffle/pack ops - we can assume that the vector operands are the same width as the result, and any non-vector values can be reused directly in the smaller width op.
2020-06-27 19:10:13 +01:00
Hsiangkai Wang 66da87dcba [RISCV] Assemble/Disassemble v-ext instructions.
Assemble/disassemble RISC-V V extension instructions according to
latest version spec in https://github.com/riscv/riscv-v-spec/.

I have tested this patch using GNU toolchain. The encoding is aligned
to GNU assembler output. In this patch, there is a test case for each
instruction at least.

The V register definition is just for assemble/disassemble. Its type
is not important in this stage. I think it will be reviewed and modified
as we want to do codegen for scalable vector types.

This patch does not include Zvamo, Zvlsseg, and Zvediv.

Differential revision: https://reviews.llvm.org/D69987
2020-06-28 00:54:07 +08:00
Roman Lebedev f0634100cd
[Analysis] isDereferenceableAndAlignedPointer(): don't crash on `bitcast <1 x ???*> to ???*` 2020-06-27 18:30:59 +03:00
Simon Pilgrim e855efe424 [X86][AVX] SimplifyDemandedVectorEltsForTargetNode - reduce width of X86ISD::VPERMIL2
If we don't need the elements of the upper lanes, reduce the width of the X86ISD::VPERMIL2 node.
2020-06-27 15:06:49 +01:00
Simon Pilgrim d56c6475a6 [X86][AVX] SimplifyDemandedVectorEltsForTargetNode - reduce width of X86ISD::VPERMILPV
If we don't need the elements of the upper lanes, reduce the width of the X86ISD::VPERMILPV node.
2020-06-27 14:43:03 +01:00
Simon Pilgrim 892df9e706 FileCollector.h - reduce Twine.h include to forward declaration. NFC. 2020-06-27 11:16:25 +01:00
Simon Pilgrim 6bdb3ce452 [DAG] reduceBuildVecExtToExtBuildVec - don't combine if it would break a splat.
reduceBuildVecExtToExtBuildVec was breaking a splat(zext(x)) pattern into buildvector(x, 0, x, 0, ..) resulting in much more complex insert+shuffle codegen.

We already go to some lengths to avoid this in SimplifyDemandedVectorElts etc. when we encounter splat buildvectors.

It should be OK to fold all splat(aext(x)) patterns - we might need to tighten this if we find a case where we mustn't introduce a buildvector(x, undef, x, undef, ..) but I can't find one.

Fixes PR46461.
2020-06-27 11:03:57 +01:00
David Zarzycki dab859d1bf Reland: [clang driver] Move default module cache from system temporary directory
This fixes a unit test. Otherwise here is the original commit:

1) Shared writable directories like /tmp are a security problem.
2) Systems provide dedicated cache directories these days anyway.
3) This also refines LLVM's cache_directory() on Darwin platforms to use
   the Darwin per-user cache directory.

Reviewers: compnerd, aprantl, jakehehrlich, espindola, respindola, ilya-biryukov, pcc, sammccall

Reviewed By: compnerd, sammccall

Subscribers: hiraditya, llvm-commits, cfe-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D82362
2020-06-27 05:35:15 -04:00
Simon Pilgrim df813dc09e Error.h - GenericBinaryError - pass Twine arg by reference not value.
This allows us to reduce the Twine.h include to a forward declaration.
2020-06-27 10:12:20 +01:00
Simon Pilgrim 23cdbdb20b MCSectionWasm.h - reduce includes to forward declarations. NFC. 2020-06-27 10:03:34 +01:00
Nikita Popov 9a334a4d20 [IR] Store attributes that are available "somewhere" (NFC)
I noticed that for some benchmarks we spend quite a bit of time
inside AttributeList::hasAttrSomewhere(), mainly when checking
for the "returned" attribute. Most of the time the attribute will
not be present, in which case this function has to walk through
the whole attribute list and check for the attribute at each index.

This patch adds a cache of all "available somewhere" attributes
inside AttributeListImpl. This makes the structure 12 bytes larger,
but I don't think that's problematic, as attribute lists are uniqued.
Compile-time in terms of instructions retired improves by 0.4% on
average, but >1% for sqlite.

Differential Revision: https://reviews.llvm.org/D81867
2020-06-27 10:44:59 +02:00
Simon Pilgrim ba2ac689e6 AsmWriter - printConstVCalls/printNonConstVCalls - avoid std::vector pass by value. NFCI. 2020-06-27 09:38:37 +01:00
Roman Lebedev 141e845da5
[SCEV] Make SCEVAddExpr actually always return pointer type if there is pointer operand (PR46457)
Summary:
The added assertion fails on the added test without the fix.

Reduced from test-suite/MultiSource/Benchmarks/MiBench/office-ispell/correct.c
In IR, getelementptr, obviously, takes pointer as it's base,
and returns a pointer.

When creating an SCEV expression, SCEV operands are sorted in hope
that it increases folding potential, and at the same time SCEVAddExpr's
type is the type of the last(!) operand.

Which means, in some exceedingly rare cases, pointer operand may happen to
end up not being the last operand, and as a result SCEV for GEP
will suddenly have a non-pointer return type.
We should ensure that does not happen.

In the end, actually storing the `Type *`, at the cost of increasing
memory footprint of `SCEVAddExpr`, appears to be the solution.
We can't just store a 'is a pointer' bit and create pointer type
on the fly since we don't have data layout in getType().

Fixes [[ https://bugs.llvm.org/show_bug.cgi?id=46457 | PR46457 ]]

Reviewers: efriedma, mkazantsev, reames, nikic

Reviewed By: efriedma

Subscribers: hiraditya, javed.absar, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D82633
2020-06-27 11:37:17 +03:00
Roman Lebedev f9f52c88ca
[NFCI][SCEV] getPointerBase(): de-recursify
Summary:
This is boringly straight-forward, each iteration we see if
V is some expression that we can look into, and if it has
a single pointer operand, then set V to that operand
and repeat.

Reviewers: efriedma, mkazantsev, reames, nikic

Reviewed By: nikic

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D82632
2020-06-27 11:37:17 +03:00
Gui Andrade eae84b41fe [MSAN] Handle x86 {round,min,max}sd intrinsics
These need special handling over the simple vector intrinsics as they
behave more like a shuffle operation: taking the top half of the vector
from one input, and the bottom half separately. Previously, these were
being handled as though all bits of all operands were combined.

Differential Revision: https://reviews.llvm.org/D82398
2020-06-27 06:46:04 +00:00
Craig Topper 9e8b5a20e9 [X86] Add MOVBE and RDRND features to BDVER4.
Only 6 years behind gcc. https://gcc.gnu.org/legacy-ml/gcc-patches/2014-08/msg00231.html

Found while working on improving how we define CPU features for
clang and auditing for correctness.
2020-06-26 23:32:17 -07:00
Fady Ghanim 82b8236cf2 [OpenMP][OMPBuilder] Adding privatization related `createXXXX` to OMPBuilder 2020-06-27 01:54:41 -04:00