Commit Graph

39993 Commits

Author SHA1 Message Date
sstefan1 72b51d6f93 OpenMP] Adding InaccessibleMemOnly and InaccessibleMemOrArgMemOnly for runtime calls.
Summary: Attempt to add more attributes for runtime calls.

Reviewers: jdoerfert

Subscribers: guansong, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D75010
2020-03-25 14:08:50 +00:00
Kerry McLaughlin 05606329e2 [AArch64][SVE] Add SVE intrinsics for masked loads & stores
Summary:
Implements the following intrinsics for contiguous loads & stores:
  - @llvm.aarch64.sve.ld1
  - @llvm.aarch64.sve.st1

Reviewers: sdesmalen, andwar, efriedma, cameron.mcinally, dancgr, rengolin

Reviewed By: cameron.mcinally

Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, danielkiss, cfe-commits, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D76688
2020-03-25 11:48:40 +00:00
Adrian Prantl ed8ad6ec15 Add an -object-path-prefix option to dsymutil
to remap object file paths (but no source paths) before
processing. This is meant to be used for Clang objects where the
module cache location was remapped using ``-fdebug-prefix-map``; to
help dsymutil find the Clang module cache.

<rdar://problem/55685132>

Differential Revision: https://reviews.llvm.org/D76391
2020-03-24 17:13:42 -07:00
Matt Arsenault 39c55cef21 GlobalISel: Introduce bitcast legalize action
For some operations, the type is unimportant and only the number of
bits matters. For example I don't want to treat <4 x s8> as a legal
type, but I also don't want to decompose loads of this into smaller
pieces to get legal register types.

On AMDGPU in SelectionDAG, we legalize a number of operations (most
notably load and store) by coercing all types to vectors of i32. For
GlobalISel, I'm trying very hard to avoid doing this for every type,
but I don't think this strategy can be completely avoided. I'm trying
to avoid bitcasts for any legitimately legal type we can operate on,
since the intervening bitcasts have proven to be a hassle.

For loads, I think I can get away without ever casting the result
type, and handling any arbitrary bitwidth during selection (I will
eventually want new tablegen support to help with this, rather than
having to add every possible type as legal). The unmerge required to
do anything with the value should expand to the expected shifts. This
is trickier for stores, since it would now require handling a wide
array of truncates during selection which I don't want.

Future potentially interesting case are for vector indexing, where
sub-dword type should be indexed in s32 pieces.
2020-03-24 19:33:33 -04:00
Johannes Doerfert 5699d08b79 [Attributor] Use knowledge retained in llvm.assume (operand bundles)
This patch integrates operand bundle llvm.assumes [0] with the
Attributor. Most IRAttributes will now look at uses of the associated
value and if there are llvm.assume operand bundle uses with the right
tag we will check if they are in the must-be-executed-context (around
the context instruction). Droppable users, which is currently only
llvm::assume, are handled special in some places now as well.

[0] http://lists.llvm.org/pipermail/llvm-dev/2019-December/137632.html

Reviewed By: uenoku

Differential Revision: https://reviews.llvm.org/D74888
2020-03-24 15:33:40 -05:00
Vedant Kumar f7052da6db [DWARF] Emit DW_AT_call_pc for tail calls
Record the address of a tail-calling branch instruction within its call
site entry using DW_AT_call_pc. This allows a debugger to determine the
address to use when creating aritificial frames.

This creates an extra attribute + relocation at tail call sites, which
constitute 3-5% of all call sites in xnu/clang respectively.

rdar://60307600

Differential Revision: https://reviews.llvm.org/D76336
2020-03-24 12:01:55 -07:00
Hiroshi Yamauchi c3417592c8 Revert "Include static prof data when collecting loop BBs"
This reverts commit 129c911efa.

Due to an internal benchmark regression.
2020-03-24 09:41:16 -07:00
Juneyoung Lee 7802be4a3d [SelDag] Add FREEZE
Summary:
- Add FREEZE node to SelDag
- Lower FreezeInst (in IR) to FREEZE node
- Add Legalization for FREEZE node

Reviewers: qcolombet, bogner, efriedma, lebedev.ri, nlopes, craig.topper, arsenm

Reviewed By: lebedev.ri

Subscribers: wdng, xbolva00, Petar.Avramovic, liuz, lkail, dylanmckay, hiraditya, Jim, arsenm, craig.topper, RKSimon, spatel, lebedev.ri, regehr, trentxintong, nlopes, mkuper, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D29014
2020-03-24 23:04:58 +09:00
Florian Hahn 7caba33907 [ConstantRange] Add initial support for binaryXor.
The initial implementation just delegates to APInt's implementation of
XOR for single element ranges and conservatively returns the full set
otherwise.

Reviewers: nikic, spatel, lebedev.ri

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D76453
2020-03-24 12:59:50 +00:00
John McCall 49e5a97ec3 Add an algorithm for performing "optimal" layout of a struct.
The algorithm supports both assigning a fixed offset to a field prior to
layout and allowing fields to have sizes that aren't multiples of their
required alignments.  This means that the well-known algorithm of sorting
by decreasing alignment isn't always good enough.  Still, we start with
that, and only if that leaves padding around do we fall back on a greedy
padding-minimizing algorithm.

There is no known efficient algorithm for producing a guaranteed-minimal
layout in all cases.  In fact, allowing arbitrary fixed-offset fields means
there's a straightforward reduction from bin-packing, making this NP-hard.
But as usual with such problems, we can still efficiently produce adequate
solutions to the cases that matter most to us.

I intend to use this in coroutine frame layout, where the retcon lowerings
very badly want to minimize total space usage, and where the switch lowering
can indeed produce a header with interior padding if the promise field is
highly-aligned.  But it may be useful in a much wider variety of situations.
2020-03-23 23:24:48 -04:00
Jessica Paquette 02187ed45a [GlobalISel] Combine G_SELECTs of the form (cond ? x : x) into x
When we find something like this:

```
%a:_(s32) = G_SOMETHING ...
...
%select:_(s32) = G_SELECT %cond(s1), %a, %a
```

We can remove the select and just replace it entirely with `%a` because it's
always going to result in `%a`.

Same if we have

```
%select:_(s32) = G_SELECT %cond(s1), %a, %b
```

where we can deduce that `%a == %b`.

This implements the following cases:

- `%select:_(s32) = G_SELECT %cond(s1), %a, %a` -> `%a`

- `%select:_(s32) = G_SELECT %cond(s1), %a, %some_copy_from_a` -> `%a`

- `%select:_(s32) = G_SELECT %cond(s1), %a, %b` -> `%a` when `%a` and `%b`
   are defined by identical instructions

This gives a few minor code size improvements on CTMark at -O3 for AArch64.

Differential Revision: https://reviews.llvm.org/D76523
2020-03-23 16:46:03 -07:00
Ladd Van Tol a20862307f Improve module.pcm lock file performance on machines with high core counts
Summary:
When building a large Xcode project with multiple module dependencies, and mixed Objective-C & Swift, I observed a large number of clang processes stalling at zero CPU for 30+ seconds throughout the build. This was especially prevalent on my 18-core iMac Pro.

After some sampling, the major cause appears to be the lock file implementation for precompiled modules in the module cache. When the lock is heavily contended by multiple clang processes, the exponential backoff runs in lockstep, with some of the processes sleeping for 30+ seconds in order to acquire the file lock.

In the attached patch, I implemented a more aggressive polling mechanism that limits the sleep interval to a max of 500ms, and randomizes the wait time. I preserved a limited form of exponential backoff. I also updated the code to use cross-platform timing, thread sleep, and random number capabilities available in C++11.

On iMac Pro (2.3 GHz Intel Xeon W, 18 core):

Xcode 11.1 bundled clang:

502.2 seconds (average of 5 runs)

Custom clang build with LockFileManager patch applied:

276.6 seconds (average of 5 runs)

This is a 1.82x speedup for this use case.

On MacBook Pro (4 core 3.1GHz Intel i7):

Xcode 11.1 bundled clang:

539.4 seconds (average of 2 runs)

Custom clang build with LockFileManager patch applied:

509.5 seconds (average of 2 runs)

As expected, machines with fewer cores benefit less from this change.

```
Call graph:
    2992 Thread_393602   DispatchQueue_1: com.apple.main-thread  (serial)
      2992 start  (in libdyld.dylib) + 1  [0x7fff6a1683d5]
        2992 main  (in clang) + 297  [0x1097a1059]
          2992 driver_main(int, char const**)  (in clang) + 2803  [0x1097a5513]
            2992 cc1_main(llvm::ArrayRef<char const*>, char const*, void*)  (in clang) + 1608  [0x1097a7cc8]
              2992 clang::ExecuteCompilerInvocation(clang::CompilerInstance*)  (in clang) + 3299  [0x1097dace3]
                2992 clang::CompilerInstance::ExecuteAction(clang::FrontendAction&)  (in clang) + 509  [0x1097dcc1d]
                  2992 clang::FrontendAction::Execute()  (in clang) + 42  [0x109818b3a]
                    2992 clang::ParseAST(clang::Sema&, bool, bool)  (in clang) + 185  [0x10981b369]
                      2992 clang::Parser::ParseFirstTopLevelDecl(clang::OpaquePtr<clang::DeclGroupRef>&)  (in clang) + 37  [0x10983e9b5]
                        2992 clang::Parser::ParseTopLevelDecl(clang::OpaquePtr<clang::DeclGroupRef>&)  (in clang) + 141  [0x10983ecfd]
                          2992 clang::Parser::ParseExternalDeclaration(clang::Parser::ParsedAttributesWithRange&, clang::ParsingDeclSpec*)  (in clang) + 695  [0x10983f3b7]
                            2992 clang::Parser::ParseObjCAtDirectives(clang::Parser::ParsedAttributesWithRange&)  (in clang) + 637  [0x10a9be9bd]
                              2992 clang::Parser::ParseModuleImport(clang::SourceLocation)  (in clang) + 170  [0x10c4841ba]
                                2992 clang::Parser::ParseModuleName(clang::SourceLocation, llvm::SmallVectorImpl<std::__1::pair<clang::IdentifierInfo*, clang::SourceLocation> >&, bool)  (in clang) + 503  [0x10c485267]
                                  2992 clang::Preprocessor::Lex(clang::Token&)  (in clang) + 316  [0x1098285cc]
                                    2992 clang::Preprocessor::LexAfterModuleImport(clang::Token&)  (in clang) + 690  [0x10cc7af62]
                                      2992 clang::CompilerInstance::loadModule(clang::SourceLocation, llvm::ArrayRef<std::__1::pair<clang::IdentifierInfo*, clang::SourceLocation> >, clang::Module::NameVisibilityKind, bool)  (in clang) + 7989  [0x10bba6535]
                                        2992 compileAndLoadModule(clang::CompilerInstance&, clang::SourceLocation, clang::SourceLocation, clang::Module*, llvm::StringRef)  (in clang) + 296  [0x10bba8318]
                                          2992 llvm::LockFileManager::waitForUnlock()  (in clang) + 91  [0x10b6953ab]
                                            2992 nanosleep  (in libsystem_c.dylib) + 199  [0x7fff6a22c914]
                                              2992 __semwait_signal  (in libsystem_kernel.dylib) + 10  [0x7fff6a2a0f32]

```

Differential Revision: https://reviews.llvm.org/D69575
2020-03-23 14:59:39 -07:00
Matt Arsenault 3f533006ba AMDGPU: Emit llvm.fshr for __builtin_amdgcn_alignbit
These are equivalent. The generic rotate builtins do not directly map
to the fshr intrinsic.
2020-03-23 16:51:25 -04:00
Stefanos Baziotis a650d555fc [Attributor][NFC] Refactorings and typos in doc
Reviewed By: sstefan1, uenoku

Differential Revision: https://reviews.llvm.org/D76175
2020-03-23 22:44:10 +02:00
Matt Arsenault 43d98a0ecf Allow replacing intrinsic operands with variables
Since intrinsics can now specify when an argument is required to be
constant, it is now OK to replace arguments with variables if they
aren't. This means intrinsics must now be accurately marked with
immarg.
2020-03-23 15:51:57 -04:00
Johannes Doerfert 6b57d7f57d [OpenMP][NFC] Reduce instantiation time with different data structure
See rational here: https://reviews.llvm.org/D71847#1922648

Reviewed By: rnk

Differential Revision: https://reviews.llvm.org/D76170
2020-03-23 14:23:46 -05:00
Matt Arsenault aa63eb6a46 GlobalISel: Add computeKnownBitsForTargetInstr
I think we can save the MRI argument from these since it's in
GISelKnownBits already, but currently not accessible.

Implementation deferred to avoid dependency on other patches.
2020-03-23 15:02:30 -04:00
Johannes Doerfert 3f51c5d9ca [OpenMPOpt][FIX] Resolve OpenMP runtime call type mismatches
Exposed by D76058 and tested once that one lands.
2020-03-23 11:43:36 -05:00
Sanjay Patel ebf83c36e2 [Analysis] simplify code for scaleShuffleMask
This is NFC-ish. The results should be identical, but perf is hopefully
better with the fast-path for no scaling. Added a unit test for that.

The code is adapted from what used to be the DAGCombiner equivalent
function before D76508 (rG0eeee83d7513).
2020-03-23 11:47:14 -04:00
Johannes Doerfert c57689bef2 [Attributor][NFC] Copy llvm::function_ref, don't use references
On IRC this was called a "code smell" so we get rid of it.
2020-03-23 10:45:24 -05:00
Jay Foad 0444d16a16 [GlobalISel] Add generic opcodes for saturating add/subtract
Summary:
Add new generic MIR opcodes G_SADDSAT etc. Add support in IRTranslator
for translating the saturating add/subtract intrinsics to the new
opcodes.

Reviewers: aemerson, dsanders, paquette, arsenm

Subscribers: jvesely, wdng, nhaehnle, rovka, hiraditya, volkan, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D76600
2020-03-23 15:16:45 +00:00
Sanjay Patel 0eeee83d75 [VectorUtils] move x86's scaleShuffleMask to generic VectorUtils
We have some long-standing missing shuffle optimizations that could
use this transform via VectorCombine now:
https://bugs.llvm.org/show_bug.cgi?id=35454
(and we still don't get that case in the backend either)

This function is apparently templated because there's existing code
in IR that treats mask values as unsigned and backend code that
treats masks values as signed.

The mask values are not endian-dependent (as shown by the existing
bitcast transform from DAGCombiner).

Differential Revision: https://reviews.llvm.org/D76508
2020-03-23 09:58:55 -04:00
Guillaume Chatelet 32851f8d63 [Alignment][NFC] Deprecate VectorUtils::getAlignment
Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Reviewers: courbet

Subscribers: hiraditya, rogfer01, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D76542
2020-03-23 13:54:15 +01:00
Guillaume Chatelet ea64ee0edb [Alignment][NFC] Deprecate ensureMaxAlignment
Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Reviewers: courbet

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D76368
2020-03-23 11:31:33 +01:00
Yaxun (Sam) Liu 314deab9af Add Triple::isAMDGPU
Differential Revision: https://reviews.llvm.org/D57707
2020-03-22 14:20:28 -04:00
Nico Weber 2655d1b457 Remove a dead function. 2020-03-22 13:27:51 -04:00
Nikita Popov dc81923659 [InstCombine] Remove ExpensiveCombines option
D75801 removed the last and only user of this option, so we can
drop it now. The original idea behind this was to only run expensive
transforms under -O3, but apart from the one known bits transform,
this has never really taken off. I believe nowadays the recommendation
is to put expensive transforms in AggressiveInstCombine instead,
though that isn't terribly popular either :)

Differential Revision: https://reviews.llvm.org/D76540
2020-03-22 16:56:28 +01:00
Matt Arsenault 830cfda19f Utils: Mostly convert memcpy expansion to use Align
The TTI hooks aren't converted. I also think the intrinsics should
have mandatory alignment and never return MaybeAlign.
2020-03-22 11:21:44 -04:00
Fangrui Song 140d6245af Delete TargetLoweringObjectFile::Ctx
We can use the parent MCObjectFileInfo::Ctx which has the same value.
2020-03-21 22:36:29 -07:00
Lang Hames 38a8760b99 [ORC] Move ostream operators for debugging output out of Core.h.
DebugUtils.h seems like a more appropriate home for these.
2020-03-21 18:27:28 -07:00
Ehud Katz 34fd007aaf Revert "[ADT] Implement the Waymarking as an independent utility"
This reverts commit 73cf8abbe6.
2020-03-21 22:47:17 +02:00
Ehud Katz 73cf8abbe6 [ADT] Implement the Waymarking as an independent utility
This is the Waymarking algorithm implemented as an independent utility.
The utility is operating on a range of sequential elements.
First we "tag" the elements, by calling `fillWaymarks`.
Then we can "follow" the tags from every element inside the tagged
range, and reach the "head" (the first element), by calling
`followWaymarks`.

Differential Revision: https://reviews.llvm.org/D74415
2020-03-21 14:30:32 +02:00
Simon Pilgrim f9a8650578 Revert rGd5d8569df14e95e2c53d167bd1b37995bcbec565 "Fix static analysis warnings about classes with virtual methods not having virtual destructors"
This reverts commit d5d8569df1.
2020-03-21 11:39:34 +00:00
Simon Pilgrim d5d8569df1 Fix static analysis warnings about classes with virtual methods not having virtual destructors 2020-03-21 11:30:44 +00:00
Huihui Zhang 4f5af9d70d [ValueTracking] Fix usage of DataLayout::getTypeStoreSize()
Summary:
DataLayout::getTypeStoreSize() returns TypeSize.

For cases where it can not be scalable vector (e.g., GlobalVariable),
explicitly call TypeSize::getFixedSize().

For cases where scalable property doesn't matter, (e.g., check for
zero-sized type), use TypeSize::isNonZero().

Reviewers: sdesmalen, efriedma, apazos, reames

Reviewed By: efriedma

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D76454
2020-03-20 16:52:15 -07:00
Vedant Kumar a3fd1a1c74 [ADT] CoalescingBitVector: Add advanceToLowerBound iterator operation
advanceToLowerBound moves an iterator to the first bit set at, or after,
the given index. This can be faster than doing IntervalMap::find.

rdar://60046261

Differential Revision: https://reviews.llvm.org/D76466
2020-03-20 12:18:26 -07:00
Vedant Kumar 4716ebb823 [ADT] CoalescingBitVector: Avoid initial heap allocation, NFC
Avoid making a heap allocation when constructing a CoalescingBitVector.

This reduces time spent in LiveDebugValues when compiling sqlite3 by
700ms (0.5% of the total User Time).

rdar://60046261

Differential Revision: https://reviews.llvm.org/D76465
2020-03-20 12:18:25 -07:00
Adrian Prantl 18e8f27ad8 Add missing module map entry 2020-03-20 11:11:27 -07:00
Sterling Augustine 5de4ba1770 Cleanup the plumbing for DILineInfoSpecifier. [NFC - Try 2] 2020-03-20 10:29:57 -07:00
Simon Pilgrim 34659de5fd [InstCombine][X86] simplifyX86immShift - convert variable in-range vector shift by scalar amounts to generic shifts (PR40391)
The sll/srl/sra scalar vector shifts can be replaced with generic shifts if the shift amount is known to be in range.

This also required public DemandedElts variants of llvm::computeKnownBits to be exposed (PR36319).
2020-03-20 15:48:06 +00:00
Simon Tatham 1adfa4c991 [ARM,MVE] Add ACLE intrinsics for the vaddv/vaddlv family.
Summary:
I've implemented them as target-specific IR intrinsics rather than
using `@llvm.experimental.vector.reduce.add`, on the grounds that the
'experimental' intrinsic doesn't currently have much code generation
benefit, and my replacements encapsulate the sign- or zero-extension
so that you don't expose the illegal MVE vector type (`<4 x i64>`) in
IR.

The machine instructions come in two versions: with and without an
input accumulator. My new IR intrinsics, like the 'experimental' one,
don't take an accumulator parameter: we represent that by just adding
on the input value using an ordinary i32 or i64 add. So if you write
the `vaddvaq` C-language intrinsic with an input accumulator of zero,
it can be optimised to VADDV, and conversely, if you write something
like `x += vaddvq(y)` then that can be combined into VADDVA.

Most of this is achieved in isel lowering, by converting these IR
intrinsics into the existing `ARMISD::VADDV` family of custom SDNode
types. For the difficult case (64-bit accumulators), isel lowering
already implements the optimization of folding an addition into a
VADDLV to make a VADDLVA; so once we've made a VADDLV, our job is
already done, except that I had to introduce a parallel set of ARMISD
nodes for the //predicated// forms of VADDLV.

For the simpler VADDV, we handle the predicated form by just leaving
the IR intrinsic alone and matching it in an ordinary dag pattern.

Reviewers: dmgreen, MarkMurrayARM, miyuki, ostannard

Reviewed By: dmgreen

Subscribers: kristof.beyls, hiraditya, danielkiss, cfe-commits

Tags: #clang

Differential Revision: https://reviews.llvm.org/D76491
2020-03-20 15:42:33 +00:00
Simon Tatham 45a9945b9e [ARM,MVE] Add ACLE intrinsics for the vminv/vmaxv family.
Summary:
I've implemented these as target-specific IR intrinsics, because
they're not //quite// enough like @llvm.experimental.vector.reduce.min
(which doesn't take the extra scalar parameter). Also this keeps the
predicated and unpredicated versions looking similar, and the
floating-point minnm/maxnm versions fold into the same schema.

We had a couple of min/max reductions already implemented, from the
initial pathfinding exercise in D67158. Those were done by having
separate IR intrinsic names for the signed and unsigned integer
versions; as part of this commit, I've changed them to use a flag
parameter indicating signedness, which is how we ended up deciding
that the rest of the MVE intrinsics family ought to work. So now
hopefully the ewhole lot is consistent.

In the new llc test, the output code from the `v8f16` test functions
looks quite unpleasant, but most of it is PCS lowering (you can't pass
a `half` directly in or out of a function). In other circumstances,
where you do something else with your `half` in the same function, it
doesn't look nearly as nasty.

Reviewers: dmgreen, MarkMurrayARM, miyuki, ostannard

Reviewed By: MarkMurrayARM

Subscribers: kristof.beyls, hiraditya, cfe-commits

Tags: #clang

Differential Revision: https://reviews.llvm.org/D76490
2020-03-20 15:42:33 +00:00
Mikhail Maltsev 969034b860 [ARM,CDE] Implement CDE unpredicated Q-register intrinsics
Summary:
This patch implements the following intrinsics:

  uint8x16_t __arm_vcx1q_u8 (int coproc, uint32_t imm);
  T __arm_vcx1qa(int coproc, T acc, uint32_t imm);
  T __arm_vcx2q(int coproc, T n, uint32_t imm);
  uint8x16_t __arm_vcx2q_u8(int coproc, T n, uint32_t imm);
  T __arm_vcx2qa(int coproc, T acc, U n, uint32_t imm);
  T __arm_vcx3q(int coproc, T n, U m, uint32_t imm);
  uint8x16_t __arm_vcx3q_u8(int coproc, T n, U m, uint32_t imm);
  T __arm_vcx3qa(int coproc, T acc, U n, V m, uint32_t imm);

Most of them are polymorphic. Furthermore, some intrinsics are
polymorphic by 2 or 3 parameter types, such polymorphism is not
supported by the existing MVE/CDE tablegen backends, also we don't
really want to have a combinatorial explosion caused by 1000 different
combinations of 3 vector types. Because of this some intrinsics are
implemented as macros involving a cast of the polymorphic arguments to
uint8x16_t.

The IR intrinsics are even more restricted in terms of types: all MVE
vectors are cast to v16i8.

Reviewers: simon_tatham, MarkMurrayARM, dmgreen, ostannard

Reviewed By: MarkMurrayARM

Subscribers: kristof.beyls, hiraditya, danielkiss, cfe-commits

Tags: #clang

Differential Revision: https://reviews.llvm.org/D76299
2020-03-20 14:01:56 +00:00
Mikhail Maltsev d22e661712 [ARM,CDE] Implement CDE S and D-register intrinsics
Summary:
This patch implements the following ACLE intrinsics:

  uint32_t __arm_vcx1_u32(int coproc, uint32_t imm);
  uint32_t __arm_vcx1a_u32(int coproc, uint32_t acc, uint32_t imm);
  uint32_t __arm_vcx2_u32(int coproc, uint32_t n, uint32_t imm);
  uint32_t __arm_vcx2a_u32(int coproc, uint32_t acc, uint32_t n, uint32_t imm);
  uint32_t __arm_vcx3_u32(int coproc, uint32_t n, uint32_t m, uint32_t imm);
  uint32_t __arm_vcx3a_u32(int coproc, uint32_t acc, uint32_t n, uint32_t m, uint32_t imm);

  uint64_t __arm_vcx1d_u64(int coproc, uint32_t imm);
  uint64_t __arm_vcx1da_u64(int coproc, uint64_t acc, uint32_t imm);
  uint64_t __arm_vcx2d_u64(int coproc, uint64_t m, uint32_t imm);
  uint64_t __arm_vcx2da_u64(int coproc, uint64_t acc, uint64_t m, uint32_t imm);
  uint64_t __arm_vcx3d_u64(int coproc, uint64_t n, uint64_t m, uint32_t imm);
  uint64_t __arm_vcx3da_u64(int coproc, uint64_t acc, uint64_t n, uint64_t m, uint32_t imm);

Since the semantics of CDE instructions is opaque to the compiler, the
ACLE intrinsics require dedicated LLVM IR intrinsics. The 64-bit and
32-bit variants share the same IR intrinsic.

Reviewers: simon_tatham, MarkMurrayARM, ostannard, dmgreen

Reviewed By: MarkMurrayARM

Subscribers: kristof.beyls, hiraditya, danielkiss, cfe-commits

Tags: #clang

Differential Revision: https://reviews.llvm.org/D76298
2020-03-20 14:01:53 +00:00
Mikhail Maltsev 7a85e3585e [ARM,CDE] Implement GPR CDE intrinsics
Summary:
This change implements ACLE CDE intrinsics that translate to
instructions working with general-purpose registers.

The specification is available at
https://static.docs.arm.com/101028/0010/ACLE_2019Q4_release-0010.pdf

Each ACLE intrinsic gets a corresponding LLVM IR intrinsic (because
they have distinct function prototypes). Dual-register operands are
represented as pairs of i32 values. Because of this the instruction
selection for these intrinsics cannot be represented as TableGen
patterns and requires custom C++ code.

Reviewers: simon_tatham, MarkMurrayARM, dmgreen, ostannard

Reviewed By: MarkMurrayARM

Subscribers: kristof.beyls, hiraditya, danielkiss, cfe-commits

Tags: #clang

Differential Revision: https://reviews.llvm.org/D76296
2020-03-20 14:01:51 +00:00
Alexey Bataev fcba7c3534 [OPENMP50]Initial support for scan directive.
Addedi basic parsing/sema/serialization support for scan directive.
2020-03-20 07:58:15 -04:00
Adrian Kuegel baa6f6a782 Revert "[TableGen][GlobalISel] Account for HwMode in RegisterBank register sizes"
This reverts commit e9f22fd429.

When building with -DLLVM_USE_SANITIZER="Thread", check-llvm has 70
failing tests with this revision, and 29 without this revision.
2020-03-20 11:02:50 +01:00
David Blaikie 1c15377496 Recommit: CFGDiff: Simplify/common the begin/end implementations to use a common range helper""
(would be nice to revisit the CFG traits and change them to use ranges
rather than begin/end - if anyone wants to do that refactor)

Also use more auto because writing the names of range utilty iterators
isn't helping readability here - they're sort of implementation details
for the most part, especially once you nest a few different filtering
and adapting iterators.

The fix (shooting from the hip since I couldn't reproduce this locally)
was to capture by value in a lambda used in a filtering iterator -
because the iterator would persist beyond the lifetime of the function
(as the iterators are returned to callers).

Originally committed in 79a7ed92a9.
This was reverted in 4a7f2032a3.
2020-03-19 18:21:14 -07:00
Sterling Augustine 6343526d64 Revert "Cleanup the plumbing for DILineInfoSpecifier. [NFC]"
This broke lldb. Will fix and resubmit.

This reverts commit 98ff6eb679.
2020-03-19 17:25:05 -07:00
Thomas Lively a3f974f3c3 [WebAssembly] SIMD bitmask intrinsics and builtin functions
Summary:
These experimental new instructions are proposed in
https://github.com/WebAssembly/simd/pull/201.

Reviewers: aheejin

Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, sunfish, cfe-commits

Tags: #clang

Differential Revision: https://reviews.llvm.org/D76397
2020-03-19 17:15:37 -07:00