Commit Graph

734 Commits

Author SHA1 Message Date
Sander de Smalen 67d4c9924c Add support for (expressing) vscale.
In LLVM IR, vscale can be represented with an intrinsic. For some targets,
this is equivalent to the constexpr:

  getelementptr <vscale x 1 x i8>, <vscale x 1 x i8>* null, i32 1

This can be used to propagate the value in CodeGenPrepare.

In ISel we add a node that can be legalized to one or more
instructions to materialize the runtime vector length.

This patch also adds SVE CodeGen support for VSCALE, which maps this
node to RDVL instructions (for scaled multiples of 16bytes) or CNT[HSD]
instructions (scaled multiples of 2, 4, or 8 bytes, respectively).

Reviewers: rengolin, cameron.mcinally, hfinkel, sebpop, SjoerdMeijer, efriedma, lattner

Reviewed by: efriedma

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D68203
2020-01-22 10:09:27 +00:00
Guillaume Chatelet 1d549e68d4 [Doc] Update requirements for masked load/store 2020-01-22 10:42:37 +01:00
Kazuaki Ishizaki f65d4aa960 [llvm] NFC: fix trivial typos in documents
Reviewers: hans, Jim

Reviewed By: Jim

Subscribers: jvesely, nhaehnle, mgorny, arphaman, bmahjour, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D73017
2020-01-22 11:32:51 +08:00
Matt Arsenault a4451d88ee Consolidate internal denormal flushing controls
Currently there are 4 different mechanisms for controlling denormal
flushing behavior, and about as many equivalent frontend controls.

- AMDGPU uses the fp32-denormals and fp64-f16-denormals subtarget features
- NVPTX uses the nvptx-f32ftz attribute
- ARM directly uses the denormal-fp-math attribute
- Other targets indirectly use denormal-fp-math in one DAGCombine
- cl-denorms-are-zero has a corresponding denorms-are-zero attribute

AMDGPU wants a distinct control for f32 flushing from f16/f64, and as
far as I can tell the same is true for NVPTX (based on the attribute
name).

Work on consolidating these into the denormal-fp-math attribute, and a
new type specific denormal-fp-math-f32 variant. Only ARM seems to
support the two different flush modes, so this is overkill for the
other use cases. Ideally we would error on the unsupported
positive-zero mode on other targets from somewhere.

Move the logic for selecting the flush mode into the compiler driver,
instead of handling it in cc1. denormal-fp-math/denormal-fp-math-f32
are now both cc1 flags, but denormal-fp-math-f32 is not yet exposed as
a user flag.

-cl-denorms-are-zero, -fcuda-flush-denormals-to-zero and
-fno-cuda-flush-denormals-to-zero will be mapped to
-fp-denormal-math-f32=ieee or preserve-sign rather than the old
attributes.

Stop emitting the denorms-are-zero attribute for the OpenCL flag. It
has no in-tree users. The meaning would also be target dependent, such
as the AMDGPU choice to treat this as only meaning allow flushing of
f32 and not f16 or f64. The naming is also potentially confusing,
since DAZ in other contexts refers to instructions implicitly treating
input denormals as zero, not necessarily flushing output denormals to
zero.

This also does not attempt to change the behavior for the current
attribute. The LangRef now states that the default is ieee behavior,
but this is inaccurate for the current implementation. The clang
handling is slightly hacky to avoid touching the existing
denormal-fp-math uses. Fixing this will be left for a future patch.

AMDGPU is still using the subtarget feature to control the denormal
mode, but the new attribute are now emitted. A future change will
switch this and remove the subtarget features.
2020-01-17 20:09:53 -05:00
Bevin Hansson 8e2b44f7e0 [Intrinsic] Add fixed point division intrinsics.
Summary:
This patch adds intrinsics and ISelDAG nodes for
signed and unsigned fixed-point division:

  llvm.sdiv.fix.*
  llvm.udiv.fix.*

These intrinsics perform scaled division on two
integers or vectors of integers. They are required
for the implementation of the Embedded-C fixed-point
arithmetic in Clang.

Patch by: ebevhan

Reviewers: bjope, leonardchan, efriedma, craig.topper

Reviewed By: craig.topper

Subscribers: Ka-Ka, ilya, hiraditya, jdoerfert, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70007
2020-01-08 15:17:46 +01:00
Bill Wendling e886e762dd Revert "Allow output constraints on "asm goto""
This reverts commit 52366088a8.

I accidentally pushed this before supporting changes.
2020-01-07 13:44:08 -08:00
Bill Wendling 52366088a8 Allow output constraints on "asm goto"
Summary:
Remove the restrictions that preventing "asm goto" from returning non-void
values. The values returned by "asm goto" are only valid on the "fallthrough"
path.

Reviewers: jyknight, nickdesaulniers, hfinkel

Reviewed By: jyknight, nickdesaulniers

Subscribers: rsmith, hiraditya, llvm-commits, cfe-commits, craig.topper, rnk

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D69876
2020-01-07 13:40:26 -08:00
Hans Wennborg e334a3a60f [docs] NFC: Fix typos in documents
"the the" -> "the"
"an" -> "a"

Patch by Kazuaki Ishizaki <ishizaki@jp.ibm.com>!

Differential revision: https://reviews.llvm.org/D72091
2020-01-07 16:06:14 +01:00
Luís Marques 27e6b171e0 [RISCV][Docs] Add RISC-V asm template argument modifiers
Adds the RISC-V asm template argument modifiers currently supported by LLVM.
Additional ones supported by GCC will be added to the documentation when we
start supporting them.
2020-01-07 11:06:46 +00:00
Florian Hahn 5762648c46 [Docs] Fix sphinx build errors. 2019-12-23 21:53:30 +01:00
Ulrich Weigand 1946461344 [FPEnv] Strict versions of llvm.minimum/llvm.maximum
Add new intrinsics
   llvm.experimental.constrained.minimum
   llvm.experimental.constrained.maximum
as strict versions of llvm.minimum and llvm.maximum.

Includes SystemZ back-end support.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D71624
2019-12-18 21:35:28 +01:00
Ulrich Weigand 1e89188d35 [FPEnv] Remove unnecessary rounding mode argument for constrained intrinsics
The following intrinsics currently carry a rounding mode metadata argument:

    llvm.experimental.constrained.minnum
    llvm.experimental.constrained.maxnum
    llvm.experimental.constrained.ceil
    llvm.experimental.constrained.floor
    llvm.experimental.constrained.round
    llvm.experimental.constrained.trunc

This is not useful since the semantics of those intrinsics do not in any way
depend on the rounding mode. In similar cases, other constrained intrinsics
do not have the rounding mode argument. Remove it here as well.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D71218
2019-12-17 21:10:36 +01:00
Kevin P. Neal b1d8576b0a This adds constrained intrinsics for the signed and unsigned conversions
of integers to floating point.

This includes some of Craig Topper's changes for promotion support from
D71130.

Differential Revision: https://reviews.llvm.org/D69275
2019-12-17 10:06:51 -05:00
Dmitri Gribenko 51707196a0 Fix title underline in LangRef
The docs didn't compile:
http://lab.llvm.org:8011/builders/llvm-sphinx-docs/builds/38906
2019-12-16 09:05:13 +01:00
Florian Hahn 526244b187 [Matrix] Add first set of matrix intrinsics and initial lowering pass.
This is the first patch adding an initial set of matrix intrinsics and a
corresponding lowering pass. This has been discussed on llvm-dev:
http://lists.llvm.org/pipermail/llvm-dev/2019-October/136240.html

The first patch introduces four new intrinsics (transpose, multiply,
columnwise load and store) and a LowerMatrixIntrinsics pass, that
lowers those intrinsics to vector operations.

Matrixes are embedded in a 'flat' vector (e.g. a 4 x 4 float matrix
embedded in a <16 x float> vector) and the intrinsics take the dimension
information as parameters. Those parameters need to be ConstantInt.
For the memory layout, we initially assume column-major, but in the RFC
we also described how to extend the intrinsics to support row-major as
well.

For the initial lowering, we split the input of the intrinsics into a
set of column vectors, transform those column vectors and concatenate
the result columns to a flat result vector.

This allows us to lower the intrinsics without any shape propagation, as
mentioned in the RFC. In follow-up patches, we plan to submit the
following improvements:
 * Shape propagation to eliminate the embedding/splitting for each
   intrinsic.
 * Fused & tiled lowering of multiply and other operations.
 * Optimization remarks highlighting matrix expressions and costs.
 * Generate loops for operations on large matrixes.
 * More general block processing for operation on large vectors,
   exploiting shape information.

We would like to add dedicated transpose, columnwise load and store
intrinsics, even though they are not strictly necessary. For example, we
could instead emit a large shufflevector instruction instead of the
transpose. But we expect that to
  (1) become unwieldy for larger matrixes (even for 16x16 matrixes,
      the resulting shufflevector masks would be huge),
  (2) risk instcombine making small changes, causing us to fail to
      detect the transpose, preventing better lowerings

For the load/store, we are additionally planning on exploiting the
intrinsics for better alias analysis.

Reviewers: anemet, Gerolf, reames, hfinkel, andrew.w.kaylor, efriedma, rengolin

Reviewed By: anemet

Differential Revision: https://reviews.llvm.org/D70456
2019-12-12 15:42:18 +00:00
Ulrich Weigand 9db13b5a7d [FPEnv] Constrained FCmp intrinsics
This adds support for constrained floating-point comparison intrinsics.

Specifically, we add:

      declare <ty2>
      @llvm.experimental.constrained.fcmp(<type> <op1>, <type> <op2>,
                                          metadata <condition code>,
                                          metadata <exception behavior>)
      declare <ty2>
      @llvm.experimental.constrained.fcmps(<type> <op1>, <type> <op2>,
                                           metadata <condition code>,
                                           metadata <exception behavior>)

The first variant implements an IEEE "quiet" comparison (i.e. we only
get an invalid FP exception if either argument is a SNaN), while the
second variant implements an IEEE "signaling" comparison (i.e. we get
an invalid FP exception if either argument is any NaN).

The condition code is implemented as a metadata string.  The same set
of predicates as for the fcmp instruction is supported (except for the
"true" and "false" predicates).

These new intrinsics are mapped by SelectionDAG codegen onto two new
ISD opcodes, ISD::STRICT_FSETCC and ISD::STRICT_FSETCCS, again
representing quiet vs. signaling comparison operations.  Otherwise
those nodes look like SETCC nodes, with an additional chain argument
and result as usual for strict FP nodes.  The patch includes support
for the common legalization operations for those nodes.

The patch also includes full SystemZ back-end support for the new
ISD nodes, mapping them to all available SystemZ instruction to
fully implement strict semantics (scalar and vector).

Differential Revision: https://reviews.llvm.org/D69281
2019-12-07 11:28:39 +01:00
Sanjay Patel ead0d77409 [LangRef] make per-element poison behavior explicit
As discussed in D70246 and PR43958:
https://bugs.llvm.org/show_bug.cgi?id=43958

The LangRef seems ambiguous about the behavior of poison with respect
to vectors.

We could go further with text and/or examples - suggestions welcome.

Also, see discussion on llvm-dev;
http://lists.llvm.org/pipermail/llvm-dev/2019-November/137243.html

Differential Revision: https://reviews.llvm.org/D70641
2019-12-04 15:32:19 -05:00
Djordje Todorovic 979592a6f7 [DebugInfo] Remove the DIFlagArgumentNotModified debug info flag
Due to changes in D68206, we remove the DIFlagArgumentNotModified
and its usage.

Differential Revision: https://reviews.llvm.org/D68207
2019-11-20 13:18:40 +01:00
Kevin P. Neal d2b6cc7ff6 Document more specifically the rounding for "llvm.round".
Differential Revision: https://reviews.llvm.org/D68810
2019-11-14 13:15:15 -05:00
Kevin P. Neal 56ae3e2692 Make the language more consistent since I'm about to commit a content
change next.
2019-11-14 13:10:59 -05:00
Nuno Lopes a7244c56bd docs: fix warning in LangRef parsing 2019-11-11 10:45:42 +00:00
Daniel Sanders ad0dfb0a25 [globalisel][docs] Rework GMIR documentation and add an early GenericOpcode reference
Summary:
Rework the GMIR documentation to focus more on the end user than the
implementation and tie it in to the MIR document. There was also some
out-of-date information which has been removed.

The quality of the GenericOpcode reference is highly variable and drops
sharply as I worked through them all but we've got to start somewhere :-).
It would be great if others could expand on this too as there is an awful
lot to get through.

Also fix a typo in the definition of G_FLOG. Previously, the comments said
we had two base-2's (G_FLOG and G_FLOG2).

Reviewers: aemerson, volkan, rovka, arsenm

Reviewed By: rovka

Subscribers: wdng, arphaman, jfb, Petar.Avramovic, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D69545
2019-11-05 15:16:43 -08:00
Nuno Lopes 2d21068d9f [Docs] Add LangRef documentation for freeze instruction
Summary:
 - Describe the new freeze instruction
 - Make it explicit that branch on undef/poison is UB

Reviewers: chandlerc, majnemer, efriedma, nikic, reames, jdoerfert, lebedev.ri, regehr

Subscribers: fhahn, bollu, lebedev.ri, delcypher, spatel, filcab, llvm-commits, aqjune

Differential Revision: https://reviews.llvm.org/D29121
2019-11-05 11:35:55 +00:00
Amy Huang ab76cfdd20 Recommit "[CodeView] Add option to disable inline line tables."
This reverts commit 004ed2b0d1.
Original commit hash 6d03890384

Summary:
This adds a clang option to disable inline line tables. When it is used,
the inliner uses the call site as the location of the inlined function instead of
marking it as an inline location with the function location.

https://reviews.llvm.org/D67723
2019-11-04 09:15:26 -08:00
Stefan Stipanovic f35740d6e9 NoFree argument attribute.
Summary: Deducing nofree atrribute for function arguments.

Reviewers: jdoerfert

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D67886
2019-11-02 19:40:48 +01:00
Stefan Stipanovic 5fb1782918 Revert "NoFree argument attribute."
This reverts commit c12efa2ed0.
2019-11-02 17:31:02 +01:00
Stefan Stipanovic c12efa2ed0 NoFree argument attribute.
Summary: Deducing nofree atrribute for function arguments.

Reviewers: jdoerfert

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D67886
2019-11-02 16:35:38 +01:00
Amy Huang 004ed2b0d1 Revert "[CodeView] Add option to disable inline line tables."
because it breaks compiler-rt tests.

This reverts commit 6d03890384.
2019-10-30 17:31:12 -07:00
Amy Huang 6d03890384 [CodeView] Add option to disable inline line tables.
Summary:
This adds a clang option to disable inline line tables. When it is used,
the inliner uses the call site as the location of the inlined function instead of
marking it as an inline location with the function location.

See https://bugs.llvm.org/show_bug.cgi?id=42344

Reviewers: rnk

Subscribers: hiraditya, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D67723
2019-10-30 16:52:39 -07:00
Jay Foad 2da4b6e514 [IR] Allow fast math flags on calls with floating point array type.
Summary:
This extends the rules for when a call instruction is deemed to be an
FPMathOperator, which is based on the type of the call (i.e. the return
type of the function being called). Previously we only allowed
floating-point and vector-of-floating-point types. Now we also allow
arrays (nested to any depth) of floating-point and
vector-of-floating-point types.

This was motivated by llpc, the pipeline compiler for AMD GPUs
(https://github.com/GPUOpen-Drivers/llpc). llpc has many math library
functions that operate on vectors, typically represented as <4 x float>,
and some that operate on matrices, typically represented as
[4 x <4 x float>], and it's useful to be able to decorate calls to all
of them with fast math flags.

Reviewers: spatel, wristow, arsenm, hfinkel, aemerson, efriedma, cameron.mcinally, mcberg2017, jmolloy

Subscribers: wdng, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D69161
2019-10-30 14:00:33 +00:00
Philip Reames e14f935ce2 [Docs] Reflect the slow migration from guard to widenable condition which is currently in progress. 2019-10-29 12:46:24 -07:00
Andrew Paverd d157a9bc8b Add Windows Control Flow Guard checks (/guard:cf).
Summary:
A new function pass (Transforms/CFGuard/CFGuard.cpp) inserts CFGuard checks on
indirect function calls, using either the check mechanism (X86, ARM, AArch64) or
or the dispatch mechanism (X86-64). The check mechanism requires a new calling
convention for the supported targets. The dispatch mechanism adds the target as
an operand bundle, which is processed by SelectionDAG. Another pass
(CodeGen/CFGuardLongjmp.cpp) identifies and emits valid longjmp targets, as
required by /guard:cf. This feature is enabled using the `cfguard` CC1 option.

Reviewers: thakis, rnk, theraven, pcc

Subscribers: ychen, hans, metalcanine, dmajor, tomrittervg, alex, mehdi_amini, mgorny, javed.absar, kristof.beyls, hiraditya, steven_wu, dexonsmith, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D65761
2019-10-28 15:19:39 +00:00
Jay Foad aa3806b47c Update docs for fast-math flags.
This adds fneg, phi and select to the list of operations that may use
fast-math flags.

llvm-svn: 375250
2019-10-18 16:07:09 +00:00
Oliver Stannard 3b598b9c86 Reland: Dead Virtual Function Elimination
Remove dead virtual functions from vtables with
replaceNonMetadataUsesWith, so that CGProfile metadata gets cleaned up
correctly.

Original commit message:

Currently, it is hard for the compiler to remove unused C++ virtual
functions, because they are all referenced from vtables, which are referenced
by constructors. This means that if the constructor is called from any live
code, then we keep every virtual function in the final link, even if there
are no call sites which can use it.

This patch allows unused virtual functions to be removed during LTO (and
regular compilation in limited circumstances) by using type metadata to match
virtual function call sites to the vtable slots they might load from. This
information can then be used in the global dead code elimination pass instead
of the references from vtables to virtual functions, to more accurately
determine which functions are reachable.

To make this transformation safe, I have changed clang's code-generation to
always load virtual function pointers using the llvm.type.checked.load
intrinsic, instead of regular load instructions. I originally tried writing
this using clang's existing code-generation, which uses the llvm.type.test
and llvm.assume intrinsics after doing a normal load. However, it is possible
for optimisations to obscure the relationship between the GEP, load and
llvm.type.test, causing GlobalDCE to fail to find virtual function call
sites.

The existing linkage and visibility types don't accurately describe the scope
in which a virtual call could be made which uses a given vtable. This is
wider than the visibility of the type itself, because a virtual function call
could be made using a more-visible base class. I've added a new
!vcall_visibility metadata type to represent this, described in
TypeMetadata.rst. The internalization pass and libLTO have been updated to
change this metadata when linking is performed.

This doesn't currently work with ThinLTO, because it needs to see every call
to llvm.type.checked.load in the linkage unit. It might be possible to
extend this optimisation to be able to use the ThinLTO summary, as was done
for devirtualization, but until then that combination is rejected in the
clang driver.

To test this, I've written a fuzzer which generates random C++ programs with
complex class inheritance graphs, and virtual functions called through object
and function pointers of different types. The programs are spread across
multiple translation units and DSOs to test the different visibility
restrictions.

I've also tried doing bootstrap builds of LLVM to test this. This isn't
ideal, because only classes in anonymous namespaces can be optimised with
-fvisibility=default, and some parts of LLVM (plugins and bugpoint) do not
work correctly with -fvisibility=hidden. However, there are only 12 test
failures when building with -fvisibility=hidden (and an unmodified compiler),
and this change does not cause any new failures for either value of
-fvisibility.

On the 7 C++ sub-benchmarks of SPEC2006, this gives a geomean code-size
reduction of ~6%, over a baseline compiled with "-O2 -flto
-fvisibility=hidden -fwhole-program-vtables". The best cases are reductions
of ~14% in 450.soplex and 483.xalancbmk, and there are no code size
increases.

I've also run this on a set of 8 mbed-os examples compiled for Armv7M, which
show a geomean size reduction of ~3%, again with no size increases.

I had hoped that this would have no effect on performance, which would allow
it to awlays be enabled (when using -fwhole-program-vtables). However, the
changes in clang to use the llvm.type.checked.load intrinsic are causing ~1%
performance regression in the C++ parts of SPEC2006. It should be possible to
recover some of this perf loss by teaching optimisations about the
llvm.type.checked.load intrinsic, which would make it worth turning this on
by default (though it's still dependent on -fwhole-program-vtables).

Differential revision: https://reviews.llvm.org/D63932

llvm-svn: 375094
2019-10-17 09:58:57 +00:00
David Stenberg 1ae2d9a2bd [DebugInfo] Add a DW_OP_LLVM_entry_value operation
Summary:
Internally in LLVM's metadata we use DW_OP_entry_value operations with
the same semantics as DWARF; that is, its operand specifies the number
of bytes that the entry value covers.

At the time of emitting entry values we don't know the emitted size of
the DWARF expression that the entry value will cover. Currently the size
is hardcoded to 1 in DIExpression, and other values causes the verifier
to fail. As the size is 1, that effectively means that we can only have
valid entry values for registers that can be encoded in one byte, which
are the registers with DWARF numbers 0 to 31 (as they can be encoded as
single-byte DW_OP_reg0..DW_OP_reg31 rather than a multi-byte
DW_OP_regx). It is a bit confusing, but it seems like llvm-dwarfdump
will print an operation "correctly", even if the byte size is less than
that, which may make it seem that we emit correct DWARF for registers
with DWARF numbers > 31. If you instead use readelf for such cases, it
will interpret the number of specified bytes as a DWARF expression. This
seems like a limitation in llvm-dwarfdump.

As suggested in D66746, a way forward would be to add an internal
variant of DW_OP_entry_value, DW_OP_LLVM_entry_value, whose operand
instead specifies the number of operations that the entry value covers,
and we then translate that into the byte size at the time of emission.

In this patch that internal operation is added. This patch keeps the
limitation that a entry value can only be applied to simple register
locations, but it will fix the issue with the size operand being
incorrect for DWARF numbers > 31.

Reviewers: aprantl, vsk, djtodoro, NikolaPrica

Reviewed By: aprantl

Subscribers: jyknight, fedor.sergeev, hiraditya, llvm-commits

Tags: #debug-info, #llvm

Differential Revision: https://reviews.llvm.org/D67492

llvm-svn: 374881
2019-10-15 11:31:21 +00:00
Jorge Gorbe Moya b052331bd6 Revert "Dead Virtual Function Elimination"
This reverts commit 9f6a873268.

llvm-svn: 374844
2019-10-14 23:25:25 +00:00
Oliver Stannard 9f6a873268 Dead Virtual Function Elimination
Currently, it is hard for the compiler to remove unused C++ virtual
functions, because they are all referenced from vtables, which are referenced
by constructors. This means that if the constructor is called from any live
code, then we keep every virtual function in the final link, even if there
are no call sites which can use it.

This patch allows unused virtual functions to be removed during LTO (and
regular compilation in limited circumstances) by using type metadata to match
virtual function call sites to the vtable slots they might load from. This
information can then be used in the global dead code elimination pass instead
of the references from vtables to virtual functions, to more accurately
determine which functions are reachable.

To make this transformation safe, I have changed clang's code-generation to
always load virtual function pointers using the llvm.type.checked.load
intrinsic, instead of regular load instructions. I originally tried writing
this using clang's existing code-generation, which uses the llvm.type.test
and llvm.assume intrinsics after doing a normal load. However, it is possible
for optimisations to obscure the relationship between the GEP, load and
llvm.type.test, causing GlobalDCE to fail to find virtual function call
sites.

The existing linkage and visibility types don't accurately describe the scope
in which a virtual call could be made which uses a given vtable. This is
wider than the visibility of the type itself, because a virtual function call
could be made using a more-visible base class. I've added a new
!vcall_visibility metadata type to represent this, described in
TypeMetadata.rst. The internalization pass and libLTO have been updated to
change this metadata when linking is performed.

This doesn't currently work with ThinLTO, because it needs to see every call
to llvm.type.checked.load in the linkage unit. It might be possible to
extend this optimisation to be able to use the ThinLTO summary, as was done
for devirtualization, but until then that combination is rejected in the
clang driver.

To test this, I've written a fuzzer which generates random C++ programs with
complex class inheritance graphs, and virtual functions called through object
and function pointers of different types. The programs are spread across
multiple translation units and DSOs to test the different visibility
restrictions.

I've also tried doing bootstrap builds of LLVM to test this. This isn't
ideal, because only classes in anonymous namespaces can be optimised with
-fvisibility=default, and some parts of LLVM (plugins and bugpoint) do not
work correctly with -fvisibility=hidden. However, there are only 12 test
failures when building with -fvisibility=hidden (and an unmodified compiler),
and this change does not cause any new failures for either value of
-fvisibility.

On the 7 C++ sub-benchmarks of SPEC2006, this gives a geomean code-size
reduction of ~6%, over a baseline compiled with "-O2 -flto
-fvisibility=hidden -fwhole-program-vtables". The best cases are reductions
of ~14% in 450.soplex and 483.xalancbmk, and there are no code size
increases.

I've also run this on a set of 8 mbed-os examples compiled for Armv7M, which
show a geomean size reduction of ~3%, again with no size increases.

I had hoped that this would have no effect on performance, which would allow
it to awlays be enabled (when using -fwhole-program-vtables). However, the
changes in clang to use the llvm.type.checked.load intrinsic are causing ~1%
performance regression in the C++ parts of SPEC2006. It should be possible to
recover some of this perf loss by teaching optimisations about the
llvm.type.checked.load intrinsic, which would make it worth turning this on
by default (though it's still dependent on -fwhole-program-vtables).

Differential revision: https://reviews.llvm.org/D63932

llvm-svn: 374539
2019-10-11 11:59:55 +00:00
Reid Kleckner f9b67b810e [X86] Add new calling convention that guarantees tail call optimization
When the target option GuaranteedTailCallOpt is specified, calls with
the fastcc calling convention will be transformed into tail calls if
they are in tail position. This diff adds a new calling convention,
tailcc, currently supported only on X86, which behaves the same way as
fastcc, except that the GuaranteedTailCallOpt flag does not need to
enabled in order to enable tail call optimization.

Patch by Dwight Guth <dwight.guth@runtimeverification.com>!

Reviewed By: lebedev.ri, paquette, rnk

Differential Revision: https://reviews.llvm.org/D67855

llvm-svn: 373976
2019-10-07 22:28:58 +00:00
Kevin P. Neal 9f4de84eb0 Fix another sphinx warning.
Differential Revision:	https://reviews.llvm.org/D64746

llvm-svn: 373909
2019-10-07 14:14:46 +00:00
Kevin P. Neal a6fc72fba9 Fix sphinx warnings.
Differential Revision:	https://reviews.llvm.org/D64746

llvm-svn: 373902
2019-10-07 13:39:56 +00:00
Kevin P. Neal 1c3d19c82d [FPEnv] Add constrained intrinsics for lrint and lround
Earlier in the year intrinsics for lrint, llrint, lround and llround were
added to llvm. The constrained versions are now implemented here.

Reviewed by:	andrew.w.kaylor, craig.topper, cameron.mcinally
Approved by:	craig.topper
Differential Revision:	https://reviews.llvm.org/D64746

llvm-svn: 373900
2019-10-07 13:20:00 +00:00
Pablo Barrio ffac4e8603 Fix doc for t inline asm constraints for ARM/Thumb
Summary: The constraint goes up to regs d15 and q7, not d16 and q8.

Subscribers: kristof.beyls, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D68090

llvm-svn: 373228
2019-09-30 16:55:10 +00:00
Kevin P. Neal 71c5b38acd Fix breakage of sphinx builders. Sorry for leaving this broken over the
weekend!

llvm-svn: 373215
2019-09-30 14:51:59 +00:00
Kevin P. Neal 875d20bcde Document requirement of function attributes with constrained floating
point.

Reviewed by:    andrew.w.kaylor, uweigand, efriedma
Approved by:    andrew.w.kaylor
Differential Revision:  https://reviews.llvm.org/D67839

llvm-svn: 373002
2019-09-26 17:50:25 +00:00
Nick Desaulniers 93d87260f1 [Verifier] add invariant check for callbr
Summary:
The list of indirect labels should ALWAYS have their blockaddresses as
argument operands to the callbr (but not necessarily the other way
around).  Add an invariant that checks this.

The verifier catches a bad test case that was added recently in r368478.
I think that was a simple mistake, and the test was made less strict in
regards to the precise addresses (as those weren't specifically the
point of the test).

This invariant will be used to find a reported bug.

Link: https://www.spinics.net/lists/arm-kernel/msg753473.html
Link: https://github.com/ClangBuiltLinux/linux/issues/649

Reviewers: craig.topper, void, chandlerc

Reviewed By: void

Subscribers: ychen, lebedev.ri, javed.absar, kristof.beyls, hiraditya, llvm-commits, srhines

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D67196

llvm-svn: 372923
2019-09-25 22:28:27 +00:00
Florian Hahn 6b3749f696 [LangRef] Clarify absence of rounding guarantees for fmuladd.
During the review of D67434, it was recommended to make fmuladd's
behavior more explicit. D67434 depends on this interpretation.

Reviewers: efriedma, jfb, reames, scanon, lebedev.ri, spatel

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D67552

llvm-svn: 372892
2019-09-25 16:09:24 +00:00
Sanjay Patel 6d4ea22e70 [IR] allow fast-math-flags on phi of FP values (2nd try)
The changes here are based on the corresponding diffs for allowing FMF on 'select':
D61917 <https://reviews.llvm.org/D61917>

As discussed there, we want to have fast-math-flags be a property of an FP value
because the alternative (having them on things like fcmp) leads to logical
inconsistency such as:
https://bugs.llvm.org/show_bug.cgi?id=38086

The earlier patch for select made almost no practical difference because most
unoptimized conditional code begins life as a phi (based on what I see in clang).
Similarly, I don't expect this patch to do much on its own either because
SimplifyCFG promptly drops the flags when converting to select on a minimal
example like:
https://bugs.llvm.org/show_bug.cgi?id=39535

But once we have this plumbing in place, we should be able to wire up the FMF
propagation and start solving cases like that.

The change to RecurrenceDescriptor::AddReductionVar() is required to prevent a
regression in a LoopVectorize test. We are intersecting the FMF of any
FPMathOperator there, so if a phi is not properly annotated, new math
instructions may not be either. Once we fix the propagation in SimplifyCFG, it
may be safe to remove that hack.

Differential Revision: https://reviews.llvm.org/D67564

llvm-svn: 372878
2019-09-25 14:35:02 +00:00
Sanjay Patel 2cec4b58f5 Revert [IR] allow fast-math-flags on phi of FP values
This reverts r372866 (git commit dec03223a9)

llvm-svn: 372868
2019-09-25 13:29:09 +00:00
Sanjay Patel dec03223a9 [IR] allow fast-math-flags on phi of FP values
The changes here are based on the corresponding diffs for allowing FMF on 'select':
D61917

As discussed there, we want to have fast-math-flags be a property of an FP value
because the alternative (having them on things like fcmp) leads to logical
inconsistency such as:
https://bugs.llvm.org/show_bug.cgi?id=38086

The earlier patch for select made almost no practical difference because most
unoptimized conditional code begins life as a phi (based on what I see in clang).
Similarly, I don't expect this patch to do much on its own either because
SimplifyCFG promptly drops the flags when converting to select on a minimal
example like:
https://bugs.llvm.org/show_bug.cgi?id=39535

But once we have this plumbing in place, we should be able to wire up the FMF
propagation and start solving cases like that.

The change to RecurrenceDescriptor::AddReductionVar() is required to prevent a
regression in a LoopVectorize test. We are intersecting the FMF of any
FPMathOperator there, so if a phi is not properly annotated, new math
instructions may not be either. Once we fix the propagation in SimplifyCFG, it
may be safe to remove that hack.

Differential Revision: https://reviews.llvm.org/D67564

llvm-svn: 372866
2019-09-25 13:14:12 +00:00
Kerry McLaughlin e55b3bf40e [SVE][Inline-Asm] Add constraints for SVE predicate registers
Summary:
Adds the following inline asm constraints for SVE:
  - Upl: One of the low eight SVE predicate registers, P0 to P7 inclusive
  - Upa: SVE predicate register with full range, P0 to P15

Reviewers: t.p.northover, sdesmalen, rovka, momchil.velikov, cameron.mcinally, greened, rengolin

Reviewed By: rovka

Subscribers: javed.absar, tschuett, rkruppe, psnobl, cfe-commits, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D66524

llvm-svn: 371967
2019-09-16 09:45:27 +00:00