Commit Graph

1470 Commits

Author SHA1 Message Date
Sander de Smalen e51c1d06a9 [SveEmitter] Add builtins for svtbl2
Reviewers: david-arm, efriedma, c-rhodes

Reviewed By: c-rhodes

Tags: #clang

Differential Revision: https://reviews.llvm.org/D81462
2020-06-17 09:41:38 +01:00
Christopher Tetreault eb81c85afd [SVE] Deprecate default false variant of VectorType::get
Reviewers: efriedma, fpetrogalli, kmclaughlin, huntergr

Reviewed By: fpetrogalli

Subscribers: cfe-commits, tschuett, rkruppe, psnobl, llvm-commits

Tags: #llvm, #clang

Differential Revision: https://reviews.llvm.org/D80342
2020-06-16 15:16:11 -07:00
Luke Geeson 10b6567f49 [AArch64]: BFloat MatMul Intrinsics&CodeGen
This patch upstreams support for BFloat Matrix Multiplication Intrinsics
and Code Generation from __bf16 to AArch64. This includes IR intrinsics. Unittests are
provided as needed. AArch32 Intrinsics + CodeGen will come after this
patch.

This patch is part of a series implementing the Bfloat16 extension of
the
Armv8.6-a architecture, as detailed here:

https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/arm-architecture-developments-armv8-6-a

The bfloat type, and its properties are specified in the Arm
Architecture
Reference Manual:

https://developer.arm.com/docs/ddi0487/latest/arm-architecture-reference-manual-armv8-for-armv8-a-architecture-profile

The following people contributed to this patch:

Luke Geeson
 - Momchil Velikov
 - Mikhail Maltsev
 - Luke Cheeseman

Reviewers: SjoerdMeijer, t.p.northover, sdesmalen, labrinea, miyuki,
stuij

Reviewed By: miyuki, stuij

Subscribers: kristof.beyls, hiraditya, danielkiss, cfe-commits,
llvm-commits, miyuki, chill, pbarrio, stuij

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D80752

Change-Id: I174f0fd0f600d04e3799b06a7da88973c6c0703f
2020-06-16 15:23:30 +01:00
Jeff Mott 8799ebbc1f [clang] Fix or emit diagnostic for checked arithmetic builtins with
_ExtInt types

- Fix computed size for _ExtInt types passed to checked arithmetic
  builtins.
- Emit diagnostic when signed _ExtInt larger than 128-bits is passed
    to __builtin_mul_overflow.
- Change Sema checks for builtins to accept placeholder types.

Differential Revision: https://reviews.llvm.org/D81420
2020-06-15 06:51:54 -07:00
Sander de Smalen 91a4a592ed [SveEmitter] Add SVE tuple types and builtins for svundef.
This patch adds new SVE types to Clang that describe tuples of SVE
vectors. For example `svint32x2_t` which maps to the twice-as-wide
vector `<vscale x 8 x i32>`. Similarly, `svint32x3_t` will map to
`<vscale x 12 x i32>`.

It also adds builtins to return an `undef` vector for a given
SVE type.

Reviewers: c-rhodes, david-arm, ctetreau, efriedma, rengolin

Reviewed By: c-rhodes

Tags: #clang

Differential Revision: https://reviews.llvm.org/D81459
2020-06-15 07:36:01 +01:00
Craig Topper ed34140e11 [X86] Move X86 stuff out of TargetParser.h and into the recently created X86TargetParser.h. NFC 2020-06-10 22:06:34 -07:00
Thomas Lively b7d369280b [WebAssembly] Implement prototype SIMD rounding instructions
Summary:
As specified in https://github.com/WebAssembly/simd/pull/232. These
instructions are implemented as LLVM intrinsics for now rather than
normal ISel patterns to make these instructions opt-in. Once the
instructions are merged to the spec proposal, the intrinsics will be
replaced with proper ISel patterns.

Reviewers: aheejin

Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, sunfish, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D81222
2020-06-09 10:14:14 -07:00
Saiyedul Islam 675cefbf60 [AMDGPU] Introduce Clang builtins to be mapped to AMDGCN atomic inc/dec intrinsics
Summary:
__builtin_amdgcn_atomic_inc32(int *Ptr, int Val, unsigned MemoryOrdering, const char *SyncScope)
__builtin_amdgcn_atomic_inc64(int64_t *Ptr, int64_t Val, unsigned MemoryOrdering, const char *SyncScope)
__builtin_amdgcn_atomic_dec32(int *Ptr, int Val, unsigned MemoryOrdering, const char *SyncScope)
__builtin_amdgcn_atomic_dec64(int64_t *Ptr, int64_t Val, unsigned MemoryOrdering, const char *SyncScope)

First and second arguments gets transparently passed to the amdgcn atomic
inc/dec intrinsic. Fifth argument of the intrinsic is set as true if the
first argument of the builtin is a volatile pointer. The third argument of
this builtin is one of the memory-ordering specifiers ATOMIC_ACQUIRE,
ATOMIC_RELEASE, ATOMIC_ACQ_REL, or ATOMIC_SEQ_CST following C++11 memory
model semantics. This is mapped to corresponding LLVM atomic memory ordering
for the atomic inc/dec instruction using CLANG atomic C ABI. The fourth
argument is an AMDGPU-specific synchronization scope defined as string.

Reviewers: arsenm, sameerds, JonChesterfield, jdoerfert

Reviewed By: arsenm, sameerds

Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, jfb, kerbowa, cfe-commits

Tags: #clang

Differential Revision: https://reviews.llvm.org/D80804
2020-06-09 17:02:58 +00:00
Florian Hahn 3323a628ec [Matrix] Add __builtin_matrix_transpose to Clang.
This patch add __builtin_matrix_transpose to Clang, as described in
clang/docs/MatrixTypes.rst.

Reviewers: rjmccall, jfb, rsmith, Bigcheese

Reviewed By: rjmccall

Differential Revision: https://reviews.llvm.org/D72778
2020-06-09 10:14:37 +01:00
Jian Cai 4db2b70248 Add a flag to debug automatic variable initialization
Summary:
Add -ftrivial-auto-var-init-stop-after= to limit the number of times
stack variables are initialized when -ftrivial-auto-var-init= is used to
initialize stack variables to zero or a pattern. This flag can be used
to bisect uninitialized uses of a stack variable exposed by automatic
variable initialization, such as http://crrev.com/c/2020401.

Reviewers: jfb, vitalybuka, kcc, glider, rsmith, rjmccall, pcc, eugenis, vlad.tsyrklevich

Reviewed By: jfb

Subscribers: phosek, hubert.reinterpretcast, srhines, MaskRay, george.burgess.iv, dexonsmith, inglorion, gbiv, llozano, manojgupta, cfe-commits

Tags: #clang

Differential Revision: https://reviews.llvm.org/D77168
2020-06-08 12:30:56 -07:00
Ties Stuij 8b137a4306 [clang][BFloat] Add create/set/get/dup intrinsics
Summary:
This patch is part of a series that adds support for the Bfloat16 extension of
the Armv8.6-a architecture, as detailed here:

https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/arm-architecture-developments-armv8-6-a

The bfloat type and its properties are specified in the Arm Architecture
Reference Manual:

https://developer.arm.com/docs/ddi0487/latest/arm-architecture-reference-manual-armv8-for-armv8-a-architecture-profile

The following people contributed to this patch:
- Luke Cheeseman
- Momchil Velikov
- Luke Geeson
- Ties Stuij
- Mikhail Maltsev

Reviewers: t.p.northover, sdesmalen, fpetrogalli, LukeGeeson, stuij, labrinea

Reviewed By: labrinea

Subscribers: miyuki, dmgreen, labrinea, kristof.beyls, ilya-biryukov, MaskRay, jkorous, arphaman, usaxena95, cfe-commits

Tags: #clang

Differential Revision: https://reviews.llvm.org/D79710
2020-06-05 14:35:10 +01:00
Ties Stuij ecd682bbf5 [ARM] Add __bf16 as new Bfloat16 C Type
Summary:
This patch upstreams support for a new storage only bfloat16 C type.
This type is used to implement primitive support for bfloat16 data, in
line with the Bfloat16 extension of the Armv8.6-a architecture, as
detailed here:

https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/arm-architecture-developments-armv8-6-a

The bfloat type, and its properties are specified in the Arm Architecture
Reference Manual:

https://developer.arm.com/docs/ddi0487/latest/arm-architecture-reference-manual-armv8-for-armv8-a-architecture-profile

In detail this patch:
- introduces an opaque, storage-only C-type __bf16, which introduces a new bfloat IR type.

This is part of a patch series, starting with command-line and Bfloat16
assembly support. The subsequent patches will upstream intrinsics
support for BFloat16, followed by Matrix Multiplication and the
remaining Virtualization features of the armv8.6-a architecture.

The following people contributed to this patch:
- Luke Cheeseman
- Momchil Velikov
- Alexandros Lamprineas
- Luke Geeson
- Simon Tatham
- Ties Stuij

Reviewers: SjoerdMeijer, rjmccall, rsmith, liutianle, RKSimon, craig.topper, jfb, LukeGeeson, fpetrogalli

Reviewed By: SjoerdMeijer

Subscribers: labrinea, majnemer, asmith, dexonsmith, kristof.beyls, arphaman, danielkiss, cfe-commits

Tags: #clang

Differential Revision: https://reviews.llvm.org/D76077
2020-06-05 10:32:43 +01:00
Craig Topper dd863ccae1 [X86] Separate X86_CPU_TYPE_COMPAT_WITH_ALIAS from X86_CPU_TYPE_COMPAT. NFC
Add a separate X86_CPU_TYPE_COMPAT_ALIAS that carries alias string
and the enum from X86_CPU_TYPE_COMPAT.
2020-06-03 14:13:12 -07:00
Andrew Wock 15a1780a10 [PowerPC] Replace subtract-from-zero float in version with fneg in PowerPC special fma compiler builtins
This is a re-revert with a corrected test.

This patch adds a test for the PowerPC fma compiler builtins, some variations
of which negate inputs and outputs. The code to generate IR for these
builtins was untested before this patch.

Originally, the code used the outdated method of subtracting floating point
values from -0.0 as floating point negation. This patch remedies that.

Patch by: Drew Wock <drew.wock@sas.com>
Differential Revision: https://reviews.llvm.org/D76949
2020-06-03 09:45:27 -04:00
Lucas Prates 8beaba13b8 [Clang][AArch64] Capturing proper pointer alignment for Neon vld1 intrinsicts
Summary:
During CodeGen for AArch64 Neon intrinsics, Clang was incorrectly
assuming all the pointers from which loads were being generated for vld1
intrinsics were aligned according to the intrinsics result type, causing
alignment faults on the code generated by the backend.

This patch updates vld1 intrinsics' CodeGen to properly capture the
correct load alignment based on the type of the pointer provided as
input for the intrinsic.

Reviewers: t.p.northover, ostannard, pcc, efriedma

Reviewed By: ostannard, efriedma

Subscribers: echristo, plotfi, nickdesaulniers, efriedma, kristof.beyls, danielkiss, cfe-commits

Tags: #clang

Differential Revision: https://reviews.llvm.org/D79721
2020-06-03 11:39:27 +01:00
Christopher Tetreault 796898172c [SVE] Eliminate calls to default-false VectorType::get() from Clang
Reviewers: efriedma, david-arm, fpetrogalli, ddunbar, rjmccall

Reviewed By: fpetrogalli, rjmccall

Subscribers: tschuett, rkruppe, psnobl, dmgreen, cfe-commits

Tags: #clang

Differential Revision: https://reviews.llvm.org/D80323
2020-06-01 10:02:14 -07:00
Eric Christopher 97a133f157 Temporarily Revert "[Clang][AArch64] Capturing proper pointer alignment for Neon vld1 intrinsicts"
as it's causing crashes on code generation and https://bugs.llvm.org/show_bug.cgi?id=46084

This reverts commit 98cad555e2.
2020-05-26 18:51:00 -07:00
Adrian Prantl b59b3640bc Debug Info: Mark os_log helper functions as artificial
The os_log helper functions are linkonce_odr and supposed to be
uniqued across TUs, so attachine a DW_AT_decl_line on it is highly
misleading. By setting the function decl to implicit, CGDebugInfo
properly marks the functions as artificial and uses a default file /
line 0 location for the function.

rdar://problem/63450824

Differential Revision: https://reviews.llvm.org/D80463
2020-05-26 09:08:27 -07:00
Lucas Prates 98cad555e2 [Clang][AArch64] Capturing proper pointer alignment for Neon vld1 intrinsicts
Summary:
During CodeGen for AArch64 Neon intrinsics, Clang was incorrectly
assuming all the pointers from which loads were being generated for vld1
intrinsics were aligned according to the intrinsics result type, causing
alignment faults on the code generated by the backend.

This patch updates vld1 intrinsics' CodeGen to properly capture the
correct load alignment based on the type of the pointer provided as
input for the intrinsic.

Reviewers: t.p.northover, ostannard, pcc

Reviewed By: ostannard

Subscribers: kristof.beyls, danielkiss, cfe-commits

Tags: #clang

Differential Revision: https://reviews.llvm.org/D79721
2020-05-26 10:09:35 +01:00
Eli Friedman 62f3ef2b53 [CGCall] Annotate references with "align" attribute.
If we're going to assume references are dereferenceable, we should also
assume they're aligned: otherwise, we can't actually dereference them.

See also D80072.

Differential Revision: https://reviews.llvm.org/D80166
2020-05-19 20:21:30 -07:00
Francesco Petrogalli b593bfd4d8 [clang][SveEmitter] SVE builtins for `svusdot` and `svsudot` ACLE.
Summary:
Intrinsics, guarded by `__ARM_FEATURE_SVE_MATMUL_INT8`:

* svusdot[_s32]
* svusdot[_n_s32]
* svusdot_lane[_s32]
* svsudot[_s32]
* svsudot[_n_s32]
* svsudot_lane[_s32]

Reviewers: sdesmalen, efriedma, david-arm, rengolin

Subscribers: tschuett, kristof.beyls, cfe-commits

Tags: #clang

Differential Revision: https://reviews.llvm.org/D79877
2020-05-18 23:07:23 +00:00
Yonghong Song 072cde03aa [Clang][BPF] implement __builtin_btf_type_id() builtin function
Such a builtin function is mostly useful to preserve btf type id
for non-global data. For example,
   extern void foo(..., void *data, int size);
   int test(...) {
     struct t { int a; int b; int c; } d;
     d.a = ...; d.b = ...; d.c = ...;
     foo(..., &d, sizeof(d));
   }

The function "foo" in the above only see raw data and does not
know what type of the data is. In certain cases, e.g., logging,
the additional type information will help pretty print.

This patch implemented a BPF specific builtin
  u32 btf_type_id = __builtin_btf_type_id(param, flag)
which will return a btf type id for the "param".
flag == 0 will indicate a BTF local relocation,
which means btf type_id only adjusted when bpf program BTF changes.
flag == 1 will indicate a BTF remote relocation,
which means btf type_id is adjusted against linux kernel or
future other entities.

Differential Revision: https://reviews.llvm.org/D74668
2020-05-15 09:44:54 -07:00
Thomas Lively 3d49d1cfa7 [WebAssembly] Implement pseudo-min/max SIMD instructions
Summary:
As proposed in https://github.com/WebAssembly/simd/pull/122. Since
these instructions are not yet merged to the SIMD spec proposal, this
patch makes them entirely opt-in by surfacing them only through LLVM
intrinsics and clang builtins. If these instructions are made
official, these intrinsics and builtins should be replaced with simple
instruction patterns.

Reviewers: aheejin

Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, sunfish, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D79742
2020-05-12 09:39:01 -07:00
Sander de Smalen d6936be2ef [SveEmitter] Add builtins for svdup and svindex
Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D79357
2020-05-12 11:02:32 +01:00
Thomas Lively 8e3e56f2a3 [WebAssembly] Add wasm-specific vector shuffle builtin and intrinsic
Summary:

Although using `__builtin_shufflevector` and the `shufflevector`
instruction works fine, they are not opaque to the optimizer. As a
result, DAGCombine can potentially reduce the number of shuffles and
change the shuffle masks. This is unexpected behavior for users of the
WebAssembly SIMD intrinsics who have crafted their shuffles to
optimize the code generated by engines. This patch solves the problem
by adding a new shuffle intrinsic that is opaque to the optimizers in
line with the decision of the WebAssembly SIMD contributors at
https://github.com/WebAssembly/simd/issues/196#issuecomment-622494748. In
the future we may implement custom DAG combines to properly optimize
shuffles and replace this solution.

Reviewers: aheejin, dschuff

Subscribers: sbc100, jgravelle-google, hiraditya, sunfish, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D66983
2020-05-11 10:01:55 -07:00
Sander de Smalen 4cad97595f [SveEmitter] Add builtins for svmovlb and svmovlt
These builtins are expanded in CGBuiltin to use intrinsics
for (signed/unsigned) shift left long top/bottom.

Reviewers: efriedma, SjoerdMeijer

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D79579
2020-05-11 09:41:58 +01:00
Sander de Smalen 3cb8b4c193 [SveEmitter] Add builtins for SVE2 Polynomial arithmetic
This patch adds builtins for:
- sveorbt
- sveortb
- svpmul
- svpmullb, svpmullb_pair
- svpmullt, svpmullt_pair

The svpmullb and svpmullt builtins are expressed using the svpmullb_pair
and svpmullt_pair LLVM IR intrinsics, respectively.

Reviewers: SjoerdMeijer, efriedma, rengolin

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D79480
2020-05-07 11:53:04 +01:00
Akira Hatanaka dc4e25d4f2 [CodeGen][ObjC] Don't try to retain a __unsafe_unretained ARC pointer
passed to __builtin_os_log_format to extend its lifetime to the end of
its enclosing block

Extend only lifetimes of pointers returned by function calls or message
sends instead. In the long term, we should lifetime-extend pointers in
more complex expressions and non-ARC objects (e.g., C++ temporaries)
too.

rdar://problem/61846261
2020-05-06 12:47:17 -07:00
Sander de Smalen 5ba329059f [SveEmitter] Add builtins for svreinterpret
The reinterpret builtins are generated separately because they
need the cross product of all types, 121 functions in total,
which is inconvenient to specify in the arm_sve.td file.

Reviewers: SjoerdMeijer, efriedma, ctetreau, rengolin

Reviewed By: efriedma

Tags: #clang

Differential Revision: https://reviews.llvm.org/D78756
2020-05-05 13:04:44 +01:00
Sander de Smalen aed6bd6f42 Reland D78750: [SveEmitter] Add builtins for svdupq and svdupq_lane
Edit: Changed a few CHECK lines into CHECK-DAG lines.

This reverts commit 90f3f62cb0.
2020-05-05 10:42:11 +01:00
Sander de Smalen 90f3f62cb0 Revert "[SveEmitter] Add builtins for svdupq and svdupq_lane"
It seems this patch broke some buildbots, so reverting until I
have had a chance to investigate.

This reverts commit 6b90a6887d.
2020-05-04 21:31:55 +01:00
Sander de Smalen 6b90a6887d [SveEmitter] Add builtins for svdupq and svdupq_lane
* svdupq builtins that duplicate scalars to every quadword of a vector
  are defined using builtins for svld1rq (load and replicate quadword).
* svdupq builtins that duplicate boolean values to fill a predicate vector
  are defined using `svcmpne`.

Reviewers: SjoerdMeijer, efriedma, ctetreau

Reviewed By: efriedma

Tags: #clang

Differential Revision: https://reviews.llvm.org/D78750
2020-05-04 20:38:47 +01:00
Thomas Lively e0f52842c8 [WebAssembly] Renumber SIMD opcodes
Summary:
As described in https://github.com/WebAssembly/simd/pull/209. This is
the final reorganization of the SIMD opcode space before
standardization. It has been landed in concert with corresponding
changes in other projects in the WebAssembly SIMD ecosystem.

Reviewers: aheejin

Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D79224
2020-05-01 17:20:49 -07:00
Sander de Smalen a4dac6d4e0 [SveEmitter] Add builtins for svmov_b and svnot_b.
These are custom expanded in CGBuiltin:

  svmov_b_z(pg, op) <=> svand_b_z(pg, op, op)
  svnot_b_z(pg, op) <=> sveor_b_z(pg, op, pg)

Reviewers: SjoerdMeijer, efriedma, ctetreau, rengolin

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D79039
2020-04-29 13:33:18 +01:00
Sander de Smalen 42a56bf63f [SveEmitter] Add builtins for gather prefetches
Patch by Andrzej Warzynski

Reviewed By: efriedma

Tags: #clang

Differential Revision: https://reviews.llvm.org/D78677
2020-04-29 11:52:49 +01:00
Christopher Tetreault ef3678cfee [SVE] Update EmitSVEPredicateCast to take a ScalableVectorType
Summary:
Removes usage of VectorType::getNumElements identified by test located
at CodeGen/aarch64-sve-intrinsics/acle_sve_abs.c. Since the type is an
SVE predicate vector, it makes sense to specialize the code for scalable
vectors only.

Reviewers: rengolin, efriedma

Reviewed By: efriedma

Subscribers: tschuett, kristof.beyls, rkruppe, psnobl, cfe-commits

Tags: #clang

Differential Revision: https://reviews.llvm.org/D78958
2020-04-28 11:22:20 -07:00
Christopher Tetreault da8918f27e [SVE][NFC] Use ScalableVectorType in CGBuiltin
Summary: * Upgrade some usages of VectorType to use ScalableVectorType

Reviewers: efriedma, david-arm, fpetrogalli, kmclaughlin

Reviewed By: efriedma

Subscribers: tschuett, rkruppe, psnobl, cfe-commits

Tags: #clang

Differential Revision: https://reviews.llvm.org/D78842
2020-04-27 16:29:45 -07:00
Sander de Smalen e4872d7f08 [SveEmitter] Add builtins for svlen
The svlen builtins return the number of elements in a vector
and are implemented using `llvm.vscale`.

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D78755
2020-04-27 21:27:32 +01:00
Sander de Smalen 03f419f3eb [SveEmitter] IsInsertOp1SVALL and builtins for svqdec[bhwd] and svqinc[bhwd]
Some ACLE builtins leave out the argument to specify the predicate
pattern, which is expected to be expanded to an SV_ALL pattern.

This patch adds the flag IsInsertOp1SVALL to insert SV_ALL as the
second operand.

Reviewers: efriedma, SjoerdMeijer

Reviewed By: SjoerdMeijer

Tags: #clang

Differential Revision: https://reviews.llvm.org/D78401
2020-04-27 11:45:10 +01:00
Saiyedul Islam 06bdffb2bb [AMDGPU] Expose llvm fence instruction as clang intrinsic
Expose llvm fence instruction as clang builtin for AMDGPU target

__builtin_amdgcn_fence(unsigned int memoryOrdering, const char *syncScope)

The first argument of this builtin is one of the memory-ordering specifiers
__ATOMIC_ACQUIRE, __ATOMIC_RELEASE, __ATOMIC_ACQ_REL, or __ATOMIC_SEQ_CST
following C++11 memory model semantics. This is mapped to corresponding
LLVM atomic memory ordering for the fence instruction using LLVM atomic C
ABI. The second argument is an AMDGPU-specific synchronization scope
defined as string.

Reviewed By: sameerds

Differential Revision: https://reviews.llvm.org/D75917
2020-04-27 09:39:03 +05:30
Sander de Smalen 3817ca7dbf [SveEmitter] Add IsAppendSVALL and builtins for svptrue and svcnt[bhwd]
Some ACLE builtins leave out the argument to specify the predicate
pattern, which is expected to be expanded to an SV_ALL pattern.

This patch adds the flag IsAppendSVALL to append SV_ALL as the final
operand.

Reviewers: SjoerdMeijer, efriedma, rovka, rengolin

Reviewed By: efriedma

Tags: #clang

Differential Revision: https://reviews.llvm.org/D77597
2020-04-26 12:44:26 +01:00
Craig Topper 0ed5b0d517 [X86] Don't use types when getting the intrinsic declaration for x86_avx512_mask_vcvtph2ps_512.
This intrinsic isn't overloaded so we should query with types.
Doing so causes the backend to miss the intrinsic and not codegen it.
This eventually leads to a linker error.
2020-04-24 11:01:22 -07:00
Luke Geeson 7da1905125 [AArch32] Armv8.6-a Matrix Mult Assembly + Intrinsics
This patch upstreams support for the Armv8.6-a Matrix Multiplication
Extension. A summary of the features can be found here:

https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/arm-architecture-developments-armv8-6-a

This patch includes:

- Assembly support for AArch32
- Intrinsics Support for AArch32 Neon Intrinsics for Matrix
  Multiplication

Note: these extensions are optional in the 8.6a architecture and so have
to be enabled by default

No additional IR types or C Types are needed for this extension.

This is part of a patch series, starting with BFloat16 support and
the other components in the armv8.6a extension (in previous patches
linked in phabricator)

Based on work by:
- Luke Geeson
- Oliver Stannard
- Luke Cheeseman

Reviewers: t.p.northover, miyuki

Reviewed By: miyuki

Subscribers: miyuki, ostannard, kristof.beyls, hiraditya, danielkiss,
cfe-commits

Tags: #clang

Differential Revision: https://reviews.llvm.org/D77872
2020-04-24 15:54:06 +01:00
Luke Geeson 832cd74913 [AArch64] Armv8.6-a Matrix Mult Assembly + Intrinsics
This patch upstreams support for the Armv8.6-a Matrix Multiplication
Extension. A summary of the features can be found here:

https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/arm-architecture-developments-armv8-6-a

This patch includes:

- Assembly support for AArch64 only (no SVE or Neon)
- Intrinsics Support for AArch64 Armv8.6a Matrix Multiplication Instructions (No bfloat16 matrix multiplication)

No IR types or C Types are needed for this extension.

This is part of a patch series, starting with BFloat16 support and
the other components in the armv8.6a extension (in previous patches
linked in phabricator)

Based on work by:
- Luke Geeson
- Oliver Stannard
- Luke Cheeseman

Reviewers: ostannard, t.p.northover, rengolin, kmclaughlin

Reviewed By: kmclaughlin

Subscribers: kmclaughlin, kristof.beyls, hiraditya, danielkiss,
cfe-commits

Tags: #clang

Differential Revision: https://reviews.llvm.org/D77871
2020-04-24 15:54:06 +01:00
Sander de Smalen 0ddb2034c1 [SveEmitter] Add builtins for compares and ReverseCompare flag.
The IsReverseCompare flag tells CGBuiltin to swap the operands,
so that a LT/LE intrinsics can be expressed in terms of GE/GT
intrinsics.

This patch also adds builtins for the wide-variants of the compares.

Reviewers: SjoerdMeijer, efriedma, ctetreau

Reviewed By: efriedma

Tags: #clang

Differential Revision: https://reviews.llvm.org/D78747
2020-04-24 14:33:47 +01:00
Sander de Smalen 823e2a670a [SveEmitter] Add builtins for contiguous prefetches
This patch also adds the enum `sv_prfop` for the prefetch operation specifier
and checks to ensure the passed enum values are valid.

Reviewers: SjoerdMeijer, efriedma, ctetreau

Reviewed By: efriedma

Tags: #clang

Differential Revision: https://reviews.llvm.org/D78674
2020-04-24 11:35:59 +01:00
Sander de Smalen 7003a1da37 [SveEmitter] Use llvm.aarch64.sve.ld1/st1 for contiguous load/store builtins
This patch changes the codegen of the builtins for contiguous loads
to map onto the SVE specific IR intrinsics llvm.aarch64.sve.ld1/st1.

Reviewers: SjoerdMeijer, efriedma, kmclaughlin, rengolin

Reviewed By: efriedma

Tags: #clang

Differential Revision: https://reviews.llvm.org/D78673
2020-04-23 15:15:41 +01:00
Sander de Smalen a5e0389b2a [AArch64] Define ACLE FP conversion intrinsics with more specific predicate.
This patch changes the FP conversion intrinsics to take a predicate
that matches the number of lanes for the vector with the widest element
type as opposed to using <vscale x 16 x i1>.

For example:
```<vscale x 4 x float> @llvm.aarch64.sve.fcvt.f32f16(<vscale x 4 x float>, <vscale x 4 x i1>, <vscale x 8 x half>)```
now uses <vscale x 4 x i1> instead of <vscale x 16 x i1>

And similar for:
```<vscale x 4 x float> @llvm.aarch64.sve.fcvt.f32f64(<vscale x 4 x float>, <vscale x 2 x i1>, <vscale x 2 x double>)```
where the predicate now matches the wider type, so <vscale x 2 x i1>.

Reviewers: efriedma, SjoerdMeijer, paulwalker-arm, rengolin

Reviewed By: efriedma

Tags: #clang

Differential Revision: https://reviews.llvm.org/D78402
2020-04-23 10:53:23 +01:00
Sander de Smalen 002164461b [SveEmitter] Add builtins for FP conversions
This adds the flag IsOverloadCvt which tells CGBulitin to use
the result type and the type of the last operand as the
overloaded types for the LLVM IR intrinsic.

This also adds the flag IsFPConvert, which is needed to avoid
converting the predicate of the operation from svbool_t to
a predicate with fewer lanes, as the LLVM IR intrinsics use
the <vscale x 16 x i1> as the predicate.

Reviewers: SjoerdMeijer, efriedma

Reviewed By: efriedma

Tags: #clang

Differential Revision: https://reviews.llvm.org/D78239
2020-04-23 10:49:06 +01:00
Sander de Smalen 2d1baf606a [SveEmitter] Add builtins for svwhilerw/svwhilewr
This also adds the IsOverloadWhileRW flag which tells CGBuiltin to use
the result predicate type and the first pointer type as the
overloaded types for the LLVM IR intrinsic.

Reviewers: SjoerdMeijer, efriedma

Reviewed By: efriedma

Tags: #clang

Differential Revision: https://reviews.llvm.org/D78238
2020-04-22 21:49:18 +01:00
Sander de Smalen 1559485e60 [SveEmitter] Add builtins for svwhile
This also adds the IsOverloadWhile flag which tells CGBuiltin to use
both the default type (predicate) and the type of the second operand
(scalar) as the overloaded types for the LLMV IR intrinsic.

Reviewers: SjoerdMeijer, efriedma, rovka

Reviewed By: efriedma

Tags: #clang

Differential Revision: https://reviews.llvm.org/D77595
2020-04-22 21:47:47 +01:00
Sander de Smalen 662cbaf647 [SveEmitter] Add IsOverloadNone flag and builtins for svpfalse and svcnt[bhwd]_pat
Add the IsOverloadNone flag to tell CGBuiltin that it does not have
an overloaded type. This is used for e.g. svpfalse which does
not take any arguments and always returns a svbool_t.

This patch also adds builtins for svcntb_pat, svcnth_pat, svcntw_pat
and svcntd_pat, as those don't require custom codegen.

Reviewers: SjoerdMeijer, efriedma, rovka

Reviewed By: efriedma

Tags: #clang

Differential Revision: https://reviews.llvm.org/D77596
2020-04-22 16:42:08 +01:00
Sander de Smalen 41d52662d5 [SveEmitter] Add support for _n form builtins
The ACLE has builtins that take a scalar value that is to be expanded
into a vector by the operation. While the ISA may have an instruction
that takes an immediate or a scalar to represent this, the LLVM IR
intrinsic may not, so Clang will have to splat the scalar value.

This patch also adds the _n forms for svabd, svadd, svdiv, svdivr,
svmax, svmin, svmul, svmulh, svub and svsubr.

Reviewers: SjoerdMeijer, efriedma, rovka

Reviewed By: SjoerdMeijer

Tags: #clang

Differential Revision: https://reviews.llvm.org/D77594
2020-04-22 14:23:54 +01:00
Andrzej Warzynski 72f565899d [SveEmitter] Implement builtins for gathers/scatters
This patch adds builtins for:
  * regular, first-faulting and non-temporal gather loads
  * regular and non-temporal scatter stores

Differential Revision: https://reviews.llvm.org/D77735
2020-04-22 13:21:39 +01:00
Sander de Smalen 06c980df46 [SveEmitter] Implement zeroing of false lanes
This implements zeroing of false lanes for binary operations,
where instead of merging into the first operand vector (_m)
a `select` is placed on the first input vector. This approach
easily translates to the use of the `zeroing movprfx` instruction.

This patch also adds builtins for svabd, svadd, svdiv, svdivr,
svmax, svmin, svmul, svmulh, svub and svsubr.

Reviewers: SjoerdMeijer, efriedma, rovka

Reviewed By: efriedma

Tags: #clang

Differential Revision: https://reviews.llvm.org/D77593
2020-04-20 17:02:48 +01:00
Sander de Smalen 9986b3de26 [SveEmitter] Explicitly merge with zero/undef
Builtins that have the merge type MergeAnyExp or MergeZeroExp,
merge into a 'undef' or 'zero' vector respectively, which enables the
_x and _z behaviour for unary operations.

This patch also adds builtins for svabs and svneg.

Reviewers: SjoerdMeijer, efriedma, rovka

Reviewed By: efriedma

Tags: #clang

Differential Revision: https://reviews.llvm.org/D77591
2020-04-20 16:26:20 +01:00
Sander de Smalen 515020c091 [SveEmitter] Add more immediate operand checks.
This patch adds a number of intrinsics that take immediates with
varying ranges based on the element size one of the operands.

    svext:   immediate ranging 0 to (2048/sizeinbits(elt) - 1)
    svasrd:  immediate ranging 1..sizeinbits(elt)
    svqshlu: immediate ranging 1..sizeinbits(elt)/2
    ftmad:   immediate ranging 0..(sizeinbits(elt) - 1)

Reviewers: efriedma, SjoerdMeijer, rovka, rengolin

Reviewed By: SjoerdMeijer

Tags: #clang

Differential Revision: https://reviews.llvm.org/D76679
2020-04-20 14:41:58 +01:00
Christopher Tetreault c858debebc Remove asserting getters from base Type
Summary:
Remove asserting vector getters from Type in preparation for the
VectorType refactor. The existence of these functions complicates the
refactor while adding little value.

Reviewers: dexonsmith, sdesmalen, efriedma

Reviewed By: efriedma

Subscribers: cfe-commits, hiraditya, llvm-commits

Tags: #llvm, #clang

Differential Revision: https://reviews.llvm.org/D77278
2020-04-17 14:03:31 -07:00
Benjamin Kramer 316b49d373 Pass shufflevector indices as int instead of unsigned.
No functionality change intended.
2020-04-15 15:52:49 +02:00
Benjamin Kramer 6f64daca8f Upgrade calls to CreateShuffleVector to use the preferred form of passing an array of ints
No functionality change intended.
2020-04-15 12:51:38 +02:00
Sander de Smalen c8a5b30bac [SveEmitter] Add range checks for immediates and predicate patterns.
Summary:
This patch adds a mechanism to easily add range checks for a builtin's
immediate operands. This patch is tested with the qdech intrinsic, which takes
both an enum for the predicate pattern, as well as an immediate for the
multiplier.

Reviewers: efriedma, SjoerdMeijer, rovka

Reviewed By: efriedma, SjoerdMeijer

Subscribers: mgorny, tschuett, mgrang, cfe-commits

Tags: #clang

Differential Revision: https://reviews.llvm.org/D76678
2020-04-14 16:49:32 +01:00
Sander de Smalen 17a68c61a9 [SveEmitter] Implement builtins for contiguous loads/stores
This adds builtins for all contiguous loads/stores, including
non-temporal, first-faulting and non-faulting.

Reviewers: efriedma, SjoerdMeijer

Reviewed By: SjoerdMeijer

Tags: #clang

Differential Revision: https://reviews.llvm.org/D76238
2020-04-14 15:24:57 +01:00
Georgii Rymar 1647ff6e27 [ADT/STLExtras.h] - Add llvm::is_sorted wrapper and update callers.
It can be used to avoid passing the begin and end of a range.
This makes the code shorter and it is consistent with another
wrappers we already have.

Differential revision: https://reviews.llvm.org/D78016
2020-04-14 14:11:02 +03:00
Christopher Tetreault f22fbe3a15 Clean up usages of asserting vector getters in Type
Summary:
Remove usages of asserting vector getters in Type in preparation for the
VectorType refactor. The existence of these functions complicates the
refactor while adding little value.

Reviewers: sdesmalen, efriedma, krememek

Reviewed By: sdesmalen, efriedma

Subscribers: dexonsmith, Charusso, cfe-commits

Tags: #clang

Differential Revision: https://reviews.llvm.org/D77257
2020-04-13 13:01:40 -07:00
Kevin P. Neal 7f38812d5b [FPEnv][AArch64] Platform-specific builtin constrained FP enablement
When constrained floating point is enabled the AArch64-specific builtins don't use constrained intrinsics in some cases. Fix that.

Neon is part of this patch, so ARM is affected as well.

Differential Revision: https://reviews.llvm.org/D77074
2020-04-10 13:02:00 -04:00
Kevin P. Neal 9f1c35d8b1 Revert "[PowerPC] Replace subtract-from-zero float in version with fneg in PowerPC special fma compiler builtins"
The new test case causes bot failures.

This reverts commit ba87430cad.
2020-04-03 15:47:19 -04:00
Andrew Wock ba87430cad [PowerPC] Replace subtract-from-zero float in version with fneg in PowerPC special fma compiler builtins
This patch adds a test for the PowerPC fma compiler builtins, some variations
of which negate inputs and outputs. The code to generate IR for these
builtins was untested before this patch.

Originally, the code used the outdated method of subtracting floating point
values from -0.0 as floating point negation. This patch remedies that.

Patch by: Drew Wock <drew.wock@sas.com>
Differential Revision: https://reviews.llvm.org/D76949
2020-04-03 14:59:33 -04:00
Yaxun (Sam) Liu 369e26ca9e [AMDGPU] Add __builtin_amdgcn_workgroup_size_x/y/z
The main purpose of introducing these builtins is to add a range
metadata [1, 1025) on the work group size loaded from dispatch
ptr, which cannot be done by source code.

Differential Revision: https://reviews.llvm.org/D76772
2020-03-28 01:03:20 -04:00
Matt Arsenault 3f533006ba AMDGPU: Emit llvm.fshr for __builtin_amdgcn_alignbit
These are equivalent. The generic rotate builtins do not directly map
to the fshr intrinsic.
2020-03-23 16:51:25 -04:00
Thomas Lively de6cd3e836 [WebAssembly] Add SIMD integer abs builtins
Summary:
Since the conditional operator cannot be used with vector conditions
in C, we need a builtin to be able to express this operation in C
source.

Reviewers: aheejin

Subscribers: dschuff, sbc100, jgravelle-google, sunfish, cfe-commits

Tags: #clang

Differential Revision: https://reviews.llvm.org/D76538
2020-03-21 00:21:24 -07:00
Thomas Lively a3f974f3c3 [WebAssembly] SIMD bitmask intrinsics and builtin functions
Summary:
These experimental new instructions are proposed in
https://github.com/WebAssembly/simd/pull/201.

Reviewers: aheejin

Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, sunfish, cfe-commits

Tags: #clang

Differential Revision: https://reviews.llvm.org/D76397
2020-03-19 17:15:37 -07:00
Lucas Prates d4ad386ee1 [ARM] Fixing range checks for Neon's vqdmulhq_lane and vqrdmulhq_lane intrinsics
Summary:
The range checks performed for the vqrdmulh_lane and vqrdmulh_lane Neon
intrinsics were incorrectly using their return type as the base type for
the range check performed on their 'lane' argument.

This patch updates those intrisics to use the type of the proper reference
argument to perform the range checks.

Reviewers: jmolloy, t.p.northover, rsmith, olista01, dnsampaio

Reviewed By: dnsampaio

Subscribers: dnsampaio, kristof.beyls, cfe-commits

Tags: #clang

Differential Revision: https://reviews.llvm.org/D74766
2020-03-19 12:08:12 +00:00
Lucas Prates f56550cf7f [ARM] Enabling range checks on Neon intrinsics' lane arguments
Summary:
Range checks were not properly performed in the lane arguments of Neon
intrinsics implemented based on splat operations. Calls to those
intrinsics where translated to `__builtin__shufflevector` calls directly
by the pre-processor through the arm_neon.h macros, missing the chance
for the proper range checks.

This patch enables the range check by introducing an auxiliary splat
instruction in arm_neon.td, delaying the translation to shufflevector
calls to CGBuiltin.cpp in clang after the checks were performed.

Reviewers: jmolloy, t.p.northover, rsmith, olista01, ostannard

Reviewed By: ostannard

Subscribers: ostannard, dnsampaio, danielkiss, kristof.beyls, cfe-commits

Tags: #clang

Differential Revision: https://reviews.llvm.org/D74619
2020-03-19 12:07:23 +00:00
Lucas Prates 7bf23563f4 Revert "[ARM] Setting missing isLaneQ attribute on Neon Intrisics definitions"
This reverts commit 62ab15ffa3.

Multiple commits were unintentionally squashed into this one. Reverting
so each of them can be pushed properly.
2020-03-19 12:01:13 +00:00
Lucas Prates 62ab15ffa3 [ARM] Setting missing isLaneQ attribute on Neon Intrisics definitions
Summary:
Some of the `*_laneq` intrinsics defined in arm_neon.td were missing the
setting of the `isLaneQ` attribute. This patch sets the attribute on the
related definitions, as they will be required to properly perform range
checks on their lane arguments.

Reviewers: jmolloy, t.p.northover, rsmith, olista01, dnsampaio

Reviewed By: dnsampaio

Subscribers: dnsampaio, kristof.beyls, cfe-commits

Tags: #clang

Differential Revision: https://reviews.llvm.org/D74616
2020-03-19 11:52:41 +00:00
Sander de Smalen c5b81466c2 Reland D75470 [SVE] Auto-generate builtins and header for svld1.
Reworked the patch to avoid sharing a header (SVETypeFlags.h) between
include/clang/Basic and utils/TableGen/SveEmitter.cpp. Now the patch
generates the enum/flags which is included in TargetBuiltins.h.

Also renamed one of the SveEmitter options to be in line with MVE.

Summary:

This is a first patch in a series for the SveEmitter to generate the arm_sve.h
header file and builtins.

I've tried my best to strip down this patch as best as I could, but there
are still a few changes that are not necessarily exercised by the load intrinsics
in this patch, mostly around the SVEType class which has some common logic to
represent types from a type and prototype string. I thought it didn't make
much sense to remove that from this patch and split it up.
2020-03-18 11:16:28 +00:00
Sander de Smalen 6ce537ccfc Revert "[SVE] Auto-generate builtins and header for svld1."
This reverts commit 8b409eabaf.

Reverting this patch for now because it breaks some buildbots.
2020-03-16 15:22:15 +00:00
Sander de Smalen 8b409eabaf [SVE] Auto-generate builtins and header for svld1.
This is a first patch in a series for the SveEmitter to generate the arm_sve.h
header file and builtins.

I've tried my best to strip down this patch as best as I could, but there
are still a few changes that are not necessarily exercised by the load intrinsics
in this patch, mostly around the SVEType class which has some common logic to
represent types from a type and prototype string. I thought it didn't make
much sense to remove that from this patch and split it up.

Reviewers: efriedma, rovka, SjoerdMeijer, rsandifo-arm, rengolin

Reviewed By: SjoerdMeijer

Tags: #clang

Differential Revision: https://reviews.llvm.org/D75470
2020-03-16 10:52:37 +00:00
Sander de Smalen 5087ace651 [Clang][SVE] Parse builtin type string for scalable vectors
This patch adds 'q' to mean 'scalable vector' in the builtin
type string, and for SVE will return the matching builtin
type as defined in the C/C++ language extensions for SVE.

This patch also adds some scaffolding to generate the arm_sve.h
header file, and some builtin definitions (+CodeGen) to be able
to implement some simple masked load intrinsics that use the
ACLE types, such as:

 svint8_t test_svld1_s8(svbool_t pg, const int8_t *base) {
   return svld1_s8(pg, base);
 }

Reviewers: efriedma, rjmccall, rovka, rsandifo-arm, rengolin

Reviewed By: efriedma

Tags: #clang

Differential Revision: https://reviews.llvm.org/D75298
2020-03-15 14:34:52 +00:00
Huihui Zhang 118abf2017 [SVE] Update API ConstantVector::getSplat() to use ElementCount.
Summary:
Support ConstantInt::get() and Constant::getAllOnesValue() for scalable
vector type, this requires ConstantVector::getSplat() to take in 'ElementCount',
instead of 'unsigned' number of element count.

This change is needed for D73753.

Reviewers: sdesmalen, efriedma, apazos, spatel, huntergr, willlovett

Reviewed By: efriedma

Subscribers: tschuett, hiraditya, rkruppe, psnobl, cfe-commits, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D74386
2020-03-12 13:22:41 -07:00
Akira Hatanaka 37fa9d65ea [CodeGen][ObjC] Don't extend lifetime of ObjC pointers passed to calls
to __builtin_os_log_format if ARC isn't enabled

Fixes a bug introduced in this commit:
f4d791f833

rdar://problem/60301219
2020-03-10 22:10:32 -07:00
Mikhail Maltsev 47edf5bafb [ARM,CDE] Generalize MVE intrinsics infrastructure to support CDE
Summary:
This patch generalizes the existing code to support CDE intrinsics
which will share some properties with existing MVE intrinsics
(some of the intrinsics will be polymorphic and accept/return values
of MVE vector types).
Specifically the patch:
* Adds new tablegen backends -gen-arm-cde-builtin-def,
  -gen-arm-cde-builtin-codegen, -gen-arm-cde-builtin-sema,
  -gen-arm-cde-builtin-aliases, -gen-arm-cde-builtin-header based on
  existing MVE backends.
* Renames the '__clang_arm_mve_alias' attribute into
  '__clang_arm_builtin_alias' (it will be used with CDE intrinsics as
  well as MVE intrinsics)
* Implements semantic checks for the coprocessor argument of the CDE
  intrinsics as well as the existing coprocessor intrinsics.
* Adds one CDE intrinsic __arm_cx1 to test the above changes

Reviewers: simon_tatham, MarkMurrayARM, ostannard, dmgreen

Reviewed By: simon_tatham

Subscribers: sdesmalen, mgorny, kristof.beyls, danielkiss, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D75850
2020-03-10 14:03:16 +00:00
Krzysztof Parzyszek d0ca1041ba [Hexagon] Refactor handling of circular load/store builtins, NFC 2020-03-09 14:40:08 -05:00
Akira Hatanaka f4d791f833 [CodeGen][ObjC] Extend lifetime of ObjC pointers passed to calls to
__builtin_os_log_format

This is needed to keep all the objects, including temporaries returned
by function calls, written to the buffer alive until os_log_pack_send is
called.

rdar://problem/60105410
2020-03-06 16:46:50 -08:00
Thomas Lively d43fcd0c04 [WebAssembly] Add SIMD integer min/max builtins
Summary:
Although SIMD integer min/max operations can be expressed using the ?:
operator in C++, that operator is disallowed for vectors in C. As a
workaround, this change introduces new WebAssembly-specific builtin
functions that lower to the desired vector icmp/select sequences.

Reviewers: aheejin, dschuff, kripken

Subscribers: sbc100, jgravelle-google, sunfish, cfe-commits

Tags: #clang

Differential Revision: https://reviews.llvm.org/D75770
2020-03-06 14:28:52 -08:00
Simon Pilgrim 842c5c7994 Fix shadow variable warning. NFC. 2020-03-02 11:41:20 +00:00
Simon Pilgrim 7e9747b50b [X86][F16C] Remove cvtph2ps intrinsics and use generic half2float conversion (PR37554)
This removes everything but int_x86_avx512_mask_vcvtph2ps_512 which provides the SAE variant, but even this can use the fpext generic if the rounding control is the default.

Differential Revision: https://reviews.llvm.org/D75162
2020-02-29 18:57:35 +00:00
Yaxun (Sam) Liu a57d9652a0 Make __builtin_amdgcn_dispatch_ptr dereferenceable and align at 4
Differential Revision: https://reviews.llvm.org/D75028
2020-02-25 13:58:20 -05:00
Craig Topper 727328433a [X86] Add back fmaddsub intrinsics to work towards fixing the strict fp implementation
Previously we emitted an fmadd and a fmadd+fneg and combined them with a shufflevector. But this doesn't follow the correct exception behavior for unselected elements so the backend can't merge them into the fmaddsub/fmsubadd instructions.

This patch restores the the fmaddsub intrinsics so we don't have two arithmetic operations. We lose out on optimization opportunity in the non-strict FP case, but I don't think this is a big loss. If someone gives us a test case we can look into adding instcombine/dagcombine improvements. I'd rather not have the frontend do completely different things for strict and non-strict.

This still has problems because target specific intrinsics don't support strict semantics yet. We also still have all of the problems with masking. But we at least generate the right instruction in constrained mode now.

Differential Revision: https://reviews.llvm.org/D74268
2020-02-24 12:07:21 -08:00
Krzysztof Parzyszek b1d47467e2 [Hexagon] Change HVX vector predicate types from v512/1024i1 to v64/128i1
This commit removes the artificial types <512 x i1> and <1024 x i1>
from HVX intrinsics, and makes v512i1 and v1024i1 no longer legal on
Hexagon.

It may cause existing bitcode files to become invalid.

* Converting between vector predicates and vector registers must be
  done explicitly via vandvrt/vandqrt instructions (their intrinsics),
  i.e. (for 64-byte mode):
    %Q = call <64 x i1> @llvm.hexagon.V6.vandvrt(<16 x i32> %V, i32 -1)
    %V = call <16 x i32> @llvm.hexagon.V6.vandqrt(<64 x i1> %Q, i32 -1)

  The conversion intrinsics are:
    declare  <64 x i1> @llvm.hexagon.V6.vandvrt(<16 x i32>, i32)
    declare <128 x i1> @llvm.hexagon.V6.vandvrt.128B(<32 x i32>, i32)
    declare <16 x i32> @llvm.hexagon.V6.vandqrt(<64 x i1>, i32)
    declare <32 x i32> @llvm.hexagon.V6.vandqrt.128B(<128 x i1>, i32)
  They are all pure.

* Vector predicate values cannot be loaded/stored directly. This directly
  reflects the architecture restriction. Loading and storing or vector
  predicates must be done indirectly via vector registers and explicit
  conversions via vandvrt/vandqrt instructions.
2020-02-19 14:14:56 -06:00
Simon Tatham c32af4447f [ARM,MVE] Add the vmovnbq,vmovntq intrinsic family.
Summary:
These are in some sense the inverse of vmovl[bt]q: they take a vector
of n wide elements and truncate each to half its width. So they only
write half a vector's worth of output data, and therefore they also
take an 'inactive' parameter to provide the other half of the data in
the output vector. So vmovnb overwrites the even lanes of 'inactive'
with the narrowed values from the main input, and vmovnt overwrites
the odd lanes.

LLVM had existing codegen which generates these MVE instructions in
response to IR that takes two vectors of wide elements, or two vectors
of narrow ones. But in this case, we have one vector of each. So my
clang codegen strategy is to narrow the input vector of wide elements
by simply reinterpreting it as the output type, and then we have two
narrow vectors and can represent the operation as a vector shuffle
that interleaves lanes from both of them.

Even so, not all the cases I needed ended up being selected as a
single MVE instruction, so I've added a couple more patterns that spot
combinations of the 'MVEvmovn' and 'ARMvrev32' SDNodes which can be
generated as a VMOVN instruction with operands swapped.

This commit adds the unpredicated forms only.

Reviewers: dmgreen, miyuki, MarkMurrayARM, ostannard

Reviewed By: dmgreen

Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D74337
2020-02-18 09:34:50 +00:00
Simon Tatham 5e97940cd2 [ARM,MVE] Add the vmovlbq,vmovltq intrinsic family.
Summary:
These intrinsics take a vector of 2n elements, and return a vector of
n wider elements obtained by sign- or zero-extending every other
element of the input vector. They're represented in IR as a
shufflevector that extracts the odd or even elements of the input,
followed by a sext or zext.

Existing LLVM codegen already matches this pattern and generates the
VMOVLB instruction (which widens the even-index input lanes). But no
existing isel rule was generating VMOVLT, so I've added some. However,
the new rules currently only work in little-endian MVE, because the
pattern they expect from isel lowering includes a bitconvert which
doesn't have the right semantics in big-endian.

The output of one existing codegen test is improved by those new
rules.

This commit adds the unpredicated forms only.

Reviewers: dmgreen, miyuki, MarkMurrayARM, ostannard

Reviewed By: dmgreen

Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D74336
2020-02-18 09:34:50 +00:00
Simon Tatham b6236e9479 [ARM,MVE] Add the vrev16q, vrev32q, vrev64q family.
Summary:
These intrinsics just reorder the lanes of a vector, so the natural IR
representation is as a shufflevector operation. Existing LLVM codegen
already recognizes those particular shufflevectors and generates the
MVE VREV instruction.

This commit adds the unpredicated forms only.

Reviewers: dmgreen, miyuki, MarkMurrayARM, ostannard

Reviewed By: dmgreen

Subscribers: kristof.beyls, cfe-commits

Tags: #clang

Differential Revision: https://reviews.llvm.org/D74334
2020-02-18 09:34:50 +00:00
Simon Tatham 90dc78bc62 [ARM,MVE] Add intrinsics for abs, neg and not operations.
Summary:
This commit adds the unpredicated intrinsics for the unary operations
vabsq (absolute value), vnegq (arithmetic negation), vmvnq (bitwise
complement), vqabsq and vqnegq (saturating versions of abs and neg for
signed integers, in the sense that they give INT_MAX if an input lane
is INT_MIN).

This is done entirely in clang: all of these operations have existing
isel patterns and existing tests for them on the LLVM side, so I've
just made clang emit the same IR that those patterns already match.

Reviewers: dmgreen, miyuki, MarkMurrayARM, ostannard

Reviewed By: MarkMurrayARM

Subscribers: kristof.beyls, cfe-commits

Tags: #clang

Differential Revision: https://reviews.llvm.org/D74331
2020-02-18 09:34:50 +00:00
Jim Lin 466f8843f5 [NFC] Remove trailing space
sed -Ei 's/[[:space:]]+$//' include/**/*.{def,h,td} lib/**/*.{cpp,h,td}
2020-02-18 10:49:13 +08:00
Nicolai Hähnle bf197304a6 CGBuiltin: Remove uses of deprecated CreateCall overloads
Reviewers: t.p.northover

Subscribers: cfe-commits, llvm-commits

Tags: #clang

Differential Revision: https://reviews.llvm.org/D74673
2020-02-18 00:24:09 +01:00
Benjamin Kramer 5fc5c7db38 Strength reduce vectors into arrays. NFCI. 2020-02-17 15:37:35 +01:00
Fangrui Song 1d49eb00d9 [AsmPrinter] De-capitalize all AsmPrinter::Emit* but EmitInstruction
Similar to rL328848.
2020-02-13 17:06:24 -08:00
Guillaume Chatelet d65bbf81f8 [clang] Add support for __builtin_memcpy_inline
Summary: This is a follow up on D61634 and the last step to implement http://lists.llvm.org/pipermail/llvm-dev/2019-April/131973.html

Reviewers: efriedma, courbet, tejohnson

Subscribers: hiraditya, cfe-commits, llvm-commits, jdoerfert, t.p.northover

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D73543
2020-02-07 23:55:26 +01:00
Craig Topper 96400ae2a4 Recommit "[FPEnv][X86] Platform-specific builtin constrained FP enablement"
With REQUIRES: x86-register-target added to the tests.

Also remove some unneeded FIXMEs

But add a FIXME for bad IR generation for FMADDSUB/FMSUBADD with
constrained FP.

Original patch by Kevin P. Neal
2020-02-06 16:54:35 -08:00
Kevin P. Neal ad0e03fd4c Revert "[FPEnv][X86] Platform-specific builtin constrained FP enablement"
This reverts commit 208470dd5d.

Tests fail:
error: unable to create target: 'No available targets are compatible with triple "x86_64-apple-darwin"'

This happens on clang-hexagon-elf, clang-cmake-armv7-quick, and
clang-cmake-armv7-quick bots.

If anyone has any suggestions on why then I'm all ears.

Differential Revision: https://reviews.llvm.org/D73570

Revert "[FPEnv][X86] Speculative fix for failures introduced by eda495426."

This reverts commit 80e17e5fcc.

The speculative fix didn't solve the test failures on Hexagon, ARMv6, and
MSVC AArch64.
2020-02-06 19:17:14 -05:00
Kevin P. Neal 208470dd5d [FPEnv][X86] Platform-specific builtin constrained FP enablement
When constrained floating point is enabled the X86-specific builtins don't
use constrained intrinsics in some cases. Fix that.

Differential Revision: https://reviews.llvm.org/D73570
2020-02-06 14:20:44 -05:00
Simon Tatham 961530fdc9 [ARM,MVE] Fix vreinterpretq in big-endian mode.
Summary:
In big-endian MVE, the simple vector load/store instructions (i.e.
both contiguous and non-widening) don't all store the bytes of a
register to memory in the same order: it matters whether you did a
VSTRB.8, VSTRH.16 or VSTRW.32. Put another way, the in-register
formats of different vector types relate to each other in a different
way from the in-memory formats.

So, if you want to 'bitcast' or 'reinterpret' one vector type as
another, you have to carefully specify which you mean: did you want to
reinterpret the //register// format of one type as that of the other,
or the //memory// format?

The ACLE `vreinterpretq` intrinsics are specified to reinterpret the
register format. But I had implemented them as LLVM IR bitcast, which
is specified for all types as a reinterpretation of the memory format.
So a `vreinterpretq` intrinsic, applied to values already in registers,
would code-generate incorrectly if compiled big-endian: instead of
emitting no code, it would emit a `vrev`.

To fix this, I've introduced a new IR intrinsic to perform a
register-format reinterpretation: `@llvm.arm.mve.vreinterpretq`. It's
implemented by a trivial isel pattern that expects the input in an
MQPR register, and just returns it unchanged.

In the clang codegen, I only emit this new intrinsic where it's
actually needed: I prefer a bitcast wherever it will have the right
effect, because LLVM understands bitcasts better. So we still generate
bitcasts in little-endian mode, and even in big-endian when you're
casting between two vector types with the same lane size.

For testing, I've moved all the codegen tests of vreinterpretq out
into their own file, so that they can have a different set of RUN
lines to check both big- and little-endian.

Reviewers: dmgreen, MarkMurrayARM, miyuki, ostannard

Reviewed By: dmgreen

Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D73786
2020-02-03 11:20:06 +00:00
Craig Topper a10cec02f7 [X86] Improve X86 cmpps/cmppd/cmpss/cmpsd intrinsics with strictfp
The constrained fcmp intrinsics don't allow the TRUE/FALSE predicates.
Using them will assert. To workaround this I'm emitting the old X86 specific intrinsics that were never removed from the backend when we switched to using fcmp in IR. We have no way to mark them as being strict, but that's true of all target specific intrinsics so doesn't seem like we need to solve that here.

I've also added support for selecting between signaling and quiet.

Still need to support SAE which will require using a target specific
intrinsic. Also need to fix masking to not use an AND instruction
after the compare.

Differential Revision: https://reviews.llvm.org/D72906
2020-01-29 15:52:11 -08:00
Sanne Wouda 2939fc13c8 [AArch64] Add IR intrinsics for sq(r)dmulh_lane(q)
Summary:
Currently, sqdmulh_lane and friends from the ACLE (implemented in arm_neon.h),
are represented in LLVM IR as a (by vector) sqdmulh and a vector of (repeated)
indices, like so:

   %shuffle = shufflevector <4 x i16> %v, <4 x i16> undef, <4 x i32> <i32 3, i32 3, i32 3, i32 3>
   %vqdmulh2.i = tail call <4 x i16> @llvm.aarch64.neon.sqdmulh.v4i16(<4 x i16> %a, <4 x i16> %shuffle)

When %v's values are known, the shufflevector is optimized away and we are no
longer able to select the lane variant of sqdmulh in the backend.

This defeats a (hand-coded) optimization that packs several constants into a
single vector and uses the lane intrinsics to reduce register pressure and
trade-off materialising several constants for a single vector load from the
constant pool, like so:

   int16x8_t v = {2,3,4,5,6,7,8,9};
   a = vqdmulh_laneq_s16(a, v, 0);
   b = vqdmulh_laneq_s16(b, v, 1);
   c = vqdmulh_laneq_s16(c, v, 2);
   d = vqdmulh_laneq_s16(d, v, 3);
   [...]

In one microbenchmark from libjpeg-turbo this accounts for a 2.5% to 4%
performance difference.

We could teach the compiler to recover the lane variants, but this would likely
require its own pass.  (Alternatively, "volatile" could be used on the constants
vector, but this is a bit ugly.)

This patch instead implements the following LLVM IR intrinsics for AArch64 to
maintain the original structure through IR optmization and into instruction
selection:
- sqdmulh_lane
- sqdmulh_laneq
- sqrdmulh_lane
- sqrdmulh_laneq.

These 'lane' variants need an additional register class.  The second argument
must be in the lower half of the 64-bit NEON register file, but only when
operating on i16 elements.

Note that the existing patterns for shufflevector and sqdmulh into sqdmulh_lane
(etc.) remain, so code that does not rely on NEON intrinsics to generate these
instructions is not affected.

This patch also changes clang to emit these IR intrinsics for the corresponding
NEON intrinsics (AArch64 only).

Reviewers: SjoerdMeijer, dmgreen, t.p.northover, rovka, rengolin, efriedma

Reviewed By: efriedma

Subscribers: kristof.beyls, hiraditya, jdoerfert, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D71469
2020-01-29 13:25:23 +00:00
Guillaume Chatelet 805c157e8a [Alignment][NFC] Deprecate Align::None()
Summary:
This is a follow up on https://reviews.llvm.org/D71473#inline-647262.
There's a caveat here that `Align(1)` relies on the compiler understanding of `Log2_64` implementation to produce good code. One could use `Align()` as a replacement but I believe it is less clear that the alignment is one in that case.

Reviewers: xbolva00, courbet, bollu

Subscribers: arsenm, dylanmckay, sdardis, nemanjai, jvesely, nhaehnle, hiraditya, kbarton, jrtc27, atanasyan, jsji, Jim, kerbowa, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D73099
2020-01-24 12:53:58 +01:00
Guillaume Chatelet 59f95222d4 [Alignment][NFC] Use Align with CreateAlignedStore
Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Reviewers: courbet, bollu

Subscribers: arsenm, jvesely, nhaehnle, hiraditya, kerbowa, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D73274
2020-01-23 17:34:32 +01:00
Guillaume Chatelet 0957233320 [Alignment][NFC] Use Align with CreateMaskedStore
Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Reviewers: courbet

Subscribers: hiraditya, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D73106
2020-01-22 11:04:39 +01:00
Kevin P. Neal 2e667d07c7 [FPEnv][SystemZ] Platform-specific builtin constrained FP enablement
When constrained floating point is enabled the SystemZ-specific builtins
don't use constrained intrinsics in some cases. Fix that.

Differential Revision: https://reviews.llvm.org/D72722
2020-01-21 12:44:39 -05:00
Guillaume Chatelet bc8a1ab26f [Alignment][NFC] Use Align with CreateMaskedLoad
Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Reviewers: courbet

Subscribers: hiraditya, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D73087
2020-01-21 14:13:22 +01:00
serge-sans-paille d293417931 Add __warn_memset_zero_len builtin as a workaround for glibc issue
Glibc issue: https://sourceware.org/bugzilla/show_bug.cgi?id=25399
The fix consist in considering the missing function as a builtin lowered to a nop.

Differential Revision: https://reviews.llvm.org/D72869
2020-01-17 09:58:32 +01:00
Sameer Sahasrabuddhe ed181efa17 [HIP][AMDGPU] expand printf when compiling HIP to AMDGPU
Summary:
This change implements the expansion in two parts:
- Add a utility function emitAMDGPUPrintfCall() in LLVM.
- Invoke the above function from Clang CodeGen, when processing a HIP
  program for the AMDGPU target.

The printf expansion has undefined behaviour if the format string is
not a compile-time constant. As a sufficient condition, the HIP
ToolChain now emits -Werror=format-nonliteral.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D71365
2020-01-16 15:15:38 +05:30
Benjamin Kramer df186507e1 Make helper functions static or move them into anonymous namespaces. NFC. 2020-01-14 14:06:37 +01:00
Alex Richardson 8c387cbea7 Add builtins for aligning and checking alignment of pointers and integers
This change introduces three new builtins (which work on both pointers
and integers) that can be used instead of common bitwise arithmetic:
__builtin_align_up(x, alignment), __builtin_align_down(x, alignment) and
__builtin_is_aligned(x, alignment).

I originally added these builtins to the CHERI fork of LLVM a few years ago
to handle the slightly different C semantics that we use for CHERI [1].
Until recently these builtins (or sequences of other builtins) were
required to generate correct code. I have since made changes to the default
C semantics so that they are no longer strictly necessary (but using them
does generate slightly more efficient code). However, based on our experience
using them in various projects over the past few years, I believe that adding
these builtins to clang would be useful.

These builtins have the following benefit over bit-manipulation and casts
via uintptr_t:

- The named builtins clearly convey the semantics of the operation. While
  checking alignment using __builtin_is_aligned(x, 16) versus
  ((x & 15) == 0) is probably not a huge win in readably, I personally find
  __builtin_align_up(x, N) a lot easier to read than (x+(N-1))&~(N-1).
- They preserve the type of the argument (including const qualifiers). When
  using casts via uintptr_t, it is easy to cast to the wrong type or strip
  qualifiers such as const.
- If the alignment argument is a constant value, clang can check that it is
  a power-of-two and within the range of the type. Since the semantics of
  these builtins is well defined compared to arbitrary bit-manipulation,
  it is possible to add a UBSAN checker that the run-time value is a valid
  power-of-two. I intend to add this as a follow-up to this change.
- The builtins avoids int-to-pointer casts both in C and LLVM IR.
  In the future (i.e. once most optimizations handle it), we could use the new
  llvm.ptrmask intrinsic to avoid the ptrtoint instruction that would normally
  be generated.
- They can be used to round up/down to the next aligned value for both
  integers and pointers without requiring two separate macros.
- In many projects the alignment operations are already wrapped in macros (e.g.
  roundup2 and rounddown2 in FreeBSD), so by replacing the macro implementation
  with a builtin call, we get improved diagnostics for many call-sites while
  only having to change a few lines.
- Finally, the builtins also emit assume_aligned metadata when used on pointers.
  This can improve code generation compared to the uintptr_t casts.

[1] In our CHERI compiler we have compilation mode where all pointers are
implemented as capabilities (essentially unforgeable 128-bit fat pointers).
In our original model, casts from uintptr_t (which is a 128-bit capability)
to an integer value returned the "offset" of the capability (i.e. the
difference between the virtual address and the base of the allocation).
This causes problems for cases such as checking the alignment: for example, the
expression `if ((uintptr_t)ptr & 63) == 0` is generally used to check if the
pointer is aligned to a multiple of 64 bytes. The problem with offsets is that
any pointer to the beginning of an allocation will have an offset of zero, so
this check always succeeds in that case (even if the address is not correctly
aligned). The same issues also exist when aligning up or down. Using the
alignment builtins ensures that the address is used instead of the offset. While
I have since changed the default C semantics to return the address instead of
the offset when casting, this offset compilation mode can still be used by
passing a command-line flag.

Reviewers: rsmith, aaron.ballman, theraven, fhahn, lebedev.ri, nlopes, aqjune
Reviewed By: aaron.ballman, lebedev.ri
Differential Revision: https://reviews.llvm.org/D71499
2020-01-09 21:48:29 +00:00
Simon Tatham 06d07ec4a3 [Clang] Handle target-specific builtins returning aggregates.
Summary:
A few of the ARM MVE builtins directly return a structure type. This
causes an assertion failure at code-gen time if you try to assign the
result of the builtin to a variable, because the `RValue` created in
`EmitBuiltinExpr` from the `llvm::Value` produced by codegen is always
made by `RValue::get()`, which creates a non-aggregate `RValue` that
will fail an assertion when `AggExprEmitter::withReturnValueSlot` calls
`Src.getAggregatePointer()`. A similar failure occurs if you try to use
the struct return value directly to extract one field, e.g.
`vld2q(address).val[0]`.

The existing code-gen tests for those MVE builtins pass the returned
structure type directly to the C `return` statement, which apparently
managed to avoid that particular code path, so we didn't notice the
crash.

Now `EmitBuiltinExpr` checks the evaluation kind of the builtin's return
value, and does the necessary handling for aggregate returns. I've added
two extra test cases, both of which crashed before this change.

Reviewers: dmgreen, rjmccall

Reviewed By: rjmccall

Subscribers: kristof.beyls, cfe-commits

Tags: #clang

Differential Revision: https://reviews.llvm.org/D72271
2020-01-09 17:28:37 +00:00
serge-sans-paille cee4a1c957 Improve support of GNU mempcpy
- Lower to the memcpy intrinsic
- Raise warnings when size/bounds are known

Differential Revision: https://reviews.llvm.org/D71374
2020-01-09 17:31:00 +01:00
Fangrui Song 6904cd9486 Add Triple::isX86()
Reviewed By: craig.topper, skan

Differential Revision: https://reviews.llvm.org/D72247
2020-01-06 15:51:02 -08:00
Kevin P. Neal 89d6c288ba [SystemZ] Use FNeg in s390x clang builtins
The s390x builtins are still using FSub instead of FNeg. Correct that.
2020-01-02 12:14:43 -05:00
Craig Topper 5e5a1d2790 [CodeGen] Emit conj/conjf/confjl libcalls as fneg instructions if possible.
We already recognize the __builtin versions of these, might as well
recognize the libcall version.

Differential Revision: https://reviews.llvm.org/D72028
2019-12-31 10:41:00 -08:00
Craig Topper 70f8dd4cf6 [CodeGen] Use IRBuilder::CreateFNeg for __builtin_conj
This replaces the fsub -0.0 idiom with an fneg instruction. We didn't see to have a test that showed the current codegen. Just some tests for constant folding and a test that was only checking the declare lines for libcalls. The latter just checked that we did not have a declare for @conj when using __builtin_conj.

Differential Revision: https://reviews.llvm.org/D72012
2019-12-30 13:25:23 -08:00
Kevin P. Neal 0293b5d671 [NFC] Remove some dead code from CGBuiltin.cpp. 2019-12-24 09:38:34 -05:00
Tim Northover 85cb560b8a ConstrainedFP: use API compatible with opaque pointers.
This just updates an IRBuilder interface to take Functions instead of
Values so the type can be derived, and fixes some callsites in Clang to
call the updated API.
2019-12-19 21:50:47 +00:00
Thomas Lively 71eb8023d8 [WebAssembly] Add avgr_u intrinsics and require nuw in patterns
Summary:
The vector pattern `(a + b + 1) / 2` was previously selected to an
avgr_u instruction regardless of nuw flags, but this is incorrect in
the case where either addition may have an unsigned wrap. This CL
changes the existing pattern to require both adds to have nuw flags
and adds builtin functions and intrinsics for the avgr_u instructions
because the corrected pattern is not representable in C.

Reviewers: aheejin

Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, sunfish, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D71648
2019-12-18 15:31:38 -08:00
Thomas Lively 3a93756dfb [WebAssembly] Replace SIMD int min/max builtins with patterns
Summary:
The instructions were originally implemented via builtins and
intrinsics so users would have to explicitly opt-in to using
them. This was useful while were validating whether these instructions
should have been merged into the spec proposal. Now that they have
been, we can use normal codegen patterns, so the intrinsics and
builtins are no longer useful.

Reviewers: aheejin

Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, sunfish, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D71500
2019-12-16 11:48:49 -08:00
Reid Kleckner 5d986953c8 [IR] Split out target specific intrinsic enums into separate headers
This has two main effects:
- Optimizes debug info size by saving 221.86 MB of obj file size in a
  Windows optimized+debug build of 'all'. This is 3.03% of 7,332.7MB of
  object file size.
- Incremental step towards decoupling target intrinsics.

The enums are still compact, so adding and removing a single
target-specific intrinsic will trigger a rebuild of all of LLVM.
Assigning distinct target id spaces is potential future work.

Part of PR34259

Reviewers: efriedma, echristo, MaskRay

Reviewed By: echristo, MaskRay

Differential Revision: https://reviews.llvm.org/D71320
2019-12-11 18:02:14 -08:00
Simon Tatham bd0f271c9e [ARM][MVE] Add intrinsics for immediate shifts. (reland)
This adds the family of `vshlq_n` and `vshrq_n` ACLE intrinsics, which
shift every lane of a vector left or right by a compile-time
immediate. They mostly work by expanding to the IR `shl`, `lshr` and
`ashr` operations, with their second operand being a vector splat of
the immediate.

There's a fiddly special case, though. ACLE specifies that the
immediate in `vshrq_n` can take values up to //and including// the bit
size of the vector lane. But LLVM IR thinks that shifting right by the
full size of the lane is UB, and feels free to replace the `lshr` with
an `undef` half way through the optimization pipeline. Hence, to keep
this legal in source code, I have to detect it at codegen time.
Logical (unsigned) right shifts by the element size are handled by
simply emitting the zero vector; arithmetic ones are converted into a
shift of one bit less, which will always give the same output.

In order to do that check, I also had to enhance the tablegen
MveEmitter so that it can cope with converting a builtin function's
operand into a bare integer to pass to a code-generating subfunction.
Previously the only bare integers it knew how to handle were flags
generated from within `arm_mve.td`.

Reviewers: dmgreen, miyuki, MarkMurrayARM, ostannard

Reviewed By: dmgreen, MarkMurrayARM

Subscribers: echristo, hokein, rdhindsa, kristof.beyls, hiraditya, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D71065
2019-12-11 10:10:09 +00:00
Kevin P. Neal 6515c524b0 [FPEnv] clang support for constrained FP builtins
Change the IRBuilder and clang so that constrained FP intrinsics will be
emitted for builtins when appropriate. Only non-target-specific builtins
are affected in this patch.

Differential Revision: https://reviews.llvm.org/D70256
2019-12-10 13:09:12 -05:00
Guillaume Chatelet 1b2842bf90 [Alignment][NFC] CreateMemSet use MaybeAlign
Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Reviewers: courbet

Subscribers: arsenm, jvesely, nhaehnle, hiraditya, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D71213
2019-12-10 15:17:44 +01:00
Eric Christopher 9c6b7f68b8 Revert "[ARM][MVE] Add intrinsics for immediate shifts."
and two follow-on commits: one warning fix and one functionality.

As it's breaking at least the lto bot:

http://lab.llvm.org:8011/builders/clang-with-lto-ubuntu/builds/15132/steps/test-stage1-compiler/logs/stdio

This reverts commits:

 8d70f3c933
 ff4dceef92
 d97b3e3e65
2019-12-09 16:47:38 -08:00
Reid Kleckner 9803178a78 Avoid Attr.h includes, CodeGen edition
This saves around 20 includes of Attr.h. Not much.
2019-12-09 16:17:18 -08:00
Haojian Wu ff4dceef92 Fix the compiler warnings: "-Winconsistent-missing-override", "-Wunused-variable"
for d97b3e3e65
2019-12-09 17:09:07 +01:00
Simon Tatham d97b3e3e65 [ARM][MVE] Add intrinsics for immediate shifts.
Summary:
This adds the family of `vshlq_n` and `vshrq_n` ACLE intrinsics, which
shift every lane of a vector left or right by a compile-time
immediate. They mostly work by expanding to the IR `shl`, `lshr` and
`ashr` operations, with their second operand being a vector splat of
the immediate.

There's a fiddly special case, though. ACLE specifies that the
immediate in `vshrq_n` can take values up to //and including// the bit
size of the vector lane. But LLVM IR thinks that shifting right by the
full size of the lane is UB, and feels free to replace the `lshr` with
an `undef` half way through the optimization pipeline. Hence, to keep
this legal in source code, I have to detect it at codegen time.
Logical (unsigned) right shifts by the element size are handled by
simply emitting the zero vector; arithmetic ones are converted into a
shift of one bit less, which will always give the same output.

In order to do that check, I also had to enhance the tablegen
MveEmitter so that it can cope with converting a builtin function's
operand into a bare integer to pass to a code-generating subfunction.
Previously the only bare integers it knew how to handle were flags
generated from within `arm_mve.td`.

Reviewers: dmgreen, miyuki, MarkMurrayARM, ostannard

Reviewed By: MarkMurrayARM

Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D71065
2019-12-09 15:44:09 +00:00
Akira Hatanaka f139ae3d93 [NFC] Pass a reference to CodeGenFunction to methods of LValue and
AggValueSlot

This reapplies 8a5b7c3570 after a null
dereference bug in CGOpenMPRuntime::emitUserDefinedMapper.

Original commit message:

This is needed for the pointer authentication work we plan to do in the
near future.

a63a81bd99/clang/docs/PointerAuthentication.rst
2019-12-03 15:22:13 -08:00
Akira Hatanaka 9f37c0e703 Revert "[NFC] Pass a reference to CodeGenFunction to methods of LValue and"
This reverts commit 8a5b7c3570. This seems
to have broken UBSan because of a null dereference.
2019-12-03 13:08:01 -08:00
Akira Hatanaka 8a5b7c3570 [NFC] Pass a reference to CodeGenFunction to methods of LValue and
AggValueSlot

This is needed for the pointer authentication work we plan to do in the
near future.

a63a81bd99/clang/docs/PointerAuthentication.rst
2019-12-03 11:30:09 -08:00
Victor Campos dcf11c5e86 [ARM][AArch64] Complex addition Neon intrinsics for Armv8.3-A
Summary:
Add support for vcadd_* family of intrinsics. This set of intrinsics is
available in Armv8.3-A.

The fp16 versions require the FP16 extension, which has been available
(opt-in) since Armv8.2-A.

Reviewers: t.p.northover

Reviewed By: t.p.northover

Subscribers: t.p.northover, kristof.beyls, hiraditya, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D70862
2019-12-02 14:38:39 +00:00
David Green 9f15fcc271 [ARM] Replace arm_neon_vqadds with sadd_sat
This replaces the A32 NEON vqadds, vqaddu, vqsubs and vqsubu intrinsics
with the target independent sadd_sat, uadd_sat, ssub_sat and usub_sat.
This helps generate vqadds from standard IR nodes, which might be
produced from the vectoriser. The old variants are removed in the
process.

Differential Revision: https://reviews.llvm.org/D69350
2019-11-27 13:32:29 +00:00
Tim Northover e23d6f3184 NeonEmitter: remove special case on casting polymorphic builtins.
For some reason we were not casting a fairly obscure class of builtin calls we
expected to be polymorphic to vectors of char. It worked because the only
affected intrinsics weren't actually polymorphic after all, but is
unnecessarily complicated.
2019-11-20 13:20:02 +00:00
Simon Tatham 4a4dd85e5a [ARM,MVE] Add intrinsics for vector comparisons.
This adds the `vcmp` family of ACLE MVE intrinsics: vector/vector,
vector/scalar, and the predicated forms of both. All are represented
using standard existing IR: vector/scalar comparisons are represented
by making a vector out of the scalar first, and predicated forms are
represented by taking the bitwise AND of the input predicate and the
output of the comparison. Existing LLVM-side tests demonstrate that
ISel will pattern-match all of that back down to single MVE VCMPs.

The idiom of handling a vector/scalar operation by generating IR to
expand the scalar into a second vector is going to be needed for a lot
of MVE intrinsics, so to make that easy, I've provided a helper
function that automatically works out the element count.

The comparison intrinsics are the first ones that have to //return// a
predicate, in the user-facing `mve_pred16_t` format. This means we
have to use the `arm_mve_pred_v2i` low-level intrinsic to convert it
back from the logical `<n x i1>` form used in IR. I've done that
explicitly in the code gen specification for the builtins, because it
happens much more rarely in the ACLE API than passing a Predicate as
input, so it didn't seem worth automating in MveEmitter.

Reviewers: ostannard, MarkMurrayARM, dmgreen

Reviewed By: dmgreen

Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D70297
2019-11-18 10:39:30 +00:00
Simon Tatham a12f588ebb [ARM,MVE] Add intrinsics for contiguous load/stores.
This patch adds the ACLE intrinsics for all the MVE load and store
instructions not already handled by D69791. These ones don't need new
IR intrinsics, because they can be implemented in terms of standard
LLVM IR constructions.

Some of the load and store instructions access less than 128 bits of
memory, sign/zero extending each value to a wider vector lane on load
or truncating it on store. These are represented in IR by a load of a
shorter vector followed by a zext/sext, and conversely, a trunc
followed by a short store. Existing ISel patterns already recognize
those combinations and turn them into the right MVE instructions.

The predicated forms of all these instructions are represented in the
same way, except that the ordinary load/store operation is replaced
with the existing intrinsics @llvm.masked.{load,store}. These are
currently only code-generated as predicated MVE load/store
instructions if you give LLVM the `-enable-arm-maskedldst` option; so
I've done that in the LLVM codegen test. When we make that the
default, that option can be removed.

In the Tablegen backend, I've had to add a handful of extra support
features:

* We need to be able to make clang::Address objects out of a
  pointer and an alignment (previously we only needed these when the
  user passed us an existing one).

* We can now specify vector types that aren't 128 bits wide (for use
  in those intermediate values in IR), the parametrized type system
  can make one starting from two existing vector types (using the lane
  count of one and the element type of the other).

* I've added support for code generation of pointer casts, and for
  specifying LLVM types as operands to IRBuilder operations (for zext
  and sext, though I think they'll come in useful again).

* Now not all IR construction operations need to be specified as
  Builder.CreateFoo; some don't involve a Builder at all, and one
  passes it as a parameter to a tiny static helper function in
  CGBuiltin.cpp.

Reviewers: ostannard, MarkMurrayARM, dmgreen

Subscribers: kristof.beyls, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D70088
2019-11-13 12:47:00 +00:00
Tim Northover 44e5879f0f AArch64: add arm64_32 support to Clang. 2019-11-12 12:45:18 +00:00
Thomas Lively 935c84c3c2 [WebAssembly] Add experimental SIMD dot product instruction
Summary:
This instruction is not merged to the spec proposal, but we need it to
be implemented in the toolchain to experiment with it. It is available
only on an opt-in basis through a clang builtin.

Defined in https://github.com/WebAssembly/simd/pull/127.

Depends on D69696.

Reviewers: aheejin

Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, sunfish, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D69697
2019-11-01 10:45:48 -07:00
Thomas Lively a07019a275 [WebAssembly] SIMD integer min and max instructions
Summary:
Introduces a clang builtins and LLVM intrinsics representing integer
min/max instructions. These instructions have not been merged to the
SIMD spec proposal yet, so they are currently opt-in only via builtins
and not produced by general pattern matching. If these instructions
are accepted into the spec proposal the builtins and intrinsics will
be replaced with normal pattern matching.

Defined in https://github.com/WebAssembly/simd/pull/27.

Reviewers: aheejin

Reviewed By: aheejin

Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, sunfish, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D69696
2019-10-31 20:22:11 -07:00
vhscampos f6e11a36c4 [ARM][AArch64] Implement __cls, __clsl and __clsll intrinsics from ACLE
Summary:
Writing support for three ACLE functions:
  unsigned int __cls(uint32_t x)
  unsigned int __clsl(unsigned long x)
  unsigned int __clsll(uint64_t x)

CLS stands for "Count number of leading sign bits".

In AArch64, these two intrinsics can be translated into the 'cls'
instruction directly. In AArch32, on the other hand, this functionality
is achieved by implementing it in terms of clz (count number of leading
zeros).

Reviewers: compnerd

Reviewed By: compnerd

Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D69250
2019-10-28 11:06:58 +00:00
Michael Liao 45787e5682 Fix compilation warning. NFC. 2019-10-25 01:06:52 -04:00
Jinsong Ji 9671d1dc17 [clang]Fixup clang -Werror,,-Wcovered-switch-default build failures
llvm/clang/lib/CodeGen/CGBuiltin.cpp:6877:3: error: default label in
switch which covers all enumeration values
[-Werror,-Wcovered-switch-default]

Similar to
https://reviews.llvm.org/rG7b3de1e811972b874d91554642ccb2ef5b32eed6
2019-10-24 22:49:17 +00:00
Simon Tatham 08074cc965 [clang,ARM] Initial ACLE intrinsics for MVE.
This commit sets up the infrastructure for auto-generating <arm_mve.h>
and doing clang-side code generation for the builtins it relies on,
and demonstrates that it works by implementing a representative sample
of the ACLE intrinsics, more or less matching the ones introduced in
LLVM IR by D67158,D68699,D68700.

Like NEON, that header file will provide a set of vector types like
uint16x8_t and C functions with names like vaddq_u32(). Unlike NEON,
the ACLE spec for <arm_mve.h> includes a polymorphism system, so that
you can write plain vaddq() and disambiguate by the vector types you
pass to it.

Unlike the corresponding NEON code, I've arranged to make every user-
facing ACLE intrinsic into a clang builtin, and implement all the code
generation inside clang. So <arm_mve.h> itself contains nothing but
typedefs and function declarations, with the latter all using the new
`__attribute__((__clang_builtin))` system to arrange that the user-
facing function names correspond to the right internal BuiltinIDs.

So the new MveEmitter tablegen system specifies the full sequence of
IRBuilder operations that each user-facing ACLE intrinsic should
translate into. Where possible, the ACLE intrinsics map to standard IR
operations such as vector-typed `add` and `fadd`; where no standard
representation exists, I call down to the sample IR intrinsics
introduced in an earlier commit.

Doing it like this means that you get the polymorphism for free just
by using __attribute__((overloadable)): the clang overload resolution
decides which function declaration is the relevant one, and _then_ its
BuiltinID is looked up, so by the time we're doing code generation,
that's all been resolved by the standard system. It also means that
you get really nice error messages if the user passes the wrong
combination of types: clang will show the declarations from the header
file and explain why each one doesn't match.

(The obvious alternative approach would be to have wrapper functions
in <arm_mve.h> which pass their arguments to the underlying builtins.
But that doesn't work in the case where one of the arguments has to be
a constant integer: the wrapper function can't pass the constantness
through. So you'd have to do that case using a macro instead, and then
use C11 `_Generic` to handle the polymorphism. Then you have to add
horrible workarounds because `_Generic` requires even the untaken
branches to type-check successfully, and //then// if the user gets the
types wrong, the error message is totally unreadable!)

Reviewers: dmgreen, miyuki, ostannard

Subscribers: mgorny, javed.absar, kristof.beyls, cfe-commits

Tags: #clang

Differential Revision: https://reviews.llvm.org/D67161
2019-10-24 16:33:13 +01:00
Simon Pilgrim 729a2f6c2b CGBuiltin - silence static analyzer getAs<> null dereference warnings. NFCI.
The static analyzer is warning about potential null dereferences, but in these cases we should be able to use castAs<> directly and if not assert will fire for us.

llvm-svn: 374987
2019-10-16 10:38:32 +00:00
Thomas Lively 232fd99d9e [WebAssembly] Trapping fptoint builtins and intrinsics
Summary:
The WebAssembly backend lowers fptoint instructions to a code sequence
that checks for overflow to avoid traps because fptoint is supposed to
be speculatable. These new builtins and intrinsics give users a way to
depend on the trapping semantics of the underlying instructions and
avoid the extra code generated normally.

Patch by coffee and tlively.

Reviewers: aheejin

Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, sunfish, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D68902

llvm-svn: 374856
2019-10-15 01:11:51 +00:00
Joerg Sonnenberger 529f4ed401 Improve __builtin_constant_p lowering
__builtin_constant_p used to be short-cut evaluated to false when
building with -O0. This is undesirable as it means that constant folding
in the front-end can give different results than folding in the back-end.
It can also create conditional branches on constant conditions that don't
get folded away. With the pending improvements to the llvm.is.constant
handling on the LLVM side, the short-cut is no longer useful.

Adjust various codegen tests to not depend on the short-cut or the
backend optimisations.

Differential Revision: https://reviews.llvm.org/D67638

llvm-svn: 374742
2019-10-13 22:33:46 +00:00
Erich Keane f759395994 Reland r374450 with Richard Smith's comments and test fixed.
The behavior from the original patch has changed, since we're no longer
allowing LLVM to just ignore the alignment.  Instead, we're just
assuming the maximum possible alignment.

Differential Revision: https://reviews.llvm.org/D68824

llvm-svn: 374562
2019-10-11 14:59:44 +00:00
Nico Weber b556085d81 Revert 374450 "Fix __builtin_assume_aligned with too large values."
The test fails on Windows, with

  error: 'warning' diagnostics expected but not seen:
    File builtin-assume-aligned.c Line 62: requested alignment
        must be 268435456 bytes or smaller; assumption ignored
  error: 'warning' diagnostics seen but not expected:
    File builtin-assume-aligned.c Line 62: requested alignment
        must be 8192 bytes or smaller; assumption ignored

llvm-svn: 374456
2019-10-10 21:34:32 +00:00
Erich Keane 31e454c1ec Fix __builtin_assume_aligned with too large values.
Code to handle __builtin_assume_aligned was allowing larger values, but
would convert this to unsigned along the way. This patch removes the
EmitAssumeAligned overloads that take unsigned to do away with this
problem.

Additionally, it adds a warning that values greater than 1 <<29 are
ignored by LLVM.

Differential Revision: https://reviews.llvm.org/D68824

llvm-svn: 374450
2019-10-10 21:08:28 +00:00
Thomas Lively 3419e90dc1 [WebAssembly] Add builtin and intrinsic for v8x16.swizzle
Summary:
This clang builtin and corresponding LLVM intrinsic are necessary to
expose the exact semantics of the underlying WebAssembly instruction
to users. LLVM produces a poison value if the dynamic swizzle indices
are greater than the vector size, but the WebAssembly instruction sets
the corresponding output lane to zero. Users who depend on this
behavior can safely use this builtin.

Depends on D68527.

Reviewers: aheejin, dschuff

Subscribers: sbc100, jgravelle-google, hiraditya, sunfish, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D68531

llvm-svn: 374189
2019-10-09 17:45:47 +00:00
Yonghong Song 05e46979d2 [BPF] do compile-once run-everywhere relocation for bitfields
A bpf specific clang intrinsic is introduced:
   u32 __builtin_preserve_field_info(member_access, info_kind)
Depending on info_kind, different information will
be returned to the program. A relocation is also
recorded for this builtin so that bpf loader can
patch the instruction on the target host.
This clang intrinsic is used to get certain information
to facilitate struct/union member relocations.

The offset relocation is extended by 4 bytes to
include relocation kind.
Currently supported relocation kinds are
 enum {
    FIELD_BYTE_OFFSET = 0,
    FIELD_BYTE_SIZE,
    FIELD_EXISTENCE,
    FIELD_SIGNEDNESS,
    FIELD_LSHIFT_U64,
    FIELD_RSHIFT_U64,
 };
for __builtin_preserve_field_info. The old
access offset relocation is covered by
    FIELD_BYTE_OFFSET = 0.

An example:
struct s {
    int a;
    int b1:9;
    int b2:4;
};
enum {
    FIELD_BYTE_OFFSET = 0,
    FIELD_BYTE_SIZE,
    FIELD_EXISTENCE,
    FIELD_SIGNEDNESS,
    FIELD_LSHIFT_U64,
    FIELD_RSHIFT_U64,
};

void bpf_probe_read(void *, unsigned, const void *);
int field_read(struct s *arg) {
  unsigned long long ull = 0;
  unsigned offset = __builtin_preserve_field_info(arg->b2, FIELD_BYTE_OFFSET);
  unsigned size = __builtin_preserve_field_info(arg->b2, FIELD_BYTE_SIZE);
 #ifdef USE_PROBE_READ
  bpf_probe_read(&ull, size, (const void *)arg + offset);
  unsigned lshift = __builtin_preserve_field_info(arg->b2, FIELD_LSHIFT_U64);
 #if __BYTE_ORDER__ == __ORDER_BIG_ENDIAN__
  lshift = lshift + (size << 3) - 64;
 #endif
 #else
  switch(size) {
  case 1:
    ull = *(unsigned char *)((void *)arg + offset); break;
  case 2:
    ull = *(unsigned short *)((void *)arg + offset); break;
  case 4:
    ull = *(unsigned int *)((void *)arg + offset); break;
  case 8:
    ull = *(unsigned long long *)((void *)arg + offset); break;
  }
  unsigned lshift = __builtin_preserve_field_info(arg->b2, FIELD_LSHIFT_U64);
 #endif
  ull <<= lshift;
  if (__builtin_preserve_field_info(arg->b2, FIELD_SIGNEDNESS))
    return (long long)ull >> __builtin_preserve_field_info(arg->b2, FIELD_RSHIFT_U64);
  return ull >> __builtin_preserve_field_info(arg->b2, FIELD_RSHIFT_U64);
}

There is a minor overhead for bpf_probe_read() on big endian.

The code and relocation generated for field_read where bpf_probe_read() is
used to access argument data on little endian mode:
        r3 = r1
        r1 = 0
        r1 = 4  <=== relocation (FIELD_BYTE_OFFSET)
        r3 += r1
        r1 = r10
        r1 += -8
        r2 = 4  <=== relocation (FIELD_BYTE_SIZE)
        call bpf_probe_read
        r2 = 51 <=== relocation (FIELD_LSHIFT_U64)
        r1 = *(u64 *)(r10 - 8)
        r1 <<= r2
        r2 = 60 <=== relocation (FIELD_RSHIFT_U64)
        r0 = r1
        r0 >>= r2
        r3 = 1  <=== relocation (FIELD_SIGNEDNESS)
        if r3 == 0 goto LBB0_2
        r1 s>>= r2
        r0 = r1
LBB0_2:
        exit

Compare to the above code between relocations FIELD_LSHIFT_U64 and
FIELD_LSHIFT_U64, the code with big endian mode has four more
instructions.
        r1 = 41   <=== relocation (FIELD_LSHIFT_U64)
        r6 += r1
        r6 += -64
        r6 <<= 32
        r6 >>= 32
        r1 = *(u64 *)(r10 - 8)
        r1 <<= r6
        r2 = 60   <=== relocation (FIELD_RSHIFT_U64)

The code and relocation generated when using direct load.
        r2 = 0
        r3 = 4
        r4 = 4
        if r4 s> 3 goto LBB0_3
        if r4 == 1 goto LBB0_5
        if r4 == 2 goto LBB0_6
        goto LBB0_9
LBB0_6:                                 # %sw.bb1
        r1 += r3
        r2 = *(u16 *)(r1 + 0)
        goto LBB0_9
LBB0_3:                                 # %entry
        if r4 == 4 goto LBB0_7
        if r4 == 8 goto LBB0_8
        goto LBB0_9
LBB0_8:                                 # %sw.bb9
        r1 += r3
        r2 = *(u64 *)(r1 + 0)
        goto LBB0_9
LBB0_5:                                 # %sw.bb
        r1 += r3
        r2 = *(u8 *)(r1 + 0)
        goto LBB0_9
LBB0_7:                                 # %sw.bb5
        r1 += r3
        r2 = *(u32 *)(r1 + 0)
LBB0_9:                                 # %sw.epilog
        r1 = 51
        r2 <<= r1
        r1 = 60
        r0 = r2
        r0 >>= r1
        r3 = 1
        if r3 == 0 goto LBB0_11
        r2 s>>= r1
        r0 = r2
LBB0_11:                                # %sw.epilog
        exit

Considering verifier is able to do limited constant
propogation following branches. The following is the
code actually traversed.
        r2 = 0
        r3 = 4   <=== relocation
        r4 = 4   <=== relocation
        if r4 s> 3 goto LBB0_3
LBB0_3:                                 # %entry
        if r4 == 4 goto LBB0_7
LBB0_7:                                 # %sw.bb5
        r1 += r3
        r2 = *(u32 *)(r1 + 0)
LBB0_9:                                 # %sw.epilog
        r1 = 51   <=== relocation
        r2 <<= r1
        r1 = 60   <=== relocation
        r0 = r2
        r0 >>= r1
        r3 = 1
        if r3 == 0 goto LBB0_11
        r2 s>>= r1
        r0 = r2
LBB0_11:                                # %sw.epilog
        exit

For native load case, the load size is calculated to be the
same as the size of load width LLVM otherwise used to load
the value which is then used to extract the bitfield value.

Differential Revision: https://reviews.llvm.org/D67980

llvm-svn: 374099
2019-10-08 18:23:17 +00:00
Richard Smith 772e266fbf Properly handle instantiation-dependent array bounds.
We previously failed to treat an array with an instantiation-dependent
but not value-dependent bound as being an instantiation-dependent type.
We now track the array bound expression as part of a constant array type
if it's an instantiation-dependent expression.

llvm-svn: 373685
2019-10-04 01:25:59 +00:00
Guillaume Chatelet d400d45150 [Alignment][NFC] Remove StoreInst::setAlignment(unsigned)
Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Reviewers: courbet, bollu, jdoerfert

Subscribers: hiraditya, asbirlea, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D68268

llvm-svn: 373595
2019-10-03 13:17:21 +00:00
Guillaume Chatelet ab11b9188d [Alignment][NFC] Remove AllocaInst::setAlignment(unsigned)
Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Reviewers: courbet

Subscribers: jholewinski, arsenm, jvesely, nhaehnle, eraman, hiraditya, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D68141

llvm-svn: 373207
2019-09-30 13:34:44 +00:00
Thomas Lively ae530c5c80 [WebAssembly] Narrowing and widening SIMD ops
Summary:
Implements target-specific LLVM intrinsics and clang builtins for
these new SIMD operations, as described at https://github.com/WebAssembly/simd/blob/master/proposals/simd/SIMD.md#integer-to-integer-narrowing.

Reviewers: aheejin

Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, sunfish, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D67425

llvm-svn: 371906
2019-09-13 22:54:41 +00:00
Thomas Lively d0d9317061 [WebAssembly] Add SIMD QFMA/QFMS
Summary:
Adds clang builtins and LLVM intrinsics for these experimental
instructions. They are not implemented in engines yet, but that is ok
because the user must opt into using them by calling the builtins.

Reviewers: aheejin, dschuff

Reviewed By: aheejin

Subscribers: sbc100, jgravelle-google, hiraditya, sunfish, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D67020

llvm-svn: 370556
2019-08-31 00:12:29 +00:00
Matt Arsenault acd0a53c02 Builtins: Start adding half versions of math builtins
The implementation of the OpenCL builtin currently library uses 2
different hacks to get to the corresponding IR intrinsics from the
source. This will allow removal of those.

This is the set that is currently used (minus a few vector ones).

llvm-svn: 367973
2019-08-06 03:28:37 +00:00
David Major 027bb52790 [COFF][ARM64] Reorder handling of aarch64 MSVC builtins
In `CodeGenFunction::EmitAArch64BuiltinExpr()`, bulk move all of the aarch64 MSVC-builtin cases to an earlier point in the function (the `// Handle non-overloaded intrinsics first` switch block) in order to avoid an unreachable in `GetNeonType()`. The NEON type-overloading logic is not appropriate for the Windows builtins.

Fixes https://llvm.org/pr42775

Differential Revision: https://reviews.llvm.org/D65403

llvm-svn: 367323
2019-07-30 15:32:49 +00:00
JF Bastien dbc0a5df8d Allow prefetching from non-zero address spaces
Summary:
This is useful for targets which have prefetch instructions for non-default address spaces.

<rdar://problem/42662136>

Subscribers: nemanjai, javed.absar, hiraditya, kbarton, jkorous, dexonsmith, cfe-commits, llvm-commits, RKSimon, hfinkel, t.p.northover, craig.topper, anemet

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D65254

llvm-svn: 367032
2019-07-25 16:11:57 +00:00
Christudasan Devadasan 8c5e6fa657 Updated the signature for some stack related intrinsics (CLANG)
Modified the intrinsics
int_addressofreturnaddress,
int_frameaddress & int_sponentry.
This commit depends on the changes in rL366679

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D64563

llvm-svn: 366683
2019-07-22 12:50:30 +00:00
Guanzhong Chen 5204f7611f [WebAssembly] Compute and export TLS block alignment
Summary:
Add immutable WASM global `__tls_align` which stores the alignment
requirements of the TLS segment.

Add `__builtin_wasm_tls_align()` intrinsic to get this alignment in Clang.

The expected usage has now changed to:

    __wasm_init_tls(memalign(__builtin_wasm_tls_align(),
                             __builtin_wasm_tls_size()));

Reviewers: tlively, aheejin, sbc100, sunfish, alexcrichton

Reviewed By: tlively

Subscribers: dschuff, jgravelle-google, hiraditya, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D65028

llvm-svn: 366624
2019-07-19 23:34:16 +00:00
Guanzhong Chen 801fa8e6b9 [WebAssembly] Implement __builtin_wasm_tls_base intrinsic
Summary:
Add `__builtin_wasm_tls_base` so that LeakSanitizer can find the thread-local
block and scan through it for memory leaks.

Reviewers: tlively, aheejin, sbc100

Subscribers: dschuff, jgravelle-google, hiraditya, sunfish, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D64900

llvm-svn: 366475
2019-07-18 17:53:22 +00:00
Matt Arsenault e56865d40c AMDGPU: Add some missing builtins
llvm-svn: 366286
2019-07-17 00:01:03 +00:00
Guanzhong Chen 42bba4b852 [WebAssembly] Implement thread-local storage (local-exec model)
Summary:
Thread local variables are placed inside a `.tdata` segment. Their symbols are
offsets from the start of the segment. The address of a thread local variable
is computed as `__tls_base` + the offset from the start of the segment.

`.tdata` segment is a passive segment and `memory.init` is used once per thread
to initialize the thread local storage.

`__tls_base` is a wasm global. Since each thread has its own wasm instance,
it is effectively thread local. Currently, `__tls_base` must be initialized
at thread startup, and so cannot be used with dynamic libraries.

`__tls_base` is to be initialized with a new linker-synthesized function,
`__wasm_init_tls`, which takes as an argument a block of memory to use as the
storage for thread locals. It then initializes the block of memory and sets
`__tls_base`. As `__wasm_init_tls` will handle the memory initialization,
the memory does not have to be zeroed.

To help allocating memory for thread-local storage, a new compiler intrinsic
is introduced: `__builtin_wasm_tls_size()`. This instrinsic function returns
the size of the thread-local storage for the current function.

The expected usage is to run something like the following upon thread startup:

    __wasm_init_tls(malloc(__builtin_wasm_tls_size()));

Reviewers: tlively, aheejin, kripken, sbc100

Subscribers: dschuff, jgravelle-google, hiraditya, sunfish, jfb, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D64537

llvm-svn: 366272
2019-07-16 22:00:45 +00:00
Kyrylo Tkachov eb72138340 [AArch64] Implement __jcvt intrinsic from Armv8.3-A
The jcvt intrinsic defined in ACLE [1] is available when ARM_FEATURE_JCVT is defined.

This change introduces the AArch64 intrinsic, wires it up to the instruction and a new clang builtin function.
The __ARM_FEATURE_JCVT macro is now defined when an Armv8.3-A or higher target is used.
I've implemented the target detection logic in Clang so that this feature is enabled for architectures from armv8.3-a onwards (so -march=armv8.4-a also enables this, for example).

make check-all didn't show any new failures.

[1] https://developer.arm.com/docs/101028/latest/data-processing-intrinsics

Differential Revision: https://reviews.llvm.org/D64495

llvm-svn: 366197
2019-07-16 09:27:39 +00:00
Rui Ueyama 49a3ad21d6 Fix parameter name comments using clang-tidy. NFC.
This patch applies clang-tidy's bugprone-argument-comment tool
to LLVM, clang and lld source trees. Here is how I created this
patch:

$ git clone https://github.com/llvm/llvm-project.git
$ cd llvm-project
$ mkdir build
$ cd build
$ cmake -GNinja -DCMAKE_BUILD_TYPE=Debug \
    -DLLVM_ENABLE_PROJECTS='clang;lld;clang-tools-extra' \
    -DCMAKE_EXPORT_COMPILE_COMMANDS=On -DLLVM_ENABLE_LLD=On \
    -DCMAKE_C_COMPILER=clang -DCMAKE_CXX_COMPILER=clang++ ../llvm
$ ninja
$ parallel clang-tidy -checks='-*,bugprone-argument-comment' \
    -config='{CheckOptions: [{key: StrictMode, value: 1}]}' -fix \
    ::: ../llvm/lib/**/*.{cpp,h} ../clang/lib/**/*.{cpp,h} ../lld/**/*.{cpp,h}

llvm-svn: 366177
2019-07-16 04:46:31 +00:00
Ulrich Weigand b98bf60ef7 [SystemZ] Add support for new cpu architecture - arch13
This patch series adds support for the next-generation arch13
CPU architecture to the SystemZ backend.

This includes:
- Basic support for the new processor and its features.
- Support for low-level builtins mapped to new LLVM intrinsics.
- New high-level intrinsics in vecintrin.h.
- Indicate support by defining  __VEC__ == 10303.

Note: No currently available Z system supports the arch13
architecture.  Once new systems become available, the
official system name will be added as supported -march name.

llvm-svn: 365933
2019-07-12 18:14:51 +00:00
Benjamin Kramer 3b5e60b695 [CodeGen] NVPTX: Switch from atomic.load.add.f32 to atomicrmw fadd
llvm-svn: 365798
2019-07-11 17:44:11 +00:00
Craig Topper caf6b71ab2 [X86] Change the IR sequence for _mm_storeh_pi and _mm_storel_pi to perform the store as a <2 x float> instead of i64.
This is similar to what we do for loadl_pi and loadh_pi.

llvm-svn: 365669
2019-07-10 17:11:29 +00:00
Pengfei Wang a50bbfc470 [NFC] [X86] Fix scan-build complaining
Summary:
Remove unused variable. This fixes bug:
https://bugs.llvm.org/show_bug.cgi?id=42526

Signed-off-by: pengfei <pengfei.wang@intel.com>

Reviewers: RKSimon, xiangzhangllvm, craig.topper

Subscribers: cfe-commits

Tags: #clang

Differential Revision: https://reviews.llvm.org/D64389

llvm-svn: 365473
2019-07-09 12:41:12 +00:00
Yonghong Song 048493f882 [BPF] Preserve debuginfo array/union/struct type/access index
For background of BPF CO-RE project, please refer to
  http://vger.kernel.org/bpfconf2019.html
In summary, BPF CO-RE intends to compile bpf programs
adjustable on struct/union layout change so the same
program can run on multiple kernels with adjustment
before loading based on native kernel structures.

In order to do this, we need keep track of GEP(getelementptr)
instruction base and result debuginfo types, so we
can adjust on the host based on kernel BTF info.
Capturing such information as an IR optimization is hard
as various optimization may have tweaked GEP and also
union is replaced by structure it is impossible to track
fieldindex for union member accesses.

Three intrinsic functions, preserve_{array,union,struct}_access_index,
are introducted.
  addr = preserve_array_access_index(base, index, dimension)
  addr = preserve_union_access_index(base, di_index)
  addr = preserve_struct_access_index(base, gep_index, di_index)
here,
  base: the base pointer for the array/union/struct access.
  index: the last access index for array, the same for IR/DebugInfo layout.
  dimension: the array dimension.
  gep_index: the access index based on IR layout.
  di_index: the access index based on user/debuginfo types.

If using these intrinsics blindly, i.e., transforming all GEPs
to these intrinsics and later on reducing them to GEPs, we have
seen up to 7% more instructions generated. To avoid such an overhead,
a clang builtin is proposed:
  base = __builtin_preserve_access_index(base)
such that user wraps to-be-relocated GEPs in this builtin
and preserve_*_access_index intrinsics only apply to
those GEPs. Such a buyin will prevent performance degradation
if people do not use CO-RE, even for programs which use
bpf_probe_read().

For example, for the following example,
  $ cat test.c
  struct sk_buff {
     int i;
     int b1:1;
     int b2:2;
     union {
       struct {
         int o1;
         int o2;
       } o;
       struct {
         char flags;
         char dev_id;
       } dev;
       int netid;
     } u[10];
  };

  static int (*bpf_probe_read)(void *dst, int size, const void *unsafe_ptr)
      = (void *) 4;

  #define _(x) (__builtin_preserve_access_index(x))

  int bpf_prog(struct sk_buff *ctx) {
    char dev_id;
    bpf_probe_read(&dev_id, sizeof(char), _(&ctx->u[5].dev.dev_id));
    return dev_id;
  }
  $ clang -target bpf -O2 -g -emit-llvm -S -mllvm -print-before-all \
    test.c >& log

The generated IR looks like below:
  ...
  define dso_local i32 @bpf_prog(%struct.sk_buff*) #0 !dbg !15 {
    %2 = alloca %struct.sk_buff*, align 8
    %3 = alloca i8, align 1
    store %struct.sk_buff* %0, %struct.sk_buff** %2, align 8, !tbaa !45
    call void @llvm.dbg.declare(metadata %struct.sk_buff** %2, metadata !43, metadata !DIExpression()), !dbg !49
    call void @llvm.lifetime.start.p0i8(i64 1, i8* %3) #4, !dbg !50
    call void @llvm.dbg.declare(metadata i8* %3, metadata !44, metadata !DIExpression()), !dbg !51
    %4 = load i32 (i8*, i32, i8*)*, i32 (i8*, i32, i8*)** @bpf_probe_read, align 8, !dbg !52, !tbaa !45
    %5 = load %struct.sk_buff*, %struct.sk_buff** %2, align 8, !dbg !53, !tbaa !45
    %6 = call [10 x %union.anon]* @llvm.preserve.struct.access.index.p0a10s_union.anons.p0s_struct.sk_buffs(
         %struct.sk_buff* %5, i32 2, i32 3), !dbg !53, !llvm.preserve.access.index !19
    %7 = call %union.anon* @llvm.preserve.array.access.index.p0s_union.anons.p0a10s_union.anons(
         [10 x %union.anon]* %6, i32 1, i32 5), !dbg !53
    %8 = call %union.anon* @llvm.preserve.union.access.index.p0s_union.anons.p0s_union.anons(
         %union.anon* %7, i32 1), !dbg !53, !llvm.preserve.access.index !26
    %9 = bitcast %union.anon* %8 to %struct.anon.0*, !dbg !53
    %10 = call i8* @llvm.preserve.struct.access.index.p0i8.p0s_struct.anon.0s(
         %struct.anon.0* %9, i32 1, i32 1), !dbg !53, !llvm.preserve.access.index !34
    %11 = call i32 %4(i8* %3, i32 1, i8* %10), !dbg !52
    %12 = load i8, i8* %3, align 1, !dbg !54, !tbaa !55
    %13 = sext i8 %12 to i32, !dbg !54
    call void @llvm.lifetime.end.p0i8(i64 1, i8* %3) #4, !dbg !56
    ret i32 %13, !dbg !57
  }

  !19 = distinct !DICompositeType(tag: DW_TAG_structure_type, name: "sk_buff", file: !3, line: 1, size: 704, elements: !20)
  !26 = distinct !DICompositeType(tag: DW_TAG_union_type, scope: !19, file: !3, line: 5, size: 64, elements: !27)
  !34 = distinct !DICompositeType(tag: DW_TAG_structure_type, scope: !26, file: !3, line: 10, size: 16, elements: !35)

Note that @llvm.preserve.{struct,union}.access.index calls have metadata llvm.preserve.access.index
attached to instructions to provide struct/union debuginfo type information.

For &ctx->u[5].dev.dev_id,
  . The "%6 = ..." represents struct member "u" with index 2 for IR layout and index 3 for DI layout.
  . The "%7 = ..." represents array subscript "5".
  . The "%8 = ..." represents union member "dev" with index 1 for DI layout.
  . The "%10 = ..." represents struct member "dev_id" with index 1 for both IR and DI layout.

Basically, traversing the use-def chain recursively for the 3rd argument of bpf_probe_read() and
examining all preserve_*_access_index calls, the debuginfo struct/union/array access index
can be achieved.

The intrinsics also contain enough information to regenerate codes for IR layout.
For array and structure intrinsics, the proper GEP can be constructed.
For union intrinsics, replacing all uses of "addr" with "base" should be enough.

Signed-off-by: Yonghong Song <yhs@fb.com>

Differential Revision: https://reviews.llvm.org/D61809

llvm-svn: 365438
2019-07-09 04:21:50 +00:00
Yonghong Song e085b40e9c Revert "[BPF] Preserve debuginfo array/union/struct type/access index"
This reverts commit r365435.

Forgot adding the Differential Revision link. Will add to the
commit message and resubmit.

llvm-svn: 365436
2019-07-09 04:15:12 +00:00
Yonghong Song f21eeafcd9 [BPF] Preserve debuginfo array/union/struct type/access index
For background of BPF CO-RE project, please refer to
  http://vger.kernel.org/bpfconf2019.html
In summary, BPF CO-RE intends to compile bpf programs
adjustable on struct/union layout change so the same
program can run on multiple kernels with adjustment
before loading based on native kernel structures.

In order to do this, we need keep track of GEP(getelementptr)
instruction base and result debuginfo types, so we
can adjust on the host based on kernel BTF info.
Capturing such information as an IR optimization is hard
as various optimization may have tweaked GEP and also
union is replaced by structure it is impossible to track
fieldindex for union member accesses.

Three intrinsic functions, preserve_{array,union,struct}_access_index,
are introducted.
  addr = preserve_array_access_index(base, index, dimension)
  addr = preserve_union_access_index(base, di_index)
  addr = preserve_struct_access_index(base, gep_index, di_index)
here,
  base: the base pointer for the array/union/struct access.
  index: the last access index for array, the same for IR/DebugInfo layout.
  dimension: the array dimension.
  gep_index: the access index based on IR layout.
  di_index: the access index based on user/debuginfo types.

If using these intrinsics blindly, i.e., transforming all GEPs
to these intrinsics and later on reducing them to GEPs, we have
seen up to 7% more instructions generated. To avoid such an overhead,
a clang builtin is proposed:
  base = __builtin_preserve_access_index(base)
such that user wraps to-be-relocated GEPs in this builtin
and preserve_*_access_index intrinsics only apply to
those GEPs. Such a buyin will prevent performance degradation
if people do not use CO-RE, even for programs which use
bpf_probe_read().

For example, for the following example,
  $ cat test.c
  struct sk_buff {
     int i;
     int b1:1;
     int b2:2;
     union {
       struct {
         int o1;
         int o2;
       } o;
       struct {
         char flags;
         char dev_id;
       } dev;
       int netid;
     } u[10];
  };

  static int (*bpf_probe_read)(void *dst, int size, const void *unsafe_ptr)
      = (void *) 4;

  #define _(x) (__builtin_preserve_access_index(x))

  int bpf_prog(struct sk_buff *ctx) {
    char dev_id;
    bpf_probe_read(&dev_id, sizeof(char), _(&ctx->u[5].dev.dev_id));
    return dev_id;
  }
  $ clang -target bpf -O2 -g -emit-llvm -S -mllvm -print-before-all \
    test.c >& log

The generated IR looks like below:
  ...
  define dso_local i32 @bpf_prog(%struct.sk_buff*) #0 !dbg !15 {
    %2 = alloca %struct.sk_buff*, align 8
    %3 = alloca i8, align 1
    store %struct.sk_buff* %0, %struct.sk_buff** %2, align 8, !tbaa !45
    call void @llvm.dbg.declare(metadata %struct.sk_buff** %2, metadata !43, metadata !DIExpression()), !dbg !49
    call void @llvm.lifetime.start.p0i8(i64 1, i8* %3) #4, !dbg !50
    call void @llvm.dbg.declare(metadata i8* %3, metadata !44, metadata !DIExpression()), !dbg !51
    %4 = load i32 (i8*, i32, i8*)*, i32 (i8*, i32, i8*)** @bpf_probe_read, align 8, !dbg !52, !tbaa !45
    %5 = load %struct.sk_buff*, %struct.sk_buff** %2, align 8, !dbg !53, !tbaa !45
    %6 = call [10 x %union.anon]* @llvm.preserve.struct.access.index.p0a10s_union.anons.p0s_struct.sk_buffs(
         %struct.sk_buff* %5, i32 2, i32 3), !dbg !53, !llvm.preserve.access.index !19
    %7 = call %union.anon* @llvm.preserve.array.access.index.p0s_union.anons.p0a10s_union.anons(
         [10 x %union.anon]* %6, i32 1, i32 5), !dbg !53
    %8 = call %union.anon* @llvm.preserve.union.access.index.p0s_union.anons.p0s_union.anons(
         %union.anon* %7, i32 1), !dbg !53, !llvm.preserve.access.index !26
    %9 = bitcast %union.anon* %8 to %struct.anon.0*, !dbg !53
    %10 = call i8* @llvm.preserve.struct.access.index.p0i8.p0s_struct.anon.0s(
         %struct.anon.0* %9, i32 1, i32 1), !dbg !53, !llvm.preserve.access.index !34
    %11 = call i32 %4(i8* %3, i32 1, i8* %10), !dbg !52
    %12 = load i8, i8* %3, align 1, !dbg !54, !tbaa !55
    %13 = sext i8 %12 to i32, !dbg !54
    call void @llvm.lifetime.end.p0i8(i64 1, i8* %3) #4, !dbg !56
    ret i32 %13, !dbg !57
  }

  !19 = distinct !DICompositeType(tag: DW_TAG_structure_type, name: "sk_buff", file: !3, line: 1, size: 704, elements: !20)
  !26 = distinct !DICompositeType(tag: DW_TAG_union_type, scope: !19, file: !3, line: 5, size: 64, elements: !27)
  !34 = distinct !DICompositeType(tag: DW_TAG_structure_type, scope: !26, file: !3, line: 10, size: 16, elements: !35)

Note that @llvm.preserve.{struct,union}.access.index calls have metadata llvm.preserve.access.index
attached to instructions to provide struct/union debuginfo type information.

For &ctx->u[5].dev.dev_id,
  . The "%6 = ..." represents struct member "u" with index 2 for IR layout and index 3 for DI layout.
  . The "%7 = ..." represents array subscript "5".
  . The "%8 = ..." represents union member "dev" with index 1 for DI layout.
  . The "%10 = ..." represents struct member "dev_id" with index 1 for both IR and DI layout.

Basically, traversing the use-def chain recursively for the 3rd argument of bpf_probe_read() and
examining all preserve_*_access_index calls, the debuginfo struct/union/array access index
can be achieved.

The intrinsics also contain enough information to regenerate codes for IR layout.
For array and structure intrinsics, the proper GEP can be constructed.
For union intrinsics, replacing all uses of "addr" with "base" should be enough.

Signed-off-by: Yonghong Song <yhs@fb.com>
llvm-svn: 365435
2019-07-09 04:04:21 +00:00
Fangrui Song 7264a474b7 Change std::{lower,upper}_bound to llvm::{lower,upper}_bound or llvm::partition_point. NFC
llvm-svn: 365006
2019-07-03 08:13:17 +00:00
Stanislav Mekhanoshin 8a8131a3f6 [AMDGPU] gfx1010 wave32 clang support
Differential Revision: https://reviews.llvm.org/D63209

llvm-svn: 363341
2019-06-13 23:47:59 +00:00
Pengfei Wang 244062eece [X86] Enable intrinsics that convert float and bf16 data to each other
Scalar version :
_mm_cvtsbh_ss , _mm_cvtness_sbh

Vector version:
_mm512_cvtpbh_ps , _mm256_cvtpbh_ps
_mm512_maskz_cvtpbh_ps , _mm256_maskz_cvtpbh_ps
_mm512_mask_cvtpbh_ps , _mm256_mask_cvtpbh_ps

Patch by Shengchen Kan (skan)

Differential Revision: https://reviews.llvm.org/D62363

llvm-svn: 363018
2019-06-11 01:17:28 +00:00
Tim Northover c46827c7ed LLVM IR: Generate new-style byval-with-Type from Clang
LLVM IR recently added a Type parameter to the byval Attribute, so that
when pointers become opaque and no longer have an element type the
information will still be present in IR.

For now the Type parameter is optional (which is why Clang didn't need
this change at the time), but it will become mandatory soon.

llvm-svn: 362652
2019-06-05 21:12:14 +00:00
Pengfei Wang cc3629d545 [X86] Add VP2INTERSECT instructions
Support intel AVX512 VP2INTERSECT instructions in clang

Patch by Xiang Zhang (xiangzhangllvm)

Differential Revision: https://reviews.llvm.org/D62367

llvm-svn: 362196
2019-05-31 06:09:35 +00:00
Adhemerval Zanella 1468991073 [clang] Handle lrint/llrint builtins
As for other floating-point rounding builtins that can be optimized
when build with -fno-math-errno, this patch adds support for lrint
and llrint.  It currently only optimize for AArch64 backend.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D62019

llvm-svn: 361878
2019-05-28 21:16:04 +00:00
Dmitri Gribenko 39192043bb Delete default constructors, copy constructors, move constructors, copy assignment, move assignment operators on Expr, Stmt and Decl
Reviewers: ilya-biryukov, rsmith

Subscribers: cfe-commits

Tags: #clang

Differential Revision: https://reviews.llvm.org/D62187

llvm-svn: 361468
2019-05-23 09:22:43 +00:00
Simon Pilgrim 823458f9b8 [CGBuiltin] dumpRecord - remove unused field offset. NFCI.
llvm-svn: 361238
2019-05-21 10:48:42 +00:00
Craig Topper af7a188453 [Intrinsics] Merge lround.i32 and lround.i64 into a single intrinsic with overloaded result type. Make result type for llvm.llround overloaded instead of fixing to i64
We shouldn't really make assumptions about possible sizes for long and long long. And longer term we should probably support vectorizing these intrinsics. By making the result types not fixed we can support vectors as well.

Differential Revision: https://reviews.llvm.org/D62026

llvm-svn: 361169
2019-05-20 16:27:09 +00:00
Craig Topper 20040db9a6 [X86] Stop implicitly enabling avx512vl when avx512bf16 is enabled.
Previously we were doing this so that the 256 bit selectw builtin could be used in the implementation of the 512->256 bit conversion intrinsic.

After this commit we now use a masked convert builtin that will emit the intrinsic call and the 256-bit select from custom code in CGBuiltin. Then the header only needs to call that one intrinsic.

llvm-svn: 360924
2019-05-16 18:28:17 +00:00
Adhemerval Zanella 0d9dcd7bf0 [clang] Handle lround/llround builtins
As for other floating-point rounding builtins that can be optimized
when build with -fno-math-errno, this patch adds support for lround
and llround.  It currently only optimize for AArch64 backend.

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D61392

llvm-svn: 360896
2019-05-16 13:43:25 +00:00
Martin Storsjo 7037a13679 [AArch64] Add __builtin_sponentry, for calling setjmp in MinGW
In MinGW, setjmp isn't expanded as a builtin in the compiler (like it
is for MSVC), but manually hooked up as calls to the right underlying
functions in headers. Using the actual CRT's real setjmp/longjmp
functions requires this intrinsic. (Currently this is worked around by
using MinGW specific reimplementations of setjmp/longjmp on aarch64.)

Differential Revision: https://reviews.llvm.org/D61592

llvm-svn: 360082
2019-05-06 21:19:07 +00:00
Luo, Yuanke 844f662932 Enable intrinsics of AVX512_BF16, which are supported for BFLOAT16 in Cooper Lake
Summary:
1. Enable infrastructure of AVX512_BF16, which is supported for BFLOAT16 in Cooper Lake;
2. Enable intrinsics for VCVTNE2PS2BF16, VCVTNEPS2BF16 and DPBF16PS instructions, which are Vector Neural Network Instructions supporting BFLOAT16 inputs and conversion instructions from IEEE single precision.
For more details about BF16 intrinsic, please refer to the latest ISE document: https://software.intel.com/en-us/download/intel-architecture-instruction-set-extensions-programming-reference

Patch by LiuTianle

Reviewers: craig.topper, smaslov, LuoYuanke, wxiao3, annita.zhang, spatel, RKSimon

Reviewed By: craig.topper

Subscribers: mgorny, cfe-commits

Tags: #clang

Differential Revision: https://reviews.llvm.org/D60552

llvm-svn: 360018
2019-05-06 08:25:11 +00:00
Richard Smith 31cfb311c5 Reinstate r359059, reverted in r359361, with a fix to properly prevent
us emitting the operand of __builtin_constant_p if it has side-effects.

Original commit message:

Fix interactions between __builtin_constant_p and constexpr to match
current trunk GCC.

GCC permits information from outside the operand of
__builtin_constant_p (but in the same constant evaluation context) to be
used within that operand; clang now does so too. A few other minor
deviations from GCC's behavior showed up in my testing and are also
fixed (matching GCC):
  * Clang now supports nullptr_t as the argument type for
    __builtin_constant_p
    * Clang now returns true from __builtin_constant_p if called with a
    null pointer
    * Clang now returns true from __builtin_constant_p if called with an
    integer cast to pointer type

llvm-svn: 359367
2019-04-27 02:58:17 +00:00
Javed Absar 18b0c40bc5 [AArch64] Add support for MTE intrinsics
This provides intrinsics support for Memory Tagging Extension (MTE),
which was introduced with the Armv8.5-a architecture.
These intrinsics are available when __ARM_FEATURE_MEMORY_TAGGING is defined.
Each intrinsic is described in detail in the ACLE Q1 2019 documentation:
https://developer.arm.com/docs/101028/latest
Reviewed By: Tim Nortover, David Spickett
Differential Revision: https://reviews.llvm.org/D60485

llvm-svn: 359348
2019-04-26 21:08:11 +00:00
Artem Belevich 5fe85a003f [CUDA] Implemented _[bi]mma* builtins.
These builtins provide access to the new integer and
sub-integer variants of MMA (matrix multiply-accumulate) instructions
provided by CUDA-10.x on sm_75 (AKA Turing) GPUs.

Also added a feature for PTX 6.4. While Clang/LLVM does not generate
any PTX instructions that need it, we still need to pass it through to
ptxas in order to be able to compile code that uses the new 'mma'
instruction as inline assembly (e.g used by NVIDIA's CUTLASS library
https://github.com/NVIDIA/cutlass/blob/master/cutlass/arch/mma.h#L101)

Differential Revision: https://reviews.llvm.org/D60279

llvm-svn: 359248
2019-04-25 22:28:09 +00:00
Diogo N. Sampaio eb312ddfdf [Aarch64] Add v8.2-a half precision element extract intrinsics
Summary:
Implements the intrinsics define on the ACLE to extract half precision fp scalar elements from float16x4_t and float16x8_t vector types.
a.k.a:
vduph_lane_f16
vduph_laneq_f16

Reviewers: pablooliveira, olista01, LukeGeeson, DavidSpickett

Reviewed By: DavidSpickett

Subscribers: DavidSpickett, javed.absar, kristof.beyls, cfe-commits

Tags: #clang

Differential Revision: https://reviews.llvm.org/D60272

llvm-svn: 358276
2019-04-12 10:43:48 +00:00
JF Bastien ef202c308b Variable auto-init: also auto-init alloca
Summary:
alloca isn’t auto-init’d right now because it’s a different path in clang that
all the other stuff we support (it’s a builtin, not an expression).
Interestingly, alloca doesn’t have a type (as opposed to even VLA) so we can
really only initialize it with memset.

<rdar://problem/49794007>

Subscribers: jkorous, dexonsmith, cfe-commits, rjmccall, glider, kees, kcc, pcc

Tags: #clang

Differential Revision: https://reviews.llvm.org/D60548

llvm-svn: 358243
2019-04-12 00:11:27 +00:00
Alexey Sotkin 1b01f9728f [OpenCL] Re-fix invalid address space generation for clk_event_t arguments of enqueue_kernel builtin function
Summary:
https://reviews.llvm.org/D53809 fixed wrong address space(assert in debug build)
generated for event_ret argument. But exactly the same problem exists for
event_wait_list argument. This patch should fix both.

Reviewers: Anastasia, yaxunl

Reviewed By:  Anastasia

Subscribers: kristina, ebevhan, cfe-commits

Differential Revision: https://reviews.llvm.org/D59985

llvm-svn: 358151
2019-04-11 06:18:17 +00:00
Vedant Kumar 37b0f9ad95 [os_log] Mark os_log_helper `nounwind`
Allow the optimizer to remove unnecessary EH cleanups surrounding calls
to os_log_helper, to save some code size.

As a follow-up, it might be worthwhile to add a BasicNoexcept exception
spec to os_log_helper, and to then teach CGCall to emit direct calls for
callees which can't throw. This could save some compile-time.

Differential Revision: https://reviews.llvm.org/D60108

llvm-svn: 357501
2019-04-02 17:42:38 +00:00
Reid Kleckner 73253bdefc [MS] Make __iso_volatile_* available on all targets
Future versions of MSVC make these intrinsics available on x86 & x64,
according to:
http://lists.llvm.org/pipermail/cfe-dev/2019-March/061711.html

The purpose of these builtins is to emit plain, non-atomic, volatile
stores when /volatile:ms (-cc1 -fms-volatile) is enabled.

llvm-svn: 357220
2019-03-28 22:59:09 +00:00
Amara Emerson c10b24691a [AArch64] Split the neon.addp intrinsic into integer and fp variants.
This is the result of discussions on the list about how to deal with intrinsics
which require codegen to disambiguate them via only the integer/fp overloads.
It causes problems for GlobalISel as some of that information is lost during
translation, while with other operations like IR instructions the information is
encoded into the instruction opcode.

This patch changes clang to emit the new faddp intrinsic if the vector operands
to the builtin have FP element types. LLVM IR AutoUpgrade has been taught to
upgrade existing calls to aarch64.neon.addp with fp vector arguments, and
we remove the workarounds introduced for GlobalISel in r355865.

This is a more permanent solution to PR40968.

Differential Revision: https://reviews.llvm.org/D59655

llvm-svn: 356722
2019-03-21 22:31:37 +00:00
Heejin Ahn 7e66a50bb4 [WebAssembly] Use rethrow intrinsic in the rethrow block
Summary:
Because in wasm we merge all catch clauses into one big catchpad, in
case none of the types in catch handlers matches after we test against
each of them, we should unwind to the next EH enclosing scope. For this,
we should NOT use a call to `__cxa_rethrow` but rather a call to our own
rethrow intrinsic, because what we're trying to do here is just to
transfer the control flow into the next enclosing EH pad (or the
caller). Calls to `__cxa_rethrow` should only be used after a call to
`__cxa_begin_catch`.

Reviewers: dschuff

Subscribers: sbc100, jgravelle-google, sunfish, cfe-commits

Tags: #clang

Differential Revision: https://reviews.llvm.org/D59353

llvm-svn: 356317
2019-03-16 05:39:12 +00:00
Erik Pilkington 02886e5476 Revert "Add a new attribute, fortify_stdlib"
This reverts commit r353765. After talking with our c stdlib folks, we decided
to use the existing pass_object_size attribute to implement _FORTIFY_SOURCE
wrappers, like Bionic does (I didn't realize that pass_object_size could be used
for this purpose). Sorry for the flip/flop, and thanks to James Y. Knight for
pointing this out to me.

llvm-svn: 356103
2019-03-13 21:37:01 +00:00
Kristina Brooks 103799c060 Fix accidentally used hard tabs. NFC
Big sorry. This undoes the indentation mess I made
in r354751.

llvm-svn: 354752
2019-02-24 18:06:10 +00:00
Kristina Brooks 716cbfb464 Wrap code for builtin_assume_aligned at 80 col.NFC
Minor style fix to avoid going over 80 cols in handling
of case for Builtin::BI__builtin_assume_aligned. NFC.

llvm-svn: 354751
2019-02-24 17:57:33 +00:00
Thomas Lively de7a0a1526 [WebAssembly] Bulk memory intrinsics and builtins
Summary:
implements llvm intrinsics and clang intrinsics for
memory.init and data.drop.

Reviewers: aheejin

Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, sunfish, cfe-commits

Tags: #clang

Differential Revision: https://reviews.llvm.org/D57736

llvm-svn: 353983
2019-02-13 22:11:16 +00:00
Nico Weber acf81a7c14 Re-enable the test disabled in r353836 and hopefully make it pass in gcc builds
Argument evaluation order is different between gcc and clang, so pull out
the Builder calls to make the generated IR independent of the host compiler's
argument evaluation order.  Thanks to rnk for reminding me of this clang/gcc
difference.

llvm-svn: 353969
2019-02-13 19:04:26 +00:00
Erik Pilkington e3cd735ea6 Add a new attribute, fortify_stdlib
This attribute applies to declarations of C stdlib functions
(sprintf, memcpy...) that have known fortified variants
(__sprintf_chk, __memcpy_chk, ...). When applied, clang will emit
calls to the fortified variant functions instead of calls to the
defaults.

In GCC, this is done by adding gnu_inline-style wrapper functions,
but that doesn't work for us for variadic functions because we don't
support __builtin_va_arg_pack (and have no intention to).

This attribute takes two arguments, the first is 'type' argument
passed through to __builtin_object_size, and the second is a flag
argument that gets passed through to the variadic checking variants.

rdar://47905754

Differential revision: https://reviews.llvm.org/D57918

llvm-svn: 353765
2019-02-11 23:21:39 +00:00
James Y Knight 751fe286dc [opaque pointer types] Cleanup CGBuilder's Create*GEP.
The various EltSize, Offset, DataLayout, and StructLayout arguments
are all computable from the Address's element type and the DataLayout
which the CGBuilder already has access to.

After having previously asserted that the computed values are the same
as those passed in, now remove the redundant arguments from
CGBuilder's Create*GEP functions.

Differential Revision: https://reviews.llvm.org/D57767

llvm-svn: 353629
2019-02-09 22:22:28 +00:00
Volodymyr Sapsai 1570571ded [CodeGen][ObjC] Fix assert on calling `__builtin_constant_p` with ObjC objects.
When we are calling `__builtin_constant_p` with ObjC objects of
different classes, we hit the assertion

> Assertion failed: (isa<X>(Val) && "cast<Ty>() argument of incompatible type!"), function cast, file include/llvm/Support/Casting.h, line 254.

It happens because LLVM types for `ObjCInterfaceType` are opaque and
have no name (see `CodeGenTypes::ConvertType`). As the result, for
different ObjC classes we have different `is_constant` intrinsics with
the same name `llvm.is.constant.p0s_s`. When we try to reuse an
intrinsic with the same name, we fail because of type mismatch.

Fix by bitcasting `ObjCObjectPointerType` to `id` prior to passing as an
argument to `__builtin_constant_p`. This results in using intrinsic
`llvm.is.constant.p0i8` and correct types.

rdar://problem/47499250

Reviewers: rjmccall, ahatanak, void

Reviewed By: void, ahatanak

Subscribers: ddunbar, jkorous, hans, dexonsmith, cfe-commits

Differential Revision: https://reviews.llvm.org/D57427

llvm-svn: 353577
2019-02-08 23:02:13 +00:00
Eli Friedman 3189d5f48c [COFF, ARM64] Fix types for _ReadStatusReg, _WriteStatusReg
r344765 added those intrinsics, but used the wrong types.

Patch by Mike Hommey

Differential Revision: https://reviews.llvm.org/D57636

llvm-svn: 353493
2019-02-08 01:17:49 +00:00
Tom Tan dcb9e08fae [COFF, ARM64] Add ARM64 support for MS intrinsic _fastfail
The MSDN document was also updated to reflect this, but it probably will take a few days to show in below link.

https://docs.microsoft.com/en-us/cpp/intrinsics/fastfail

Differential Revision: https://reviews.llvm.org/D57631

llvm-svn: 353337
2019-02-06 20:08:26 +00:00
James Y Knight 76f787424d [opaque pointer types] More trivial changes to pass FunctionType to CallInst.
Change various functions to use FunctionCallee or Function*.

Pass function type through __builtin_dump_struct's dumpRecord helper.

llvm-svn: 353199
2019-02-05 19:17:50 +00:00
James Y Knight 9871db064d [opaque pointer types] Pass function types for runtime function calls.
Emit{Nounwind,}RuntimeCall{,OrInvoke} have been modified to take a
FunctionCallee as an argument, and CreateRuntimeFunction has been
modified to return a FunctionCallee. All callers have been updated.

Additionally, CreateBuiltinFunction is removed, as it was redundant
with CreateRuntimeFunction after some previous changes.

Differential Revision: https://reviews.llvm.org/D57668

llvm-svn: 353184
2019-02-05 16:42:33 +00:00
James Y Knight 8799caee8d [opaque pointer types] Trivial changes towards CallInst requiring
explicit function types.

llvm-svn: 353009
2019-02-03 21:53:49 +00:00
Erik Pilkington 9c3b588db9 Add a new builtin: __builtin_dynamic_object_size
This builtin has the same UI as __builtin_object_size, but has the
potential to be evaluated dynamically. It is meant to be used as a
drop-in replacement for libraries that use __builtin_object_size when
a dynamic checking mode is enabled. For instance,
__builtin_object_size fails to provide any extra checking in the
following function:

  void f(size_t alloc) {
    char* p = malloc(alloc);
    strcpy(p, "foobar"); // expands to __builtin___strcpy_chk(p, "foobar", __builtin_object_size(p, 0))
  }

This is an overflow if alloc < 7, but because LLVM can't fold the
object size intrinsic statically, it folds __builtin_object_size to
-1. With __builtin_dynamic_object_size, alloc is passed through to
__builtin___strcpy_chk.

rdar://32212419

Differential revision: https://reviews.llvm.org/D56760

llvm-svn: 352665
2019-01-30 20:34:53 +00:00
Erik Pilkington 600e9deacf Add a 'dynamic' parameter to the objectsize intrinsic
This is meant to be used with clang's __builtin_dynamic_object_size.
When 'true' is passed to this parameter, the intrinsic has the
potential to be folded into instructions that will be evaluated
at run time. When 'false', the objectsize intrinsic behaviour is
unchanged.

rdar://32212419

Differential revision: https://reviews.llvm.org/D56761

llvm-svn: 352664
2019-01-30 20:34:35 +00:00
James Y Knight 3933addd30 Cleanup: replace uses of CallSite with CallBase.
llvm-svn: 352595
2019-01-30 02:54:28 +00:00
Matt Arsenault b72888647b AMDGPU: Add ds append/consume builtins
llvm-svn: 352443
2019-01-28 23:59:18 +00:00
Craig Topper 07b6d3de1b [X86] Add new variadic avx512 compress/expand intrinsics that use vXi1 types for the mask argument.
Custom lower the builtins to these intrinsics. This enables the middle end to optimize out bitcasts for the masks.

llvm-svn: 352344
2019-01-28 07:03:10 +00:00
Craig Topper bd7884ed79 [X86] Custom codegen 512-bit cvt(u)qq2tops, cvt(u)qqtopd, and cvt(u)dqtops intrinsics.
Summary:
The 512-bit cvt(u)qq2tops, cvt(u)qqtopd, and cvt(u)dqtops intrinsics all have the possibility of taking an explicit rounding mode argument. If the rounding mode is CUR_DIRECTION we'd like to emit a sitofp/uitofp instruction and a select like we do for 256-bit intrinsics.

For cvt(u)qqtopd and cvt(u)dqtops we do this when the form of the software intrinsics that doesn't take a rounding mode argument is used. This is done by using convertvector in the header with the select builtin. But if the explicit rounding mode form of the intrinsic is used and CUR_DIRECTION is passed, we don't do this. We shouldn't have this inconsistency.

For cvt(u)qqtops nothing is done because we can't use the select builtin in the header without avx512vl. So we need to use custom codegen for this.

Even when the rounding mode isn't CUR_DIRECTION we should also use select in IR for consistency. And it will remove another scalar integer mask from our intrinsics.

To accomplish all of these goals I've taken a slightly unusual approach. I've added two new X86 specific intrinsics for sitofp/uitofp with rounding. These intrinsics are variadic on the input and output type so we only need 2 instead of 6. This avoids the need for a switch to map them in CGBuiltin.cpp. We just need to check signed vs unsigned. I believe other targets also use variadic intrinsics like this.

So if the rounding mode is CUR_DIRECTION we'll use an sitofp/uitofp instruction. Otherwise we'll use one of the new intrinsics. After that we'll emit a select instruction if needed.

Reviewers: RKSimon, spatel

Reviewed By: RKSimon

Subscribers: cfe-commits

Differential Revision: https://reviews.llvm.org/D56998

llvm-svn: 352267
2019-01-26 02:42:01 +00:00
Simon Pilgrim a7bcd72c0a [X86] Replace VPCOM/VPCOMU with generic integer comparisons (clang)
These intrinsics can always be replaced with generic integer comparisons without any regression in codegen, even for -O0/-fast-isel cases.

Noticed while cleaning up vector integer comparison costs for PR40376.

A future commit will remove/autoupgrade the existing VPCOM/VPCOMU llvm intrinsics.

llvm-svn: 351687
2019-01-20 16:40:33 +00:00
Chandler Carruth 2946cd7010 Update the file headers across all of the LLVM projects in the monorepo
to reflect the new license.

We understand that people may be surprised that we're moving the header
entirely to discuss the new license. We checked this carefully with the
Foundation's lawyer and we believe this is the correct approach.

Essentially, all code in the project is now made available by the LLVM
project under our new license, so you will see that the license headers
include that license only. Some of our contributors have contributed
code under our old license, and accordingly, we have retained a copy of
our old license notice in the top-level files in each project and
repository.

llvm-svn: 351636
2019-01-19 08:50:56 +00:00
Anton Korobeynikov 81cff31ccf CodeGen: Cast llvm.flt.rounds result to match __builtin_flt_rounds
llvm.flt.rounds returns an i32, but the builtin expects an integer. 
On targets where integers are not 32-bits clang tries to bitcast the result, causing an assertion failure.

The patch enables newlib build for msp430.

Patch by Edward Jones!

Differential Revision: https://reviews.llvm.org/D24461

llvm-svn: 351449
2019-01-17 15:21:55 +00:00
Craig Topper 015585abb2 [X86] Add custom emission for the avx512 scatter builtins to convert from scalar integer to vXi1 for the mask arguments to the intrinsics.
llvm-svn: 351408
2019-01-17 00:34:19 +00:00
Craig Topper 931779761e Recommit r351160 "[X86] Make _xgetbv/_xsetbv on non-windows platforms"
V8 has been fixed now.

llvm-svn: 351391
2019-01-16 22:56:25 +00:00
Craig Topper bb5b06603b [X86] Add versions of the avx512 gather intrinsics that take the mask as a vXi1 vector instead of a scalar
We need to custom handle these so we can turn the scalar mask into a vXi1 vector.

Differential Revision: https://reviews.llvm.org/D56530

llvm-svn: 351390
2019-01-16 22:34:33 +00:00
Benjamin Kramer 9c53890833 Revert "[X86] Make _xgetbv/_xsetbv on non-windows platforms"
This reverts commit r351160. Breaks building v8.

llvm-svn: 351210
2019-01-15 17:23:36 +00:00
Roman Lebedev bd1c087019 [clang][UBSan] Sanitization for alignment assumptions.
Summary:
UB isn't nice. It's cool and powerful, but not nice.
Having a way to detect it is nice though.
[[ https://wg21.link/p1007r3 | P1007R3: std::assume_aligned ]] / http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p1007r2.pdf says:
```
We propose to add this functionality via a library function instead of a core language attribute.
...
If the pointer passed in is not aligned to at least N bytes, calling assume_aligned results in undefined behaviour.
```

This differential teaches clang to sanitize all the various variants of this assume-aligned attribute.

Requires D54588 for LLVM IRBuilder changes.
The compiler-rt part is D54590.

This is a second commit, the original one was r351105,
which was mass-reverted in r351159 because 2 compiler-rt tests were failing.

Reviewers: ABataev, craig.topper, vsk, rsmith, rnk, #sanitizers, erichkeane, filcab, rjmccall

Reviewed By: rjmccall

Subscribers: chandlerc, ldionne, EricWF, mclow.lists, cfe-commits, bkramer

Tags: #sanitizers

Differential Revision: https://reviews.llvm.org/D54589

llvm-svn: 351177
2019-01-15 09:44:25 +00:00
Craig Topper 69aed7c364 [X86] Make _xgetbv/_xsetbv on non-windows platforms
Summary:
This patch attempts to redo what was tried in r278783, but was reverted.

These intrinsics should be available on non-windows platforms with "xsave" feature check. But on Windows platforms they shouldn't have feature check since that's how MSVC behaves.

To accomplish this I've added a MS builtin with no feature check. And a normal gcc builtin with a feature check. When _MSC_VER is not defined _xgetbv/_xsetbv will be macros pointing to the gcc builtin name.

I've moved the forward declarations from intrin.h to immintrin.h to match the MSDN documentation and used that as the header file for the MS builtin.

I'm not super happy with this implementation, and I'm open to suggestions for better ways to do it.

Reviewers: rnk, RKSimon, spatel

Reviewed By: rnk

Subscribers: cfe-commits

Differential Revision: https://reviews.llvm.org/D56686

llvm-svn: 351160
2019-01-15 05:03:18 +00:00
Vlad Tsyrklevich 86e68fda3b Revert alignment assumptions changes
Revert r351104-6, r351109, r351110, r351119, r351134, and r351153. These
changes fail on the sanitizer bots.

llvm-svn: 351159
2019-01-15 03:38:02 +00:00
Roman Lebedev 7892c37455 [clang][UBSan] Sanitization for alignment assumptions.
Summary:
UB isn't nice. It's cool and powerful, but not nice.
Having a way to detect it is nice though.
[[ https://wg21.link/p1007r3 | P1007R3: std::assume_aligned ]] / http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p1007r2.pdf says:
```
We propose to add this functionality via a library function instead of a core language attribute.
...
If the pointer passed in is not aligned to at least N bytes, calling assume_aligned results in undefined behaviour.
```

This differential teaches clang to sanitize all the various variants of this assume-aligned attribute.

Requires D54588 for LLVM IRBuilder changes.
The compiler-rt part is D54590.

Reviewers: ABataev, craig.topper, vsk, rsmith, rnk, #sanitizers, erichkeane, filcab, rjmccall

Reviewed By: rjmccall

Subscribers: chandlerc, ldionne, EricWF, mclow.lists, cfe-commits, bkramer

Tags: #sanitizers

Differential Revision: https://reviews.llvm.org/D54589

llvm-svn: 351105
2019-01-14 19:09:27 +00:00
Dan Gohman 51532a524e [WebAssembly] Remove old builtins
This removes the old grow_memory and mem.grow-style builtins, leaving just
the memory.grow-style builtins.

Differential Revision: https://reviews.llvm.org/D56645

llvm-svn: 351089
2019-01-14 18:28:10 +00:00
Craig Topper 49488407aa [X86] Remove mask parameter from avx512 pmultishiftqb intrinsics. Use select in IR instead.
Fixes PR40259

llvm-svn: 351036
2019-01-14 08:46:51 +00:00
Craig Topper 689b3b71af [X86] Remove mask parameter from vpshufbitqmb intrinsics. Change result to a vXi1 vector.
We'll do the scalar<->vXi1 conversions with bitcasts in IR.

Fixes PR40258

llvm-svn: 351029
2019-01-14 00:03:55 +00:00
Craig Topper cd9e232a4d Recommit r350555 "[X86] Use funnel shift intrinsics for the VBMI2 vshld/vshrd builtins."
The MSVC limit hit in AutoUpgrade.cpp has been worked around for now.

llvm-svn: 350568
2019-01-07 21:00:41 +00:00
Craig Topper 33c9088783 Revert r350555 "[X86] Use funnel shift intrinsics for the VBMI2 vshld/vshrd builtins."
Had to revert the LLVM patch this depends on to fix a MSVC compiler limit in AutoUpgrade.cpp

llvm-svn: 350563
2019-01-07 19:39:25 +00:00
Craig Topper e34f2bb807 [X86] Use funnel shift intrinsics for the VBMI2 vshld/vshrd builtins.
Differential Revision: https://reviews.llvm.org/D56365

llvm-svn: 350555
2019-01-07 19:10:22 +00:00
Haibo Huang 303b2333e4 Declares __cpu_model as dso local
__builtin_cpu_supports and __builtin_cpu_is use information in __cpu_model to decide cpu features. Before this change, __cpu_model was not declared as dso local. The generated code looks up the address in GOT when reading __cpu_model. This makes it impossible to use these functions in ifunc, because at that time GOT entries have not been relocated. This change makes it dso local.

Differential Revision: https://reviews.llvm.org/D53850

llvm-svn: 349825
2018-12-20 21:33:59 +00:00
Simon Pilgrim 4597379227 [X86] Auto upgrade XOP/AVX512 rotation intrinsics to generic funnel shift intrinsics (clang)
This emits FSHL/FSHR generic intrinsics for the XOP VPROT and AVX512 VPROL/VPROR rotation intrinsics.

LLVM counterpart: https://reviews.llvm.org/D55938

Differential Revision: https://reviews.llvm.org/D55937

llvm-svn: 349796
2018-12-20 19:01:13 +00:00
Simon Pilgrim 313dc85ce0 [X86][SSE] Auto upgrade PADDS/PSUBS intrinsics to SADD_SAT/SSUB_SAT generic intrinsics (clang)
This emits SADD_SAT/SSUB_SAT generic intrinsics for the SSE signed saturated math intrinsics.

LLVM counterpart: https://reviews.llvm.org/D55894

Differential Revision: https://reviews.llvm.org/D55890

llvm-svn: 349743
2018-12-20 11:53:45 +00:00
Simon Pilgrim a7b30b4a58 [X86][SSE] Auto upgrade PADDUS/PSUBUS intrinsics to UADD_SAT/USUB_SAT generic intrinsics (clang)
Sibling patch to D55855, this emits UADD_SAT/USUB_SAT generic intrinsics for the SSE saturated math intrinsics instead of expanding to a IR code sequence that could be difficult to reassemble.

Differential Revision: https://reviews.llvm.org/D55879

llvm-svn: 349631
2018-12-19 14:43:47 +00:00
Vedant Kumar 77dfca88b2 [CodeGen] Handle mixed-width ops in mixed-sign mul-with-overflow lowering
The special lowering for __builtin_mul_overflow introduced in r320902
fixed an ICE seen when passing mixed-sign operands to the builtin.

This patch extends the special lowering to cover mixed-width, mixed-sign
operands. In a few common scenarios, calls to muloti4 will no longer be
emitted.

This should address the latest comments in PR34920 and work around the
link failure seen in:

  https://bugzilla.redhat.com/show_bug.cgi?id=1657544

Testing:
- check-clang
- A/B output comparison with: https://gist.github.com/vedantk/3eb9c88f82e5c32f2e590555b4af5081

Differential Revision: https://reviews.llvm.org/D55843

llvm-svn: 349542
2018-12-18 21:05:03 +00:00
Eric Fiselier 261875054e [Clang] Add __builtin_launder
Summary:
This patch adds `__builtin_launder`, which is required to implement `std::launder`. Additionally GCC provides `__builtin_launder`, so thing brings Clang in-line with GCC.

I'm not exactly sure what magic `__builtin_launder` requires, but  based on previous discussions this patch applies a `@llvm.invariant.group.barrier`. As noted in previous discussions, this may not be enough to correctly handle vtables.

Reviewers: rnk, majnemer, rsmith

Reviewed By: rsmith

Subscribers: kristina, Romain-Geissler-1A, erichkeane, amharc, jroelofs, cfe-commits, Prazek

Differential Revision: https://reviews.llvm.org/D40218

llvm-svn: 349195
2018-12-14 21:11:28 +00:00
Craig Topper 1f2b181689 [Builltins][X86] Provide implementations of __lzcnt16, __lzcnt, __lzcnt64 for MS compatibility. Remove declarations from intrin.h and implementations from lzcntintrin.h
intrin.h had forward declarations for these and lzcntintrin.h had implementations that were only available with -mlzcnt or a -march that supported the lzcnt feature.

For MS compatibility we should always have these builtins available regardless of X86 being the target or the CPU support the lzcnt instruction. The backends should be able to gracefully fallback to something support even if its just shifts and bit ops.

Unfortunately, gcc also implements 2 of the 3 function names here on X86 when lzcnt feature is enabled.

This patch adds builtins for these for MSVC compatibility and drops the forward declarations from intrin.h. To keep the gcc compatibility the two intrinsics that collided have been turned into macros that use the X86 specific builtins with the lzcnt feature check. These macros are only defined when _MSC_VER is not defined. Without them being macros we can get a redefinition error because -ms-extensions doesn't seem to set _MSC_VER but does make the MS builtins available.

Should fix PR40014

Differential Revision: https://reviews.llvm.org/D55677

llvm-svn: 349098
2018-12-14 00:21:02 +00:00
Haibo Huang e177082972 Revert "Declares __cpu_model as dso local"
This reverts r348978

llvm-svn: 348982
2018-12-12 22:39:51 +00:00
Haibo Huang 6b22f59207 Declares __cpu_model as dso local
__builtin_cpu_supports and __builtin_cpu_is use information in __cpu_model to decide cpu features. Before this change, __cpu_model was not declared as dso local. The generated code looks up the address in GOT when reading __cpu_model. This makes it impossible to use these functions in ifunc, because at that time GOT entries have not been relocated. This change makes it dso local.

Differential Revision: https://reviews.llvm.org/D53850

llvm-svn: 348978
2018-12-12 22:04:12 +00:00
Raphael Isemann b23ccecbb0 Misc typos fixes in ./lib folder
Summary: Found via `codespell -q 3 -I ../clang-whitelist.txt -L uint,importd,crasher,gonna,cant,ue,ons,orign,ned`

Reviewers: teemperor

Reviewed By: teemperor

Subscribers: teemperor, jholewinski, jvesely, nhaehnle, whisperity, jfb, cfe-commits

Differential Revision: https://reviews.llvm.org/D55475

llvm-svn: 348755
2018-12-10 12:37:46 +00:00
Craig Topper 6d7a7ef9eb [X86] Remove the addcarry builtins. Leaving only the addcarryx builtins since that matches gcc.
The addcarry and addcarryx builtins do the same thing. The only difference is that addcarryx previously required adx feature.

This commit removes the adx feature check from addcarryx and removes the addcarry builtin. This matches the builtins that gcc has. We don't guarantee compatibility in builtins, but we generally try to be consistent if its not a burden.

llvm-svn: 348738
2018-12-10 06:07:59 +00:00
Fangrui Song 407659ab0a Revert "Revert r347417 "Re-Reinstate 347294 with a fix for the failures.""
It seems the two failing tests can be simply fixed after r348037

Fix 3 cases in Analysis/builtin-functions.cpp
Delete the bad CodeGen/builtin-constant-p.c for now

llvm-svn: 348053
2018-11-30 23:41:18 +00:00
Fangrui Song f5d3335d75 Revert r347417 "Re-Reinstate 347294 with a fix for the failures."
Kept the "indirect_builtin_constant_p" test case in test/SemaCXX/constant-expression-cxx1y.cpp
while we are investigating why the following snippet fails:

  extern char extern_var;
  struct { int a; } a = {__builtin_constant_p(extern_var)};

llvm-svn: 348039
2018-11-30 21:26:09 +00:00
Hans Wennborg 48ee4ad325 Re-commit r347417 "Re-Reinstate 347294 with a fix for the failures."
This was reverted in r347656 due to me thinking it caused a miscompile of
Chromium. Turns out it was the Chromium code that was broken.

llvm-svn: 347756
2018-11-28 14:04:12 +00:00