Commit Graph

6306 Commits

Author SHA1 Message Date
Thomas Lively 3f738d1f5e Reland "[WebAssembly] v128.load{8,16,32,64}_lane instructions"
This reverts commit 7c8385a352 with a typing fix
to an instruction selection pattern.
2020-10-15 19:32:34 +00:00
Thomas Lively 7c8385a352 Revert "[WebAssembly] v128.load{8,16,32,64}_lane instructions"
This reverts commit 7c6bfd90ab.
2020-10-15 15:49:36 +00:00
Thomas Lively 7c6bfd90ab [WebAssembly] v128.load{8,16,32,64}_lane instructions
Prototype the newly proposed load_lane instructions, as specified in
https://github.com/WebAssembly/simd/pull/350. Since these instructions are not
available to origin trial users on Chrome stable, make them opt-in by only
selecting them from intrinsics rather than normal ISel patterns. Since we only
need rough prototypes to measure performance right now, this commit does not
implement all the load and store patterns that would be necessary to make full
use of the offset immediate. However, the full suite of offset tests is included
to make it easy to track improvements in the future.

Since these are the first instructions to have a memarg immediate as well as an
additional immediate, the disassembler needed some additional hacks to be able
to parse them correctly. Making that code more principled is left as future
work.

Differential Revision: https://reviews.llvm.org/D89366
2020-10-15 15:33:10 +00:00
Simon Pilgrim d7fa9030d4 [CodeGen][X86] Emit fshl/fshr ir intrinsics for shiftleft128/shiftright128 ms intrinsics
Now that funnel shift handling is pretty good, we can use the intrinsics directly and avoid a lot of zext/trunc issues.

https://godbolt.org/z/YqhnnM

Differential Revision: https://reviews.llvm.org/D89405
2020-10-15 10:22:41 +01:00
Simon Pilgrim b967b9a711 [CodeGen] Move x86 specific ms intrinsic tests into x86 target subfolder. NFCI. 2020-10-14 17:37:26 +01:00
Jonas Paulsson 625fa47617 Revert "[clang] Improve handling of physical registers in inline assembly operands."
This reverts commit c78da03778.

Temporarily reverted due to https://bugs.llvm.org/show_bug.cgi?id=47837.
2020-10-14 08:42:51 +02:00
Liu, Chen3 bd05afcb3f [X86][NFC] Fix RUN line bug in the testcase
Testcase added in D78699 doesn't work because the wrong RUN line in the
testcase.

Differential Revision: https://reviews.llvm.org/D89361
2020-10-14 12:40:34 +08:00
Jonas Paulsson c78da03778 [clang] Improve handling of physical registers in inline assembly operands.
Change EmitAsmStmt() to

- Not tie physregs with the "+r" constraint, but instead add the hard
  register as an input constraint. This makes "+r" and "=r":"r" look the same
  in the output.

  Background: Macro intensive user code may contain inline assembly
  statements with multiple operands constrained to the same physreg. Such a
  case (with the operand constraints "+r" : "r") currently triggers the
  TwoAddressInstructionPass assertion against any extra use of a tied
  register. Furthermore, TwoAddress will insert a COPY to that physreg even
  though isel has already done so (for the non-tied use), which may lead to a
  second redundant instruction currently. A simple fix for this is to not
  emit tied physreg uses in the first place for the "+r" constraint, which is
  what this patch does.

- Give an error on multiple outputs to the same physical register.

  This should be reported and this is also what GCC does.

Review: Ulrich Weigand, Aaron Ballman, Jennifer Yu, Craig Topper

Differential Revision: https://reviews.llvm.org/D87279
2020-10-13 15:09:52 +02:00
Ties Stuij 208987844f [ARM] Follow AACPS standard for volatile bit-fields access width
This patch resumes the work of D16586.
According to the AAPCS, volatile bit-fields should
be accessed using containers of the widht of their
declarative type. In such case:
```
struct S1 {
  short a : 1;
}
```
should be accessed using load and stores of the width
(sizeof(short)), where now the compiler does only load
the minimum required width (char in this case).
However, as discussed in D16586,
that could overwrite non-volatile bit-fields, which
conflicted with C and C++ object models by creating
data race conditions that are not part of the bit-field,
e.g.
```
struct S2 {
  short a;
  int  b : 16;
}
```
Accessing `S2.b` would also access `S2.a`.

The AAPCS Release 2020Q2
(https://documentation-service.arm.com/static/5efb7fbedbdee951c1ccf186?token=)
section 8.1 Data Types, page 36, "Volatile bit-fields -
preserving number and width of container accesses" has been
updated to avoid conflict with the C++ Memory Model.
Now it reads in the note:
```
This ABI does not place any restrictions on the access widths of bit-fields where the container
overlaps with a non-bit-field member or where the container overlaps with any zero length bit-field
placed between two other bit-fields. This is because the C/C++ memory model defines these as being
separate memory locations, which can be accessed by two threads simultaneously. For this reason,
compilers must be permitted to use a narrower memory access width (including splitting the access into
multiple instructions) to avoid writing to a different memory location. For example, in
struct S { int a:24; char b; }; a write to a must not also write to the location occupied by b, this requires at least two
memory accesses in all current Arm architectures. In the same way, in struct S { int a:24; int:0; int b:8; };,
writes to a or b must not overwrite each other.
```

I've updated the patch D16586 to follow such behavior by verifying that we
only change volatile bit-field access when:
 - it won't overlap with any other non-bit-field member
 - we only access memory inside the bounds of the record
 - avoid overlapping zero-length bit-fields.

Regarding the number of memory accesses, that should be preserved, that will
be implemented by D67399.

Reviewed By: ostannard

Differential Revision: https://reviews.llvm.org/D72932
2020-10-13 10:31:48 +01:00
Simon Pilgrim 6c23cbc560 [X86] Convert integer _mm_reduce_* intrinsics to emit llvm.reduction intrinsics (PR47506)
Emit the equivalent integer reduction intrinsics in IR instead of expanding to shuffle+arithmetic sequences.

The fadd/fmul reductions might be trickier as they assume a similar bisection reduction while the generic intrinsics assume a sequential reduction (intel docs are ambiguous on the correct approach) - I'm not sure if we want to always tag them with reassoc? Anyway, that issue can wait until a separate fp patch along with the fmin/fmax reductions.

Differential Revision: https://reviews.llvm.org/D87604
2020-10-13 09:28:39 +01:00
Wang, Pengfei 412cdcf2ed [X86] Add HRESET instruction.
For more details about these instructions, please refer to the latest ISE document: https://software.intel.com/en-us/download/intel-architecture-instruction-set-extensions-programming-reference.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D89102
2020-10-13 08:47:26 +08:00
Fangrui Song 012dd42e02 [X86] Support -march=x86-64-v[234]
PR47686. These micro-architecture levels are defined in the x86-64 psABI:

https://gitlab.com/x86-psABIs/x86-64-ABI/-/commit/77566eb03bc6a326811cb7e9

GCC 11 will support these levels.

Note, -mtune=x86-64-v[234] are invalid and __builtin_cpu_is cannot be
used on them.

Reviewed By: craig.topper, RKSimon

Differential Revision: https://reviews.llvm.org/D89197
2020-10-12 10:29:46 -07:00
Roman Lebedev 544a6aa267
[InstCombine] combineLoadToOperationType(): don't fold int<->ptr cast into load
And another step towards transforms not introducing inttoptr and/or
ptrtoint casts that weren't there already.

As we've been establishing (see D88788/D88789), if there is a int<->ptr cast,
it basically must stay as-is, we can't do much with it.

I've looked, and the most source of new such casts being introduces,
as far as i can tell, is this transform, which, ironically,
tries to reduce count of casts..

On vanilla llvm test-suite + RawSpeed, @ `-O3`, this results in
-33.58% less `IntToPtr`s (19014 -> 12629)
and +76.20% more `PtrToInt`s (18589 -> 32753),
which is an increase of +20.69% in total.

However just on RawSpeed, where i know there are basically
none `IntToPtr` in the original source code,
this results in -99.27% less `IntToPtr`s (2724 -> 20)
and +82.92% more `PtrToInt`s (4513 -> 8255).
which is again an increase of 14.34% in total.

To me this does seem like the step in the right direction,
we end up with strictly less `IntToPtr`, but strictly more `PtrToInt`,
which seems like a reasonable trade-off.

See https://reviews.llvm.org/D88860 / https://reviews.llvm.org/D88995
for some more discussion on the subject.

(Eventually, `CastInst::isNoopCast()`/`CastInst::isEliminableCastPair`
should be taught about this, yes)

Reviewed By: nlopes, nikic

Differential Revision: https://reviews.llvm.org/D88979
2020-10-11 20:24:28 +03:00
Thomas Lively d8f58bf53a [WebAssembly] Prototype i16x8.q15mulr_sat_s
This saturating, rounding, Q-format multiplication instruction is proposed in
https://github.com/WebAssembly/simd/pull/365.

Differential Revision: https://reviews.llvm.org/D88968
2020-10-09 21:17:53 +00:00
Scott Linder 40cef5a00e [clang] Add a test for CGDebugInfo treatment of blocks
There doesn't seem to be a direct test of this, and I'm planning to make
future changes which will affect it.

I'm not particularly familiar with the blocks extension, so suggestions
for better tests are welcome.

Differential Revision: https://reviews.llvm.org/D88754
2020-10-09 19:03:21 +00:00
Liu, Chen3 26cfb6e562 [X86] Passing union type through register
For example:

  union M256 {
    double d;
    __m256 m;
  };
  extern void foo1(union M256 A);
  union M256 m1;
  void test() {
    foo1(m1);
  }

clang will pass m1 through stack which does not follow the ABI.

Differential Revision: https://reviews.llvm.org/D78699
2020-10-09 11:24:29 +08:00
Arthur Eubanks afff74e5c2 [HWAsan][NewPM] Handle hwasan like other sanitizers
Move it as an EP callback (-O[123]) or in addSanitizersAtO0.

This makes it not run in ThinLTO pre-link (like the other sanitizers),
so don't check LTO runs in hwasan-new-pm.c. Changing its position also
seems to change the generated IR. I think we just need to make sure the
pass runs.

Reviewed By: leonardchan

Differential Revision: https://reviews.llvm.org/D88936
2020-10-08 14:43:21 -07:00
David Green a15bd0bfc2 [AIX] Add REQUIRES for powerpc test. NFC 2020-10-08 18:40:09 +01:00
diggerlin 92bca12843 [AIX] add new option -mignore-xcoff-visibility
SUMMARY:

In IBM compiler xlclang , there is an option -fnovisibility which suppresses visibility. For more details see: https://www.ibm.com/support/knowledgecenter/SSGH3R_16.1.0/com.ibm.xlcpp161.aix.doc/compiler_ref/opt_visibility.html.

We need to add the option -mignore-xcoff-visibility for compatibility with the IBM AIX OS (as the option is enabled by default in AIX). With this option llvm does not emit any visibility attribute to ASM or XCOFF object file.

The option only work on the AIX OS, for other non-AIX OS using the option will report an unsupported options error.

In AIX OS:

1.1  the option -mignore-xcoff-visibility is enabled by default , if there is not -fvisibility=* and -mignore-xcoff-visibility explicitly in the clang command .

1.2 if there is -fvisibility=* explicitly but not -mignore-xcoff-visibility  explicitly in the clang command.  it will generate visibility attributes.

1.3 if there are  both  -fvisibility=* and  -mignore-xcoff-visibility  explicitly in the clang command. The option  "-mignore-xcoff-visibility" wins , it do not emit the visibility attribute.

The option -mignore-xcoff-visibility has no effect on visibility attribute when compile with -emit-llvm option to generated LLVM IR.

Reviewer: daltenty,Jason Liu

Differential Revision: https://reviews.llvm.org/D87451
2020-10-08 09:34:58 -04:00
Simon Pilgrim 42d91438ad [CodeGen][X86] Cleanup labels on some sse/avx intrinsics tests. NFCI.
Add some missing CHECK-LABEL lines.

Remove leading '@' so it'll be possible to match against c and c++ builds in a future patch.
2020-10-07 19:33:14 +01:00
Fanbo Meng 9908ee5670 [SystemZ][z/OS] Add test of zero length bitfield type size larger than target zero length bitfield boundary
Reviewed By: hubert.reinterpretcast

Differential Revision: https://reviews.llvm.org/D88963
2020-10-07 11:34:13 -04:00
Fanbo Meng 43cd0a98d1 [SystemZ][z/OS] Set default alignment rules for z/OS target
Update RUN line to fix lit failure

Differential Revision: https://reviews.llvm.org/D88845
2020-10-06 14:21:21 -04:00
Fanbo Meng c781dc74a8 [SystemZ][z/OS] Set default alignment rules for z/OS target
Set the default alignment control variables for z/OS target and add test case for alignment rules on z/OS.

Reviewed By: abhina.sreeskantharajan

Differential Revision: https://reviews.llvm.org/D88845
2020-10-06 13:16:15 -04:00
David Spickett f0a78bdfdc [AArch64] Correct parameter type for unsigned Neon scalar shift intrinsics
In the following intrinsics the shift amount
(parameter 2) should be signed.

vqshlb_u8 vqshlh_u16  vqshls_u32  vqshld_u64
vqrshlb_u8 vqrshlh_u16 vqrshls_u32 vqrshld_u64
vshld_u64
vrshld_u64

See https://developer.arm.com/documentation/ihi0073/latest

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D88013
2020-10-06 11:34:58 +01:00
Roman Lebedev e00f189d39
[InstCombine] Revert rL226781 "Teach InstCombine to canonicalize loads which are only ever stored to always use a legal integer type if one is available." (PR47592)
(it was introduced in https://lists.llvm.org/pipermail/llvm-dev/2015-January/080956.html)

This canonicalization seems dubious.

Most importantly, while it does not create `inttoptr` casts by itself,
it may cause them to appear later, see e.g. D88788.

I think it's pretty obvious that it is an undesirable outcome,
by now we've established that seemingly no-op `inttoptr`/`ptrtoint` casts
are not no-op, and are no longer eager to look past them.
Which e.g. means that given
```
%a = load i32
%b = inttoptr %a
%c = inttoptr %a
```
we likely won't be able to tell that `%b` and `%c` is the same thing.

As we can see in D88789 / D88788 / D88806 / D75505,
we can't really teach SCEV about this (not without the https://bugs.llvm.org/show_bug.cgi?id=47592 at least)
And we can't recover the situation post-inlining in instcombine.

So it really does look like this fold is actively breaking
otherwise-good IR, in a way that is not recoverable.
And that means, this fold isn't helpful in exposing the passes
that are otherwise unaware of these patterns it produces.

Thusly, i propose to simply not perform such a canonicalization.
The original motivational RFC does not state what larger problem
that canonicalization was trying to solve, so i'm not sure
how this plays out in the larger picture.

On vanilla llvm test-suite + RawSpeed, this results in
increase of asm instructions and final object size by ~+0.05%
decreases final count of bitcasts by -4.79% (-28990),
ptrtoint casts by -15.41% (-3423),
and of inttoptr casts by -25.59% (-6919, *sic*).
Overall, there's -0.04% less IR blocks, -0.39% instructions.

See https://bugs.llvm.org/show_bug.cgi?id=47592

Differential Revision: https://reviews.llvm.org/D88789
2020-10-06 00:00:30 +03:00
Yuanfang Chen 2c94d88e07 [NewPM] collapsing nested pass mangers of the same type
This is one of the reason for extra invalidations in D84959. In
practice, I don't think we have use cases needing this. This simplifies
the pipeline a bit and prune corner cases when considering
invalidations.

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D85676
2020-10-04 15:57:13 -07:00
Craig Topper a02b449bb1 [X86] Sync AESENC/DEC Key Locker builtins with gcc.
For the wide builtins, pass a single input and output pointer to
the builtins. Emit the GEPs and input loads from CGBuiltin.
2020-10-04 12:09:41 -07:00
Craig Topper 230c57b0bd [X86] Synchronize the encodekey builtins with gcc. Don't assume void* is 16 byte aligned.
We were taking multiple pointer arguments in the builtin.
gcc accepts a single void*.

The cast from void* to _m128i* caused the IR generation to assume
the pointer was aligned.

Instead make the builtin take a single void*, emit i8* GEPs to
adjust then cast to <2 x i64>* and perform a store with align of 1.
2020-10-04 12:09:35 -07:00
Roman Lebedev aaae13d0c2
[NFC][clang][codegen] Autogenerate a few ARM SVE tests that are being affected by an upcoming patch 2020-10-04 19:54:09 +03:00
Esme-Yi e3475f5b91 [PowerPC] Add builtins for xvtdiv(dp|sp) and xvtsqrt(dp|sp).
Summary: This patch implements the builtins for xvtdivdp, xvtdivsp, xvtsqrtdp, xvtsqrtsp.
The instructions correspond to the following builtins:
int vec_test_swdiv(vector double v1, vector double v2);
int vec_test_swdivs(vector float v1, vector float v2);
int vec_test_swsqrt(vector double v1);
int vec_test_swsqrts(vector float v1);
This patch depends on D88274, which fixes the bug in copying from CRRC to GPRC/G8RC.

Reviewed By: steven.zhang, amyk

Differential Revision: https://reviews.llvm.org/D88278
2020-10-04 16:24:20 +00:00
Arthur Eubanks eb55735073 Reland [AlwaysInliner] Update BFI when inlining
Reviewed By: davidxl

Differential Revision: https://reviews.llvm.org/D88324
2020-10-02 10:46:57 -07:00
Sanjay Patel 149f5b573c [APFloat] convert SNaN to QNaN in convert() and raise Invalid signal
This is an alternate fix (see D87835) for a bug where a NaN constant
gets wrongly transformed into Infinity via truncation.
In this patch, we uniformly convert any SNaN to QNaN while raising
'invalid op'.
But we don't have a way to directly specify a 32-bit SNaN value in LLVM IR,
so those are always encoded/decoded by calling convert from/to 64-bit hex.

See D88664 for a clang fix needed to allow this change.

Differential Revision: https://reviews.llvm.org/D88238
2020-10-01 14:37:38 -04:00
Sanjay Patel 81921ebc43 [CodeGen] improve coverage for float (32-bit) type of NAN; NFC
Goes with D88238
2020-09-30 15:10:25 -04:00
Sanjay Patel 187686bea3 [CodeGen] add test for NAN creation; NFC
This goes with the APFloat change proposed in
D88238.
This is copied from the MIPS-specific test in
builtin-nan-legacy.c to verify that the normal
behavior is correct on other targets without the
complication of an inverted quiet bit.
2020-09-30 13:22:12 -04:00
Xiangling Liao 3a7487f903 [FE] Use preferred alignment instead of ABI alignment for complete object when applicable
On some targets, preferred alignment is larger than ABI alignment in some cases. For example,
on AIX we have special power alignment rules which would cause that. Previously, to support
those cases, we added a “PreferredAlignment” field in the `RecordLayout` to store the AIX
special alignment values in “PreferredAlignment” as the community suggested.

However, that patch alone is not enough. There are places in the Clang where `PreferredAlignment`
should have been used instead of ABI-specified alignment. This patch is aimed at fixing those
spots.

Differential Revision: https://reviews.llvm.org/D86790
2020-09-30 10:48:28 -04:00
Xiang1 Zhang 413577a879 [X86] Support Intel Key Locker
Key Locker provides a mechanism to encrypt and decrypt data with an AES key without having access
to the raw key value by converting AES keys into “handles”. These handles can be used to perform the
same encryption and decryption operations as the original AES keys, but they only work on the current
system and only until they are revoked. If software revokes Key Locker handles (e.g., on a reboot),
then any previous handles can no longer be used.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D88398
2020-09-30 18:08:45 +08:00
Richard Smith 1c604a9f5f Recognize setjmp and friends as builtins even if jmp_buf is not declared yet.
This happens in glibc's headers. It's important that we recognize these
functions so that we can mark them as returns_twice.

Differential Revision: https://reviews.llvm.org/D88518
2020-09-29 15:53:17 -07:00
Fangrui Song 3681be876f Add -fprofile-update={atomic,prefer-atomic,single}
GCC 7 introduced -fprofile-update={atomic,prefer-atomic} (prefer-atomic is for
best efforts (some targets do not support atomics)) to increment counters
atomically, which is exactly what we have done with -fprofile-instr-generate
(D50867) and -fprofile-arcs (b5ef137c11).
This patch adds the option to clang to surface the internal options at driver level.

GCC 7 also turned on -fprofile-update=prefer-atomic when -pthread is specified,
but it has performance regression
(https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89307). So we don't follow suit.

Differential Revision: https://reviews.llvm.org/D87737
2020-09-29 10:43:23 -07:00
Tres Popp eb9f7c28e5 Revert "OpaquePtr: Add type to sret attribute"
This reverts commit 55c4ff91bd.

Issues were introduced as discussed in https://reviews.llvm.org/D88241
where this change made previous bugs in the linker and BitCodeWriter
visible.
2020-09-29 10:31:04 +02:00
Yonghong Song 54d9f743c8 BPF: move AbstractMemberAccess and PreserveDIType passes to EP_EarlyAsPossible
Move abstractMemberAccess and PreserveDIType passes as early as
possible, right after clang code generation.

Currently, compiler may transform the above code
  p1 = llvm.bpf.builtin.preserve.struct.access(base, 0, 0);
  p2 = llvm.bpf.builtin.preserve.struct.access(p1, 1, 2);
  a = llvm.bpf.builtin.preserve_field_info(p2, EXIST);
  if (a) {
    p1 = llvm.bpf.builtin.preserve.struct.access(base, 0, 0);
    p2 = llvm.bpf.builtin.preserve.struct.access(p1, 1, 2);
    bpf_probe_read(buf, buf_size, p2);
  }
to
  p1 = llvm.bpf.builtin.preserve.struct.access(base, 0, 0);
  p2 = llvm.bpf.builtin.preserve.struct.access(p1, 1, 2);
  a = llvm.bpf.builtin.preserve_field_info(p2, EXIST);
  if (a) {
    bpf_probe_read(buf, buf_size, p2);
  }
and eventually assembly code looks like
  reloc_exist = 1;
  reloc_member_offset = 10; //calculate member offset from base
  p2 = base + reloc_member_offset;
  if (reloc_exist) {
    bpf_probe_read(bpf, buf_size, p2);
  }
if during libbpf relocation resolution, reloc_exist is actually
resolved to 0 (not exist), reloc_member_offset relocation cannot
be resolved and will be patched with illegal instruction.
This will cause verifier failure.

This patch attempts to address this issue by do chaining
analysis and replace chains with special globals right
after clang code gen. This will remove the cse possibility
described in the above. The IR typically looks like
  %6 = load @llvm.sk_buff:0:50$0:0:0:2:0
  %7 = bitcast %struct.sk_buff* %2 to i8*
  %8 = getelementptr i8, i8* %7, %6
for a particular address computation relocation.

But this transformation has another consequence, code sinking
may happen like below:
  PHI = <possibly different @preserve_*_access_globals>
  %7 = bitcast %struct.sk_buff* %2 to i8*
  %8 = getelementptr i8, i8* %7, %6

For such cases, we will not able to generate relocations since
multiple relocations are merged into one.

This patch introduced a passthrough builtin
to prevent such optimization. Looks like inline assembly has more
impact for optimizaiton, e.g., inlining. Using passthrough has
less impact on optimizations.

A new IR pass is introduced at the beginning of target-dependent
IR optimization, which does:
  - report fatal error if any reloc global in PHI nodes
  - remove all bpf passthrough builtin functions

Changes for existing CORE tests:
  - for clang tests, add "-Xclang -disable-llvm-passes" flags to
    avoid builtin->reloc_global transformation so the test is still
    able to check correctness for clang generated IR.
  - for llvm CodeGen/BPF tests, add "opt -O2 <ir_file> | llvm-dis" command
    before "llc" command since "opt" is needed to call newly-placed
    builtin->reloc_global transformation. Add target triple in the IR
    file since "opt" requires it.
  - Since target triple is added in IR file, if a test may produce
    different results for different endianness, two tests will be
    created, one for bpfeb and another for bpfel, e.g., some tests
    for relocation of lshift/rshift of bitfields.
  - field-reloc-bitfield-1.ll has different relocations compared to
    old codes. This is because for the structure in the test,
    new code returns struct layout alignment 4 while old code
    is 8. Align 8 is more precise and permits double load. With align 4,
    the new mechanism uses 4-byte load, so generating different
    relocations.
  - test intrinsic-transforms.ll is removed. This is used to test
    cse on intrinsics so we do not lose metadata. Now metadata is attached
    to global and not instruction, it won't get lost with cse.

Differential Revision: https://reviews.llvm.org/D87153
2020-09-28 16:56:22 -07:00
Craig Topper 288c5776c9 [X86] Use inlineasm flag output for the _bittest* intrinsics.
Instead of expliciting emitting a setc in the inline asm instructions,
we can use flag output. This allows the backend to use the flag
directly if it is needed by a branch. Previously we needed a test
instruction to convert the register back to a flag.

If the flag can't be used directly, the backend will emit a setcc.

Differential Revision: https://reviews.llvm.org/D87888
2020-09-28 13:33:22 -07:00
Baptiste Saleil 0156914275 [PowerPC] Legalize v256i1 and v512i1 and implement load and store of these types
This patch legalizes the v256i1 and v512i1 types that will be used for MMA.

It implements loads and stores of these types.
v256i1 is a pair of VSX registers, so for this type, we load/store the two
underlying registers. v512i1 is used for MMA accumulators. So in addition to
loading and storing the 4 associated VSX registers, we generate instructions to
prime (copy the VSX registers to the accumulator) after loading and unprime
(copy the accumulator back to the VSX registers) before storing.

This patch also adds the UACC register class that is necessary to implement the
loads and stores. This class represents accumulator in their unprimed form and
allow the distinction between primed and unprimed accumulators to avoid invalid
copies of the VSX registers associated with primed accumulators.

Differential Revision: https://reviews.llvm.org/D84968
2020-09-28 14:39:37 -05:00
Michael Liao 5dbf80cad9 [clang][codegen] Annotate `correctly-rounded-divide-sqrt-fp-math` fn-attr for OpenCL only.
- `-cl-fp32-correctly-rounded-divide-sqrt` is an OpenCL-specific option
  and `correctly-rounded-divide-sqrt-fp-math` should be added for OpenCL
  at most.

Differential revision: https://reviews.llvm.org/D88303
2020-09-28 11:40:32 -04:00
Florian Hahn 915310bf14 Revert "[DSE] Switch to MemorySSA-backed DSE by default."
There appears to be a mis-compile with MemorySSA-backed DSE in
combination with llvm.lifetime.end. It currently appears like
DSE is doing the right thing and the llvm.lifetime.end markers
are incorrect. The reverted patch uncovers the mis-compile.

This patch temporarily switches back to the legacy DSE
implementation, while we investigate.

This reverts commit 9d172c8e9c.
2020-09-26 18:35:27 +01:00
Matt Arsenault 55c4ff91bd OpaquePtr: Add type to sret attribute
Make the corresponding change that was made for byval in
b7141207a4. Like byval, this requires a
bulk update of the test IR tests to include the type before this can
be mandatory.
2020-09-25 14:07:30 -04:00
Chris Bowler f330d9f163 [PPC] [AIX] Implement calling convention IR for C99 complex types on AIX
Add AIX calling convention logic to Clang for C99 complex types on AIX

Differential Revision: https://reviews.llvm.org/D88130
2020-09-25 07:43:31 -04:00
Momchil Velikov a88c722e68 [AArch64] PAC/BTI code generation for LLVM generated functions
PAC/BTI-related codegen in the AArch64 backend is controlled by a set
of LLVM IR function attributes, added to the function by Clang, based
on command-line options and GCC-style function attributes. However,
functions, generated in the LLVM middle end (for example,
asan.module.ctor or __llvm_gcov_write_out) do not get any attributes
and the backend incorrectly does not do any PAC/BTI code generation.

This patch record the default state of PAC/BTI codegen in a set of
LLVM IR module-level attributes, based on command-line options:

* "sign-return-address", with non-zero value means generate code to
  sign return addresses (PAC-RET), zero value means disable PAC-RET.

* "sign-return-address-all", with non-zero value means enable PAC-RET
  for all functions, zero value means enable PAC-RET only for
  functions, which spill LR.

* "sign-return-address-with-bkey", with non-zero value means use B-key
  for signing, zero value mean use A-key.

This set of attributes are always added for AArch64 targets (as
opposed, for example, to interpreting a missing attribute as having a
value 0) in order to be able to check for conflicts when combining
module attributed during LTO.

Module-level attributes are overridden by function level attributes.
All the decision making about whether to not to generate PAC and/or
BTI code is factored out into AArch64FunctionInfo, there shouldn't be
any places left, other than AArch64FunctionInfo, which directly
examine PAC/BTI attributes, except AArch64AsmPrinter.cpp, which
is/will-be handled by a separate patch.

Differential Revision: https://reviews.llvm.org/D85649
2020-09-25 11:47:14 +01:00
Chris Bowler 64b8a633a8 [NFC] [PPC] Add PowerPC expected IR tests for C99 complex
Adding this test so that I can extend it in a follow on patch with
expected IR for AIX when I implement complex handling in
AIXABIInfo.

Reviewed By: daltenty, ZarkoCA

Differential Revision: https://reviews.llvm.org/D88105
2020-09-24 23:28:40 -04:00
Ian Levesque 6f7fbdd285 [xray] Function coverage groups
Add the ability to selectively instrument a subset of functions by dividing the functions into N logical groups and then selecting a group to cover. By selecting different groups over time you could cover the entire application incrementally with lower overhead than instrumenting the entire application at once.

Differential Revision: https://reviews.llvm.org/D87953
2020-09-24 22:09:53 -04:00
Amy Kwan 6b136b19cb [Power10] Implement custom codegen for the vec_replace_elt and vec_replace_unaligned builtins.
This patch implements custom codegen for the vec_replace_elt and
vec_replace_unaligned builtins.

These builtins map to the @llvm.ppc.altivec.vinsw and @llvm.ppc.altivec.vinsd
intrinsics depending on the arguments. The main motivation for doing custom
codegen for these intrinsics is because there are float and double versions of
the builtin. Normally, the converting the float to an integer would be done via
fptoui in the IR. This is incorrect as fptoui truncates the value and we must
ensure the value is not truncated. Therefore, we provide custom codegen to utilize
bitcast instead as bitcasts do not truncate.

Differential Revision: https://reviews.llvm.org/D83500
2020-09-23 22:55:25 -05:00