Commit Graph

387121 Commits

Author SHA1 Message Date
Jay Foad 16d707e656 [AMDGPU] Fix v_swap_b32 formation on physical registers
As explained in the comments, matchSwap matches:

// mov t, x
// mov x, y
// mov y, t

and turns it into:

// mov t, x (t is potentially dead and move eliminated)
// v_swap_b32 x, y

On physical registers we don't have full use-def chains so the check
for T being live-out was not working properly with subregs/superregs.

Differential Revision: https://reviews.llvm.org/D101546
2021-04-29 20:53:40 +01:00
Alexey Bataev 12c51f2358 [COST] Improve shuffle kind detection if shuffle mask is provided.
Added an extra analysis for better choosing of shuffle kind in
getShuffleCost functions for better cost estimation if mask was
provided.

Differential Revision: https://reviews.llvm.org/D100865
2021-04-29 12:48:00 -07:00
Alexey Bataev 6e859f3cd4 Revert "[COST] Improve shuffle kind detection if shuffle mask is provided."
This reverts commit 9239932221 to fix
a compiler crash on mask checks.
2021-04-29 12:40:33 -07:00
Jez Ng 07884152ec [lld-macho] Remove stray file 2021-04-29 15:33:23 -04:00
Sriraman Tallam a64411916c Basic block sections for functions with implicit-section-name attribute
Functions can have section names set via #pragma or section attributes,
basic block sections should be correctly named for such functions.

With #pragma, the expectation is that all functions in that file are placed
in the same section in the final binary. Basic block sections should be
correctly named with the unique flag set so that the final binary has all the
basic blocks of the function in that named section. This patch fixes the bug
by calling getExplictSectionGlobal when implicit-section-name attribute is set
to make sure the function's basic blocks get the correct section name.

Differential Revision: https://reviews.llvm.org/D101311
2021-04-29 12:29:34 -07:00
Jez Ng d9c8ffa958 [lld-macho][nfc] Clean up header.s test
I don't think it's super worthwhile to test the dylib headers outputs of
all the different archs when x86_64 is the only one that has interesting
behavior.

Motivated by my upcoming addition of arm32...
2021-04-29 15:11:23 -04:00
Jez Ng 7e115da5df [lld-macho] Make everything PIE by default
Modern versions of macOS (>= 10.7) and in general all modern Mach-O
target archs want PIEs by default. ld64 defaults to PIE for iOS >= 4.3,
as well as for all versions of watchOS and simulators. Basically all the
platforms LLD is likely to target want PIE. So instead of cluttering LLD's
code with legacy version checks, I think it's simpler to just default to
PIE for everything.

Note that `-no_pie` still works, so users can still opt out of it.

Reviewed By: #lld-macho, thakis, MaskRay

Differential Revision: https://reviews.llvm.org/D101513
2021-04-29 15:11:23 -04:00
Aart Bik a6d92a9711 [mlir][sparse] migrate sparse operations into new sparse tensor dialect
This is the very first step toward removing the glue and clutter from linalg and
replace it with proper sparse tensor types. This revision migrates the LinalgSparseOps
into SparseTensorOps of a sparse tensor dialect. This also provides a new home for
sparse tensor related transformation.

NOTE: the actual replacement with sparse tensor types (and removal of linalg glue/clutter)
will follow but I am trying to keep the amount of changes per revision manageable.

Reviewed By: bixia

Differential Revision: https://reviews.llvm.org/D101488
2021-04-29 12:09:10 -07:00
Sanjay Patel 0f8b6686ac [InstCombine] narrow popcount with zext operand
https://llvm.org/PR50141
2021-04-29 15:07:16 -04:00
Sanjay Patel b142e9d1c5 [InstCombine] add tests for popcount with zext operand; NFC
PR50141
2021-04-29 15:07:16 -04:00
Tim Northover c1b7460b5b Revert "RegAlloc: do not consider liveins to EH-pad successors as liveout."
Some liveins *can* come from this block (e.g. any SSA value except the call),
it's only the ones that produce `landingpad` values that can't and I didn't
think it through properly.
2021-04-29 20:00:07 +01:00
Petar Avramovic c34900e133 AMDGPU/GlobalISel: Fix selection of image intrinsics with unused return
When atomic image intrinsic return value is unused, register class for
destination of a sub-register copy of return value ends up not being set.
This copy then hits 'Register class not set' assert later.
If return value has uses, register class is determined by use instruction.
Fix is to not create sub-register copy when image intrinsic destination has
no uses because it would be deleted by dead-mi-elimination later anyway.

Differential Revision: https://reviews.llvm.org/D101448
2021-04-29 20:56:03 +02:00
Dan Liew 2d42b2ee7b [ASan] Rename `-fsanitize-address-destructor-kind=` to drop the `-kind` suffix.
Renaming the option is based on discussions in https://reviews.llvm.org/D101122.

It is normally not a good idea to rename driver flags but this flag is
new enough and obscure enough that it is very unlikely to have adopters.

While we're here also drop the `<kind>` metavar. It's not necessary and
is actually inconsistent with the documentation in
`clang/docs/ClangCommandLineReference.rst`.

Differential Revision: https://reviews.llvm.org/D101491
2021-04-29 11:55:42 -07:00
Tim Northover 438a63e13b RegAlloc: do not consider liveins to EH-pad successors as liveout.
These registers get defined by the runtime, not the block being allocated, and
treating them as preassigned in RegAllocFast adds extra pressure, sometimes
enough to make the function unallocatable.
2021-04-29 19:34:49 +01:00
Victor Huang ae3377c553 [AIX][TLS] Add ASM portion changes to support TLSGD relocations to XCOFF objects
- Add new variantKinds for the symbol's variable offset and region handle
- Print the proper relocation specifier @gd in the asm streamer when emitting
  the TC Entry for the variable offset for the symbol
- Fix the switch section failure between the TC Entry of variable offset and
  region handle
- Put .__tls_get_addr symbol in the ProgramCodeSects with XTY_ER property

Reviewed by: sfertile

Differential Revision: https://reviews.llvm.org/D100956
2021-04-29 13:18:59 -05:00
Roman Lebedev cc63203908
[SimplifyCFG] Common code sinking: fix application of profitability check
The profitability check is: we don't want to create more than a single PHI
per instruction sunk. We need to create the PHI unless we'll sink
all of it's would-be incoming values.

But there is a caveat there.
This profitability check doesn't converge on the first iteration!
If we first decide that we want to sink 10 instructions,
but then determine that 5'th one is unprofitable to sink,
that may result in us not sinking some instructions that
resulted in determining that some other instruction
we've determined to be profitable to sink becoming unprofitable.

So we need to iterate until we converge, as in determine
that all leftover instructions are profitable to sink.

But, the direct approach of just re-iterating seems dumb,
because in the worst case we'd find that the last instruction
is unprofitable, which would result in revisiting instructions
many many times.

Instead, i think we can get away with just two passes - forward and backward.
However then it isn't obvious what is the most performant way to update
InstructionsToSink.
2021-04-29 21:11:40 +03:00
Sam Clegg a6f406480a [lld][WebAssembly] Add `--export-if-defined`
Unlike the existing `--export` option this will not causes errors
or warnings if the specified symbol is not defined.

See: https://github.com/emscripten-core/emscripten/issues/13736

Differential Revision: https://reviews.llvm.org/D99887
2021-04-29 10:58:45 -07:00
Mark de Wever 9393060f90 [libc++] Fixes std::to_chars for bases != 10.
While working on D70631, Microsoft's unit tests discovered an issue.
Our `std::to_chars` implementation for bases != 10 uses the range
`[first,last)` as temporary buffer. This violates the contract for
to_chars:
[charconv.to.chars]/1 http://eel.is/c++draft/charconv#to.chars-1
`to_chars_result to_chars(char* first, char* last, see below value, int base = 10);`
"If the member ec of the return value is such that the value is equal to
the value of a value-initialized errc, the conversion was successful and
the member ptr is the one-past-the-end pointer of the characters
written."

Our implementation modifies the range `[member ptr, last)`, which causes
Microsoft's test to fail. Their test verifies the buffer
`[member ptr, last)` is unchanged. (The test is only done when the
conversion is successful.)

While looking at the code I noticed the performance for bases != 10 also
is suboptimal. This is tracked in D97705.

This patch fixes the issue and adds a benchmark. This benchmark will be
used as baseline for D97705.

Reviewed By: #libc, Quuxplusone, zoecarver

Differential Revision: https://reviews.llvm.org/D100722
2021-04-29 19:56:28 +02:00
Petr Hosek ba631240ae [CMake] Set correct CXX_FLAGS for relative-vtables variants
We overrite CXX_FLAGS to enable relative vtables, but doing so
overwrites generic Fuchsia CXX_FLAGS leading to a build failure
on Windows.

Differential Revision: https://reviews.llvm.org/D101551
2021-04-29 10:34:37 -07:00
Raphael Isemann a76df78470 [lldb] Make the NSSet formatter faster and less prone to infinite recursion
Right now to get the 'NSSet *` pointer value we first derefence it and then take
the address of the result.

Beside being inefficient this potentially can cause an infinite recursion if the
`pointer` value we get is a pointer of a type that the TypeSystem can't
derefence. If the pointer is for example some form of `void *` that the dynamic
type resolution can't resolve to an actual type, then the `Derefence` call goes
back to asking the formatters how to reference it. If the NSSet formatter then
checks if it's an NSSet variation under the hood then we just end infinitely
often recursion.

In practice this seems to happen with some form of Builtin.RawPointer we get
from a NSDictionary in Swift.

FWIW, no other formatter is doing the same deref->addressOf as here and there
doesn't seem to be any specific reason to do so in the git history (it's just
part of the initial formatter commit)

Reviewed By: JDevlieghere

Differential Revision: https://reviews.llvm.org/D101537
2021-04-29 19:13:43 +02:00
LLVM GN Syncbot 5fbea82692 [gn build] Port df323ba445 2021-04-29 16:59:58 +00:00
Benjamin Kramer df323ba445 Revert "[X86] Support AMX fast register allocation"
This reverts commit 3b8ec86fd5.

Revert "[X86] Refine AMX fast register allocation"

This reverts commit c3f95e9197.

This pass breaks using LLVM in a multi-threaded environment by
introducing global state.
2021-04-29 18:56:33 +02:00
Vitaly Buka ea7618684c Revert "[scudo] Use require_constant_initialization"
This reverts commit 7ad4dee3e7.
2021-04-29 09:55:54 -07:00
Sanjay Patel 1089158c5a [ConstantFolding] propagate poison through vector reduction intrinsics 2021-04-29 12:54:20 -04:00
Martin Storsjö 203096adfc [libcxx] [test] Include more libraries that normally are linked automatically
As the libcxx tests link with -nostdlib, libraries that normally
are added by default by the compiler driver has to be added
manually.

The "oldnames" library is automatically added when driving linking
with clang-cl. When linking with the plain clang driver, as the
libcxx tests do, the clang driver does the same but only since Clang
12.0). But when linking with -nostdlib, like the libcxx tests do,
the driver defaults aren't added at all, and we need to specify the
defaults manually.

This allows removing a TODO from the Windows CI setup; it turns out
that upgrading to Clang 12.0 didn't help here as expected, sorry about
that mixup.

Differential Revision: https://reviews.llvm.org/D101434
2021-04-29 19:54:07 +03:00
Vitaly Buka 7ad4dee3e7 [scudo] Use require_constant_initialization
Attribute guaranties safe static initialization of globals.

Differential Revision: https://reviews.llvm.org/D101514
2021-04-29 09:47:59 -07:00
Craig Topper dcdda2bdf2 [RISCV] Teach DAG combine to fold (and (select_cc lhs, rhs, cc, -1, c), x) -> (select_cc lhs, rhs, cc, x, (and, x, c))
Similar for or/xor with 0 in place of -1.

This is the canonical form produced by InstCombine for something like `c ? x & y : x;` Since we have to use control flow to expand select we'll usually end up with a mv in basic block. By folding this we may be able to pull the and/or/xor into the block instead and avoid a mv instruction.

The code here is based on code from ARM that uses this to create predicated instructions. I'm doing it on SELECT_CC so it happens late, but we could do it on select earlier which is what ARM does. I'm not sure if we lose any combine opportunities if we do it earlier.

I left out add and sub because this can separate sext.w from the add/sub. It also made a conditional i64 addition/subtraction on RV32 worse. I guess both of those would be fixed by doing this earlier on select.

The select-binop-identity.ll test has not been commited yet, but I made the diff show the changes to it.

Reviewed By: luismarques

Differential Revision: https://reviews.llvm.org/D101485
2021-04-29 09:43:51 -07:00
Craig Topper 60216adef1 [RISCV] Add test cases for D101485. NFC 2021-04-29 09:43:51 -07:00
Alexey Bataev 9239932221 [COST] Improve shuffle kind detection if shuffle mask is provided.
Added an extra analysis for better choosing of shuffle kind in
getShuffleCost functions for better cost estimation if mask was
provided.

Differential Revision: https://reviews.llvm.org/D100865
2021-04-29 09:42:56 -07:00
Fangrui Song 47a686d5cb [unittest] Fix Frontend/OpenMPIRBuilderTest.cpp -Wsign-compare after D89671 2021-04-29 09:37:58 -07:00
Fangrui Song b7c6697813 [DebugInfo] Add tests that we emit .eh_frame instead of .debug_frame
Add tests which can catch the issue in 0ce723cb22
(If any function needs CFISection::EH, the module should use CFISection::EH).

Reviewed By: echristo

Differential Revision: https://reviews.llvm.org/D101339
2021-04-29 09:35:48 -07:00
Sanjay Patel 678018138d [ConstProp] add tests for vector reductions of poison; NFC 2021-04-29 12:20:59 -04:00
Sanjay Patel 71597d40e8 [ConstantFolding] refactor helper for vector reductions; NFC
We should handle other cases (undef/poison), so reduce
the duplication of repeated switches.
2021-04-29 12:09:22 -04:00
Sanjay Patel f4b1272d3d [ADT] fix typo in code block comment; NFC 2021-04-29 12:09:22 -04:00
Anirudh Prasad ded0a70aeb [AsmParser][SystemZ][z/OS] Reject "Dot" as current PC on z/OS
- Currently, the "." (Dot) character, when not identifying an Identifier or a Constant, refers to the current PC (Program Counter)
- However, in z/OS, for the HLASM dialect, it strictly accepts only the "*" as the current PC (Support for this will be put up in a follow-up patch)
- The changes in this patch allow individual platforms to choose whether they would like to use the "." (Dot) character as a marker for the current PC or not.
- It is achieved by introducing a new field in MCAsmInfo.h called `DotIsPC` (similar to `DollarIsPC`)

Reviewed By: abhina.sreeskantharajan

Differential Revision: https://reviews.llvm.org/D100975
2021-04-29 11:58:54 -04:00
Fangrui Song c9b1bd1012 [ELF] Support .rela.eh_frame with unordered r_offset values
GNU ld -r can create .rela.eh_frame with unordered r_offset values.
(With LLD, we can craft such a case by reordering sections in .eh_frame.)
This is currently unsupported and will trigger
`assert(pieces[i].inputOff <= off ...` in `OffsetGetter::get`
(the content is corrupted in a -DLLVM_ENABLE_ASSERTIONS=off build).
This patch supports this case.

Reviewed By: jhenderson

Differential Revision: https://reviews.llvm.org/D101116
2021-04-29 08:51:09 -07:00
Craig Topper 0c330afdfa [RISCV] Enable SPLAT_VECTOR for fixed vXi64 types on RV32.
This replaces D98479.

This allows type legalization to form SPLAT_VECTOR_PARTS so we don't
lose the splattedness when the scalar type is split.

I'm handling SPLAT_VECTOR_PARTS for fixed vectors separately so
we can continue using non-VL nodes for scalable vectors.

I limited to RV32+vXi64 because DAGCombiner::visitBUILD_VECTOR likes
to form SPLAT_VECTOR before seeing if it can replace the BUILD_VECTOR
with other operations. Especially interesting is a splat BUILD_VECTOR of
the extract_vector_elt which can become a splat shuffle, but won't if
we form SPLAT_VECTOR first. We either need to reorder visitBUILD_VECTOR
or add visitSPLAT_VECTOR.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D100803
2021-04-29 08:20:09 -07:00
Craig Topper 25391cec3a [RISCV] Teach computeKnownBits that vsetvli returns number less than 2^31.
This seems like a reasonable upper bound on VL. WG discussions for
the V spec would probably allow us to use 2^16 as an upper bound
on VLEN, but this is good enough for now.

This allows us to remove sext and zext if user happens to assign
the size_t result into an int and then uses it as a VL intrinsic
argument which is size_t.

Reviewed By: frasercrmck, rogfer01, arcbbb

Differential Revision: https://reviews.llvm.org/D101472
2021-04-29 08:07:59 -07:00
Sander de Smalen 51d648c119 Revert "[LV] Calculate max feasible scalable VF."
Temporarily reverting this patch due to some unexpected issue found
by one of the PPC buildbots.

This reverts commit 584e9b6e4b.
2021-04-29 16:04:37 +01:00
Jay Foad 1ecddddbec [AMDGPU] Add a v_swap_b32 test case to be fixed 2021-04-29 16:03:15 +01:00
Chirag Khandelwal c204106188 [Clang][OpenMP] Frontend work for sections - D89671
This patch is child of D89671, contains the clang
implementation to use the OpenMP IRBuilder's section
construct.

Co-author: @anchu-rajendran

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D91054
2021-04-29 19:52:27 +05:30
David Zarzycki 3eb2be67b9 Unbreak no-asserts testing 2021-04-29 10:01:37 -04:00
Anastasia Stulova 1ed6e87ab0 [OpenCL][Docs] Misc updates to C++ for OpenCL and offline compilation
Differential Revision: https://reviews.llvm.org/D101092
2021-04-29 14:15:07 +01:00
Chirag Khandelwal fbd3548d1c [LLVM][OpenMP] Adding support for OpenMP sections construct in OpenMPIRBuilder
This patch adds section support in the OpenMP IRBuilder module, along with a test for the same.

Reviewed By: fghanim

Differential Revision: https://reviews.llvm.org/D89671
2021-04-29 18:39:49 +05:30
Anastasia Stulova 8fb0d6df11 [OpenCL][Docs] Describe extension for legacy atomics with generic addr space.
This extension is primarily targeting SPIR-V compilations flow
as the IR translation is the same between 1.x and 2.x atomics.

Differential Revision: https://reviews.llvm.org/D101089
2021-04-29 14:02:34 +01:00
Florian Hahn a0e1313c23
[VPlan] Add getVPSingleValue helper.
As suggested in D99294, this adds a getVPSingleValue helper to use for
recipes that are guaranteed to define a single value. This replaces uses
of getVPValue() which used to default to I = 0.
2021-04-29 13:37:38 +01:00
Arnamoy Bhattacharyya 79f7d3b7b1 [flang][OpenMP] Add semantic checks for strict nesting inside `teams` construct. 2021-04-29 08:35:20 -04:00
Alex Zinenko 28ab7ff2d7 [mlir] fix shared-lib build 2021-04-29 13:27:41 +02:00
Bradley Smith 354604a2a7 [AArch64][SVE] Use SIMD variant of INSR when scalar is the result of a vector extract
At the intrinsic layer the sve.insr operation takes a scalar. When this
scalar is an integer we are forcing a data transition between GPRs and
ZPRs that is potentially costly.

Often the integer scalar is the result of a vector extract, when
performing a reduction for example. In such cases we should keep all
data within the ZPRs.

Co-authored-by: Paul Walker <paul.walker@arm.com>

Differential Revision: https://reviews.llvm.org/D101169
2021-04-29 12:17:42 +01:00
Bradley Smith 89085bcc86 [AArch64][SVE] Convert svdup(vec, SV_VL1, elm) to insertelement(vec, elm, 0)
By converting the SVE intrinsic to a normal LLVM insertelement we give
the code generator a better chance to remove transitions between GPRs
and VPRs

Co-authored-by: Paul Walker <paul.walker@arm.com>

Depends on D101302

Differential Revision: https://reviews.llvm.org/D101167
2021-04-29 12:17:42 +01:00