Commit Graph

382143 Commits

Author SHA1 Message Date
Roman Lebedev b46c085d2b
[NFCI] SCEVExpander: emit intrinsics for integral {u,s}{min,max} SCEV expressions
These intrinsics, not the icmp+select are the canonical form nowadays,
so we might as well directly emit them.

This should not cause any regressions, but if it does,
then then they would needed to be fixed regardless.

Note that this doesn't deal with `SCEVExpander::isHighCostExpansion()`,
but that is a pessimization, not a correctness issue.

Additionally, the non-intrinsic form has issues with undef,
see https://reviews.llvm.org/D88287#2587863
2021-03-06 21:52:46 +03:00
Sean Fertile f0904a6208 [PowePC][AIX] Handle variadic vector call operands.
Patch adds support for passing vector call operands to variadic
functions. Arguments which are fixed shadow GPRs and stack space even
when they are passed in vector registers, while arguments passed through
ellipses are passed in properly aligned GPRs if available and on the
stack once all GPR arguments registers are consumed.

Differential Revision: https://reviews.llvm.org/D97956
2021-03-06 13:49:55 -05:00
Ta-Wei Tu 8a003861a3 [NPM] Add -enable-loopinterchange option to NPM
We have the `enable-loopinterchange` option in legacy pass manager but not in NPM.
Add `LoopInterchange` pass to the optimization pipeline (at the same position as before)
when `enable-loopinterchange` is turned on.

Reviewed By: aeubanks, fhahn

Differential Revision: https://reviews.llvm.org/D98116
2021-03-07 02:39:28 +08:00
Elia Geretto b46c89892f [XRay][compiler-rt][x86_64] Fix CFI directives in assembly trampolines
This patch modifies the x86_64 XRay trampolines to fix the CFI information
generated by the assembler. One of the main issues in correcting the CFI
directives is the `ALIGNED_CALL_RAX` macro, which makes the CFA dependent on
the alignment of the stack. However, this macro is not really necessary because
some additional assumptions can be made on the alignment of the stack when the
trampolines are called. The code has been written as if the stack is guaranteed
to be 8-bytes aligned; however, it is instead guaranteed to be misaligned by 8
bytes with respect to a 16-bytes alignment. For this reason, always moving the
stack pointer by 8 bytes is sufficient to restore the appropriate alignment.

Trampolines that are called from within a function as a result of the builtins
`__xray_typedevent` and `__xray_customevent` are necessarely called with the
stack properly aligned so, in this case too, `ALIGNED_CALL_RAX` can be
eliminated.

Fixes https://bugs.llvm.org/show_bug.cgi?id=49060

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D96785
2021-03-06 10:38:27 -08:00
Fangrui Song ca747e48af [sanitizer] Restrict clock_gettime workaround to glibc
The hackery is due to glibc clock_gettime crashing from preinit_array (D40679).
32-bit musl architectures do not define `__NR_clock_gettime` so the code causes a compile error.

Tested on Alpine Linux x86-64 (musl) and FreeBSD x86-64.

Reviewed By: vitalybuka

Differential Revision: https://reviews.llvm.org/D96925
2021-03-06 10:32:27 -08:00
William S. Moses d163e75c81 [Attributor] Enable heap-to-stack of any size
Enable Attributor's heap-to-stack to lower unbounded allocations given a max size of -1

Differential Revision: https://reviews.llvm.org/D97873
2021-03-06 12:57:32 -05:00
Philip Reames 9c139c50c9 [tests] Update an autogen test for format change 2021-03-06 09:49:27 -08:00
Philip Reames 5db2735af9 [gvn] Handle simply phi equivalence cases
GVN basically doesn't handle phi nodes at all. This is for a reason - we can't value number their inputs since the predecessor blocks have probably not been visited yet.

However, it also creates a significant pass ordering problem. As it stands, instcombine and simplifycfg ends up implementing CSE of phi nodes. This means that for any series of CSE opportunities intermixed with phi nodes, we end up having to alternate instcombine/simplifycfg and gvn to make progress.

This patch handles the simplest case by simply preprocessing the phi instructions in a block, and CSEing them if they are syntactically identical. This turns out to be powerful enough to handle many cases in a single invocation of GVN since blocks which use the cse'd phi results are visited after the block containing the phi. If there's a CSE opportunity in one the phi predecessors required to recognize the phi CSE opportunity, that will require a second iteration on the function. (Still within a single run of gvn though.)

Compile time wise, this could go either way. On one hand, we're potentially causing GVN to iterate over the function more. On the other, we're cutting down on iterations between two passes and potentially shrinking the IR aggressively. So, a bit unclear what to expect.

Note that this does still rely on instcombine to canonicalize block order of the phis, but that's a one time transformation independent of the values incoming to the phi.

Differential Revision: https://reviews.llvm.org/D98080
2021-03-06 09:31:12 -08:00
Martin Storsjö 15fdd536f9 [libcxx] [test] Fix path.itr/iterator.pass.cpp for windows
Differential Revision: https://reviews.llvm.org/D98107
2021-03-06 19:27:14 +02:00
Philip Reames 06a8a867d1 [rs4gc/tests] Remove use of internal debug flags
As a pragmatic tradeoff, the ease of updating the tests outweighs the slightly easier to understand test conditions.  Where revevant, debug output was converted to comments to help human understanding.
2021-03-06 09:20:02 -08:00
Philip Reames c6ec563f02 [rs4gc] autogen a bunch of tests for ease of update 2021-03-06 09:04:00 -08:00
Philip Reames 8fe59ba51e [rs4gc] track the original value in the state use for base pointer rewriting
I'd originally intended to build on this for another purpose and have decided not to, but at a minimum, the stronger asserts are useful.
2021-03-06 08:46:15 -08:00
Philip Reames 6334952ff0 [rs4gc] minor code style improvement 2021-03-06 08:46:15 -08:00
Vy Nguyen 70c0dbf151 [lld-macho][NFC] Replace config param with a global in hasCompatVersion() helper.
Differential Revision: https://reviews.llvm.org/D98115
2021-03-06 11:32:51 -05:00
Nikita Popov f278734bf1 [Loads] Restructure getAvailableLoadStore implementation (NFC)
Separate out some conditions with early exits, to make it easier to
support additional cases.
2021-03-06 16:58:11 +01:00
Nikita Popov 1c59bf4d4d [InstCombine] Add tests for non-trivial store to load forward (NFC)
Examples of things we mostly don't handle.
2021-03-06 16:58:11 +01:00
KareemErgawy-TomTom 3fb384d50e [MLIR][SPIRV] Rename `spv.selection` to `spv.mlir.selection`.
To unify the naming scheme across all ops in the SPIR-V dialect, we are
moving from spv.camelCase to spv.CamelCase everywhere. For ops that
don't have a SPIR-V spec counterpart, we use spv.mlir.snake_case.

Reviewed By: antiagainst

Differential Revision: https://reviews.llvm.org/D98014
2021-03-06 16:05:31 +01:00
Yaxun (Sam) Liu 34d1a5c7b1 [HIP] Support Spack packages
Spack is a package management tool extensively used by HPC community.
As ROCm packages are built by Spack by HPC community, we need to teach
clang driver to detect ROCm installation built by Spack.

Reviewed by: Artem Belevich

Differential Revision: https://reviews.llvm.org/D97340
2021-03-06 08:41:37 -05:00
Lei Zhang bb6f5c8314 [mlir][spirv] Convert tensor.extract for very small tensors
Normally tensors will be stored in buffers before converting to SPIR-V,
given that is how a large amount of data is sent to the GPU. However,
SPIR-V supports converting from tensors directly too. This is for the
cases where the tensor just contains a small amount of elements and it
makes sense to directly inline them as a small data array in the shader.
To handle this, internally the conversion might create new local
variables. SPIR-V consumers in GPU drivers may or may not optimize that
away. So this has implications over register pressure. Therefore, a
threshold is used to control when the patterns should kick in.

Reviewed By: ThomasRaoux

Differential Revision: https://reviews.llvm.org/D98052
2021-03-06 08:03:36 -05:00
Alexey Lapshin cf7cdaff64 [X86][VARARG] Avoid spilling xmm registers for va_start.
That review is extracted from D69372.
It fixes https://bugs.llvm.org/show_bug.cgi?id=42219 bug.

For the noimplicitfloat mode, the compiler mustn't generate
floating-point code if it was not asked directly to do so.
This rule does not work with variable function arguments currently.
Though compiler correctly guards block of code, which copies xmm vararg
parameters with a check for %al, it does not protect spills for xmm registers.
Thus, such spills are generated in non-protected areas and could break code,
which does not expect floating-point data. The problem happens in -O0
optimization mode. With this optimization level there is used
FastRegisterAllocator, which spills virtual registers at basic block boundaries.
Register Allocator does not protect spills with additional control-flow modifications.
Thus to resolve that problem, it is suggested to not copy incoming physical
registers into virtual registers. Instead, store incoming physical xmm registers
into the memory from scratch.

Differential Revision: https://reviews.llvm.org/D80163
2021-03-06 15:25:47 +03:00
Nikita Popov edf7004851 [ConstantFold] Handle vectors in ConstantFoldLoadThroughBitcast()
There seems to be an impedance mismatch between what the type
system considers an aggregate (structs and arrays) and what
constants consider an aggregate (structs, arrays and vectors).

Rather than adjusting the type check, simply drop it entirely,
as getAggregateElement() is well-defined for non-aggregates: It
simply returns null in that case.
2021-03-06 12:17:56 +01:00
Nikita Popov be58465591 [GVN] Regenerate test checks (NFC) 2021-03-06 12:11:16 +01:00
David Zarzycki f4059cc352 Partially revert "[runtimes] Use add_lit_testsuite to register lit testsuites"
This partially reverts commit e1173c8794
until we find out why libcxx tests are failing under runtimes build.
2021-03-06 06:06:55 -05:00
Juneyoung Lee 7ae191f59f [LangRef] dos2unix (NFC) 2021-03-06 18:44:40 +09:00
Nikita Popov a917fb89dc [LVI] Simplify and generalize handling of clamp patterns
Instead of handling a number of special cases for selects, handle
this generally when inferring ranges from conditions. We already
infer ranges from `x + C pred C2` to `x`, so doing the same for
`x pred C2` to `x + C` is straightforward.
2021-03-06 10:42:41 +01:00
Nikita Popov 906deaa0d9 [CVP] Add additional tests for clamp patterns (NFC)
These are the same as the existing tests, but using different
predicates that are not handled by the current code.
2021-03-06 10:42:40 +01:00
Raul Tambre 10a7289649 [runtimes] Fix crosscompiling after a7cad6680b (D97451)
It moved the logic for CMake target arguments into llvm_ExternalProject_Add().
No handling was added for CMAKE_CROSSCOMPILING, which has a separate set of compiler_args.
This broke crosscompiling, as now the runtimes builds defaulted to the compiler's default.

I've also added passing of CMAKE_ASM_COMPILER, which was missing before although we were passing the triple for it.

Reviewed By: zero9178

Differential Revision: https://reviews.llvm.org/D97855
2021-03-06 11:35:14 +02:00
Nikita Popov b42be01788 [LVI] Pass offset by reference (NFC)
Instead of by pointer. This allows us to use offsets that are not
materialized in the IR.
2021-03-06 10:24:44 +01:00
Nikita Popov 019ae8220f [CVP] Fix tests for clamp patterns (NFC)
These tests didn't test the pattern they were supposed to, because
%a instead of %add was used in the select, which turned this into
a normal min/max).

Noticed this when commenting out the clamp handling code did not
result in any test failures...
2021-03-06 10:24:44 +01:00
Jay Foad 99682bc039 Revert "Revert "[AMDGPU] Restore the s_memtime instruction in gfx1030""
This reverts commit e58d68fcd0.

This reinstates commit fc28f600e5
with a fix to initialize HasShaderCyclesRegister. See
https://reviews.llvm.org/D97928.
2021-03-06 09:00:01 +00:00
Aleksandr Platonov c4efd04f18 [clangd] Use URIs instead of paths in the index file list
Without this patch the file list of the preamble index contains URIs, but other indexes file lists contain file paths.
This makes `indexedFiles()` always returns `IndexContents::None` for the preamble index, because current implementation expects file paths inside the file list of the index.

This patch fixes this problem and also helps to avoid a lot of URI to path conversions during indexes merge.

Reviewed By: kadircet

Differential Revision: https://reviews.llvm.org/D97535
2021-03-06 10:47:05 +03:00
Martin Storsjö 714644a36c [libcxx] [test] Move the is_<platform> functions down to subclasses
If cross testing (and manually specifying a LIBCXX_TARGET_INFO in the
cmake configuration, as the default is to match the build platform),
we want the accessors for querying the target platform, is_windows,
is_darwin, to return the right value depending on which target info
class is used, not based on what platform is running the build and
driving the tests.

When LIBCXX_TARGET_INFO isn't defined, the right target info class
is chosen automatically based on the platform one is running on, so
this shouldn't make any practical difference for such setups.

Differential Revision: https://reviews.llvm.org/D98045
2021-03-06 08:52:34 +02:00
Martin Storsjö ebe6d3be0f [clang] Don't default to a specifically shared libunwind on mingw with a g++ driver
For MinGW targets, we distinguish between an explicitly shared unwinder
library (requested via -shared-libgcc), an explicitly static one
(requested via -static-libgcc or -static) and the default case (which
just passes -lunwind to the linker, which will pick either shared or
static depending on what's available, with the normal linker logic).

This makes the implicit default case (as added in D79995) actually work as
it was intended, when using the g++ driver (which is the main usecase for
libunwind as far as I know).

Differential Revision: https://reviews.llvm.org/D98023
2021-03-06 08:50:46 +02:00
Martin Storsjö 002dd47bdd [clang] Fix typos in the default logic for CLANG_DEFAULT_UNWINDLIB
CLANG_DEFAULT_RTLIB had a typo, and libunwind isn't a valid
option for it.

This keeps the actual behaviour from before, defaulting to none if
using compiler-rt as rtlib.

Differential Revision: https://reviews.llvm.org/D98022
2021-03-06 08:50:46 +02:00
Fangrui Song 2d922de3af [MC][RISCV] Support .reloc *, BFD_RELOC_{NONE,32,64}, *
BFD_RELOC_NONE is useful for ld --gc-sections: it provides a generic way indicating a dependency between two sections.
2021-03-05 21:45:11 -08:00
Fangrui Song 59ff9315fd [MC][ARM] Support .reloc *, BFD_RELOC_{NONE,8,16,32}, *
BFD_RELOC_NONE is useful for ld --gc-sections: it provides a generic way indicating a dependency between two sections.
2021-03-05 21:39:16 -08:00
Fangrui Song e4398bcdff [MC][test] Fix reloc-directive-elf-*.s 2021-03-05 21:37:29 -08:00
Mehdi Amini f8fe6d9f3f Use gen-dialect-doc instead of gen-op-doc for the Builtin dialect
This is fixing the missing title and menu entry on the MLIR website.
2021-03-06 05:32:46 +00:00
Fangrui Song 3110187f1f [MC][PowerPC] Support .reloc *, BFD_RELOC_{NONE,16,32,64}, *
BFD_RELOC_NONE is useful for ld --gc-sections: it provides a generic way indicating a dependency between two sections.
2021-03-05 21:31:45 -08:00
Fangrui Song aceea45d87 [MC][AArch64] Support .reloc *, BFD_RELOC_{NONE,16,32,64}, *
BFD_RELOC_NONE is useful for ld --gc-sections: it provides a generic way indicating a dependency between two sections.
2021-03-05 21:31:08 -08:00
Fangrui Song 4f7562d52f [MC][X86] Support .reloc *, BFD_RELOC_{NONE,8,16,32,64}, *
The names are unfortunate, but BFD_RELOC_NONE provides a generic way indicating
a dependency between two sections, which is useful for ld --gc-sections.
See https://sourceware.org/bugzilla/show_bug.cgi?id=27530
2021-03-05 21:31:05 -08:00
Vitaly Buka 56ed64dfa9 [sanitizer] Don't expect ABORTING in print-module-map
ABORTING message is inconsistent across sanitizers.

Another followup for D98089
2021-03-05 19:22:34 -08:00
Christopher Di Bella c744332793 [libcxx] adds std::ranges::swap, std::swappable, and std::swappable_with
Implements parts of:
    - P0898R3 Standard Library Concepts
    - P1754 Rename concepts to standard_case for C++20, while we still can

Depends on D96742

Differential Revision: https://reviews.llvm.org/D97162
2021-03-05 19:03:57 -08:00
Jianzhou Zhao 469d5462fa [dfsan] Re-enable origin tracking test cases 2021-03-06 02:41:56 +00:00
Mitch Phillips e58d68fcd0 Revert "[AMDGPU] Restore the s_memtime instruction in gfx1030"
Broke the ASan/MSan buildbots. See more comments in the original patch,
https://reviews.llvm.org/D97928.

Build failure at http://lab.llvm.org:8011/#/builders/5/builds/5327

This reverts commit fc28f600e5.
2021-03-05 18:24:59 -08:00
Matheus Izvekov 71e6e82746 [clang] Fix constrained decltype(auto) deduction
Prior to this fix, constrained decltype(auto) behaves exactly the same
as constrained regular auto.
This fixes it so it deduces like decltype(auto).

Signed-off-by: Matheus Izvekov <mizvekov@gmail.com>

Reviewed By: rsmith

Differential Revision: https://reviews.llvm.org/D98087
2021-03-05 18:20:09 -08:00
Vitaly Buka 2fcd872d8a [dfsan] Remove dfsan_get_origin from done_abilist.txt
Followup for D95835
2021-03-05 17:59:39 -08:00
Shoaib Meenai 9a2a167b6c [DirectoryWatcher] Increase timeout to make test less flaky
We've observed this test being significantly flaky on our Mac CI
machines when we're running the full check-clang suite. It fails because
the wait_for condition isn't met within 3 seconds. We believe it's
because our CI machines are somewhat underpowered and pretty heavily
loaded when we're running the full check-clang suite.

I ran some experiments on increasing the timeout. I ran the full
check-clang suite 100 times with each timeout value and recorded how
many flaky failures we encountered in these tests. The results are:

3 second timeout (baseline): 20 failures
10 second timeout: 14 failures
20 second timeout: 4 failures
30 second timeout: 2 failures
40 second timeout: 1 failure
50 second timeout: 0 failures
60 second timeout: 0 failures

I ran another set of 100 tests for the 50 second timeout and observed
one flaky failure. By contrast, I ended up running check-clang 500 times
for the 60 second timeout and didn't observe a single flaky failure.
That's how the 60 second timeout value used in this patch was derived.

While a 60 second timeout might seem high, keep in mind that:
- This is a timeout, not a sleep; the test should require much less time
  the vast majority of instances, especially on more powerful machines.
- The long timeout is most likely to occur when other tests are also
  running at the same time, so the latency of the timeout will also be
  masked by the latency of the other tests.

See https://reviews.llvm.org/D58418?id=200123#inline-554211 for where
this timeout was originally introduced and the possibility of raising it
if it wasn't enough was discussed.

Reviewed By: plotfi

Differential Revision: https://reviews.llvm.org/D97878
2021-03-05 17:49:14 -08:00
Vitaly Buka 1c5f083128 [NFC] Fix module map test
Followup for D98089
2021-03-05 17:23:19 -08:00
Jianzhou Zhao d02e0ba070 [dfsan] Disable origin test cases temporarily 2021-03-06 01:12:54 +00:00