Commit Graph

380488 Commits

Author SHA1 Message Date
Eric Schweitz a88991d782 [flang][fir][NFC] run clang-format
cleanup post-merge
2021-02-19 12:07:13 -08:00
Sanjay Patel d79063129c [Verifier] remove dead code for saturating intrinsics; NFC
Test coverage shows that we assert with the string from the
tablegen defs file for these intrinsics, so these cases
should never be live.
2021-02-19 14:58:25 -05:00
Sanjay Patel 38730b0029 [Verifier] add tests for saturating intrinsics; NFC
As noted in D96904, we don't have direct tests for these malformed ops.
2021-02-19 14:58:25 -05:00
Martin Storsjö f4f5fb9151 [libcxx] Make generic_*string return paths with forward slashes on windows
This matches what MS STL returns; in std::filesystem, forward slashes
are considered generic dir separators that are valid on all platforms.

Differential Revision: https://reviews.llvm.org/D91181
2021-02-19 21:49:51 +02:00
Haowei Wu 784c7debb2 [elfabi] Fix a bug when .dynsym contains no non-local symbol
This patch fixed a bug when elbabi was supplied with a tbe file
contains no non-local symbol. Before this patch, it wrote 0 to
sh_info of the .dynsym section, making the ELF stub file invalid.
This patch fixed this issue.

Differential Revision: https://reviews.llvm.org/D96930
2021-02-19 11:36:53 -08:00
zoecarver dbc89028d7 [libcxx] Fix LWG 2875: shared_ptr::shared_ptr(Y*, D, […]) constructors should be constrained.
Fixes LWG issue 2875.

Differential Revision: https://reviews.llvm.org/D81414
2021-02-19 11:11:39 -08:00
Martin Storsjö 513463fd26 [libcxx] Have lexically_normal return the path with preferred separators
Differential Revision: https://reviews.llvm.org/D91179
2021-02-19 21:06:54 +02:00
Sanjay Patel 5b250a27ec [Analysis][LoopVectorize] do not form reductions of pointers
This is a fix for https://llvm.org/PR49215 either before/after
we make a verifier enhancement for vector reductions with D96904.

I'm not sure what the current thinking is for pointer math/logic
in IR. We allow icmp on pointer values. Therefore, we match min/max
patterns, so without this patch, the vectorizer could form a vector
reduction from that sequence.

But the LangRef definitions for min/max and vector reduction
intrinsics do not allow pointer types:
https://llvm.org/docs/LangRef.html#llvm-smax-intrinsic
https://llvm.org/docs/LangRef.html#llvm-vector-reduce-umax-intrinsic

So we would crash/assert at some point - either in IR verification,
in the cost model, or in codegen. If we do want to allow this kind
of transform, we will need to update the LangRef and all of those
parts of the compiler.

Differential Revision: https://reviews.llvm.org/D97047
2021-02-19 14:01:57 -05:00
Michael Kruse 91c472c86c [Polly] Fix test after D96534. 2021-02-19 12:49:29 -06:00
Craig Topper e7c86f4ac4 [RISCV] Use inheritance to reduce some repeated code in tablegen. NFC
The VLX and VSX searchable tables, share the same format so we
can have a common base class for them.
2021-02-19 10:42:18 -08:00
Simon Pilgrim d7350efc40 [X86] Regenerate 2007-06-28-X86-64-isel.ll 2021-02-19 18:35:15 +00:00
Simon Pilgrim 3dae0b5703 [X86] Remove unused intrinsic declaration 2021-02-19 18:35:14 +00:00
Simon Pilgrim 6ad4bf330b [X86] Regenerate 2011-12-06-AVXVectorExtractCombine.ll 2021-02-19 18:35:14 +00:00
Craig Topper 7f5b3886e4 [RISCV] Remove unneeded indexed segment load/store vector pseudo instruction.
We had more combinations of data and index lmuls than we needed.

Also add some asserts to verify that the IndexVT and data VT have
the same element count when we isel these pseudo instructions.
2021-02-19 10:28:48 -08:00
Craig Topper d056d5decf [RISCV] Use custom isel for vector indexed load/store intrinsics.
There are many legal combinations of index and data VTs supported
for these intrinsics. This results in a lot of isel patterns in
RISCVGenDAGISel.inc.

By adding a separate table similar to what we use for segment
load/stores, we can more efficiently manually select these
intrinsics. We should also be able to reuse this table scalable
vector gather/scatter.

This reduces the llc binary size by ~56K.

Reviewed By: khchen

Differential Revision: https://reviews.llvm.org/D97033
2021-02-19 10:10:06 -08:00
Craig Topper dbf910f0d9 [RISCV] Prevent selecting a 0 VL to X0 for the segment load/store intrinsics.
Just like we do for isel patterns, we need to call selectVLOp
to prevent 0 from being selected to X0 by the default isel.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D97021
2021-02-19 10:07:12 -08:00
Craig Topper 98dff5e804 [RISCV] Move SHFLI matching to DAG combine. Add 32-bit support for RV64
We previously used isel patterns for this, but that used quite
a bit of space in the isel table due to OR being associative
and commutative. It also wouldn't handle shifts/ands being in
reversed order.

This generalizes the shift/and matching from GREVI to
take the expected mask table as input so we can reuse it for
SHFLI.

There is no SHFLIW instruction, but we can promote a 32-bit
SHFLI to i64 on RV64. As long as bit 4 of the control bit isn't
set, a 64-bit SHFLI will preserve 33 sign bits if the input had
at least 33 sign bits. ComputeNumSignBits has been updated to
account for that to avoid sext.w in the tests.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D96661
2021-02-19 10:07:12 -08:00
Wei Mi 4ffad1fb48 [SampleFDO] Add PromotedInsns to prevent repeated ICP.
In https://reviews.llvm.org/rG5fb65c02ca5e91e7e1a00e0efdb8edc899f3e4b9,
We use 0 count value profile to memorize which target has been promoted
and prevent repeated ICP for the same target, so we delete PromotedInsns.
However, I found the implementation in the patch has some shortcomings
to be fixed otherwise there will still be repeated ICP. So I add
PromotedInsns back temorarily. Will remove it after I get a thorough fix.
2021-02-19 10:01:49 -08:00
Artem Belevich 1a368ae3b7 [CUDA] fix builtin constraints for PTX 7.2
This fixes build issues w/ CUDA-11 introduced by https://reviews.llvm.org/D95974

Reviewed By: yaxunl

Differential Revision: https://reviews.llvm.org/D97009
2021-02-19 09:57:21 -08:00
Luís Marques 43fa23a01f [Sanitizer][NFC] Fix typo 2021-02-19 17:46:02 +00:00
Jessica Paquette 8d3442eddb [AArch64][GlobalISel] Run redundant_sext_inreg in the post-legalizer combiner
This is to ensure that we can eliminate G_ASSERT_SEXT.

In a follow-up patch, I'm going to make CallLowering emit G_ASSERT_SEXT for
signext parameters.

Differential Revision: https://reviews.llvm.org/D96913
2021-02-19 09:34:47 -08:00
Nicolas Vasilache 0ee4bf151c [mlir] Add folding of tensor.cast -> subtensor_insert
Differential Revision: https://reviews.llvm.org/D97059
2021-02-19 17:24:16 +00:00
Geoffrey Martin-Noble 236aab0b0c [MLIR] Delete unused functions getCollapsedInitTensor and getExpandedInitTensor
These are unused since
https://reviews.llvm.org/rG81264dfbe80df08668a325a61613b64243b99c01

Reviewed By: nicolasvasilache

Differential Revision: https://reviews.llvm.org/D97014
2021-02-19 09:23:54 -08:00
Benjamin Kramer 59f442e6bb [LV] Fold single-use variable into assert. NFC. 2021-02-19 18:11:39 +01:00
Nikita Popov 71a8e4e7d6 [MemCopyOpt] Enable MemorySSA by default
This enables use of MemorySSA instead of MemDep in MemCpyOpt. To
allow this without significant compile-time impact, the MemCpyOpt
pass is moved directly before DSE (in the cases where this was not
already the case), which allows us to reuse the existing MemorySSA
analysis.

Unlike the MemDep-based implementation, the MemorySSA-based MemCpyOpt
can also perform simple optimizations across basic blocks.

Differential Revision: https://reviews.llvm.org/D94376
2021-02-19 18:06:25 +01:00
Matthew Malcomson c1653b8cc7 Hwasan InitPrctl check for error using internal_iserror
When adding this function in https://reviews.llvm.org/D68794 I did not
notice that internal_prctl has the API of the syscall to prctl rather
than the API of the glibc (posix) wrapper.

This means that the error return value is not necessarily -1 and that
errno is not set by the call.

For InitPrctl this means that the checks do not catch running on a
kernel *without* the required ABI (not caught since I only tested this
function correctly enables the ABI when it exists).
This commit updates the two calls which check for an error condition to
use internal_iserror. That function sets a provided integer to an
equivalent errno value and returns a boolean to indicate success or not.

Tested by running on a kernel that has this ABI and on one that does
not. Verified that running on the kernel without this ABI the current
code prints the provided error message and does not attempt to run the
program. Verified that running on the kernel with this ABI the current
code does not print an error message and turns on the ABI.
This done on an x86 kernel (where the ABI does not exist), an AArch64
kernel without this ABI, and an AArch64 kernel with this ABI.

In order to keep running the testsuite on kernels that do not provide
this new ABI we add another option to the HWASAN_OPTIONS environment
variable, this option determines whether the library kills the process
if it fails to enable the relaxed syscall ABI or not.
This new flag is `fail_without_syscall_abi`.
The check-hwasan testsuite results do not change with this patch on
either x86, AArch64 without a kernel supporting this ABI, and AArch64
with a kernel supporting this ABI.

Differential Revision: https://reviews.llvm.org/D96964
2021-02-19 16:30:56 +00:00
Philip Reames 4a5edea193 [SCEV] Use both known bits and sign bits when computing range of SCEV unknowns
When computing a range for a SCEVUnknown, today we use computeKnownBits for unsigned ranges, and computeNumSignBots for signed ranges. This means we miss opportunities to improve range results.

One common missed pattern is that we have a signed range of a value which CKB can determine is positive, but CNSB doesn't convey that information. The current range includes the negative part, and is thus double the size.

Per the removed comment, the original concern which delayed using both (after some code merging years back) was a compile time concern. CTMark results (provided by Nikita, thanks!) showed a geomean impact of about 0.1%. This doesn't seem large enough to avoid higher quality results.

Differential Revision: https://reviews.llvm.org/D96534
2021-02-19 08:29:12 -08:00
Marek Kurdej bcb5a124ae [libc++] Turn off clang-format for auto-generated version header. NFC. 2021-02-19 17:26:16 +01:00
Joel E. Denny ef8b3b5ffd [OpenMP] Fix nvptx CUDA_VERSION conversion
As mentioned in PR#49250, without this patch, ptxas for CUDA 9.1 fails
in the following two tests:

- openmp/libomptarget/test/mapping/lambda_mapping.cpp
- openmp/libomptarget/test/offloading/bug49021.cpp

The error looks like:

```
ptxas /tmp/lambda_mapping-081ea9.s, line 828; error   : Not a name of any known instruction: 'activemask'
```

The problem is that our cmake script converts CUDA version strings
incorrectly: 9.1 becomes 9100, but it should be 9010, as shown in
`getCudaVersion` in `clang/lib/Driver/ToolChains/Cuda.cpp`.  Thus,
`openmp/libomptarget/deviceRTLs/nvptx/src/target_impl.cu`
inadvertently enables `activemask` because it apparently becomes
available in 9.2.  This patch fixes the conversion.

This patch does not fix the other two tests in PR#49250.

Reviewed By: tianshilei1992

Differential Revision: https://reviews.llvm.org/D97012
2021-02-19 11:09:26 -05:00
Joel E. Denny d2147b1a87 [OpenMP] Fix always,from and delete for data absent at exit
Without this patch, there's a runtime error for those map types at
exit from an "omp target data" or at "omp target exit data", but the
spec says the list item should be ignored.

This patch tests that fix in data_absent_at_exit.c, and it also
improves other testing for data that is not fully present at exit.

Reviewed By: grokos, RaviNarayanaswamy

Differential Revision: https://reviews.llvm.org/D96999
2021-02-19 11:09:26 -05:00
Mircea Trofin 82492f24ff [NFC][Regalloc] Share the VirtRegAuxInfo object with LiveRangeEdit
VirtRegAuxInfo is an extensibility point, so the register allocator's
decision on which implementation to use should be communicated to the
other users - namely, LiveRangeEdit.

Differential Revision: https://reviews.llvm.org/D96898
2021-02-19 07:44:28 -08:00
madhur13490 3c297a2564 Make fixed-abi default for AMD HSA OS
fixed-abi uses pre-defined and predictable
SGPR/VGPRs for passing arguments. This patch makes
this scheme default when HSA OS is specified in triple.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D96340
2021-02-19 15:05:25 +00:00
David Green a1c34a9d6a [ARM] Correct vector predicate type in MVE getCmpSelInstrCost 2021-02-19 14:43:51 +00:00
Jay Foad b2c7f06db1 [AMDGPU] Add some GFX9 test coverage. NFC. 2021-02-19 14:38:52 +00:00
Simon Pilgrim 5d3930bb8f [DAG] visitTRUNCATE - attempt to truncate USUBSAT
Fold trunc(usubsat(zext(x),y)) -> usubsat(x,trunc(umin(y,satlimit)))
2021-02-19 14:26:05 +00:00
Nicolas Vasilache 62f5c46eec [mlir][Linalg] NFC - Expose more options to the CodegenStrategy 2021-02-19 14:01:44 +00:00
Djordje Todorovic b6db47d7e0 [llvm-dwarfdump][locstats] Unify handling of inlined vars with no loc
The presence or absence of an inline variable (as well as formal
parameter) with only an abstract_origin ref (without DW_AT_location)
should not change the location coverage.

It means, for both:

DW_TAG_inlined_subroutine
  DW_AT_abstract_origin (0x0000004e "f")
  DW_AT_low_pc  (0x0000000000000010)
  DW_AT_high_pc (0x0000000000000013)
  DW_TAG_formal_parameter
    DW_AT_abstract_origin       (0x0000005a "b")

and,

DW_TAG_inlined_subroutine
   DW_AT_abstract_origin (0x0000004e "f")
   DW_AT_low_pc  (0x0000000000000010)
   DW_AT_high_pc (0x0000000000000013)

we should report 0% location coverage. If we add DW_AT_location,
for both cases the coverage should be improved.

Differential Revision: https://reviews.llvm.org/D96045
2021-02-19 05:38:01 -08:00
Jan Kratochvil 08331281af [lldb/Commands] Fix help text typo for 'breakpoint set' -a|--address. 2021-02-19 14:33:42 +01:00
David Green 7a5c26e99a Revert "[ARM] Expand the range of allowed post-incs in load/store optimizer"
This reverts commit 3b34b06fc5 as runtime
errors were reported.
2021-02-19 13:15:10 +00:00
Florian Hahn edc92a1c42
[LV] Remove VPCallback.
Now that all state for generated instructions is managed directly in
VPTransformState, VPCallBack is no longer needed. This patch updates the
last use of `getOrCreateScalarValue` to instead manage the value
directly in VPTransformState and removes VPCallback.

Reviewed By: gilr

Differential Revision: https://reviews.llvm.org/D95383
2021-02-19 12:50:41 +00:00
Kadir Cetinkaya 6329ce75da
[clangd] Expose absoluteParent helper
Will be used in other components that need ancestor traversal.

Reviewed By: sammccall

Differential Revision: https://reviews.llvm.org/D96123
2021-02-19 13:40:21 +01:00
Simon Pilgrim c1664c5a27 [X86][SSE] Add tests for trunc(usubsat()) patterns. 2021-02-19 12:26:48 +00:00
Nico Weber 3b7580951c [gn build] Port 1a2b3536ef 2021-02-19 07:23:48 -05:00
Fraser Cormack d9531a3097 [RISCV] Address some clang-tidy warnings. NFCI. 2021-02-19 12:10:28 +00:00
Nikita Popov ac065b7a37 [LLD] Fix tests after D96993
We now need mustprogress to eliminate these calls. The code doesn't
really make sense, but that's not the point of the test...
2021-02-19 13:08:17 +01:00
Alexander Belyaev 53367b8fe1 [mlir][nfc] Fix indentation in LinalgOps.td. 2021-02-19 13:02:58 +01:00
Ron Lieberman 30c0d5b4c3 [OPENMP][AMDGCN] Improvements to print_kernel_trace (bit mask)
allow bit masking to select various trace features.
  bit 0 => Launch tracing           (stderr)
  bit 1 => timing of runtime        (stdout)
  bit 2 => detailed launch tracing  (stderr)
  bit 3 => timing goes to stdout instead of stderr

  example: LIBOMPTARGET_KERNEL_TRACE=7     does it all
           LIBOMPTARGET_KERNEL_TRACE=5     Launch + details
           LIBOMPTARGET_KERNEL_TRACE=2     timings + launch to stderr
           LIBOMPTARGET_KERNEL_TRACE=10    timings + launch to stdout

Differential Revision: https://reviews.llvm.org/D96998
2021-02-19 06:47:22 -05:00
Carl Ritson 8181dcd30f [AMDGPU] WQM/WWM: Fix marking of partial definitions
Track lanes when processing definitions for marking WQM/WWM.
If all lanes have been defined then marking can stop.
This prevents marking unnecessary instructions as WQM/WWM.

In particular this fixes a bug where values passing through
V_SET_INACTIVE would me marked as requiring WWM.

Reviewed By: piotr

Differential Revision: https://reviews.llvm.org/D95503
2021-02-19 20:45:24 +09:00
Nikita Popov 2f17ed294f [DCE] Don't remove non-willreturn calls
In both ADCE and BDCE (via DemandedBits) we should not remove
instructions that are not guaranteed to return. This issue was
pointed out by fhahn in the recent llvm-dev thread.

Differential Revision: https://reviews.llvm.org/D96993
2021-02-19 12:35:40 +01:00
Faris Rehman 529f71811b [flang][driver] Add debug measure-parse-tree and pre-fir-tree options
Add the following options:
* -fdebug-measure-parse-tree
* -fdebug-pre-fir-tree

Summary of changes:
- Add 2 new frontend actions: DebugMeasureParseTreeAction and DebugPreFIRTreeAction
- Add MeasurementVisitor to FrontendActions.h
- Make reportFatalSemanticErrors return true if there are any fatal errors
- Port most of the `-fdebug-pre-fir-tree` tests to use the new driver if built, otherwise use f18.

Differential Revision: https://reviews.llvm.org/D96884
2021-02-19 11:27:54 +00:00