Commit Graph

370969 Commits

Author SHA1 Message Date
Jay Foad 32897c05ab [AMDGPU] Specify a triple to avoid codegen changes depending on host OS 2020-11-03 13:33:44 +00:00
Lei Zhang d5bf727bcd [mlir][spirv] Support for a few more decorations in (de)serialization
Reviewed By: mravishankar

Differential Revision: https://reviews.llvm.org/D90655
2020-11-03 08:11:19 -05:00
Sanjay Patel 9af561ec99 [x86] update cost table comments for maxnum; NFC
Follow-up suggested in D90613.
2020-11-03 08:09:59 -05:00
Yaxun (Sam) Liu abd8cd9199 [CUDA][HIP] Fix linkage for -fgpu-rdc
Currently for explicit template function instantiation in CUDA/HIP device
compilation clang emits instantiated kernel with external linkage
and instantiated device function with internal linkage.

This is fine for -fno-gpu-rdc since there is only one TU.

However this causes duplicate symbols for kernels for -fgpu-rdc if
the same instantiation happen in multiple TU. Or missing symbols
if a device function calls an explicitly instantiated template function
in a different TU.

To make explicit template function instantiation work for
-fgpu-rdc we need to follow the C++ linkage paradigm, i.e.
use weak_odr linkage.

Differential Revision: https://reviews.llvm.org/D90311
2020-11-03 08:07:19 -05:00
Roman Lebedev c009d11bda
[InstCombine] Perform C-(X+C2) --> (C-C2)-X transform before using Negator
In particular, it makes it fire for C=0, because negator doesn't want
to perform that fold since in general it's not beneficial.
2020-11-03 16:06:52 +03:00
Roman Lebedev e465f9c303
[InstCombine] Negator: - (C - %x) --> %x - C (PR47997)
This relaxes one-use restriction on that `sub` fold,
since apparently the addition of Negator broke
preexisting `C-(C2-X) --> X+(C-C2)` (with C=0) fold.
2020-11-03 16:06:51 +03:00
Roman Lebedev f8cf6d027b
[NFC][InstCombine] Negator: add test coverage for `(?? - (%y + C))` pattern (PR47997) 2020-11-03 16:06:51 +03:00
Roman Lebedev 67be050acc
[NFC][InstCombine] Negator: add test coverage for `(?? - (C - %y))` pattern (PR47997) 2020-11-03 16:06:51 +03:00
Roman Lebedev 482d65331b
[NFC][InstCombine] Add test coverage for PR47997 2020-11-03 16:06:50 +03:00
Florian Hahn d68bed0fa9 [SCCP] Handle bitcast of vector constants.
Vectors where all elements have the same known constant range are treated as a
single constant range in the lattice. When bitcasting such vectors, there is a
mis-match between the width of the lattice value (single constant range) and
the original operands (vector). Go to overdefined in that case.

Fixes PR47991.
2020-11-03 12:58:39 +00:00
David Green bd32386410 [ARM] Remove unused variable. NFC 2020-11-03 12:58:10 +00:00
Joachim Protze 71041a8b6b [OpenMP][libomptarget][Tests] fix failing test
D88149 updated `omp_get_initial_device` behavior to conform with OpenMP 5.1.
omp_get_initial_device() == omp_get_num_devices()
2020-11-03 13:15:33 +01:00
Joachim Protze b0eb19bf8a [OpenMP][OMPT][NFC] Fix flaky test
As reported by @ronlieb, the test shows intermittent fails.
The test failed, if the dependent task was already finished, when the depending
task was to be created. We have other tests to check for the dependences pair.
2020-11-03 13:15:32 +01:00
Joachim Protze e99207feb4 [OpenMP][Tool] Handle detached tasks in Archer
Since detached tasks are supported by clang and the OpenMP runtime, Archer
must expect to receive the corresponding callbacks.

This patch adds support to interpret the synchronization semantics of
omp_fulfill_event and cleans up the handling of task switches.
2020-11-03 13:15:32 +01:00
Hans Wennborg cbf25fbed5 Revert "[CodeGen] [WinException] Only produce handler data at the end of the function if needed"
This caused an explosion in ICF times during linking on Windows when libfuzzer
instrumentation is enabled. For a small binary we see ICF time go from ~0 to
~10 s. For a large binary it goes from ~1 s to forevert (I gave up after 30
minutes).

See comment on the code review.

> If we are going to write handler data (that is written as variable
> length data following after the unwind info in .xdata), we need to
> emit the handler data immediately, but for cases where no such
> info is going to be written, skip emitting it right away. (Unwind
> info for all remaining functions that hasn't gotten it emitted
> directly is emitted at the end.)
>
> This does slightly change the ordering of sections (triggering a
> bunch of updates to DebugInfo/COFF tests), but the change should be
> benign.
>
> This also matches GCC's assembly output, which doesn't output
> .seh_handlerdata unless it actually is needed.
>
> For ARM64, the unwind info can be packed into the runtime function
> entry itself (leaving no data in the .xdata section at all), but
> that can only be done if there's no follow-on data in the .xdata
> section. If emission of the unwind info is triggered via
> EmitWinEHHandlerData (or the .seh_handlerdata directive), which
> implicitly switches to the .xdata section, there's a chance of the
> caller wanting to pass further data there, so the packed format
> can't be used in that case.
>
> Differential Revision: https://reviews.llvm.org/D87448

This reverts commit 36c64af9d7.
2020-11-03 13:12:10 +01:00
Stefan Gränitz b397795f1a [JITLink][ELF] Implement R_X86_64_PLT32 relocations
Basic implementation for call and jmp branches with 32 bit offset. Branches to local targets produce
Branch32 edges that are resolved like a regular PCRel32 relocations. Branches to external (undefined)
targets produce Branch32ToStub edges and go through a PLT entry by default. If the target happens to
get resolved within the 32 bit range from the callsite, the edge is relaxed during post-allocation
optimization. There is a test for each of these cases.

Reviewed By: lhames

Differential Revision: https://reviews.llvm.org/D90331
2020-11-03 12:05:54 +00:00
Hiral Oza d6a468d622 [clang-tidy] adding "--config-file=<file-path>" to specify custom config file.
Let clang-tidy to read config from specified file.
Example:
$ clang-tidy --config-file=/some/path/myTidyConfig --list-checks --
...this will read config from '/some/path/myTidyConfig'.

ClangTidyMain.cpp reads ConfigFile into string and then assigned read data to 'Config' i.e. makes like '--config' code flow internally.

May speed-up tidy runtime since now it will just look-up <file-path>
instead of searching ".clang-tidy" in parent-dir(s).

Directly specifying config path helps setting build dependencies.

Thanks to @DmitryPolukhin for valuable suggestion. This patch now propose
change only in ClangTidyMain.cpp.

Reviewed By: DmitryPolukhin

Differential Revision: https://reviews.llvm.org/D89936
2020-11-03 11:59:46 +00:00
serge-sans-paille 1c068a0103 Fix 'default label in switch which covers all enumeration values' warning 2020-11-03 12:58:15 +01:00
David Green e474499402 [ARM] Treat memcpy/memset/memmove as call instructions for low overhead loops
If an instruction will be lowered to a call there is no advantage of
using a low overhead loop as the LR register will need to be spilled and
reloaded around the call, and the low overhead will end up being
reverted. This teaches our hardware loop lowering that these memory
intrinsics will be calls under certain situations.

Differential Revision: https://reviews.llvm.org/D90439
2020-11-03 11:53:09 +00:00
David Green 785080e3fa [ARM] Low overhead loop memcpy lowering test. NFC 2020-11-03 11:44:50 +00:00
Sander de Smalen ba10c514c9 [AArch64][SVE] NFC: Guard all SVE tests for TypeSize warnings.
This patch adds a bunch of CHECK lines to guard against implicit
conversions of TypeSize -> uint64_t occuring in code-paths that previously
were safe for scalable vectors.
2020-11-03 11:29:36 +00:00
Alexander Bosch 5452fa6a59 [MLIR] Added test operations to replace linalg dependency for
BufferizeTests.

Summary:
Added test operations to replace the LinalgDialect dependency in tests
which use the buffer-deallocation, buffer-hoisting,
buffer-loop-hoisting, promote-buffers-to-stack,
buffer-placement-preparation-allowed-memref-resutls and
buffer-placement-preparation pass. Adapted the corresponding tests cases
and TestBufferPlacement.cpp.

Differential Revision: https://reviews.llvm.org/D90037
2020-11-03 12:18:49 +01:00
Mehdi Amini 008b9d97cb Make the implicit nesting behavior of the PassManager user-controllable and default to false
This is an error prone behavior, I frequently have ~20 min debugging sessions when I hit
an unexpected implicit nesting. This default makes the C++ API safer for users.

Depends On D90669

Reviewed By: rriddle

Differential Revision: https://reviews.llvm.org/D90671
2020-11-03 11:17:44 +00:00
Mehdi Amini cd7107a62b Handle the verifier at run() time in the PassManager instead of build time
This simplifies a few parts of the pass manager, but in particular we don't add as many
verifierpass as there are passes in the pipeline, and we can now enable/disable the
verifier after the fact on an already built PassManager.

Reviewed By: rriddle

Differential Revision: https://reviews.llvm.org/D90669
2020-11-03 11:17:14 +00:00
Mehdi Amini bf523186fb Change the PrintOpStatsPass to operate on any operation instead of just ModuleOp
This allows to use it on other operation, like a GPUModule for example.
2020-11-03 11:15:32 +00:00
Mehdi Amini 0aaa2a4cb1 Remove mlir-c/Core.h which is superseded by the new API in mlir-c/IR.h
This header was an initial early attempt at a crude C API for bindings,
but it isn't used and redundant with the new API. At this point it only
contributes to more confusion.

Differential Revision: https://reviews.llvm.org/D90643
2020-11-03 11:15:32 +00:00
Stephen Kelly ff02ae2139 Add test missing from previous commit 2020-11-03 11:06:52 +00:00
Simon Pilgrim 59b22e495c [AggressiveInstCombine] Generalize foldGuardedRotateToFunnelShift to generic funnel shifts
The fold currently only handles rotation patterns, but with the maturation of backend funnel shift handling we can now realistically handle all funnel shift patterns.

This should allow us to begin resolving PR46896 et al.

Differential Revision: https://reviews.llvm.org/D90625
2020-11-03 10:49:49 +00:00
Alexander Belyaev 9925168576 [mlir] Convert `memref_reshape` to LLVM.
https://llvm.discourse.group/t/rfc-standard-memref-cast-ops/1454/15

Differential Revision: https://reviews.llvm.org/D90377
2020-11-03 11:39:08 +01:00
Florian Hahn d9cbf39a37 [SLP] Pass VecPred argument to getCmpSelInstrCost.
Check if all compares in VL have the same predicate and pass it to
getCmpSelInstrCost, to improve cost-modeling on targets that only
support compare/select combinations for certain uniform predicates.

This leads to additional vectorization in some cases

```
Same hash: 217 (filtered out)
Remaining: 19
Metric: SLP.NumVectorInstructions

Program                                        base    slp2    diff
 test-suite...marks/SciMark2-C/scimark2.test    11.00   26.00  136.4%
 test-suite...T2006/445.gobmk/445.gobmk.test    79.00  135.00  70.9%
 test-suite...ediabench/gsm/toast/toast.test    54.00   71.00  31.5%
 test-suite...telecomm-gsm/telecomm-gsm.test    54.00   71.00  31.5%
 test-suite...CI_Purple/SMG2000/smg2000.test   426.00  542.00  27.2%
 test-suite...ch/g721/g721encode/encode.test    30.00   24.00  -20.0%
 test-suite...000/186.crafty/186.crafty.test   116.00  138.00  19.0%
 test-suite...ications/JM/ldecod/ldecod.test   697.00  765.00   9.8%
 test-suite...6/464.h264ref/464.h264ref.test   822.00  886.00   7.8%
 test-suite...chmarks/MallocBench/gs/gs.test   154.00  162.00   5.2%
 test-suite...nsumer-lame/consumer-lame.test   621.00  651.00   4.8%
 test-suite...lications/ClamAV/clamscan.test   223.00  231.00   3.6%
 test-suite...marks/7zip/7zip-benchmark.test   680.00  695.00   2.2%
 test-suite...CFP2000/177.mesa/177.mesa.test   2121.00 2129.00  0.4%
 test-suite...:: External/Povray/povray.test   2406.00 2412.00  0.2%
 test-suite...TimberWolfMC/timberwolfmc.test   634.00  634.00   0.0%
 test-suite...CFP2006/433.milc/433.milc.test   1036.00 1036.00  0.0%
 test-suite.../Benchmarks/nbench/nbench.test   321.00  321.00   0.0%
 test-suite...ctions-flt/Reductions-flt.test    NaN      5.00   nan%
```

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D90124
2020-11-03 10:16:43 +00:00
serge-sans-paille 3bdeb2ac2e [lld] missing doc entry for error handling script
Fix http://lab.llvm.org:8011/#/builders/69/builds/67
2020-11-03 11:16:02 +01:00
Nicholas Guy 54d8627852 [AArch64] Redundant masks in downcast long multiply
Adds patterns to catch masks preceeding a long multiply,
and generating a single umull/smull instruction instead.

Differential revision: https://reviews.llvm.org/D89956
2020-11-03 10:12:28 +00:00
serge-sans-paille cfc32267e2 Provide a hook to customize missing library error handling
Make it possible for lld users to provide a custom script that would help to
find missing libraries. A possible scenario could be:

    % clang /tmp/a.c -fuse-ld=lld -loauth -Wl,--error-handling-script=/tmp/addLibrary.py
    unable to find library -loauth
    looking for relevant packages to provides that library

        liboauth-0.9.7-4.el7.i686
        liboauth-devel-0.9.7-4.el7.i686
        liboauth-0.9.7-4.el7.x86_64
        liboauth-devel-0.9.7-4.el7.x86_64
        pix-1.6.1-3.el7.x86_64

Where addLibrary would be called with the missing library name as first argument
(in that case addLibrary.py oauth)

Differential Revision: https://reviews.llvm.org/D87758
2020-11-03 11:01:29 +01:00
David Green 90131e3ecb [CostModel] Make target intrinsics cheap by default
This patch changes the intrinsics cost model to assume that by default
target intrinsics are cheap. This didn't seem to be the case for all
intrinsics, and is potentially an MVE problem due to our scalarization
overheads. Cheap seems to be a good default in general though.

Differential Revision: https://reviews.llvm.org/D90597
2020-11-03 09:58:28 +00:00
Sander de Smalen 1667d23e58 [NFCI] Add StackOffset class and base classes for ElementCount, TypeSize.
This patch adds a linear polynomial base class, called LinearPolyBase, which
serves as a base class for StackOffset. It tries to represent a linear
polynomial like:

  c0 * scale0 + c1 * scale1 + ... + cK * scaleK

where the scale is implicit, meaning that only the coefficients are
encoded.

This patch also adds a univariate linear polynomial, which serves as
a base class for ElementCount and TypeSize. This tries to represent a
linear polynomial where only one dimension can be set at any one time,
i.e. a TypeSize is either fixed-sized, or scalable-sized, but cannot be
a combination of the two.

  class LinearPolyBase
     ^
     |
     +---- class StackOffset  (dimensions = 2 (fixed/scalable), type = int64_t)

  class UnivariateLinearPolyBase
     |
     |
     +---- class LinearPolySize (dimensions = 2 (fixed/scalable))
                  ^
                  |
                  +-------- class ElementCount  (type = unsigned)
                  |
                  |
                  +-------- class TypeSize      (type = uint64_t)

Reviewed By: ctetreau, david-arm

Differential Revision: https://reviews.llvm.org/D88982
2020-11-03 09:41:39 +00:00
Pedro Tammela 4b9fa3b705 [LLDB][NFC] treat Lua error codes in a more explicit manner
This patch is a minor suggestion to not rely on the fact
that the `LUA_OK` macro is 0.

This assumption could change in future versions of the C API.

Differential Revision: https://reviews.llvm.org/D90556
2020-11-03 09:39:47 +00:00
Tres Popp ca1bcdff4b [mlir] Add to shape.is_broadcastable description 2020-11-03 10:23:55 +01:00
Tres Popp d05d42199f [mlir] Add partial lowering of shape.cstr_broadcastable.
Because cstr operations allow more instruction reordering than asserts, we only
lower cstr_broadcastable to std ops with cstr_require. This ensures that the
more drastic lowering to asserts can happen specifically with the user's desire.

Differential Revision: https://reviews.llvm.org/D89325
2020-11-03 09:57:23 +01:00
Michał Górny 952ddc9866 [lldb] [Plugins/FreeBSDRemote] Disable GetMemoryRegionInfo()
Disable GetMemoryRegionInfo() in order to unbreak expression parsing.
For some reason, the presence of non-stub function causes LLDB to fail
to detect system libraries correctly.  Through being unable to find
mmap() and allocate memory, this leads to expression parser being
broken.

The issue is non-trivial and it is going to require more time debugging.
On the other hand, the downsides of missing the function are minimal
(2 failing tests), and the benefit of working expression parser
justifies disabling it temporarily.  Furthermore, the old FreeBSD plugin
did not implement it anyway, so it allows us to switch to the new plugin
without major regressions.

The really curious part is that the respective code in the NetBSD plugin
yields very similar results, yet does not seem to break the expression
parser.

Differential Revision: https://reviews.llvm.org/D90650
2020-11-03 09:45:51 +01:00
Michał Górny 40d26bc4b1 [lldb] [Process/FreeBSDRemote] Remove GetSharedLibraryInfoAddress override
Remove the NetBSD-specific override of GetSharedLibraryInfoAddress(),
restoring the generic implementation from NativeProcessELF.

Differential Revision: https://reviews.llvm.org/D90620
2020-11-03 09:45:50 +01:00
Michał Górny 8e6bcbb417 [lldb] [Process/FreeBSDRemote] Fix attaching via lldb-server
Fix two bugs that caused attaching to a process in a pre-connected
lldb-server to fail.  These are:

1. Prematurely reporting status in NativeProcessFreeBSD::Attach().
   The SetState() call defaulted to notify the process, and LLGS tried
   to send the stopped packet before the process instance was assigned
   to it.  While at it, add an assert for that in LLGS.

2. Duplicate call to ReinitializeThreads() (via SetupTrace()) that
   overwrote the stopped status in threads.  Now SetupTrace() is called
   directly by NativeProcessFreeBSD::Attach() (not the Factory) in place
   of ReinitializeThreads().

This fixes at least commands/process/attach/TestProcessAttach.py
and python_api/hello_world/TestHelloWorld.py.

Differential Revision: https://reviews.llvm.org/D90525
2020-11-03 09:45:50 +01:00
Michał Górny f893b29397 [lldb] [Host/{free,net}bsd] Fix process matching by name
Fix process matching by name to make 'process attach -n ...' work.

The process finding code has an optimization that defers getting
the process name and executable format after the numeric (PID, UID...)
parameters are tested.  However, the ProcessInstanceInfoMatch.Matches()
method has been matching process name against the incomplete process
information as well, and effectively no process ever matched.

In order to fix this, create a copy of ProcessInstanceInfoMatch, set
it to ignore process name and se this copy for the initial match.
The same fix applies to FreeBSD and NetBSD host code.

Differential Revision: https://reviews.llvm.org/D90454
2020-11-03 09:45:50 +01:00
Michał Górny 326d235300 [lldb] [Process/FreeBSDRemote] Implement thread GetName()
Implement NativeThreadFreeBSD::GetName().  This is based
on the equivalent code in the legacy FreeBSD plugin, except it is
modernized a bit to use llvm::Optional and std::vector for data storage.

Differential Revision: https://reviews.llvm.org/D90298
2020-11-03 09:45:49 +01:00
Georgii Rymar 1af3cb5424 [llvm-readobj/libObject] - Allow dumping objects that has a broken SHT_SYMTAB_SHNDX section.
Currently it is impossible to create an instance of ELFObjectFile when the
SHT_SYMTAB_SHNDX can't be read. We error out when fail to parse the
SHT_SYMTAB_SHNDX section in the factory method.

This change delays reading of the SHT_SYMTAB_SHNDX section entries,
with it llvm-readobj is now able to work with such inputs.

Differential revision: https://reviews.llvm.org/D89379
2020-11-03 11:30:28 +03:00
Petar Avramovic 0031418dce AMDGPU/GlobalISel: Use same builder/observer in post-legalizer-combiner
Change match/apply functions into methods of new target specific combiner
helper class. Use reference to MachineIRBuilder from helper instead of
constructing new MachineIRBuilder each time new instruction needs to made.
Allows correct tracking of newly created instructions.

Differential Revision: https://reviews.llvm.org/D90623
2020-11-03 09:24:50 +01:00
Martin Storsjö d3bd06f5c7 [clang] Fix the fsanitize.c testcase after eaae6fdf67. NFC.
After that commit, the vptr sanitizer is enabled for mingw targets.
2020-11-03 10:21:29 +02:00
Max Kazantsev 46b2e85f0f [NFC] Refactor code in IndVars, preparing for further improvement 2020-11-03 15:08:12 +07:00
Martin Storsjö eaae6fdf67 [clang] [MinGW] Allow using the vptr sanitizer
Differential Revision: https://reviews.llvm.org/D90572
2020-11-03 09:59:09 +02:00
Martin Storsjö 076d351e8b [compiler-rt] [ubsan] Use the itanium type info lookup for mingw targets
Differential Revision: https://reviews.llvm.org/D90571
2020-11-03 09:59:08 +02:00
Esme-Yi 119ab2181e [PowerPC] Extend folding RLWINM + RLWINM to post-RA.
Summary: This patch depends on D89846. We have the patterns to fold 2 RLWINMs in ppc-mi-peephole, while some RLWINM will be generated after RA, for example rGc4690b007743. If the RLWINM generated after RA followed by another RLWINM, we expect to perform the optimization after RA, too.

Reviewed By: shchenz, steven.zhang

Differential Revision: https://reviews.llvm.org/D89855
2020-11-03 07:44:11 +00:00