Commit Graph

391730 Commits

Author SHA1 Message Date
Sanjay Patel b1f6ef92ec [InstCombine] reduce code duplication for FP min/max with casts fold; NFC 2021-06-22 14:15:04 -04:00
Sanjay Patel bfd172999b [InstCombine][test] add tests for FP min/max with negated op; NFC 2021-06-22 14:15:04 -04:00
Sanjay Patel 4e78bd3836 [InstCombine][test] add tests for FP min/max with negated op; NFC 2021-06-22 14:15:04 -04:00
Joseph Huber 30e36c9b3c [Attributor] Add interface to emit remarks in Attributor
Summary:
This patch adds support for the Attributor to emit remarks on behalf of some
other pass. The attributor can now optionally take a callback function that
returns an OptimizationRemarkEmitter object when given a Function pointer. If
this is availible then a remark will be emitted for the corresponding pass
name.

Depends on D102197

Reviewed By: sstefan1 thegameg

Differential Revision: https://reviews.llvm.org/D102444
2021-06-22 14:12:46 -04:00
David Green 015c27caa2 [ARM] Change some Gather/Scatter interface types to Instructions. NFC
These returned Values are cast to an Instruction already, this just
cleans up the interface a little to match the expected types.
2021-06-22 19:11:39 +01:00
Raphael Isemann 709f8186a4 [lldb] Add missing string include to lldb-server's main 2021-06-22 19:49:10 +02:00
Louis Dionne 87dbe6c4ef [libc++] NFC: Add missing all.h to the modulemap 2021-06-22 13:47:41 -04:00
Matt Arsenault 39f8a792f0 AMDGPU: Try to eliminate clearing of high bits of 16-bit instructions
These used to consistently be zeroed pre-gfx9, but gfx9 made the
situation complicated since now some still do and some don't. This
also manages to pick up a few cases that the pattern fails to optimize
away.

We handle some cases with instruction patterns, but some get
through. In particular this improves the integer cases.
2021-06-22 13:42:49 -04:00
Arthur O'Dwyer 317e92a3e8 [libc++] Enable `explicit` conversion operators, even in C++03 mode.
C++03 didn't support `explicit` conversion operators;
but Clang's C++03 mode does, as an extension, so we can use it.
This lets us make the conversion explicit in `std::function` (even in '03),
and remove some silly metaprogramming in `std::basic_ios`.

Drive-by improvements to the tests for these operators, in addition
to making sure all these tests also run in `c++03` mode.

Differential Revision: https://reviews.llvm.org/D104682
2021-06-22 13:35:59 -04:00
Matt Arsenault 2e120920ac AMDGPU: Add baseline test for instructions zeroing high bits 2021-06-22 13:27:39 -04:00
Joseph Huber 7d69da71dd [OpenMP] Enable HeapToStack conversion in OpenMPOpt for new RTL globalization calls
Summary:
The changes to globalization introduced in D97680 introduce a large amount of overhead by default. The old globalization method would always ignore globalization code if executing in SPMD mode. This wasn't strictly correct as data sharing is still possible in SPMD mode. The new interface is correct but introduces globalization code even when unnecessary. This optimization will use the existing HeapToStack transformation in the attributor to allow for unneeded globalization to be replaced with thread-private stack memory. This is done using the newly introduced library instances for the RTL functions added in D102087.

Depends on D97818

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D102197
2021-06-22 13:23:05 -04:00
Joseph Huber 2662351e3b [OpenMP] Add new OpenMP globalization functions to library info
Summary:
The changes to globalization introduced in D97680 created two new functions to
push / pop shareably memory on the GPU, __kmpc_alloc_shared and
__kmpc_free_shared. This patch adds these new runtime functions to the
library info so they can be used by the HeapToStack attributor interface. This
optimization replaces malloc / free pairs with stack memory if legal.

Reviewed By: tianshilei1992

Differential Revision: https://reviews.llvm.org/D102087
2021-06-22 13:23:05 -04:00
Patrick Holland d03736455c [MCA] [In-order pipeline] Fix for 0 latency instruction causing assertion to fail.
0 latency instructions now get processed and retired properly within the in-order pipeline. Had to fix a bug within TimelineView.cpp as well that would show up when a 0 latency instruction was the first instruction in the source.

Differential Revision: https://reviews.llvm.org/D104675
2021-06-22 10:18:39 -07:00
Matt Arsenault 9ad8a1f6fb AMDGPU: Fix high 16-bit optimization on gfx9
We can do this optimization in the majority of cases, but we currently
don't have a way to do it. We do not track/model which instructions
have which behavior, the control bit to change the high bit behavior,
or making use of preserved bits at all. This is a bit fuzzy since we
don't know precisely how the source instruction will be lowered, but
that only really matters in one case (for fma_mixlo).

We do need to fixup some of these cases after selection, but the
pattern helps eliminate many of these zexts.
2021-06-22 13:16:45 -04:00
LLVM GN Syncbot 805e1a5896 [gn build] Port 40d6d2c49d 2021-06-22 17:03:46 +00:00
Sami Tolvanen 4474958d3a ThinLTO: Fix inline assembly references to static functions with CFI
Create an internal alias with the original name for static functions
that are renamed in promoteInternals to avoid breaking inline
assembly references to them.

Link: https://github.com/ClangBuiltLinux/linux/issues/1354

Reviewed By: pcc

Differential Revision: https://reviews.llvm.org/D104058
2021-06-22 10:01:55 -07:00
zhijian bd240b3d77 [AIX][XCOFF] generate eh_info when vector registers are saved according to the traceback table.
Summary:

generate eh_info when vector registers are saved according to the traceback table.

struct eh_info_t {

unsigned version;       /* EH info version 0 */
#if defined(64BIT)

char _pad[4];           /* padding */
#endif

unsigned long lsda;     /* Pointer to Language Specific Data Area */
unsigned long personality; /* Pointer to the personality routine */
};

the value of lsda and personality is zero when the number of vector registers saved is large zero and there is not personality of the function

Reviewers: Jason Liu
Differential Revision: https://reviews.llvm.org/D103651
2021-06-22 13:01:31 -04:00
Stanislav Mekhanoshin d797a7f8da [AMDGPU] Use performOptimizedStructLayout for LDS sort
This gives better packing.

Differential Revision: https://reviews.llvm.org/D104331
2021-06-22 09:58:10 -07:00
Fangrui Song f53d791520 Improve the diagnostic of DiagnosticInfoResourceLimit (and warn-stack-size in particular)
Before: `warning: stack size limit exceeded (888) in main`
After: `warning: stack frame size (888) exceeds limit (100) in function 'main'` (the -Wframe-larger-than limit will be mentioned)

Reviewed By: nickdesaulniers

Differential Revision: https://reviews.llvm.org/D104667
2021-06-22 09:55:20 -07:00
zoecarver 40d6d2c49d [libcxx][ranges] Add `ranges::iter_swap`.
Differential Revision: https://reviews.llvm.org/D102809
2021-06-22 09:52:40 -07:00
Nico Weber 47553356ef [gn build] manually port c747b7d1d9 (config.osx_sysroot) 2021-06-22 12:50:58 -04:00
Matt Arsenault a7786badb7 AMDGPU: Move zeroed FP high bits optimization to patterns 2021-06-22 12:47:56 -04:00
Hyundeok Park 7adf713a5e [libc++] Change forward_list::swap to use propagate_on_container_swap for noexcept specification
The current implementation of `std::forward_list::swap` uses
`propagate_on_container_move_assignment` for `noexcept` specification.
This patch changes it to use `propagate_on_container_swap`, as it should.

Fixes https://llvm.org/PR50224.

Differential Revision: https://reviews.llvm.org/D101899
2021-06-22 12:42:02 -04:00
Alexandru Octavian Butiu 78d404a11d [clang][c++20] Fix false warning for unused private fields when a class has only defaulted comparison operators.
Fixes bug 50263

When "unused-private-field" flag is on if you have a struct with private
members and only defaulted comparison operators clang will warn about
unused private fields.

If you where to write the comparison operators by hand no warning is
produced.

This is a bug since defaulting a comparison operator uses all private
members .

The fix is simple, in CheckExplicitlyDefaultedFunction just clear the
list of unused private fields if the defaulted function is a comparison
function.

Differential revision: https://reviews.llvm.org/D102186
2021-06-22 18:40:16 +02:00
Shilei Tian 0029059074 [NFC][OpenMP][Offloading] Unified the construction of mapping table entry
This patch unifies construction of mapping table entry to use `emplace`.

Reviewed By: grokos

Differential Revision: https://reviews.llvm.org/D104580
2021-06-22 12:38:47 -04:00
Joseph Huber 03d7e61c87 [OpenMP] Internalize functions in OpenMPOpt to improve IPO passes
Summary:
Currently the attributor needs to give up if a function has external linkage.
This means that the optimization introduced in D97818 will only apply to static
functions. This change uses the Attributor to internalize OpenMP device
routines by making a copy of each function with private linkage and replacing
the uses in the module with it. This allows for the optimization to be applied
to any regular function.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D102824
2021-06-22 12:38:10 -04:00
Steven Wu c747b7d1d9 [llvm] Fix lto tests that requires ld64
Since Xcode 13, ld64 requires linking libSystem for all the executable.
Fix the tests that needs to run ld64 by linking libSystem from sysroot.

rdar://77332728

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D104332
2021-06-22 09:21:29 -07:00
Fangrui Song 3accff2553 [llvm-objcopy] Fix some namespace style issues
https://llvm.org/docs/CodingStandards.html#use-namespace-qualifiers-to-implement-previously-declared-functions

Reviewed By: jhenderson

Differential Revision: https://reviews.llvm.org/D104693
2021-06-22 09:19:48 -07:00
Bill Wendling dd1b121c99 [llvm-diff] Constify APIs so that there aren't conflicts
Some APIs work with const variables while others don't. This can cause
conflicts when calling one from the other.

This is NFC.

Differential Revision: https://reviews.llvm.org/D104719
2021-06-22 09:17:04 -07:00
Joseph Huber 6fc51c9f7d [OpenMP] Replace GPU globalization calls with shared memory in the middle-end
Summary:
The changes introduced in D97680 create a simpler interface to code that needs
to be globalized. This interface is used to simplify the globalization calls in
the middle end. We can check any globalization call that is only called by a
single thread in the team and replace it with a static shared memory buffer.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D97818
2021-06-22 11:55:44 -04:00
Joseph Huber 244e98ff48 [Libomptarget] Improve device runtime implementation for globalized variables.
Currently the runtime implementation of `__kmpc_alloc_shared` is extremely slow because it allocated memory for each thread individually. This patch adds a small buffer for the threads to share data and will greatly improve performance for builds where all globalization could not be optimized out. If the shared buffer is full, then memory will not only be allocated per-warp rather than per-thread.

Depends on D97680

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D104666
2021-06-22 11:52:49 -04:00
Nikita Popov e790d3667e [OpaquePtr] Handle addrspacecasts in InstCombine
This adds support for addrspace casts involving opaque pointers to
InstCombine, as well as the isEliminableCastPair() helper
(otherwise the assertion failure would just move there).

Add PointerType::hasSameElementTypeAs() to hide the element type
details.

Differential Revision: https://reviews.llvm.org/D104668
2021-06-22 17:45:30 +02:00
Meera Nakrani ea011ec5ed [AArch64LoadStoreOptimizer] Recommit: Generate more STPs by renaming registers earlier
This is a recommit that fixes unwanted STP generation by checking that
the base register has not been modified or used elsewhere.

Our initial motivating case was memcpy's with alignments > 16. The
loads/stores, to which small memcpy's expand, are kept together in
several places so that we get a sequence like this for a 64 bit copy:
LD w0
LD w1
ST w0
ST w1
The load/store optimiser can generate a LDP/STP w0, w1 from this because
the registers read/written are consecutive. In our case however, the
sequence is optimised during ISel, resulting in:
LD w0
ST w0
LD w0
ST w0
This instruction reordering allows reuse of registers. Since the registers
are no longer consecutive (i.e. they are the same), it inhibits LDP/STP
creation. The approach here is to perform renaming:
LD w0
ST w0
LD w1
ST w1
to enable the folding of the stores into a STP. We do not yet generate
the LDP due to a limitation in the renaming implementation, but plan to
look at that in a follow-up so that we fully support this case. While
this was initially motivated by certain memcpy's, this is a general
approach and thus is beneficial for other cases too, as can be seen
in some test changes.

Differential Revision: https://reviews.llvm.org/D103597
2021-06-22 15:29:13 +00:00
David Spickett a8dd7094d3 [lldb] Remove more redundant SetStatus(eReturnStatusFailed)
Mostly by converting uses of GetErrorStream to AppendError,
so that the call to SetStatus is implicit.

Some remain where it isn't certain that you'll have a message
to set, or you want the output to be on stdout.

One place in CommandObjectWatchpoint previously didn't set
the status to failed at all. However it's pretty obvious
that it should do so.

Reviewed By: teemperor

Differential Revision: https://reviews.llvm.org/D104697
2021-06-22 15:28:28 +00:00
Jingu Kang 873ff5a728 [SimpleLoopUnswich] Fixa a bug on ComputeUnswitchedCost with partial unswitch
There was a bug from cost calculation for partially invariant unswitch.

The costs of non-duplicated blocks are substracted from the total LoopCost, so
anything that is duplicated should not be counted.

Differential Revision: https://reviews.llvm.org/D103816
2021-06-22 16:18:00 +01:00
Florian Hahn 6c782e6eb0
[SCEV] Reduce code to handle predicates in applyLoopGuards (NFC).
Hoist out common recurrence check and sink updating the map, to reduce
the code required to support additional predicates.
2021-06-22 15:56:45 +01:00
Joseph Huber 68d133a3e8 [OpenMP] Simplify GPU memory globalization
Summary:
Memory globalization is required to maintain OpenMP standard semantics for data sharing between
worker and master threads. The GPU cannot share data between its threads so must allocate global or
shared memory to store the data in. Currently this is implemented fully in the frontend using the
`__kmpc_data_sharing_push_stack` and __kmpc_data_sharing_pop_stack` functions to emulate standard
CPU stack sharing. The front-end scans the target region for variables that escape the region and
must be shared between the threads. Each variable then has a field created for it in a global record
type.

This patch replaces this functinality with a single allocation command, effectively mimicing an
alloca instruction for the variables that must be shared between the threads. This will be much
slower than the current solution, but makes it much easier to optimize as we can analyze each
variable independently and determine if it is not captured. In the future, we can replace these
calls with an `alloca` and small allocations can be pushed to shared memory.

Reviewed By: tianshilei1992

Differential Revision: https://reviews.llvm.org/D97680
2021-06-22 10:52:46 -04:00
Raphael Isemann 48e2d3a5c2 [lldb][NFC] Remove an outdated comment in HostInfoBase
We should *never* use static local variables in this file as this makes
unittesting the plugin code impossible (and this whole 'testing' thing has
turned out to be rather useful so far).
2021-06-22 16:48:17 +02:00
Rosie Sumpter b2f48cc914 [SLP][AArch64] Add SLP vectorizer tests for XOR and AND reductions. NFC
These regression tests show missed SLP vectorization opportunities,
which will be fixed in a future commit (see:
https://reviews.llvm.org/D104538).

Differential Revision: https://reviews.llvm.org/D104708
2021-06-22 15:16:02 +01:00
Graham Hunter a83ce95b09 [clang] Remove unused capture in closure
c6a91ee6aa removed uses of IsMonotonic from OpenMP SIMD codegen,
but that left a capture of the variable unused which upset buildbots
using -Werror.
2021-06-22 15:09:39 +01:00
Joseph Huber 952a0f2385 [Libomptarget] Introduce new globalization runtime calls
Summary:
This patch introduces the new globalization runtime to be used by D97680. These
runtime calls will replace the __kmpc_data_sharing_push_stack and
__kmpc_data_sharing_pop_stack functions.

Reviewed By: tianshilei1992

Differential Revision: https://reviews.llvm.org/D102532
2021-06-22 10:05:42 -04:00
Florian Hahn 34cccdaed7
[BitcodeReader] Validate Strtab before accessing.
This fixes a crash with invalid bitcode files that have records
referencing names in Strtab, but Strtab is not present or the index is
out-of-bounds.

This fixes the following clusterfuzz issue:
https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=29895

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D95554
2021-06-22 14:52:16 +01:00
Nikita Popov e638a290f7 [ConstantFold] Delay fetching pointer element type
Don't do this while stipping pointer casts, instead fetch it at
the end. This improves compatibility with opaque pointers for the
case where the base object is not opaque.
2021-06-22 15:51:00 +02:00
Nikita Popov 87bdde4962 [ConstantFold] Skip bitcast -> GEP transform for opaque pointers
Same as with the InstCombine transform, this is not possible for
bitcasts involving opaque pointers, as GEP preserves opaqueness.
2021-06-22 15:50:55 +02:00
AndreyChurbanov 5dd4d0d46f [OpenMP] libomp: fix dynamic loop dispatcher
Restructured dynamic loop dispatcher code.
Fixed use of dispatch buffers for nonmonotonic dynamic (static_steal) schedule:
- eliminated possibility of stealing iterations of the wrong loop when victim
  thread changed its buffer to work on another loop;
- fixed race when victim thread changed its buffer to work in nested parallel;
- eliminated "static" property of the schedule, that is now a single thread can
  execute whole loop.

Differential Revision: https://reviews.llvm.org/D103648
2021-06-22 16:29:01 +03:00
Butygin 82c1fb5750 [mlir] Fix invalid handling of AllocOp symbolOperands by SimplifyAllocConst.
symbolOperands were completely ignored by SimplifyAllocConst. Also, slightly improved diagnostic message for verifyAllocLikeOp.

Differential Revision: https://reviews.llvm.org/D104260
2021-06-22 15:39:53 +03:00
Pushpinder Singh 9d110f9159 [AMDGPU][Libomptarget] Move allow_access_to_all_gpu_agents to rtl.cpp
Moving this method helps eliminate a use of g_atl_machine.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D104691
2021-06-22 11:44:52 +00:00
Raphael Isemann c462048cc4 [lldb][NFC] Use SubsystemRAII in XcodeSDKModuleTests 2021-06-22 13:41:01 +02:00
Thomas Johnson 2ef1fbfe0e Add norm sub-target feature to table gen for ARC
This adds the `norm` sub-target feature (without backing implementation for now) to table gen.

Differential Revision: https://reviews.llvm.org/D104558
2021-06-22 14:39:29 +03:00
Muhammad Omair Javaid 28058d4cd1 [LLDB] Skip TestExitDuringExpression on aarch64/linux buildbot
TestExitDuringExpression test_exit_before_one_thread_no_unwind fails
sporadically on both Arm and AArch64 linux buildbots. This seems like
manifesting itself on a fully loaded machine. I have not found a reliable
timeout value so marking it skip for now.
2021-06-22 16:22:48 +05:00