Commit Graph

35966 Commits

Author SHA1 Message Date
Stella Stamenova 432e4e56d3 Revert "[WebAssembly] Emulate v128.const efficiently"
This reverts commit 542523a61a.
2020-10-02 09:26:21 -07:00
Vinay Madhusudan f192594956 [AArch64] Generate dot for v16i8 sum reduction to i32
Convert VECREDUCE_ADD( EXTEND(v16i8_type) ) to VECREDUCE_ADD( DOTv16i8(v16i8_type) ) whenever the result type is i32. This gains in one of the SPECCPU 2017 benchmark.

This partially solves the bug: https://bugs.llvm.org/show_bug.cgi?id=46888
Meta ticket: https://bugs.llvm.org/show_bug.cgi?id=46929

Differential Revision: https://reviews.llvm.org/D88577
2020-10-02 17:11:02 +01:00
serge-sans-paille f2c6bfa350 Fix interaction between stack alignment and inline-asm stack clash protection
As reported in https://github.com/rust-lang/rust/issues/70143 alignment is not
taken into account when doing the probing. Fix that by adjusting the first probe
if the stack align is small, or by extending the dynamic probing if the
alignment is large.

Differential Revision: https://reviews.llvm.org/D84419
2020-10-02 16:51:49 +02:00
Denis Antrushin 7b19cd06d7 [Statepoints][ISEL] visitGCRelocate: chain to current DAG root.
This is similar to D87251, but for CopyFromRegs nodes.
Even for local statepoint uses we generate CopyToRegs/CopyFromRegs
nodes.  When generating CopyFromRegs in visitGCRelocate, we must chain
to current DAG root, not EntryNode, to ensure proper ordering of copy
w.r.t. statepoint node producing result for it.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D88639
2020-10-02 21:41:22 +07:00
Meera Nakrani f7c0e2b8f2 [ARM] Prevent constants from iCmp instruction from being hoisted if part of a min(max()) pattern
Marks constants of an ICmp instruction as free if it's only user is a select
instruction that is part of a min(max()) pattern. Ensures that in loops, in
particular when loop unrolling is turned on, SSAT will still be correctly generated.

Differential Revision: https://reviews.llvm.org/D88662
2020-10-02 09:28:35 +00:00
serge-sans-paille 9573c9f2a3 Fix limit behavior of dynamic alloca
When the allocation size is 0, we shouldn't probe. Within [1,  PAGE_SIZE], we
should probe once etc.

This fixes https://bugs.llvm.org/show_bug.cgi?id=47657

Differential Revision: https://reviews.llvm.org/D88548
2020-10-02 11:10:02 +02:00
Thomas Lively 542523a61a [WebAssembly] Emulate v128.const efficiently
v128.const was recently implemented in V8, but until it rolls into Chrome
stable, we can't enable it in the WebAssembly backend without breaking origin
trial users. So far we have been lowering build_vectors that would otherwise
have been lowered to v128.const to splats followed by sequences of replace_lane
instructions to initialize each lane individually. That produces large and
inefficient code, so this patch introduces new logic to lower integer vector
constants to a single i64x2.splat where possible, with at most a single
i64x2.replace_lane following it if necessary.

Adapted from a patch authored by @omnisip.

Differential Revision: https://reviews.llvm.org/D88591
2020-10-02 00:28:06 -07:00
Martin Storsjö afb4e0f289 [AArch64] Omit SEH directives for the epilogue if none are needed
For these cases, we already omit the prologue directives, if
(!AFI->hasStackFrame() && !windowsRequiresStackProbe && !NumBytes).

When writing the epilogue (after the prolog has been written), if
the function doesn't have the WinCFI flag set (i.e. if no prologue
was generated), assume that no epilogue will be needed either,
and don't emit any epilog start pseudo instruction. After completing
the epilogue, make sure that it actually matched the prologue.

Previously, when epilogue start/end was generated, but no prologue,
the unwind info for such functions actually was huge; 12 bytes xdata
(4 bytes header, 4 bytes for one non-folded epilogue header, 4 bytes
for padded opcodes) and 8 bytes pdata. Because the epilog consisted of
one opcode (end) but the prolog was empty (no .seh_endprologue), the
epilogue couldn't be folded into the prologue, and thus couldn't be
considered for packed form either.

On a 6.5 MB DLL with 110 KB pdata and 166 KB xdata, this gets rid of
38 KB pdata and 62 KB xdata.

Differential Revision: https://reviews.llvm.org/D88641
2020-10-02 09:12:56 +03:00
Carl Ritson 5136f4748a CodeGen: Fix livein calculation in MachineBasicBlock splitAt
Fix and simplify computation of liveins for new block.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D88535
2020-10-02 10:45:04 +09:00
Esme-Yi c4690b0077 [PowerPC] Put the CR field in low bits of GRC during copying CRRC to GRC.
Summary: How we copying the CRRC to GRC is using a single MFOCRF to copy the contents of CR field n (CR bits 4×n+32:4×n+35) into bits 4×n+32:4×n+35 of register GRC. That’s not correct because we expect the value of destination register equals to source so we have to put the the contents of CR field in the lowest 4 bits. This patch adds a RLWINM after MFOCRF to achieve that.
The problem came up when adding builtins for xvtdivdp, xvtdivsp, xvtsqrtdp, xvtsqrtsp, as posted in D88278. We need to move the outputs (in CR register) to GRC. However outputs of these instructions may not in a fixed CR# register, so we can’t directly add a rotation instruction in the .td patterns, but need to wait until the CR register is determined. Then we confirmed this should be a bug in POST-RA PSEUDO PASS.

Reviewed By: nemanjai, shchenz

Differential Revision: https://reviews.llvm.org/D88274
2020-10-02 01:26:18 +00:00
jasonliu 78a9e62aa6 [XCOFF] Enable -fdata-sections on AIX
Summary:
Some design decision worth noting about:

I've noticed a recent mailing discussing about why string literal is
not affected by -fdata-sections for ELF target:
http://lists.llvm.org/pipermail/llvm-dev/2020-September/145121.html

But on AIX, our linker could not split the mergeable string like other target.
So I think it would make more sense for us to emit separate csect for
every mergeable string in -fdata-sections mode,
as there might not be other ways for linker to do garbage collection
on unused mergeable string.

Reviewed By: daltenty, hubert.reinterpretcast

Differential Revision: https://reviews.llvm.org/D88339
2020-10-02 00:16:24 +00:00
Muhammad Asif Manzoor aab6f7db47 [AArch64][SVE] Add lowering for llvm fabs
Add the functionality to lower fabs for passthru variant

Reviewed By: paulwalker-arm

Differential Revision: https://reviews.llvm.org/D88679
2020-10-01 19:41:25 -04:00
Jessica Paquette 5402d11b1d [GlobalISel][AArch64] Don't emit cset for G_FCMPs feeding into G_BRCONDs
Similar to the FP case in `AArch64TargetLowering::LowerBR_CC`.

Instead of emitting the csets + a tbnz, just emit a compare + bcc
(or two bccs, depending on the condition code)

This improves cases like this: https://godbolt.org/z/v8hebx

This is a 0.1% geomean code size improvement for CTMark at -O3.

Differential Revision: https://reviews.llvm.org/D88624
2020-10-01 15:34:16 -07:00
Jessica Paquette 8e8664e55e [AArch64][GlobalISel] Use emitTestBit in selection for G_BRCOND
Partially refactoring, partially fixing a bug.

- We shouldn't use TB(N)ZX unless the bit number is >= 32
- We can fold more than xor using emitTestBit

Also remove a check which isn't relevant anymore + update tests.

Rename select-brcond-of-not.mir to select-brcond-of-binop.mir, since it now
tests more than just G_XOR.

Differential Revision: https://reviews.llvm.org/D88702
2020-10-01 15:33:34 -07:00
Amara Emerson 017b871502 [AArch64][GlobalISel] Alias rules for G_FCMP to G_ICMP.
No need to be different here for the vast majority of rules.
2020-10-01 15:20:09 -07:00
Amara Emerson e28c5899a2 [AArch64][GlobalISel] Make <8 x s8> integer arithmetic ops legal. 2020-10-01 14:35:21 -07:00
Amara Emerson a97e97faed [AArch64][GlobalISel] Make <8 x s8> shifts legal and add selection support. 2020-10-01 14:21:18 -07:00
Amara Emerson 9a2b3bbc59 Revert "[AArch64][GlobalISel] Make <8 x s8> shifts legal."
Accidentally pushed this.
2020-10-01 14:15:57 -07:00
Amara Emerson 8071c2f5c6 [AArch64][GlobalISel] Make <8 x s8> shifts legal. 2020-10-01 14:10:10 -07:00
Amara Emerson 9f6acb1358 [AArch64][GlobalISel] Merge G_SHL, G_ASHR and G_LSHR legalizer rules together.
There's no need for any difference between these.
2020-10-01 14:02:45 -07:00
Amara Emerson 73457536ff [AArch64][GlobalISel] Use custom legalization for G_TRUNC for v8i8 vectors.
Truncating to v8i8 is a case where we want to split the source but also generate
intermediate truncates to reduce the size of the source vector before truncating
down to v8i8. This implements the same strategy as what SelectionDAG does, but
I'm not certain where if anywhere in generic code it should live.

Use it for legalization of v8s8 = G_ICMP v8s32.

Differential Revision: https://reviews.llvm.org/D88191
2020-10-01 13:22:00 -07:00
Amara Emerson 4c265ce665 [AArch64][GlobalISel] Camp oversize v4s64 G_FPEXT operations. 2020-10-01 13:08:31 -07:00
Arthur Eubanks 499260c03b Revert "[CFGuard] Add address-taken IAT tables and delay-load support"
This reverts commit ef4e971e5e.
2020-10-01 11:29:54 -07:00
Martin Storsjö f4b9dfd9bc [AArch64] Don't merge sp decrement into later stores when using WinCFI
This matches the corresponding existing case in
AArch64LoadStoreOpt::findMatchingUpdateInsnForward.

Both cases could also be modified to check
MBBI->getFlag(FrameSetup/FrameDestroy) instead of forbidding any
optimization involving SP, but the effect is probably pretty much
the same.

Differential Revision: https://reviews.llvm.org/D88541
2020-10-01 19:03:27 +03:00
Andrew Paverd ef4e971e5e [CFGuard] Add address-taken IAT tables and delay-load support
This patch adds support for creating Guard Address-Taken IAT Entry Tables (.giats$y sections) in object files, matching the behavior of MSVC. These contain lists of address-taken imported functions, which are used by the linker to create the final GIATS table.
Additionally, if any DLLs are delay-loaded, the linker must look through the .giats tables and add the respective load thunks of address-taken imports to the GFIDS table, as these are also valid call targets.

Reviewed By: rnk

Differential Revision: https://reviews.llvm.org/D87544
2020-10-01 12:45:07 +01:00
Kerry McLaughlin fcf70e1e3b [SVE][CodeGen] Lower scalable fp_extend & fp_round operations
This patch adds FP_EXTEND_MERGE_PASSTHRU & FP_ROUND_MERGE_PASSTHRU
ISD nodes, used to lower scalable vector fp_extend/fp_round operations.
fp_round has an additional argument, the 'trunc' flag, which is an integer of zero or one.

This also fixes a warning introduced by the new tests added to sve-split-fcvt.ll,
resulting from an implicit TypeSize -> uint64_t cast in SplitVecOp_FP_ROUND.

Reviewed By: sdesmalen, paulwalker-arm

Differential Revision: https://reviews.llvm.org/D88321
2020-10-01 12:17:37 +01:00
Sam Parker 38f625d0d1 [ARM][LowOverheadLoops] Adjust Start insertion.
Try to move the insertion point to become the terminator of the
block, usually the preheader.

Differential Revision: https://reviews.llvm.org/D88638
2020-10-01 10:49:19 +01:00
Kerry McLaughlin 75db7cf78a [SVE][CodeGen] Legalisation of integer -> floating point conversions
Splitting the operand of a scalable [S|U]INT_TO_FP results in a
concat_vectors operation where the operands are unpacked FP
scalable vectors (e.g. nxv2f32).
This patch adds custom lowering of concat_vectors which
checks that the number of operands is 2, and isel patterns
to match concat_vectors of scalable FP types with uzp1.

Reviewed By: efriedma, paulwalker-arm

Differential Revision: https://reviews.llvm.org/D88033
2020-10-01 10:43:20 +01:00
Sam Parker 6ec5f32497 [ARM][LowOverheadLoops] Iteration count liveness
Before deciding to insert a [W|D]LSTP, check that defining LR with
the element count won't affect any other instructions that should be
taking the iteration count.

Differential Revision: https://reviews.llvm.org/D88549
2020-10-01 10:11:10 +01:00
Sam Parker 7b90516d47 [ARM][LowOverheadLoops] Start insertion point
If possible, try not to move the start position earlier than it
already is.

Differential Revision: https://reviews.llvm.org/D88542
2020-10-01 10:05:25 +01:00
Sam Parker dfa2c14b8f [ARM][LowOverheadLoops] Use iterator for InsertPt.
Use a MachineBasicBlock::iterator instead of a MachineInstr* for the
position of our LoopStart instruction. NFCish, as it change debug
info.
2020-10-01 08:32:35 +01:00
Amara Emerson da11479fd1 [AArch64][GlobalISel] Select all-zero G_BUILD_VECTOR into a zero mov.
Unfortunately the leaf SDAG patterns aren't supported yet so we need to do
this manually, but it's not a significant amount of code anyway.

Differential Revision: https://reviews.llvm.org/D87924
2020-09-30 23:53:38 -07:00
Andrew Dona-Couch 1fedd90cc7 [AVR] fix interrupt stack pointer restoration
This patch fixes a corruption of the stack pointer and several registers in any AVR interrupt with non-empty stack frame.  Previously, the callee-saved registers were popped before restoring the stack pointer, causing the pointer math to use the wrong base value while also corrupting the caller's register.  This change fixes the code to restore the stack pointer last before exiting the interrupt service routine.

https://bugs.llvm.org/show_bug.cgi?id=47253

Reviewed By: dylanmckay

Differential Revision: https://reviews.llvm.org/D87735

Patch by Andrew Dona-Couch.
2020-10-01 18:52:13 +13:00
Amara Emerson 196c097bba [AArch64][GlobalISel] Clamp oversize FP arithmetic vectors. 2020-09-30 18:03:37 -07:00
Amara Emerson 4ab45cc226 [AArch64][GlobalISel] Add some more legal types for G_PHI, G_IMPLICIT_DEF, G_FREEZE.
Also use this opportunity start to clean up the mess of vector type lists we
have in the LegalizerInfo. Unfortunately since the legalizer rule builders require
std::initializer_list objects as parameters we can't programmatically generate the
type lists.
2020-09-30 17:25:33 -07:00
Jessica Paquette bc43ddf42f [AArch64][GlobalISel] NFC: Refactor G_FCMP selection code
Refactor this so it's similar to the existing integer comparison code.

Also add some missing 64-bit testcases to select-fcmp.mir.

Refactoring to prep for improving selection for G_FCMP-related conditional
branches etc.

Differential Revision: https://reviews.llvm.org/D88614
2020-09-30 16:50:39 -07:00
Craig Topper d1d7fc9832 [X86] Canonicalize (x > 1) ? x : 1 -> (x >= 1) ? x : 1 for sign and unsigned to enable the use of test instructions for the compare.
This will be further canonicalized to a compare involving 0
which will enable the use of test instructions. Either using
cmovg for signed for cmovne for unsigned.

Fixes more case for PR47049
2020-09-30 13:50:52 -07:00
Arthur Eubanks ce5379f0f0 [NPM] Add target specific hook to add passes for New Pass Manager
The patch adds a new TargetMachine member "registerPassBuilderCallbacks" for targets to add passes to the pass pipeline using the New Pass Manager (similar to adjustPassManager for the Legacy Pass Manager).

Reviewed By: aeubanks

Differential Revision: https://reviews.llvm.org/D88138
2020-09-30 13:29:43 -07:00
Rahman Lavaee 8955950c12 Exception support for basic block sections
This is part of the Propeller framework to do post link code layout optimizations. Please see the RFC here: https://groups.google.com/forum/#!msg/llvm-dev/ef3mKzAdJ7U/1shV64BYBAAJ and the detailed RFC doc here: https://github.com/google/llvm-propeller/blob/plo-dev/Propeller_RFC.pdf

This patch provides exception support for basic block sections by splitting the call-site table into call-site ranges corresponding to different basic block sections. Still all landing pads must reside in the same basic block section (which is guaranteed by the the core basic block section patch D73674 (ExceptionSection) ). Each call-site table will refer to the landing pad fragment by explicitly specifying @LPstart (which is omitted in the normal non-basic-block section case). All these call-site tables will share their action and type tables.

The C++ ABI somehow assumes that no landing pads point directly to LPStart (which works in the normal case since the function begin is never a landing pad), and uses LP.offset = 0 to specify no landing pad. In the case of basic block section where one section contains all the landing pads, the landing pad offset relative to LPStart could actually be zero. Thus, we avoid zero-offset landing pads by inserting a **nop** operation as the first non-CFI instruction in the exception section.

**Background on Exception Handling in C++ ABI**
https://github.com/itanium-cxx-abi/cxx-abi/blob/master/exceptions.pdf

Compiler emits an exception table for every function. When an exception is thrown, the stack unwinding library queries the unwind table (which includes the start and end of each function) to locate the exception table for that function.

The exception table includes a call site table for the function, which is used to guide the exception handling runtime to take the appropriate action upon an exception. Each call site record in this table is structured as follows:

| CallSite                       |  -->  Position of the call site (relative to the function entry)
| CallSite length           |  -->  Length of the call site.
| Landing Pad               |  -->  Position of the landing pad (relative to the landing pad fragment’s begin label)
| Action record offset  |  -->  Position of the first action record

The call site records partition a function into different pieces and describe what action must be taken for each callsite. The callsite fields are relative to the start of the function (as captured in the unwind table).

The landing pad entry is a reference into the function and corresponds roughly to the catch block of a try/catch statement. When execution resumes at a landing pad, it receives an exception structure and a selector value corresponding to the type of the exception thrown, and executes similar to a switch-case statement. The landing pad field is relative to the beginning of the procedure fragment which includes all the landing pads (@LPStart). The C++ ABI requires all landing pads to be in the same fragment. Nonetheless, without basic block sections, @LPStart is the same as the function @Start (found in the unwind table) and can be omitted.

The action record offset is an index into the action table which includes information about which exception types are caught.

**C++ Exceptions with Basic Block Sections**
Basic block sections break the contiguity of a function fragment. Therefore, call sites must be specified relative to the beginning of the basic block section. Furthermore, the unwinding library should be able to find the corresponding callsites for each section. To do so, the .cfi_lsda directive for a section must point to the range of call-sites for that section.
This patch introduces a new **CallSiteRange** structure which specifies the range of call-sites which correspond to every section:

  `struct CallSiteRange {
    // Symbol marking the beginning of the precedure fragment.
    MCSymbol *FragmentBeginLabel = nullptr;
    // Symbol marking the end of the procedure fragment.
    MCSymbol *FragmentEndLabel = nullptr;
    // LSDA symbol for this call-site range.
    MCSymbol *ExceptionLabel = nullptr;
    // Index of the first call-site entry in the call-site table which
    // belongs to this range.
    size_t CallSiteBeginIdx = 0;
    // Index just after the last call-site entry in the call-site table which
    // belongs to this range.
    size_t CallSiteEndIdx = 0;
    // Whether this is the call-site range containing all the landing pads.
    bool IsLPRange = false;
  };`

With N basic-block-sections, the call-site table is partitioned into N call-site ranges.

Conceptually, we emit the call-site ranges for sections sequentially in the exception table as if each section has its own exception table. In the example below, two sections result in the two call site ranges (denoted by LSDA1 and LSDA2) placed next to each other. However, their call-sites will refer to records in the shared Action Table. We also emit the header fields (@LPStart and CallSite Table Length) for each call site range in order to place the call site ranges in separate LSDAs. We note that with -basic-block-sections, The CallSiteTableLength will not actually represent the length of the call site table, but rather the reference to the action table. Since the only purpose of this field is to locate the action table, correctness is guaranteed.

Finally, every call site range has one @LPStart pointer so the landing pads of each section must all reside in one section (not necessarily the same section). To make this easier, we decide to place all landing pads of the function in one section (hence the `IsLPRange` field in CallSiteRange).

|  @LPStart                   |  --->  Landing pad fragment     ( LSDA1 points here)
| CallSite Table Length | ---> Used to find the action table.
| CallSites                     |
| …                                 |
| …                                 |
| @LPStart                    |  --->  Landing pad fragment ( LSDA2 points here)
| CallSite Table Length |
| CallSites                     |
| …                                 |
| …                                 |
…
…
|      Action Table          |
|      Types Table           |

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D73739
2020-09-30 11:05:55 -07:00
Congzhe Cao 8d8cb1ad80 [AArch64] Avoid pairing loads when the base reg is modified
When pairing loads, we should check if in between the two loads the
base register has been modified. If that is the case then avoid pairing
them because the second load actually loads from a different address.

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D86956
2020-09-30 13:06:51 -04:00
Kazushi (Jam) Marukawa 1034262e0a [VE] Support TargetBlockAddress
Change to handle TargetBlockAddress and add a regression test for it.

Reviewed By: simoll

Differential Revision: https://reviews.llvm.org/D88576
2020-10-01 00:48:21 +09:00
Zarko Todorovski 052c5bf40a [PPC] Do not emit extswsli in 32BIT mode when using -mcpu=pwr9
It looks like in some circumstances when compiling with `-mcpu=pwr9` we create an EXTSWSLI node when which causes llc to fail. No such error occurs in pwr8 or lower.

This occurs in 32BIT AIX and BE Linux. the cause seems to be that the default return in combineSHL is to create an EXTSWSLI node.  Adding a check for whether we are in PPC64 before that fixes the issue.

Reviewed By: #powerpc, nemanjai

Differential Revision: https://reviews.llvm.org/D87046
2020-09-30 11:06:20 -04:00
Matt Arsenault a66fca44ac RegAllocFast: Add extra DBG_VALUE for live out spills
This allows LiveDebugValues to insert the proper DBG_VALUEs in live
out blocks if a spill is inserted before the use of a
register. Previously, this would see the register use as the last
DBG_VALUE, even though the stack slot should be treated as the live
out value.

This avoids an lldb test regression when D52010 is re-applied.
2020-09-30 10:35:25 -04:00
Matt Arsenault 89baeaef2f Reapply "RegAllocFast: Rewrite and improve"
This reverts commit 73a6a164b8.
2020-09-30 10:35:25 -04:00
Sean Fertile dfb717da1f [PowerPC] Remove support for VRSAVE save/restore/update.
After removal of Darwin as a PowerPC subtarget, the VRSAVE
save/restore/spill/update code is no longer needed by any supported
subtarget, so remove it while keeping support for vrsave and related instruction
aliases for inline asm. I've pre-commited tests to document the existing vrsave
handling in relation to @llvm.eh.unwind.init and inline asm usage, as
well as a test which shows a beahviour change on AIX related to
returning vector type as we were wrongly emiting VRSAVE_UPDATE on AIX.
2020-09-30 10:05:53 -04:00
Sam Parker 3f88c10a6b [RDA] isSafeToDefRegAt: Look at global uses
We weren't looking at global uses of a value, so we could happily
overwrite the register incorrectly.

Differential Revision: https://reviews.llvm.org/D88554
2020-09-30 14:06:45 +01:00
Sam Parker 3cbd01ddb9 [NFC][ARM] Add more LowOverheadLoop tests. 2020-09-30 12:20:07 +01:00
Xiang1 Zhang 413577a879 [X86] Support Intel Key Locker
Key Locker provides a mechanism to encrypt and decrypt data with an AES key without having access
to the raw key value by converting AES keys into “handles”. These handles can be used to perform the
same encryption and decryption operations as the original AES keys, but they only work on the current
system and only until they are revoked. If software revokes Key Locker handles (e.g., on a reboot),
then any previous handles can no longer be used.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D88398
2020-09-30 18:08:45 +08:00
Jay Foad cdac4492b4 [SplitKit] Cope with no live subranges in defFromParent
Following on from D87757 "[SplitKit] Only copy live lanes", it is
possible to split a live range at a point when none of its subranges
are live. This patch handles that case by inserting an implicit def
of the superreg.

Patch by Quentin Colombet!

Differential Revision: https://reviews.llvm.org/D88397
2020-09-30 10:16:25 +01:00
Mirko Brkusanin 0249df33fe [AMDGPU] Do not generate mul with 1 in AMDGPU Atomic Optimizer
Check if operand of mul is constant value of one for certain atomic
instructions in order to avoid making unnecessary instructions when
-amdgpu-atomic-optimizer is present.

Differential Revision: https://reviews.llvm.org/D88315
2020-09-30 11:09:18 +02:00