Commit Graph

59508 Commits

Author SHA1 Message Date
Sam Parker dfa2c14b8f [ARM][LowOverheadLoops] Use iterator for InsertPt.
Use a MachineBasicBlock::iterator instead of a MachineInstr* for the
position of our LoopStart instruction. NFCish, as it change debug
info.
2020-10-01 08:32:35 +01:00
Amara Emerson da11479fd1 [AArch64][GlobalISel] Select all-zero G_BUILD_VECTOR into a zero mov.
Unfortunately the leaf SDAG patterns aren't supported yet so we need to do
this manually, but it's not a significant amount of code anyway.

Differential Revision: https://reviews.llvm.org/D87924
2020-09-30 23:53:38 -07:00
Andrew Dona-Couch 1fedd90cc7 [AVR] fix interrupt stack pointer restoration
This patch fixes a corruption of the stack pointer and several registers in any AVR interrupt with non-empty stack frame.  Previously, the callee-saved registers were popped before restoring the stack pointer, causing the pointer math to use the wrong base value while also corrupting the caller's register.  This change fixes the code to restore the stack pointer last before exiting the interrupt service routine.

https://bugs.llvm.org/show_bug.cgi?id=47253

Reviewed By: dylanmckay

Differential Revision: https://reviews.llvm.org/D87735

Patch by Andrew Dona-Couch.
2020-10-01 18:52:13 +13:00
Amara Emerson 196c097bba [AArch64][GlobalISel] Clamp oversize FP arithmetic vectors. 2020-09-30 18:03:37 -07:00
Amara Emerson 93a1fc2e18 Try to fix build. May have used a C++ feature too new/not supported on all platforms. 2020-09-30 17:36:38 -07:00
Amara Emerson 4ab45cc226 [AArch64][GlobalISel] Add some more legal types for G_PHI, G_IMPLICIT_DEF, G_FREEZE.
Also use this opportunity start to clean up the mess of vector type lists we
have in the LegalizerInfo. Unfortunately since the legalizer rule builders require
std::initializer_list objects as parameters we can't programmatically generate the
type lists.
2020-09-30 17:25:33 -07:00
Jessica Paquette bc43ddf42f [AArch64][GlobalISel] NFC: Refactor G_FCMP selection code
Refactor this so it's similar to the existing integer comparison code.

Also add some missing 64-bit testcases to select-fcmp.mir.

Refactoring to prep for improving selection for G_FCMP-related conditional
branches etc.

Differential Revision: https://reviews.llvm.org/D88614
2020-09-30 16:50:39 -07:00
Ahsan Saghir 66d2e3f495 [PowerPC] Add outer product instructions for MMA
This patch adds outer product instructions for MMA, including related infrastructure, and their tests.

Depends on D84968.

Reviewed By: #powerpc, bsaleil, amyk

Differential Revision: https://reviews.llvm.org/D88043
2020-09-30 18:06:49 -05:00
Stanislav Mekhanoshin 722d792499 [AMDGPU] Reorganize VOP3P encoding
This changes width of encoding and opcode fields to match the
documentation.

Differential Revision: https://reviews.llvm.org/D88619
2020-09-30 15:27:06 -07:00
Craig Topper d1d7fc9832 [X86] Canonicalize (x > 1) ? x : 1 -> (x >= 1) ? x : 1 for sign and unsigned to enable the use of test instructions for the compare.
This will be further canonicalized to a compare involving 0
which will enable the use of test instructions. Either using
cmovg for signed for cmovne for unsigned.

Fixes more case for PR47049
2020-09-30 13:50:52 -07:00
Arthur Eubanks ce5379f0f0 [NPM] Add target specific hook to add passes for New Pass Manager
The patch adds a new TargetMachine member "registerPassBuilderCallbacks" for targets to add passes to the pass pipeline using the New Pass Manager (similar to adjustPassManager for the Legacy Pass Manager).

Reviewed By: aeubanks

Differential Revision: https://reviews.llvm.org/D88138
2020-09-30 13:29:43 -07:00
Rahman Lavaee 8955950c12 Exception support for basic block sections
This is part of the Propeller framework to do post link code layout optimizations. Please see the RFC here: https://groups.google.com/forum/#!msg/llvm-dev/ef3mKzAdJ7U/1shV64BYBAAJ and the detailed RFC doc here: https://github.com/google/llvm-propeller/blob/plo-dev/Propeller_RFC.pdf

This patch provides exception support for basic block sections by splitting the call-site table into call-site ranges corresponding to different basic block sections. Still all landing pads must reside in the same basic block section (which is guaranteed by the the core basic block section patch D73674 (ExceptionSection) ). Each call-site table will refer to the landing pad fragment by explicitly specifying @LPstart (which is omitted in the normal non-basic-block section case). All these call-site tables will share their action and type tables.

The C++ ABI somehow assumes that no landing pads point directly to LPStart (which works in the normal case since the function begin is never a landing pad), and uses LP.offset = 0 to specify no landing pad. In the case of basic block section where one section contains all the landing pads, the landing pad offset relative to LPStart could actually be zero. Thus, we avoid zero-offset landing pads by inserting a **nop** operation as the first non-CFI instruction in the exception section.

**Background on Exception Handling in C++ ABI**
https://github.com/itanium-cxx-abi/cxx-abi/blob/master/exceptions.pdf

Compiler emits an exception table for every function. When an exception is thrown, the stack unwinding library queries the unwind table (which includes the start and end of each function) to locate the exception table for that function.

The exception table includes a call site table for the function, which is used to guide the exception handling runtime to take the appropriate action upon an exception. Each call site record in this table is structured as follows:

| CallSite                       |  -->  Position of the call site (relative to the function entry)
| CallSite length           |  -->  Length of the call site.
| Landing Pad               |  -->  Position of the landing pad (relative to the landing pad fragment’s begin label)
| Action record offset  |  -->  Position of the first action record

The call site records partition a function into different pieces and describe what action must be taken for each callsite. The callsite fields are relative to the start of the function (as captured in the unwind table).

The landing pad entry is a reference into the function and corresponds roughly to the catch block of a try/catch statement. When execution resumes at a landing pad, it receives an exception structure and a selector value corresponding to the type of the exception thrown, and executes similar to a switch-case statement. The landing pad field is relative to the beginning of the procedure fragment which includes all the landing pads (@LPStart). The C++ ABI requires all landing pads to be in the same fragment. Nonetheless, without basic block sections, @LPStart is the same as the function @Start (found in the unwind table) and can be omitted.

The action record offset is an index into the action table which includes information about which exception types are caught.

**C++ Exceptions with Basic Block Sections**
Basic block sections break the contiguity of a function fragment. Therefore, call sites must be specified relative to the beginning of the basic block section. Furthermore, the unwinding library should be able to find the corresponding callsites for each section. To do so, the .cfi_lsda directive for a section must point to the range of call-sites for that section.
This patch introduces a new **CallSiteRange** structure which specifies the range of call-sites which correspond to every section:

  `struct CallSiteRange {
    // Symbol marking the beginning of the precedure fragment.
    MCSymbol *FragmentBeginLabel = nullptr;
    // Symbol marking the end of the procedure fragment.
    MCSymbol *FragmentEndLabel = nullptr;
    // LSDA symbol for this call-site range.
    MCSymbol *ExceptionLabel = nullptr;
    // Index of the first call-site entry in the call-site table which
    // belongs to this range.
    size_t CallSiteBeginIdx = 0;
    // Index just after the last call-site entry in the call-site table which
    // belongs to this range.
    size_t CallSiteEndIdx = 0;
    // Whether this is the call-site range containing all the landing pads.
    bool IsLPRange = false;
  };`

With N basic-block-sections, the call-site table is partitioned into N call-site ranges.

Conceptually, we emit the call-site ranges for sections sequentially in the exception table as if each section has its own exception table. In the example below, two sections result in the two call site ranges (denoted by LSDA1 and LSDA2) placed next to each other. However, their call-sites will refer to records in the shared Action Table. We also emit the header fields (@LPStart and CallSite Table Length) for each call site range in order to place the call site ranges in separate LSDAs. We note that with -basic-block-sections, The CallSiteTableLength will not actually represent the length of the call site table, but rather the reference to the action table. Since the only purpose of this field is to locate the action table, correctness is guaranteed.

Finally, every call site range has one @LPStart pointer so the landing pads of each section must all reside in one section (not necessarily the same section). To make this easier, we decide to place all landing pads of the function in one section (hence the `IsLPRange` field in CallSiteRange).

|  @LPStart                   |  --->  Landing pad fragment     ( LSDA1 points here)
| CallSite Table Length | ---> Used to find the action table.
| CallSites                     |
| …                                 |
| …                                 |
| @LPStart                    |  --->  Landing pad fragment ( LSDA2 points here)
| CallSite Table Length |
| CallSites                     |
| …                                 |
| …                                 |
…
…
|      Action Table          |
|      Types Table           |

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D73739
2020-09-30 11:05:55 -07:00
Congzhe Cao 8d8cb1ad80 [AArch64] Avoid pairing loads when the base reg is modified
When pairing loads, we should check if in between the two loads the
base register has been modified. If that is the case then avoid pairing
them because the second load actually loads from a different address.

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D86956
2020-09-30 13:06:51 -04:00
Kazushi (Jam) Marukawa 1034262e0a [VE] Support TargetBlockAddress
Change to handle TargetBlockAddress and add a regression test for it.

Reviewed By: simoll

Differential Revision: https://reviews.llvm.org/D88576
2020-10-01 00:48:21 +09:00
Zarko Todorovski 052c5bf40a [PPC] Do not emit extswsli in 32BIT mode when using -mcpu=pwr9
It looks like in some circumstances when compiling with `-mcpu=pwr9` we create an EXTSWSLI node when which causes llc to fail. No such error occurs in pwr8 or lower.

This occurs in 32BIT AIX and BE Linux. the cause seems to be that the default return in combineSHL is to create an EXTSWSLI node.  Adding a check for whether we are in PPC64 before that fixes the issue.

Reviewed By: #powerpc, nemanjai

Differential Revision: https://reviews.llvm.org/D87046
2020-09-30 11:06:20 -04:00
Benjamin Kramer 2c394bd407 [PowerPC] Avoid unused variable warning in Release builds
PPCFrameLowering.cpp:632:8: warning: unused variable 'isAIXABI' [-Wunused-variable]
2020-09-30 17:02:55 +02:00
Sean Fertile dfb717da1f [PowerPC] Remove support for VRSAVE save/restore/update.
After removal of Darwin as a PowerPC subtarget, the VRSAVE
save/restore/spill/update code is no longer needed by any supported
subtarget, so remove it while keeping support for vrsave and related instruction
aliases for inline asm. I've pre-commited tests to document the existing vrsave
handling in relation to @llvm.eh.unwind.init and inline asm usage, as
well as a test which shows a beahviour change on AIX related to
returning vector type as we were wrongly emiting VRSAVE_UPDATE on AIX.
2020-09-30 10:05:53 -04:00
Xiang1 Zhang 413577a879 [X86] Support Intel Key Locker
Key Locker provides a mechanism to encrypt and decrypt data with an AES key without having access
to the raw key value by converting AES keys into “handles”. These handles can be used to perform the
same encryption and decryption operations as the original AES keys, but they only work on the current
system and only until they are revoked. If software revokes Key Locker handles (e.g., on a reboot),
then any previous handles can no longer be used.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D88398
2020-09-30 18:08:45 +08:00
Jonas Paulsson 9f5da55f5d [SystemZ] Support bare nop instructions
Add support of "nop" and "nopr" (without operands) to assembler.

Review: Ulrich Weigand
2020-09-30 11:23:41 +02:00
Mirko Brkusanin 0249df33fe [AMDGPU] Do not generate mul with 1 in AMDGPU Atomic Optimizer
Check if operand of mul is constant value of one for certain atomic
instructions in order to avoid making unnecessary instructions when
-amdgpu-atomic-optimizer is present.

Differential Revision: https://reviews.llvm.org/D88315
2020-09-30 11:09:18 +02:00
Sam Parker 779a8a028f [ARM][LowOverheadLoops] TryRemove helper.
Make a helper function that wraps around RDA::isSafeToRemove and
utilises the existing DCE IT block checks.
2020-09-30 09:37:24 +01:00
Sam Parker 195c22f273 [ARM] Change VPT state assertion
Just because we haven't encountered an instruction setting the VPR,
it doesn't mean we can't create a VPT block - the VPR maybe a
live-in.

Differential Revision: https://reviews.llvm.org/D88224
2020-09-30 08:01:10 +01:00
Craig Topper 618a890b72 [X86] Increase the depth threshold required to form VPERMI2W/VPERMI2B in shuffle combining
These instructions are implemented with two port 5 uops and one port 015 uop so they are more complicated that most shuffles.

This patch increases the depth threshold for when we form them during shuffle combining to try to limit increasing the number of uops especially on port 5.

Differential Revision: https://reviews.llvm.org/D88503
2020-09-29 18:37:23 -07:00
Evandro Menezes c6b18cf967 [RISCV] Use the extensions in the canonical order (NFC)
Use the ISA extensions for specific processors in the conventional canonical order.
2020-09-29 20:03:02 -05:00
Stanislav Mekhanoshin 61b3106965 [AMDGPU] Remove SIEncodingFamily.GFX10_B
It turns to be not needed anymore.

Differential Revision: https://reviews.llvm.org/D88520
2020-09-29 15:59:49 -07:00
Cameron McInally 80381c4dc9 [SVE] Lower fixed length VECREDUCE_[FMAX|FMIN] to Scalable
Differential Revision: https://reviews.llvm.org/D88444
2020-09-29 16:22:29 -05:00
Eric Astor feb74530f8 [ms] [llvm-ml] Accept whitespace around the dot operator
MASM allows arbitrary whitespace around the Intel dot operator, especially when used for struct field lookup

Reviewed By: rnk

Differential Revision: https://reviews.llvm.org/D88450
2020-09-29 17:01:13 -04:00
Eric Astor 6b70a83d9c [ms] [llvm-ml] Add support for .radix directive, and accept all radix specifiers
Add support for .radix directive, and radix specifiers [yY] (binary), [oOqQ] (octal), and [tT] (decimal).

Also, when lexing MASM integers, require radix specifier; MASM requires that all literals without a radix specifier be treated as in the default radix. (e.g., 0100 = 100)

Relanding D87400, now with fewer ms-inline-asm tests broken!

Reviewed By: rnk

Differential Revision: https://reviews.llvm.org/D88337
2020-09-29 16:55:51 -04:00
Simon Pilgrim 346199152f LanaiTargetMachine.h - remove unnecessary includes. NFCI. 2020-09-29 18:15:31 +01:00
Simon Pilgrim ae7ab96284 LanaiSubtarget.h - remove unnecessary includes. NFCI.
TargetFrameLowering.h is guaranteed to be covered by LanaiFrameLowering.h
2020-09-29 18:15:31 +01:00
Simon Pilgrim a06581ef39 MSP430TargetMachine.h - remove unused includes. NFCI. 2020-09-29 16:41:59 +01:00
Simon Pilgrim 8f34216ece NVPTXTargetMachine.h - remove unused includes. NFCI. 2020-09-29 16:41:59 +01:00
Simon Pilgrim 30c0bea571 SparcSubtarget.h - cleanup include dependencies. NFCI.
TargetFrameLowering.h is guaranteed to be covered by SparcFrameLowering.h

Fix missing implicit Triple.h dependency.
2020-09-29 16:41:58 +01:00
Mirko Brkusanin 8b08fa0103 Revert "[AMDGPU] Reorganize GCN subtarget features for unaligned access"
This reverts commit f5cd7ec9f3.

Certain rocPRIM/rocThrust/hipCUB tests were failing because of this change.
2020-09-29 15:33:34 +02:00
Jonas Paulsson 75a5febe31 [SystemZ] Don't emit PC-relative memory accesses to unaligned symbols.
In the presence of packed structures (#pragma pack(1)) where elements are
referenced through pointers, there will be stores/loads with alignment values
matching the default alignments for the element types while the elements are
in fact unaligned. Strictly speaking this is incorrect source code, but is
unfortunately part of existing code and therefore now addressed.

This patch improves the pattern predicate for PC-relative loads and stores by
not only checking the alignment value of the instruction, but also making
sure that the symbol (and element) itself is aligned.

Fixes https://bugs.llvm.org/show_bug.cgi?id=44405

Review: Ulrich Weigand

Differential Revision: https://reviews.llvm.org/D87510
2020-09-29 14:51:13 +02:00
Sam Parker 4c19b89b25 [NFC][ARM] Comments and lambdas
Add some comments in LowOverheadLoops and make some lambda variables
explicit arguments instead of capturing.
2020-09-29 08:41:53 +01:00
Craig Topper 82da0cabb9 [X86] Add computeKnownBits support for PEXT.
The number of zeros in the mask provides a lower bound on the number
of leading zeros in the result.
2020-09-28 22:54:07 -07:00
Amara Emerson b9f2b3bc43 [AArch64][GlobalISel] Scalarize <2 x s64> G_MUL since we don't have native support for it.
Differential Revision: https://reviews.llvm.org/D88437
2020-09-28 19:29:45 -07:00
Yonghong Song 54d9f743c8 BPF: move AbstractMemberAccess and PreserveDIType passes to EP_EarlyAsPossible
Move abstractMemberAccess and PreserveDIType passes as early as
possible, right after clang code generation.

Currently, compiler may transform the above code
  p1 = llvm.bpf.builtin.preserve.struct.access(base, 0, 0);
  p2 = llvm.bpf.builtin.preserve.struct.access(p1, 1, 2);
  a = llvm.bpf.builtin.preserve_field_info(p2, EXIST);
  if (a) {
    p1 = llvm.bpf.builtin.preserve.struct.access(base, 0, 0);
    p2 = llvm.bpf.builtin.preserve.struct.access(p1, 1, 2);
    bpf_probe_read(buf, buf_size, p2);
  }
to
  p1 = llvm.bpf.builtin.preserve.struct.access(base, 0, 0);
  p2 = llvm.bpf.builtin.preserve.struct.access(p1, 1, 2);
  a = llvm.bpf.builtin.preserve_field_info(p2, EXIST);
  if (a) {
    bpf_probe_read(buf, buf_size, p2);
  }
and eventually assembly code looks like
  reloc_exist = 1;
  reloc_member_offset = 10; //calculate member offset from base
  p2 = base + reloc_member_offset;
  if (reloc_exist) {
    bpf_probe_read(bpf, buf_size, p2);
  }
if during libbpf relocation resolution, reloc_exist is actually
resolved to 0 (not exist), reloc_member_offset relocation cannot
be resolved and will be patched with illegal instruction.
This will cause verifier failure.

This patch attempts to address this issue by do chaining
analysis and replace chains with special globals right
after clang code gen. This will remove the cse possibility
described in the above. The IR typically looks like
  %6 = load @llvm.sk_buff:0:50$0:0:0:2:0
  %7 = bitcast %struct.sk_buff* %2 to i8*
  %8 = getelementptr i8, i8* %7, %6
for a particular address computation relocation.

But this transformation has another consequence, code sinking
may happen like below:
  PHI = <possibly different @preserve_*_access_globals>
  %7 = bitcast %struct.sk_buff* %2 to i8*
  %8 = getelementptr i8, i8* %7, %6

For such cases, we will not able to generate relocations since
multiple relocations are merged into one.

This patch introduced a passthrough builtin
to prevent such optimization. Looks like inline assembly has more
impact for optimizaiton, e.g., inlining. Using passthrough has
less impact on optimizations.

A new IR pass is introduced at the beginning of target-dependent
IR optimization, which does:
  - report fatal error if any reloc global in PHI nodes
  - remove all bpf passthrough builtin functions

Changes for existing CORE tests:
  - for clang tests, add "-Xclang -disable-llvm-passes" flags to
    avoid builtin->reloc_global transformation so the test is still
    able to check correctness for clang generated IR.
  - for llvm CodeGen/BPF tests, add "opt -O2 <ir_file> | llvm-dis" command
    before "llc" command since "opt" is needed to call newly-placed
    builtin->reloc_global transformation. Add target triple in the IR
    file since "opt" requires it.
  - Since target triple is added in IR file, if a test may produce
    different results for different endianness, two tests will be
    created, one for bpfeb and another for bpfel, e.g., some tests
    for relocation of lshift/rshift of bitfields.
  - field-reloc-bitfield-1.ll has different relocations compared to
    old codes. This is because for the structure in the test,
    new code returns struct layout alignment 4 while old code
    is 8. Align 8 is more precise and permits double load. With align 4,
    the new mechanism uses 4-byte load, so generating different
    relocations.
  - test intrinsic-transforms.ll is removed. This is used to test
    cse on intrinsics so we do not lose metadata. Now metadata is attached
    to global and not instruction, it won't get lost with cse.

Differential Revision: https://reviews.llvm.org/D87153
2020-09-28 16:56:22 -07:00
Craig Topper e53196b1e8 [X86] Add support for calling SimplifyDemandedBits on the input of PDEP with a constant mask.
We can do several optimizations for PDEP using computeKnownBits and SimplifyDemandedBits

-If the MSBs of the output aren't demanded, those MSBs of the mask input aren't demanded either. We need to keep the most significant demanded bit of the mask and any mask bits before it.
-The number of possible ones in the mask determines how many bits of the lsbs of the other operand are demanded. Any bits of the mask we don't demand by the previous rule should not be counted.
-The result will have zeros in any position that the mask is zero.
-Since non-mask input bits can only be output in the original position or a higher bit position, the result will have at least as many trailing zeroes as the non-mask input.

Differential Revision: https://reviews.llvm.org/D87883
2020-09-28 14:21:30 -07:00
Amara Emerson 082321909e [GlobalISel] Add support for lowering of vector G_SELECT and use for AArch64.
The lowering is a port of the SDAG expansion.

Differential Revision: https://reviews.llvm.org/D88364
2020-09-28 14:00:46 -07:00
Amara Emerson 5aa56b2429 Revert "Revert "[AArch64][GlobalISel] Add selection support for <8 x s16> G_INSERT_VECTOR_ELT with GPR scalar.""
This isn't a real with the codegen, it's a previously known bug in clang which
causes non-deterministic failures due to garbage bits in undef registers being
used in saturating instructions.

I'm disabling the result checking for the test until this issue is resolved.

This reverts commit 6c8168324b.
2020-09-28 13:44:51 -07:00
Baptiste Saleil 0156914275 [PowerPC] Legalize v256i1 and v512i1 and implement load and store of these types
This patch legalizes the v256i1 and v512i1 types that will be used for MMA.

It implements loads and stores of these types.
v256i1 is a pair of VSX registers, so for this type, we load/store the two
underlying registers. v512i1 is used for MMA accumulators. So in addition to
loading and storing the 4 associated VSX registers, we generate instructions to
prime (copy the VSX registers to the accumulator) after loading and unprime
(copy the accumulator back to the VSX registers) before storing.

This patch also adds the UACC register class that is necessary to implement the
loads and stores. This class represents accumulator in their unprimed form and
allow the distinction between primed and unprimed accumulators to avoid invalid
copies of the VSX registers associated with primed accumulators.

Differential Revision: https://reviews.llvm.org/D84968
2020-09-28 14:39:37 -05:00
Jon Roelofs 83dc53d30c [AArch64] reuse another map iterator. NFC 2020-09-28 11:30:21 -07:00
Amara Emerson 6c8168324b Revert "[AArch64][GlobalISel] Add selection support for <8 x s16> G_INSERT_VECTOR_ELT with GPR scalar."
This reverts commit b5e87c9ef2 as it seems to have
broken a bot.
2020-09-28 11:25:19 -07:00
Jessica Paquette 9d7ec46f57 [AArch64][GlobalISel] Infer whether G_PHI is going to be a FPR in regbankselect
Some instructions (G_LOAD, G_SELECT, G_UNMERGE_VALUES) check if their uses
will define/use FPRs (using `onlyUsesFP` and `onlyDefinesFP`).

The register bank of a use isn't necessarily known when an instruction asks for
this.

Teach `hasFPConstraints` to look at the instructions feeding into a G_PHI when
its destination bank is unknown. If any of them are FPR, assume the entire
G_PHI will also be assigned a FPR.

Since a phi can have many inputs, and those inputs can in turn be phis,
restrict the search depth to a very low number.

Also improve the docs for `hasFPConstraints` and friends a little.

This is a 0.3% code size improvement on CTMark/Bullet at -O3, and a 0.2% code
size improvement at CTMark/pairlocalalign at -O3.

Differential Revision: https://reviews.llvm.org/D88177
2020-09-28 10:37:09 -07:00
Jessica Paquette f55a5186c6 [AArch64][GlobalISel] Support shifted register form in emitTST
Support emitting ANDSXrs and ANDSWrs in `emitTST`. Update opt-fold-compare.mir
to show that it works.

Differential Revision: https://reviews.llvm.org/D87530
2020-09-28 10:13:47 -07:00
Jessica Paquette a52e78012a [GlobalISel] Combine (xor (and x, y), y) -> (and (not x), y)
When we see this:

```
%and = G_AND %x, %y
%xor = G_XOR %and, %y
```

Produce this:

```
%not = G_XOR %x, -1
%new_and = G_AND %not, %y
```

as long as we are guaranteed to eliminate the original G_AND.

Also matches all commuted forms. E.g.

```
%and = G_AND %y, %x
%xor = G_XOR %y, %and
```

will be matched as well.

Differential Revision: https://reviews.llvm.org/D88104
2020-09-28 10:08:14 -07:00
Jon Roelofs 37ef2255b6 [AArch64] Reuse map iterator instead of double lookup. NFC 2020-09-28 09:47:00 -07:00
Jay Foad 0e0a0c8d2c [AMDGPU] Reformat AMDGPUTargetLowering::isSDNodeAlwaysUniform. NFC. 2020-09-28 16:24:16 +01:00