Fix an off by one error in the bounds checking for 'dinsu' and update
the ranges in the test comments so that they are accurate.
Reviewers: atanasyan
https://reviews.llvm.org/D41183
llvm-svn: 320974
For Cylone, the instruction "movi.2d vD, #0" is executed incorrectly in some rare
circumstances. Work around the issue conservatively by avoiding the instruction entirely.
This patch changes CodeGen so that problematic instructions are never
generated, and the AsmParser so that an equivalent instruction is used (with a
warning).
llvm-svn: 320965
Search from AND nodes to find whether they can be propagated back to
loads, so that the AND and load can be combined into a narrow load.
We search through OR, XOR and other AND nodes and all bar one of the
leaves are required to be loads or constants. The exception node then
needs to be masked off meaning that the 'and' isn't removed, but the
loads(s) are narrowed still.
Differential Revision: https://reviews.llvm.org/D41177
llvm-svn: 320962
This is a follow up to the fmaddsub support added in r320950. Hopefully in the future we can fix lowering to handle this fmsubadd too.
llvm-svn: 320951
Summary:
We had no tests for this and we couldn't do the optimization because of a bad use count check. We need to know how many non-undef pieces of the build vector were filled in and ensure our use count is equal to that. But on the shuffle combine version we need the use count to be 2.
The missing coverage was noticed during the review of D40335.
Reviewers: RKSimon, zvi, spatel
Reviewed By: RKSimon
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D41133
llvm-svn: 320950
Assuming we can safely adjust the broadcast index for the new type to keep it suitably aligned, then peek through BITCASTs when looking for the broadcast source.
Fixes PR32007
llvm-svn: 320933
If the loop operand type is int8 then there will be no residual loop for the
unknown size expansion. Dont create the residual-size and bytes-copied values
when they are not needed.
llvm-svn: 320929
Previously we promoted to v8i64, but we don't need to go all the way to 512-bits. If we have VLX we can use the 256-bit instruction. And even if we don't have VLX we can widen v8i32 to v16i32 and drop the upper half.
llvm-svn: 320926
Summary:
Currently we don't handle v32i1/v64i1 insert_vector_elt correctly as we fail to look at the number of elements closely and assume it can only be v16i1 or v8i1.
We also can't type legalize v64i1 insert_vector_elt correctly on KNL due to the type not being byte addressable as required by the legalizing through memory accesses path requires.
For the first issue, the patch now tries to pick a 512-bit register with the correct number of elements and promotes to that.
For the second issue, we now extend the vector to a byte addressable type, do the stores to memory, load the two halves, and then truncate the halves back to the original type. Technically since we changed the type, we may not need two loads, but actually checking that is more work and for the v64i1 case we do need them.
Reviewers: RKSimon, delena, spatel, zvi
Reviewed By: RKSimon
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D40942
llvm-svn: 320849
The original memcpy expansion inserted the loop basic block inbetween
the 2 new basic blocks created by splitting the original block the memcpy
call was in. This commit makes the new memcpy expansion do the same to keep the
layout of the IR matching between the old and new implementations.
Differential Review: https://reviews.llvm.org/D41197
llvm-svn: 320848
Work towards the unification of MIR and debug output by printing
`%stack.0` instead of `<fi#0>`, and `%fixed-stack.0` instead of
`<fi#-4>` (supposing there are 4 fixed stack objects).
Only debug syntax is affected.
Differential Revision: https://reviews.llvm.org/D41027
llvm-svn: 320827
The following CFI directives are suported by MC but not by MIR:
* .cfi_rel_offset
* .cfi_adjust_cfa_offset
* .cfi_escape
* .cfi_remember_state
* .cfi_restore_state
* .cfi_undefined
* .cfi_register
* .cfi_window_save
Add support for printing, parsing and update tests.
Differential Revision: https://reviews.llvm.org/D41230
llvm-svn: 320819
This patch switches the default for -riscv-no-aliases to false
and updates all affected MC and CodeGen tests. As recommended in
D41071, MC tests use the canonical instructions and the CodeGen
tests use the aliases.
Additionally, for the f and d instructions with rounding mode,
the tests for the aliased versions are moved and tightened such
that they can actually detect if alias emission is enabled.
(see D40902 for context)
Differential Revision: https://reviews.llvm.org/D41225
Patch by Mario Werner.
llvm-svn: 320797
This patch adds the necessary infrastructure to convert instructions that
take two register operands to those that take a register and immediate if
the necessary operand is produced by a load-immediate. Furthermore, it uses
this infrastructure to perform such conversions twice - first at MachineSSA
and then pre-emit.
There are a number of reasons we may end up with opportunities for this
transformation, including but not limited to:
- X-Form instructions chosen since the exact offset isn't available at ISEL time
- Atomic instructions with constant operands (we will add patterns for this
in the future)
- Tail duplication may duplicate code where one block contains this redundancy
- When emitting compare-free code in PPCDAGToDAGISel, we don't handle constant
comparands specially
Furthermore, this patch moves the initialization of PPCMIPeepholePass so that
it can be used for MIR tests.
llvm-svn: 320791
A couple places didn't use the same SDValue variables to connect everything all the way through.
I don't have a test case for a bug in insert into the lower bits of a non-zero, non-undef vector. Not sure the best way to create that. We don't create the case when lowering concat_vectors which is the main way to get insert_subvectors.
llvm-svn: 320790
The compare elimination peephole introduced in https://reviews.llvm.org/rL312514
causes a miscompile in AMDGPUInstrInfo.cpp which in turn causes some AMDGPU
test case failures in stage2 bootstrap testing. This miscompile didn't cause any
test case failures until https://reviews.llvm.org/rL320614, so it appeared as if
that patch caused these failures.
Disabling this transformation for now to bring the build bots back to green and
the author of the patch will investigate the miscompile.
llvm-svn: 320786
Add support for properly handling PIC code with no-PLT. This equates to
`-fpic -fno-plt -O0` with the clang frontend. External functions are
marked with nonlazybind, which must then be indirected through the GOT.
This allows code to be built without optimizations in PIC mode without
going through the PLT. Addresses PR35653!
llvm-svn: 320776
Summary:
- lowers @llvm.global_dtors by adding @llvm.global_ctors
functions which register the destructors with `__cxa_atexit`.
- impements @llvm.global_ctors with wasm start functions and linker metadata
See [here](https://github.com/WebAssembly/tool-conventions/issues/25) for more background.
Subscribers: jfb, dschuff, mgorny, jgravelle-google, aheejin, sunfish
Differential Revision: https://reviews.llvm.org/D41211
llvm-svn: 320774
This doesn't match the semantics of the extract_vector_elt operation. Nothing downstream knows the bits were zeroed so they still get masked or sign extended after the extrat anyway.
llvm-svn: 320723
MIPSR6 introduced several new jump instructions and deprecated
the use of the 'j' instruction. For microMIPS32R6, 'j' was removed
entirely and it only has non delay slot jumps.
This patch adds support for MIPSR6 by using some R6 instructions--
'bc' instead of 'j', 'jic $reg, 0' instead of 'jalr $zero, $reg'--
and modifies the sequences not to use delay slots for R6.
Reviewers: atanasyan
Reviewed By: atanasyan
Subscribers: dschuff, arichardson, llvm-commits
Differential Revision: https://reviews.llvm.org/D40786
llvm-svn: 320703
store operation on a truncated memory (load) of vXi1 is poorly supported by LLVM and most of the time end with an assertion.
This patch fixes this issue.
Differential Revision: https://reviews.llvm.org/D39547
Change-Id: Ida5523dd09c1ad384acc0a27e9e59273d28cbdc9
llvm-svn: 320691
Work towards the unification of MIR and debug output by printing
`@foo` instead of `<ga:@foo>`.
Also print target flags in the MIR format since most of them are used on
global address operands.
Only debug syntax is affected.
llvm-svn: 320682
Recommitting rL319773, which was reverted due to a recursive issue
causing timeouts. This happened because I failed to check whether
the discovered loads could be narrowed further. In the case of a tree
with one or more narrow loads, that could not be further narrowed, as
well as a node that would need masking, an AND could be introduced
which could then be visited and recombined again with the same load.
This could again create the masking load, with would be combined
again... We now check that the load can be narrowed so that this
process stops.
Original commit message:
Search from AND nodes to find whether they can be propagated back to
loads, so that the AND and load can be combined into a narrow load.
We search through OR, XOR and other AND nodes and all bar one of the
leaves are required to be loads or constants. The exception node then
needs to be masked off meaning that the 'and' isn't removed, but the
loads(s) are narrowed still.
Differential Revision: https://reviews.llvm.org/D41177
llvm-svn: 320679