This adds a new variant of the DC system instruction for persistent
memory.
Patch by Pablo Barrio!
Differential revision: https://reviews.llvm.org/D52480
llvm-svn: 343216
This adds new system instructions which act as barriers to speculative
execution based on earlier execution within a particular execution
context.
Patch by Pablo Barrio!
Differential revision: https://reviews.llvm.org/D52479
llvm-svn: 343214
This is a new barrier which limits speculative execution of the
instructions following it.
Patch by Pablo Barrio!
Differential revision: https://reviews.llvm.org/D52477
llvm-svn: 343213
This is a new barrier which limits speculative execution of the
instructions following it.
Patch by Pablo Barrio!
Differential revision: https://reviews.llvm.org/D52476
llvm-svn: 343211
Summary: It is currently broken and for Sparc there is not much benefit
in using a builtin version compared to a library version. Both versions
needs to store the same four values in setjmp and flush the register
windows in longjmp. If the need for a builtin setjmp/longjmp arises there
is an improved implementation available at https://reviews.llvm.org/D50969.
Reviewers: jyknight, joerg, venkatra
Subscribers: fedor.sergeev, jrtc27, llvm-commits
Differential Revision: https://reviews.llvm.org/D51487
llvm-svn: 343210
These are some new variants of the "Floating-point Round to Integral"
family of instructions, which round to the nearest floating-point value
which fits in a 32- or 64-bit integer.
Patch by Pablo Barrio!
Differential revision: https://reviews.llvm.org/D52475
llvm-svn: 343209
Summary: Use 0 as the default immediate for the UNIMP instruction.
This matches the behavior in gas.
Reviewers: jyknight, venkatra
Subscribers: fedor.sergeev, jrtc27, llvm-commits
Differential Revision: https://reviews.llvm.org/D51526
llvm-svn: 343203
Summary:
Partial write %PSR (WRPSR) is a SPARC V8e option that allows WRPSR
instructions to only affect the %PSR.ET field. It is supported by
the GR740 and GR716.
Reviewers: jyknight, venkatra
Subscribers: fedor.sergeev, jrtc27, llvm-commits
Differential Revision: https://reviews.llvm.org/D48644
llvm-svn: 343202
We have an unfortunate situation in our back end where we have to keep pairs of
functions synchronized. Needless to say that this is not an ideal situation as
it is very difficult to enforce. Even without bugs, it's annoying to have to do
the same thing in two places.
This patch just refactors the code so that the two pairs of those functions that
pertain to printing register operands are unified:
- stripRegisterPrefix() - this just removes the letter prefixes from registers
for the InstrPrinter and AsmPrinter. This patch provides this as a static
member of PPCRegisterInfo
- Handling of PPCII::UseVSXReg - there are 3 places where we do something
special for instructions with that flag set. Each of those places does its
own checking of this flag and implements code customization. Any changes to
how we print/encode VSX/VMX registers require modifying all 3 places. This
patch unifies this into a static function in PPCInstrInfo that returns the
register number adjusted as needed.
Differential revision: https://reviews.llvm.org/D52467
llvm-svn: 343195
These new instructions manipluate the NZCV bits, to convert between the
regular Arm floating-point comare format and an alternative format.
Patch by Pablo Barrio!
Differential revision: https://reviews.llvm.org/D52473
llvm-svn: 343187
Debian uses different triples for MIPS r6 and paths. Here we use SubArch
to determine whether it is r6, if we found `r6' in CPU section of triple.
These new triples include:
mipsisa32r6-linux-gnu
mipsisa32r6el-linux-gnu
mipsisa64r6-linux-gnuabi64
mipsisa64r6el-linux-gnuabi64
mipsisa64r6-linux-gnuabin32
mipsisa64r6el-linux-gnuabin32
Patch by YunQiang Su.
Differential revision: https://reviews.llvm.org/D50857
llvm-svn: 343185
Summary:
The OneUseDominatesOtherUses in the WebAssemblyRegStackify not properly validates register use using hasOneUse. Since we added/modified DBG_VALUE the assert started catching valid cases.
See also https://reviews.llvm.org/D49034#1247200
Fix verified by running the wasm waterfall.
Reviewed By: dschuff
Tags: #debug-info
Differential Revision: https://reviews.llvm.org/D49034
llvm-svn: 343154
Summary:
This is essentially NFC, because the complex pattern used for these patterns
will fail on non-CI, but this makes the pattern consistent with other CI
smrd patterns. It is also a performance improvement, because the pattern
will now fail earlier on non-CI.
Reviewers: arsenm, nhaehnle
Reviewed By: arsenm
Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits
Differential Revision: https://reviews.llvm.org/D52469
llvm-svn: 343125
The Armv8.3-A reference manual defines floating-point data-processing
instructions with one source operand to have an opcode of 6 bits
[20:15]. The current class in tablegen, BaseSingleOperandFPData, only
allows [18:15]. This was ok because [20:19] could only be '00', with
other encodings unallocated. Armv8.5-A brings in the FRINT group of
instructions which use other values for these bits.
This patch refactors the existing class a bit to allow using the full 6
bits of the opcode, as defined in the Arm ARM.
Patch by Pablo Barrio!
Differential revision: https://reviews.llvm.org/D52474
llvm-svn: 343120
Reuse some code in preparation for the v8.5A XAFlag/AXFlag instructions,
which shares part of the encoding of the MSR-immediate.
Patch by Pablo Barrio!
Differential revision: https://reviews.llvm.org/D52472
llvm-svn: 343113
Parsing of the system instructions (IC, DC, AT and TLBI) uses this
function to show the required architecture when the operand is valid,
but the architecture is not enabled. Armv8.5A adds a few different
system instructions as part of optional features, so we need to extend
it to show individual features, not just base architectures.
This is NFC for now, but will be used by three different features added
in v8.5A, and will be tested by them.
Patch by David Spickett!
Differential revision: https://reviews.llvm.org/D52478
llvm-svn: 343109
This caused the DebugInfo/Sparc/gnu-window-save.ll test to fail.
> Functions that have signed return addresses need additional dwarf support:
> - After signing the LR, and before authenticating it, the LR register is in a
> state the is unusable by a debugger or unwinder
> - To account for this a new directive, .cfi_negate_ra_state, is added
> - This directive says the signed state of the LR register has now changed,
> i.e. unsigned -> signed or signed -> unsigned
> - This directive has the same CFA code as the SPARC directive GNU_window_save
> (0x2d), adding a macro to account for multiply defined codes
> - This patch matches the gcc implementation of this support:
> https://patchwork.ozlabs.org/patch/800271/
>
> Differential Revision: https://reviews.llvm.org/D50136
llvm-svn: 343103
This patch allows targeting Armv8.5-A, adding the architecture to
tablegen and setting the options to be identical to Armv8.4-A for the
time being. Subsequent patches will add support for the different
features included in the Armv8.5-A Reference Manual.
Patch by Pablo Barrio!
Differential revision: https://reviews.llvm.org/D52470
llvm-svn: 343102
This patch adds a check to optimize conditional branch (BC and BCn) based on a constant set by CRSET or CRUNSET.
Other optimizers, such as block placement, may generate such code and hence
I do this at the very end of the optimization in pre-emit peephole pass.
A conditional branch based on a constant is eliminated or converted into unconditional branch.
Also CRSET/CRUNSET is eliminated if the condition code register is not used
by instruction other than the branch to be optimized.
Differential Revision: https://reviews.llvm.org/D52345
llvm-svn: 343100
Similar to the existing ISD::SRL constant vector shifts from D49562, this patch adds ISD::SRA support with ISD::MULHS.
As we're dealing with signed values, we have to handle shift by zero and shift by one special cases, so XOP+AVX2/AVX512 splitting/extension is still a better solution - really we should still use ISD::MULHS if one of the special cases are used but for now I've just left a TODO and filtered by isKnownNeverZero.
Differential Revision: https://reviews.llvm.org/D52171
llvm-svn: 343093
When calculating whether a value can safely overflow for use by an
icmp, we weren't checking that the value couldn't wrap around. To do
this we need the icmp to be using a constant, as well as the incoming
add or sub.
bugzilla report: https://bugs.llvm.org/show_bug.cgi?id=39060
Differential Revision: https://reviews.llvm.org/D52463
llvm-svn: 343092
Functions that have signed return addresses need additional dwarf support:
- After signing the LR, and before authenticating it, the LR register is in a
state the is unusable by a debugger or unwinder
- To account for this a new directive, .cfi_negate_ra_state, is added
- This directive says the signed state of the LR register has now changed,
i.e. unsigned -> signed or signed -> unsigned
- This directive has the same CFA code as the SPARC directive GNU_window_save
(0x2d), adding a macro to account for multiply defined codes
- This patch matches the gcc implementation of this support:
https://patchwork.ozlabs.org/patch/800271/
Differential Revision: https://reviews.llvm.org/D50136
llvm-svn: 343089
This broke Chromium's Android build (https://crbug.com/889390) and the
polly-aosp buildbot
(http://lab.llvm.org:8011/builders/aosp-O3-polly-before-vectorizer-unprofitable).
> Originally committed in rL342210 but was reverted in rL342260 because
> it was causing issues in vectorized code, because I had forgotten to
> ensure that we're operating on scalar values.
>
> Original commit message:
>
> On failing to find sequences that can be converted into dual macs,
> try to find sequential 16-bit loads that are used by muls which we
> can then use smultb, smulbt, smultt with a wide load.
>
> Differential Revision: https://reviews.llvm.org/D51983
llvm-svn: 343082
Summary:
Lowers (s|u)itofp and fpto(s|u)i instructions for vectors. The fp to
int conversions produce poison values if their arguments are out of
the convertible range, so a future CL will have to add an LLVM
intrinsic to make the saturating behavior of this conversion usable.
Reviewers: aheejin, dschuff
Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits
Differential Revision: https://reviews.llvm.org/D52372
llvm-svn: 343052
This removes an int->fp bitcast between the surrounding code and the movmsk. I had already added a hack to combineMOVMSK to try to look through this bitcast to improve the SimplifyDemandedBits there.
But I found an additional issue where the bitcast was preventing combineMOVMSK from being called again after earlier nodes in the DAG are optimized. The bitcast gets revisted, but not the user of the bitcast. By using integer types throughout, the bitcast doesn't get in the way.
llvm-svn: 343046
Summary:
We generate s_xor to lower add of i1s in general cases, and s_not to
lower add with a one-bit imm of -1 (true).
Reviewers:
rampitec
Differential Revision:
https://reviews.llvm.org/D52518
llvm-svn: 343030
This is the final (I hope!) problem pattern mentioned in PR37749:
https://bugs.llvm.org/show_bug.cgi?id=37749
We are trying to avoid an AVX1 sinkhole caused by having 256-bit bitwise logic ops but no other 256-bit integer ops.
We've already solved the simple logic ops, but 'andn' is an x86 special. I looked at alternative solutions like
extending the generic DAG combine or trying to wait until the ANDNP node is created, but those are bigger patches
that can over-reach. Ie, splitting to 128-bit does not look like a win in most cases with >1 256-bit op.
The pattern matching is cluttered with bitcasts because of our i64 element canonicalization. For the affected test,
we have this vector-type-legalized sequence:
t29: v8i32 = concat_vectors t27, t28
t30: v4i64 = bitcast t29
t18: v8i32 = BUILD_VECTOR Constant:i32<-1>, Constant:i32<-1>, ...
t31: v4i64 = bitcast t18
t32: v4i64 = xor t30, t31
t9: v8i32 = BUILD_VECTOR Constant:i32<255>, Constant:i32<255>, ...
t34: v4i64 = bitcast t9
t35: v4i64 = and t32, t34
t36: v8i32 = bitcast t35
t37: v4i32 = extract_subvector t36, Constant:i64<0>
t38: v4i32 = extract_subvector t36, Constant:i64<4>
Differential Revision: https://reviews.llvm.org/D52318
llvm-svn: 343008
Share predecessor search bookkeeping in both perform PostLD1Combine
and performNEONPostLDSTCombine. This should be approximately a 4x and
2x performance improvement.
llvm-svn: 342986
As suggested by Craig Topper - I'm going to look at cleaning up the RMW sequences instead.
The uops are slightly different to the register variant, so requires a +1uop tweak
llvm-svn: 342969
[AMDGPU] lower-switch in preISel as a workaround for legacy DA
Summary:
The default target of the switch instruction may sometimes be an
"unreachable" block, when it is guaranteed that one of the cases is
always taken. The dominator tree concludes that such a switch
instruction does not have an immediate post dominator. This confuses
divergence analysis, which is unable to propagate sync dependence to
the targets of the switch instruction.
As a workaround, the AMDGPU target now invokes lower-switch as a
preISel pass. LowerSwitch is designed to handle the unreachable
default target correctly, allowing the divergence analysis to locate
the correct immediate dominator of the now-lowered switch.
llvm-svn: 342956
Added
__builtin_vsx_scalar_extract_expq
__builtin_vsx_scalar_insert_exp_qp
Builtins should behave the same way as in GCC.
Differential Revision: https://reviews.llvm.org/D48185
llvm-svn: 342910
We're missing quite a bit of data for these instruction, removing the overrides makes this obvious - inconsistent reg/mem variants is a concern as well.
Also, we have Divider resources (HWDivider etc.) but they aren't actually used consistently.
llvm-svn: 342904
A simple MOVS rd, imm8 can materialize [-128, 127] in signed i8 type or
[0, 255] in unsigned i8 type on Thumb1.
Differential Revision: https://reviews.llvm.org/D52257
llvm-svn: 342898
Split WriteIMul by size and also by IMUL multiply-by-imm and multiply-by-reg cases.
This removes all the scheduler overrides for gpr multiplies and stops WriteMULH being ignored for BMI2 MULX instructions.
llvm-svn: 342892
- The assembler accepts VSTM/VLDM with register lists (specifically double registers lists) with more than 16 registers specified
- The Arm architecture reference manual says this instruction must not contain more than 16 registers when the registers are doubleword registers
- This addresses one of the concerns in https://bugs.llvm.org/show_bug.cgi?id=38389
Differential Revision: https://reviews.llvm.org/D52082
llvm-svn: 342891
The r337288 tried to fix result of icmp i1 when its input is not sanitized
by falling back to DagISel. While it now produces the correct result for
bit 0, the other bits can still hold arbitrary value which is not supported
by MipsFastISel branch lowering. This patch fixes the issue by falling back
to DagISel in this case.
Patch by Dragan Mladjenovic.
Differential Revision: https://reviews.llvm.org/D52045
llvm-svn: 342884
gcc uses operand modifier 'x' in inline asm for VSX registers.
Without this modifier, instructions which use VSX numbering for their
operands are printed as VMX registers. This patch adds support for the
operand modifier 'x'.
Differential Revision: https://reviews.llvm.org/D52244
llvm-svn: 342882
If the alignment is at least 4, this should report true.
Something still seems off with how < 4-byte types are
handled here though.
Fixing this seems to change how some combines get
to where they get, but somehow isn't changing the net
result.
llvm-svn: 342879
A sequence of VMUL and VADD instructions always give the same or better
performance than a fused VMLA instruction on the Cortex-M4 and Cortex-M33.
Executing the VMUL and VADD back-to-back requires the same cycles, but
having separate instructions allows scheduling to avoid the hazard between
these 2 instructions.
Differential Revision: https://reviews.llvm.org/D52289
llvm-svn: 342874
This caused miscompilation of WebRTC for Android: PR39060.
> We've had the pass enabled downstream for a couple of weeks and it
> seems to be okay, so enable it by default.
>
> Differential Revision: https://reviews.llvm.org/D51920
llvm-svn: 342873
- The load store optimizer is currently merging multiple loads/stores into VLDM/VSTM with more than 16 doubleword registers
- This is an UNPREDICTABLE instruction and shouldn't be done
- It looks like the Limit for how many registers included in a merge got dropped at some point so I am reintroducing it in this patch
- This fixes https://bugs.llvm.org/show_bug.cgi?id=38389
Differential Revision: https://reviews.llvm.org/D52085
llvm-svn: 342872
Originally committed in rL342210 but was reverted in rL342260 because
it was causing issues in vectorized code, because I had forgotten to
ensure that we're operating on scalar values.
Original commit message:
On failing to find sequences that can be converted into dual macs,
try to find sequential 16-bit loads that are used by muls which we
can then use smultb, smulbt, smultt with a wide load.
Differential Revision: https://reviews.llvm.org/D51983
llvm-svn: 342870
Variable Shifts/Rotates using the CL register have different behaviours to the immediate instructions - split accordingly to help remove yet more repeated overrides from the schedule models.
llvm-svn: 342852
Confirmed with Craig Topper - fix a typo that was missing a Port4 uop for ROR*mCL instructions on some Intel models.
Yet another step on the scheduler model cleanup marathon......
llvm-svn: 342846
This is an alternative to https://reviews.llvm.org/D37896. We can't decompose
multiplies generically without a target hook to tell us when it's profitable.
ARM and AArch64 may be able to remove some existing code that overlaps with
this transform.
This extends D52195 and may resolve PR34474:
https://bugs.llvm.org/show_bug.cgi?id=34474
(still an open question about transforming legal vector multiplies, but we
could open another bug report for those)
llvm-svn: 342844
The SandyBridge model was missing schedule values for the RCL/RCR values - instead using the (incredibly optimistic) WriteShift (now WriteRotate) defaults.
I've added overrides with more realistic (slow) values, based on a mixture of Agner/instlatx64 numbers and what later Intel models do as well.
This is necessary to allow WriteRotate to be updated to remove other rotate overrides.
It'd probably be a good idea to investigate a WriteRotateCarry class at some point but its not high priority given the unusualness of these instructions.
llvm-svn: 342842
Despite being rotates, these more modern instructions avoid many of the quirks of the regular x86 rotate instructions and consistently have a schedule closer to shifts.
llvm-svn: 342839
NFCI for now, but it should make it easier to remove a lot of unnecessary overrides in a future commit.
Now that funnel shift intrinsics are coming online we need to get this cleaned up to make vectorization costs from scalar rotate patterns more straightforward.
llvm-svn: 342837
Our lowering that tries to avoid this sign extend can be defeated by the DAG combine folding it with a truncate.
The pattern needs to extend to an v8i32 then truncate back down to v8i16.
llvm-svn: 342830
Summary:
Specifying X[8-15,18] registers as callee-saved is used to support
CONFIG_ARM64_LSE_ATOMICS in Linux kernel. As part of this patch we:
- use custom CSR list/mask when user specifies custom CSRs
- update Machine Register Info's list of CSRs with additional custom CSRs in
LowerCall and LowerFormalArguments.
Reviewers: srhines, nickdesaulniers, efriedma, javed.absar
Reviewed By: nickdesaulniers
Subscribers: kristof.beyls, jfb, llvm-commits
Differential Revision: https://reviews.llvm.org/D52216
llvm-svn: 342824
Summary: Similar to D51893 which was for memcpy
Reviewers: efriedma
Reviewed By: efriedma
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D52063
llvm-svn: 342796
We don't have a vXi8 shift left so we need to bitcast to a vXi16 vector to perform the shift. If we let lowering legalize the vXi8 shift we get an extra and that we don't need and fail to remove.
llvm-svn: 342795
Previously we used SUBREG_TO_REG+MOV32ri. But regular isel was changed recently to use the MOV32ri64 pseudo. Fast isel now does the same.
llvm-svn: 342788
Summary:
By using the existing isCodeGenOnly bit in the tablegen defs, as
suggested by tlively in https://reviews.llvm.org/D51662
Tested: llvm-lit -v `find test -name WebAssembly`
Reviewers: tlively
Subscribers: dschuff, sbc100, jgravelle-google, aheejin, sunfish, llvm-commits
Differential Revision: https://reviews.llvm.org/D52373
llvm-svn: 342772
This patch introduces a SchedWriteVariant to describe zero-idiom VXORP(S|D)Yrr
and VANDNP(S|D)Yrr.
This is a follow-up of r342555.
On Jaguar, a VXORPSYrr is 2 macro opcodes. Only one opcode is eliminated at
register-renaming stage. The other opcode has to be executed to set the upper
half of the destination YMM.
Same for VANDNP(S|D)Yrr.
Differential Revision: https://reviews.llvm.org/D52347
llvm-svn: 342728
Summary:
The default target of the switch instruction may sometimes be an
"unreachable" block, when it is guaranteed that one of the cases is
always taken. The dominator tree concludes that such a switch
instruction does not have an immediate post dominator. This confuses
divergence analysis, which is unable to propagate sync dependence to
the targets of the switch instruction.
As a workaround, the AMDGPU target now invokes lower-switch as a
preISel pass. LowerSwitch is designed to handle the unreachable
default target correctly, allowing the divergence analysis to locate
the correct immediate dominator of the now-lowered switch.
Reviewers: arsenm, nhaehnle
Reviewed By: nhaehnle
Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits, simoll
Differential Revision: https://reviews.llvm.org/D52221
llvm-svn: 342722
Summary: This change is the first part of the AMDGPU target description
change. The aim of it is the effective splitting the vector and scalar
flows at the selection stage. Selection uses predicate functions based
on the framework implemented earlier - https://reviews.llvm.org/D35267
Differential revision: https://reviews.llvm.org/D52019
Reviewers: rampitec
llvm-svn: 342719
Currently, BPF has XADD (locked add) insn support and the
asm looks like:
lock *(u32 *)(r1 + 0) += r2
lock *(u64 *)(r1 + 0) += r2
The instruction itself does not have a return value.
At the source code level, users often use
__sync_fetch_and_add()
which eventually translates to XADD. The return value of
__sync_fetch_and_add() is supposed to be the old value
in the xadd memory location. Since BPF::XADD insn does not
support such a return value, this patch added a PreEmit
phase to check such a usage. If such an illegal usage
pattern is detected, a fatal error will be reported like
line 4: Invalid usage of the XADD return value
if compiled with -g, or
Invalid usage of the XADD return value
if compiled without -g.
Signed-off-by: Yonghong Song <yhs@fb.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
llvm-svn: 342692
Summary: Adds the necessary support to lib/ObjectYAML and fixes SIMD
calls to allow the tests to work. Also removes some dead code that
would otherwise have to have been updated.
Reviewers: aheejin, dschuff, sbc100
Subscribers: jgravelle-google, sunfish, llvm-commits
Differential Revision: https://reviews.llvm.org/D52105
llvm-svn: 342689
x86 had 2 versions of peekThroughBitcast. DAGCombiner had 1. Plus, it had a 1-off implementation for the one-use variant.
Move the x86 versions of the code to SelectionDAG, so we don't have different copies of the code.
No functional change intended.
I'm putting this next to isBitwiseNot() because I am planning to use it in there. Another option is next to the
helpers in the ISD namespace (eg, ISD::isConstantSplatVector()). But if there's no good reason for those to be
there, I'd prefer to pull other helpers over to SelectionDAG in follow-up steps.
Differential Revision: https://reviews.llvm.org/D52285
llvm-svn: 342669
This is a trivial refactoring that I'm committing now as it makes a patch I'm
about to post for review easier to follow. There is some overlap between
evaluateConstantImm and addExpr in RISCVAsmParser. This patch allows
evaluateConstantImm to be reused from addExpr to remove this overlap. The
benefit will be greater when a future patch adds extra code to allows
immediates to be evaluated from constant symbols (e.g. `.equ CONST, 0x1234`).
No functional change intended.
llvm-svn: 342641
Examples such as `jal a3`, `j a3` and `jal a3, a3` are accepted by gas
but rejected by LLVM MC. This patch rectifies this. I introduce
RISCVAsmParser::parseJALOffset to ensure that symbol names that coincide with
register names can safely be parsed. This is made a somewhat fiddly due to the
single-operand alias form (see the comment in parseJALOffset for more info).
Differential Revision: https://reviews.llvm.org/D52029
llvm-svn: 342629
Building a vector out of multiple loads can be converted to a load of the vector type if the loads are consecutive.
But the special condition is that the element number is 1, such as <1 x i128>. So just early exit to fix the assert.
Patch By: wuzish (Zixuan Wu)
Differential Revision: https://reviews.llvm.org/D52072
llvm-svn: 342611
Summary:
This change leaves holes in the opcode space where missing
instructions could logically be added later if they were found to be
useful.
Reviewers: aheejin, dschuff
Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits
Differential Revision: https://reviews.llvm.org/D52282
llvm-svn: 342610
Enable enableMultipleCopyHints() on X86.
Original Patch by @jonpa:
While enabling the mischeduler for SystemZ, it was discovered that for some reason a test needed one extra seemingly needless COPY (test/CodeGen/SystemZ/call-03.ll). The handling for that is resulted in this patch, which improves the register coalescing by providing not just one copy hint, but a sorted list of copy hints. On SystemZ, this gives ~12500 less register moves on SPEC, as well as marginally less spilling.
Instead of improving just the SystemZ backend, the improvement has been implemented in common-code (calculateSpillWeightAndHint(). This gives a lot of test failures, but since this should be a general improvement I hope that the involved targets will help and review the test updates.
Differential Revision: https://reviews.llvm.org/D38128
llvm-svn: 342578
As the code comments suggest, these are about splitting, and they
are not necessarily limited to lowering, so that misled me.
There's nothing that's actually x86-specific in these either, so
they might be better placed in a common header so any target can
use them.
llvm-svn: 342575
The patch extends size reduction pass for MicroMIPS. Two MOVE
instructions are transformed into one MOVEP instrucition.
Patch by Milena Vujosevic Janicic.
Differential revision: https://reviews.llvm.org/D52037
llvm-svn: 342572
The patch fixes definition of MOVEP instruction. Two registers are used
instead of register pairs. This is necessary as machine verifier cannot
handle register pairs.
Patch by Milena Vujosevic Janicic.
Differential revision: https://reviews.llvm.org/D52035
llvm-svn: 342571
This patch adds an initial x86 SimplifyDemandedVectorEltsForTargetNode implementation to handle target shuffles.
Currently the patch only decodes a target shuffle, calls SimplifyDemandedVectorElts on its input operands and removes any shuffle that reduces to undef/zero/identity.
Future work will need to integrate this with combineX86ShufflesRecursively, add support for other x86 ops, etc.
NOTE: There is a minor regression that appears to be affecting further (extractelement?) combines which I haven't been able to solve yet - possibly something to do with how nodes are added to the worklist after simplification.
Differential Revision: https://reviews.llvm.org/D52140
llvm-svn: 342564
Summary:
This is required for GPUs with 16 bit instructions where f16 is a
legal register type and hence int_to_fp i1 to f16 is not lowered
by legalizing.
Reviewers: arsenm, nhaehnle
Reviewed By: nhaehnle
Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits
Differential Revision: https://reviews.llvm.org/D52018
Change-Id: Ie4c0fd6ced7cf10ad612023c6879724d9ded5851
llvm-svn: 342558
Clang-compiled object files currently don't include the symbol sizes and
types. Some tools however need that information. For example, ctfconvert
uses that information to generate FreeBSD's CTF representation from ELF
files.
With this patch, symbol sizes and types are included in object files.
Signed-off-by: Paul Chaignon <paul.chaignon@orange.com>
Reported-by: Yutaro Hayakawa <yhayakawa3720@gmail.com>
llvm-svn: 342556
This patch adds the ability for processor models to describe dependency breaking
instructions.
Different processors may specify a different set of dependency-breaking
instructions.
That means, we cannot assume that all processors of the same target would use
the same rules to classify dependency breaking instructions.
The main goal of this patch is to provide the means to describe dependency
breaking instructions directly via tablegen, and have the following
TargetSubtargetInfo hooks redefined in overrides by tabegen'd
XXXGenSubtargetInfo classes (here, XXX is a Target name).
```
virtual bool isZeroIdiom(const MachineInstr *MI, APInt &Mask) const {
return false;
}
virtual bool isDependencyBreaking(const MachineInstr *MI, APInt &Mask) const {
return isZeroIdiom(MI);
}
```
An instruction MI is a dependency-breaking instruction if a call to method
isDependencyBreaking(MI) on the STI (TargetSubtargetInfo object) evaluates to
true. Similarly, an instruction MI is a special case of zero-idiom dependency
breaking instruction if a call to STI.isZeroIdiom(MI) returns true.
The extra APInt is used for those targets that may want to select which machine
operands have their dependency broken (see comments in code).
Note that by default, subtargets don't know about the existence of
dependency-breaking. In the absence of external information, those method calls
would always return false.
A new tablegen class named STIPredicate has been added by this patch to let
processor models classify instructions that have properties in common. The idea
is that, a MCInstrPredicate definition can be used to "generate" an instruction
equivalence class, with the idea that instructions of a same class all have a
property in common.
STIPredicate definitions are essentially a collection of instruction equivalence
classes.
Also, different processor models can specify a different variant of the same
STIPredicate with different rules (i.e. predicates) to classify instructions.
Tablegen backends (in this particular case, the SubtargetEmitter) will be able
to process STIPredicate definitions, and automatically generate functions in
XXXGenSubtargetInfo.
This patch introduces two special kind of STIPredicate classes named
IsZeroIdiomFunction and IsDepBreakingFunction in tablegen. It also adds a
definition for those in the BtVer2 scheduling model only.
This patch supersedes the one committed at r338372 (phabricator review: D49310).
The main advantages are:
- We can describe subtarget predicates via tablegen using STIPredicates.
- We can describe zero-idioms / dep-breaking instructions directly via
tablegen in the scheduling models.
In future, the STIPredicates framework can be used for solving other problems.
Examples of future developments are:
- Teach how to identify optimizable register-register moves
- Teach how to identify slow LEA instructions (each subtarget defining its own
concept of "slow" LEA).
- Teach how to identify instructions that have undocumented false dependencies
on the output registers on some processors only.
It is also (in my opinion) an elegant way to expose knowledge to both external
tools like llvm-mca, and codegen passes.
For example, machine schedulers in LLVM could reuse that information when
internally constructing the data dependency graph for a code region.
This new design feature is also an "opt-in" feature. Processor models don't have
to use the new STIPredicates. It has all been designed to be as unintrusive as
possible.
Differential Revision: https://reviews.llvm.org/D52174
llvm-svn: 342555
This is an alternative to D37896. I don't see a way to decompose multiplies
generically without a target hook to tell us when it's profitable.
ARM and AArch64 may be able to remove some duplicate code that overlaps with
this transform.
As a first step, we're only getting the most clear wins on the vector examples
requested in PR34474:
https://bugs.llvm.org/show_bug.cgi?id=34474
As noted in the code comment, it's likely that the x86 constraints are tighter
than necessary, but it may not always be a win to replace a pmullw/pmulld.
Differential Revision: https://reviews.llvm.org/D52195
llvm-svn: 342554
This involves changing the shouldExpandAtomicCmpXchgInIR interface, but I have
updated the in-tree backends using this hook (ARM, AArch64, Hexagon) so they
will see no functional change. Previously this hook returned bool, but it now
returns AtomicExpansionKind.
This hook allows targets to select how a given cmpxchg is to be expanded.
D48131 uses this to expand part-word cmpxchg to a target-specific intrinsic.
See my associated RFC for more info on the motivation for this change
<http://lists.llvm.org/pipermail/llvm-dev/2018-June/123993.html>.
Differential Revision: https://reviews.llvm.org/D48130
llvm-svn: 342550
Fixes the unwind information generated for floating-point registers.
Previously, all padding registers were assumed to be four bytes wide. Now, the
width of the register is used to specify the amount of padding.
Patch by Jackson Woodruff!
Differential revision: https://reviews.llvm.org/D51494
llvm-svn: 342545
Introduce a new RISCVExpandPseudoInsts pass to expand atomic
pseudo-instructions after register allocation. This is necessary in order to
ensure that register spills aren't introduced between LL and SC, thus breaking
the forward progress guarantee for the operation. AArch64 does something
similar for CmpXchg (though only at O0), and Mips is moving towards this
approach (see D31287). See also [this mailing list
post](http://lists.llvm.org/pipermail/llvm-dev/2016-May/099490.html) from
James Knight, which summarises the issues with lowering to ll/sc in IR or
pre-RA.
See the [accompanying RFC
thread](http://lists.llvm.org/pipermail/llvm-dev/2018-June/123993.html) for an
overview of the lowering strategy.
Differential Revision: https://reviews.llvm.org/D47882
llvm-svn: 342534
The 0x800 bit in @feat.00 needs to be set in order to make LLD pick up
the .gfid$y table. I believe this is fine to set even if we don't emit
the instrumentation.
We haven't emitted @feat.00 on 64-bit before. I see that MSVC does emit
it, but I'm not entirely sure what the default value should be. I went
with zero since that seems as safe as not emitting the symbol in the
first place.
Differential Revision: https://reviews.llvm.org/D52235
llvm-svn: 342532
- Instead of having both `SUnit::dump(ScheduleDAG*)` and
`ScheduleDAG::dumpNode(ScheduleDAG*)`, just keep the latter around.
- Add `ScheduleDAG::dump()` and avoid code duplication in several
places. Implement it for different ScheduleDAG variants.
- Add `ScheduleDAG::dumpNodeName()` in favor of the `SUnit::print()`
functions. They were only ever used for debug dumping and putting the
function into ScheduleDAG is consistent with the `dumpNode()` change.
llvm-svn: 342520
This allows the hard-coded shouldForceImmediate logic to be removed because
the generated MatchOperandParserImpl makes use of the current context (i.e.
the current mnemonic) to determine parsing behaviour, and so won't first try
to parse a register before parsing a symbol name.
No functional change is intended. gas accepts immediate arguments for call,
tail and lla. This patch doesn't address this discrepancy.
Differential Revision: https://reviews.llvm.org/D51733
llvm-svn: 342488
addi a0, a0, foo and lw a0, foo(a0) and similar are now rejected. An explicit
%lo and %pcrel_lo modifier is required. This matches gas behaviour.
llvm-svn: 342487
Reject bare symbols and accept only %pcrel_hi(sym) for auipc and %hi(sym) for
lui. Also test valid operand modifiers in rv32i-valid.s.
Note this is slightly stricter than gas, which will accept either %pcrel_hi or
%hi for both lui and auipc.
Differential Revision: https://reviews.llvm.org/D51731
llvm-svn: 342486
This is a follow-up to the previous patch that eliminated some of the rotates.
With this addition, we will also emit the record-form andis.
This patch increases the number of record-form rotates we eliminate by
more than 70%.
Differential revision: https://reviews.llvm.org/D44897
llvm-svn: 342478
Both ANDIo and ANDISo (and the 64-bit versions) are record-form instructions.
When optimizing compares, we handle the former in order to eliminate the compare
instruction but not the latter. This patch just adds the latter to the set of
instructions we optimize.
The reason these instructions need to be handled separately is that they are not
part of the RecFormRel map (since they don't have a non-record-form). The
missing "and-immediate-shifted" is just an oversight in the initial
implementation.
Differential revision: https://reviews.llvm.org/D51353
llvm-svn: 342472
This tries to make use of evaluateAsRelocatable in AArch64AsmParser::classifySymbolRef
to parse more complex expressions as relocatable operands. It is hopefully better than
the existing code which only handles Symbol +- Constant.
This allows us to parse more complex adr/adrp, mov, ldr/str and add operands. It also
loosens the requirements on parsing addends in ld/st and mov's and adds a number of
tests.
Differential Revision: https://reviews.llvm.org/D51792
llvm-svn: 342455
When doing some instruction scheduling work, we noticed some missing itineraries.
Before we switch to machine scheduler, those missing itineraries might not have impact to actually scheduling,
because we can still get same latency due to default values.
With machine scheduler, however, itineraries will have impact to scheduling.
eg: NumMicroOps will default to be 0 if there is NO itineraries for specific instruction class.
And most of the instruction class with itineraries will have NumMicroOps default to 1.
This will has impact on the count of RetiredMOps, affects the Pending/Available Queue,
then causing different scheduling or suboptimal scheduling further.
Patch By: jsji (Jinsong Ji)
Differential Revision: https://reviews.llvm.org/D52040
llvm-svn: 342441
This reverts r342395 as it caused error
> Argument value type does not match pointer operand type!
> %0 = atomicrmw volatile xchg i8* %_Value1, i32 1 monotonic, !dbg !25
> i8in function atomic_flag_test_and_set
> fatal error: error in backend: Broken function found, compilation aborted!
on bot http://green.lab.llvm.org/green/job/clang-stage1-configure-RA/
More details are available at https://reviews.llvm.org/D52080
llvm-svn: 342431
Add support mips64(el)-linux-gnuabin32 triples, and set them to N32.
Debian architecture name mipsn32/mipsn32el are also added. Set
UseIntegratedAssembler for N32 if we can detect it.
Patch by YunQiang Su.
Differential revision: https://reviews.llvm.org/D51408
llvm-svn: 342416
Summary:
The IR reference for the `byval` attribute states:
```
This indicates that the pointer parameter should really be passed by value
to the function. The attribute implies that a hidden copy of the pointee is
made between the caller and the callee, so the callee is unable to modify
the value in the caller. This attribute is only valid on LLVM pointer arguments.
```
However, on Win64, this attribute is unimplemented and the raw pointer is
passed to the callee instead. This is problematic, because frontend authors
relying on the implicit hidden copy (as happens for every other calling
convention) will see the passed value silently (if mutable memory) or
loudly (by means of a crash) modified because the callee treats the
location as scratch memory space it is allowed to mutate.
At this point, it's worth taking a step back to understand the context.
In most calling conventions, aggregates that are too large to be passed
in registers, instead get *copied* to the stack at a fixed (computable
from the signature) offset of the stack pointer. At the LLVM, we hide
this hidden copy behind the byval attribute. The caller passes a pointer
to the desired data and the callee receives a pointer, but these pointers
are not the same. In particular, the pointer that the callee receives
points to temporary stack memory allocated as part of the call lowering.
In most calling conventions, this pointer is never realized in registers
or memory. The temporary memory is simply defined by an implicit
offset from the stack pointer at function entry.
Win64, uniquely, works differently. The structure is still passed in
memory, but instead of being stored at an implicit memory offset, the
caller computes a pointer to the temporary memory and passes it to
the callee as a regular pointer (taking up a register, or if all
registers are taken up, an additional stack slot). Presumably, this
was done to allow eliding the copy when passing aggregates through
several functions on the stack.
This explains why ignoring the `byval` attribute mostly works on Win64.
The argument simply gets passed as a pointer and as long as we're ok
with the callee trampling all over that memory, there are no ill effects.
However, it does contradict the documentation of the `byval` attribute
which specifies that there is to be an implicit copy.
Frontends can of course work around this by never emitting the `byval`
attribute for Win64 and creating `alloca`s for the requisite temporary
stack slots (and that does appear to be what frontends are doing).
However, the presence of the `byval` attribute is not a trap for
frontend authors, since it seems to work, but silently modifies the
passed memory contrary to documentation.
I see two solutions:
- Disallow the `byval` attribute in the verifier if using the Win64
calling convention.
- Make it work by simply emitting a temporary stack copy as we would
with any other calling convention (frontends can of course always
not use the attribute if they want to elide the copy).
This patch implements the second option (make it work), though I would
be fine with the first also.
Ref: https://github.com/JuliaLang/julia/issues/28338
Reviewers: rnk
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D51842
llvm-svn: 342402
isSupportedValue explicitly checked and accepted many types of value,
primarily for debugging reasons. Remove most of these checks and do a
bit of refactoring now that the pass is more stable. This also enables
ZExts to be sources, but this has very little practical benefit at the
moment extend instructions will still be introduced.
Differential Revision: https://reviews.llvm.org/D52080
llvm-svn: 342395
We allow overflowing instructions if they're decreasing and only used
by an unsigned compare. Add the extra condition that the icmp cannot
be using a negative immediate.
Differential Revision: https://reviews.llvm.org/D52102
llvm-svn: 342392
For constant non-uniform cases we'll never introduce more and/andn/or selects than already occur in generic pre-SSE41 ISD::SRL lowering.
llvm-svn: 342352
https://bugs.llvm.org/show_bug.cgi?id=38949
It's not clear to me that we even need a one-use check in this fold.
Ie, 2 independent loads might be better than a load+dependent shuffle.
Note that the existing re-use tests are not affected. We actually do form a
broadcast node in those tests now because there's no extra use of the
insert_subvector node in those cases. But something later in isel pattern
matching decides that it is not worth using a broadcast for the full load in
those tests:
Legalized selection DAG: %bb.0 'test_broadcast_2f64_4f64_reuse:'
t7: v2f64,ch = load<(load 16 from %ir.p0)> t0, t2, undef:i64
t4: i64,ch = CopyFromReg t0, Register:i64 %1
t10: ch = store<(store 16 into %ir.p1)> t7:1, t7, t4, undef:i64
t18: v4f64 = insert_subvector undef:v4f64, t7, Constant:i64<0>
t20: v4f64 = insert_subvector t18, t7, Constant:i64<2>
Becomes:
t7: v2f64,ch = load<(load 16 from %ir.p0)> t0, t2, undef:i64
t4: i64,ch = CopyFromReg t0, Register:i64 %1
t10: ch = store<(store 16 into %ir.p1)> t7:1, t7, t4, undef:i64
t21: v4f64 = X86ISD::SUBV_BROADCAST t7
ISEL: Starting selection on root node: t21: v4f64 = X86ISD::SUBV_BROADCAST t7
...
Created node: t27: v4f64 = INSERT_SUBREG IMPLICIT_DEF:v4f64, t7, TargetConstant:i32<7>
Morphed node: t21: v4f64 = VINSERTF128rr t27, t7, TargetConstant:i8<1>
llvm-svn: 342347
Summary: This unfortunately adds a move, but isn't that better than going to the int domain and back?
Reviewers: RKSimon
Reviewed By: RKSimon
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D52134
llvm-svn: 342327
Summary:
MOVMSK only care about the sign bit so we don't need the setcc to fill the whole element with 0s/1s. We can just shift the bit we're looking for into the sign bit. This saves a constant pool load.
Inspired by PR38840.
Reviewers: RKSimon, spatel
Reviewed By: RKSimon
Subscribers: lebedev.ri, llvm-commits
Differential Revision: https://reviews.llvm.org/D52121
llvm-svn: 342326
Summary:
Implement shifts of vectors by i32. Since LLVM defines shifts as
binary operations between two vectors, this involves pattern matching
on splatted shift operands. For v2i64 shifts any i32 shift operands
have to be zero extended in the input and any i64 shift operands have
to be wrapped in the output. Depends on D52007.
Reviewers: aheejin, dschuff
Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits
Differential Revision: https://reviews.llvm.org/D51906
llvm-svn: 342302
Summary:
Integer types smaller than i32 must be extended to i32 by default.
The feature "crbits" introduced at r202451 handles i1 as a special case,
but it did not extend properly.
The caller was, therefore, passing i1 stack arguments by writing 0/1 to
the first byte of the 4-byte stack object and callee was
reading the first byte for the value.
"crbits" is enabled if the optimization level is greater than 1,
which is very common in "release builds".
Such discrepancies with ABI specification also introduces
potential incompatibility with programs or libraries
built with other compilers e.g. GCC.
Fixes PR38661
Reviewers: hfinkel, cuviper
Subscribers: sylvestre.ledru, glaubitz, nagisa, nemanjai, kbarton, llvm-commits
Differential Revision: https://reviews.llvm.org/D51108
llvm-svn: 342288
Attempt to lower a shuffle as an unpack of elements from two inputs followed by a single-input (wider) permutation.
As long as the permutation is wider this is a win - there may be some circumstances where same size permutations would also be useful but I've left that for future work.
Differential Revision: https://reviews.llvm.org/D52043
llvm-svn: 342257
Summary:
GFX9 and above support sin/cos instructions with a greater range and thus don't
require a fract instruction prior to invocation.
Added a subtarget feature to reflect this and added code to take advantage of
expanded range on GFX9+
Also updated the tests to check correct behaviour
Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, tpr, t-tye, llvm-commits
Differential Revision: https://reviews.llvm.org/D51933
Change-Id: I1c1f1d3726a5ae32116646ca5cfa1ab4ef69e5b0
llvm-svn: 342222
On failing to find sequences that can be converted into dual macs,
try to find sequential 16-bit loads that are used by muls which we
can then use smultb, smulbt, smultt with a wide load.
Differential Revision: https://reviews.llvm.org/D51983
llvm-svn: 342210
After recent improvements which makes better use of LOC instead of IPM, the
TTI cost functions also needs to be updated to reflect this.
This involves sext, zext and xor of i1.
The tests were updated so that for z13 the new costs are expected, while the
old costs are still checked for on zEC12.
Review: Ulrich Weigand
https://reviews.llvm.org/D51339
llvm-svn: 342207
Summary:
I accidentally left this behind in D50306, and it causes a build warning
when I build with gcc7.
Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, t-tye, llvm-commits
Differential Revision: https://reviews.llvm.org/D52022
Change-Id: I30f7a47047e9d9d841f652da66d2fea19e74842c
llvm-svn: 342189
When replacing a named register input to the appropriately sized
sub/super-register. In the case of a 64-bit value being assigned to a
register in 32-bit mode, match GCC's assignment.
Reviewers: eli.friedman, craig.topper
Subscribers: nickdesaulniers, llvm-commits, hiraditya
Differential Revision: https://reviews.llvm.org/D51502
llvm-svn: 342175
Summary:
Fixed assertions due to invalid fixup when encoding compressed instructions
(c.addi, c.addiw, c.li, c.andi) with bare symbols with/without modifiers.
This matches GAS behavior as well.
This bug was uncovered by a LLVM MC Disassembler Protocol Buffer Fuzzer
for the RISC-V assembly language.
Reviewers: asb
Reviewed By: asb
Subscribers: rbar, johnrusso, simoncook, sabuasal, niosHD, kito-cheng, shiva0217, zzheng, edward-jones, mgrang, rogfer01, MartinMosbeck, brucehoult, the_o, rkruppe, PkmX, jocewei, asb
Differential Revision: https://reviews.llvm.org/D52005
llvm-svn: 342160
Summary:
The illegal instruction 0x00 0x00 is being wrongly decoded as
c.addi4spn with 0 immediate.
The invalid instruction 0x01 0x61 is being wrongly decoded as
c.addi16sp with 0 immediate.
This bug was uncovered by a LLVM MC Disassembler Protocol Buffer Fuzzer
for the RISC-V assembly language.
Reviewers: asb
Reviewed By: asb
Subscribers: rbar, johnrusso, simoncook, sabuasal, niosHD, kito-cheng, shiva0217, zzheng, edward-jones, mgrang, rogfer01, MartinMosbeck, brucehoult, the_o, rkruppe, PkmX, jocewei, asb
Differential Revision: https://reviews.llvm.org/D51815
llvm-svn: 342159
Also, add a check to ensure that when main has the expected signature
we do not create a wrapper.
Differential Revision: https://reviews.llvm.org/D51562
llvm-svn: 342157
We previously only allowed truncs as sinks, but now allow them as
sources too. We do this by checking that the result type is the
narrow type that we're trying to optimise for.
Differential Revision: https://reviews.llvm.org/D51978
llvm-svn: 342141
Part of FixConsts wrongly assumes either a 8- or 16-bit constant
which can result in the wrong constants being generated during
promotion.
Differential Revision: https://reviews.llvm.org/D52032
llvm-svn: 342140
If an argument was passed on the stack, this
was using the default alignment.
I'm not sure there's an observable change from this. This
was observable due to bugs in expansion of unaligned
loads and stores, but since that is fixed I don't think
this matters much.
llvm-svn: 342133
The Technical Reference Manuals for these two CPUs state that branching
to an unaligned 32-bit instruction incurs an extra pipeline reload
penalty. That's bad.
This also enables the optimization at -Os since it costs on average one
byte per loop in return for 1 cycle per iteration, which is pretty good
going.
llvm-svn: 342127
This implements suggesting alternative mnemonics when an invalid one is
specified. For example `addru $9, $6, 17767` leads to the following
error message:
error: unknown instruction, did you mean: add, addiu, addu, maddu?
Differential revision: https://reviews.llvm.org/D40646
llvm-svn: 342119
Summary:
Previously we type legalized v2i32 div/rem by promoting to v2i64. But we don't support div/rem of vectors so op legalization would then scalarize it using i64 scalar ops since it doesn't know about the original promotion. 64-bit scalar divides on Intel hardware are known to be slow and in 32-bit mode they require a libcall.
This patch switches type legalization to do the scalarizing itself using i32.
It looks like the division by power of 2 optimization is still kicking in and leaving the code as a vector. The division by other constant optimization doesn't kick in pre type legalization since it ignores illegal types. And previously, after type legalization we scalarized the v2i64 since we don't have v2i64 MULHS/MULHU support.
Another option might be to widen v2i32 to v4i32 so we could do division by constant optimizations, but we'd have to be careful to only do that for constant divisors or we risk scalaring to 4 scalar divides.
Reviewers: RKSimon, spatel
Reviewed By: spatel
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D51325
llvm-svn: 342114
The `IMAGE_REL_ARM_BRANCH20T` applies only to a `b.w` instruction. A
thumb-2 `bl` should be relocated using a `IMAGE_REL_ARM_BRANCH24T`.
Correct the relocation that we emit in such a case.
Resolves PR38620! Based on the patch by Jordan Rhee!
llvm-svn: 342109
Shufflevector instructions in LLVM IR that extract a subset of elements
of a longer input into a shorter vector can be done using VECTOR_SHUFFLEs.
This will avoid expanding them into constly extracts and inserts.
llvm-svn: 342091
Summary:
rL341389 broke code with tied register operands in inline assembly. For
example, `asm("" : "=r"(var) : "0"(var));`
The code above specifies the input operand to be in the same register
with the output operand, tying the two register. This patch makes this
kind of code work again.
Reviewers: dschuff
Subscribers: sbc100, jgravelle-google, eraman, sunfish, llvm-commits
Differential Revision: https://reviews.llvm.org/D51991
llvm-svn: 342084
Scalarization of a shuffle will break up the source vectors into individual
elements, and use them to assemble the resulting vector. An element type of
a legal vector type may not necessarily be a legal scalar type, so make
sure that the extracted values are extended to a legal scalar type.
llvm-svn: 342079
Move isa version determination into TargetParser.
Also switch away from target features to CPU string when
determining isa version. This fixes an issue when we
output wrong isa version in the object code when features
of a particular CPU are altered (i.e. gfx902 w/o xnack
used to result in gfx900).
llvm-svn: 342069
Summary:
Match the ordering semantics of non-vector comparisons. For
floating point comparisons that do not correspond to instructions, the
tests check that some vector comparison instruction was emitted but do
not care about the full implementation.
Reviewers: aheejin, dschuff
Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits
Differential Revision: https://reviews.llvm.org/D51765
llvm-svn: 342064
There's no advantage to this instruction unless you need to avoid touching other flag bits. It's encoding is longer, it can't fold an immediate, it doesn't write all the flags.
I don't think gcc will generate this instruction either.
Fixes PR38852.
Differential Revision: https://reviews.llvm.org/D51754
llvm-svn: 342059
This patch adds codegen support for the saving/restoring
V8-V23 for functions specified with the aarch64_vector_pcs
calling convention attribute, as added in patch D51477.
Reviewers: t.p.northover, gberry, thegameg, rengolin, javed.absar, MatzeB
Reviewed By: thegameg
Differential Revision: https://reviews.llvm.org/D51479
llvm-svn: 342049
This patch refactors several parts of AArch64FrameLowering
so that it can be easily extended to support saving/restoring
of FPR128 (Q) registers.
Reviewers: t.p.northover, gberry, thegameg, rengolin, javed.absar
Reviewed By: thegameg
Differential Revision: https://reviews.llvm.org/D51478
llvm-svn: 342038
SMLAD and SMLALD instructions also come in the form of SMLADX and
SMLALDX which perform an exchange on their second operand. To support
this, more of the loads in the MAC candidates are compared for
sequential access and a boolean value has been added to BinOpChain.
AddMACCandiate has been refactored into a small pattern matching
state machine to reduce the amount of duplicated code, but also to
enable the matching to be more flexible. CreateParallelMACPairs now
iterates through all the candidates to find parallel ones.
Differential Revision: https://reviews.llvm.org/D51424
llvm-svn: 342033
This patch adds parsing support for the 'aarch64_vector_pcs'
calling convention attribute to calls and function declarations.
More information describing the vector ABI and procedure call standard
can be found here:
https://developer.arm.com/products/software-development-tools/\
hpc/arm-compiler-for-hpc/vector-function-abi
Reviewers: t.p.northover, rnk, rengolin, javed.absar, thegameg, SjoerdMeijer
Reviewed By: SjoerdMeijer
Differential Revision: https://reviews.llvm.org/D51477
llvm-svn: 342030
Summary:
In GNUX23, is64BitMode returns true, but pointers are 32-bits. So we shouldn't copy pointer values into RSI/RDI since the widths don't match.
Fixes PR38865 despite what the title says. I think the llvm_unreachable in the copyPhysReg code tricked the optimizer and made the fatal error trigger.
Reviewers: rnk, efriedma, MatzeB, echristo
Reviewed By: efriedma
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D51893
llvm-svn: 342015
into TargetParser.
Also switch away from target features to CPU string when
determining isa version. This fixes an issue when we
output wrong isa version in the object code when features
of a particular CPU are altered (i.e. gfx902 w/o xnack
used to result in gfx900).
Differential Revision: https://reviews.llvm.org/D51890
llvm-svn: 341982
In r337348, I changed lowering to prefer X86ISD::UNPCKL/UNPCKH opcodes over MOVLHPS/MOVHLPS for v2f64 {0,0} and {1,1} shuffles when we have SSE2. This enabled the removal of a bunch of weirdly bitcasted isel patterns in r337349. To avoid changing the tests I placed a gross hack in isel to still emit movhlps instructions for fake unary unpckh nodes. A similar hack was not needed for unpckl and movlhps because we do execution domain switching for those. But unpckh and movhlps have swapped operand order.
This patch removes the hack.
This is a code size increase since unpckhpd requires a 0x66 prefix and movhlps does not. But if that's a big concern we should be using movhlps for all unpckhpd opcodes and let commuteInstruction turnit into unpckhpd when its an advantage.
Differential Revision: https://reviews.llvm.org/D49499
llvm-svn: 341973
GNUX32 uses 32-bit pointers despite is64BitMode being true. So we should use EAX to return the value.
Fixes ones of the failures from PR38865.
Differential Revision: https://reviews.llvm.org/D51940
llvm-svn: 341972
An fp_to_sint node would be incorrectly lowered to a TruncIntFP node in
single-float mode. This would trigger an "Unexpected illegal type!"
assert.
Patch by Dan Ravensloft.
Differential revision: https://reviews.llvm.org/D51810
llvm-svn: 341952
Search from i64 reducing phis, as well as i32, to allow the
generation of smlald instructions.
Differential Revision: https://reviews.llvm.org/D51101
llvm-svn: 341941
We've had the pass enabled downstream for a couple of weeks and it
seems to be okay, so enable it by default.
Differential Revision: https://reviews.llvm.org/D51920
llvm-svn: 341932