We have isel patterns for v4i32/v4f64 that artificially widen to v8i32/v8f64 so just use that.
If x86-experimental-vector-widening-legalization is enabled, we don't need any custom legalization and can just return. I've modified the test RUN lines to cover this case.
llvm-svn: 341765
This is the DAG equivalent of D51433.
If we know we're not using all vector lanes, use that knowledge to potentially simplify a vselect condition.
The reduction/horizontal tests show that we are eliminating AVX1 operations on the upper half of 256-bit
vectors because we don't need those anyway.
I'm not sure what the pr34592 test is showing. That's run with -O0; is SimplifyDemandedVectorElts supposed
to be running there?
Differential Revision: https://reviews.llvm.org/D51696
llvm-svn: 341762
Summary:
This patch allows vectors with a power of 2 number of elements and i8/i16 element type to select paddus/psubus instructions. ReplaceNodeResults has been updated to custom widen these operations up to 128 bits like we already do for PAVG.
Another step towards fixing PR38691
Reviewers: RKSimon, spatel
Reviewed By: RKSimon, spatel
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D51818
llvm-svn: 341753
This is a MIR test so we can test ADOX which we have no isel patterns for. I also plan to remove ADCX isel patterns in the near future so this will help maintain coverage.
llvm-svn: 341751
We are currently only able to fold a load in operand 1 to ADCX. A load in operand 0 will use the legacy ADC instruction.
Ultimately I want to remove isel support for ADCX, but first I'm going to fix the shortcomings I know of so I can write proper MIR tests to maintain coverage later.
llvm-svn: 341744
We currently have no way to isel ADOX and I plan to remove isel patterns for ADCX. This test will ensure we still have stack folding support for these instructions if we need them in the future.
llvm-svn: 341743
Summary:
MSVC and LLD sort sections ASCII-betically, so we need to use section
names that sort between .CRT$XCA (the start) and .CRT$XCU (the default
priority).
In the general case, use .CRT$XCT12345 as the section name, and let the
linker sort the zero-padded digits.
Users with low priorities typically want to initialize as early as
possible, so use .CRT$XCA00199 for prioties less than 200. This number
is arbitrary.
Implements PR38552.
Reviewers: majnemer, mstorsjo
Subscribers: hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D51820
llvm-svn: 341727
Summary:
Since the shuffle mask is not exposed as an operand in the native ISel
DAG, create a new WebAssembly ISD node exposing the mask. The mask is
lowered as sixteen immediate byte indices no matter what type the
original vector shuffle was operating on.
This CL depends on D51656
Reviewers: aheejin, dschuff
Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits
Differential Revision: https://reviews.llvm.org/D51659
llvm-svn: 341718
Summary:
Reserving registers x1-7 is used to support CONFIG_ARM64_LSE_ATOMICS in Linux kernel. This change adds support for reserving registers x1 through x7.
Reviewers: javed.absar, phosek, srhines, nickdesaulniers, efriedma
Reviewed By: nickdesaulniers, efriedma
Subscribers: niravd, jfb, manojgupta, nickdesaulniers, jyknight, efriedma, kristof.beyls, llvm-commits
Differential Revision: https://reviews.llvm.org/D48580
llvm-svn: 341706
The generic type legalizer will scalarize vXi1 instructions getting rid of the vector entirely. Creating wider vector instructions is just going to prevent that.
llvm-svn: 341705
The type legalizer will try to scalarize this and fail.
It looks like there's some other v1iX oddities out there too since we still generated some vector instructions.
llvm-svn: 341704
Similar to what was recently done for addcarry/subborrow and has been done for rdrand/rdseed for a while. It's better to use two results and an explicit store in IR when the store isn't part of the semantics of the instruction. This allows store->load forwarding to happen in the middle end. Or the store to be removed if its never loaded.
Differential Revision: https://reviews.llvm.org/D51803
llvm-svn: 341698
We should represent the store directly in IR instead. This gives the middle end a chance to remove it if it can see a load from the same address.
Differential Revision: https://reviews.llvm.org/D51769
llvm-svn: 341677
Previously we only handled loads in operand 0, but nothing guarantees the load will be operand 0 for commutable operations.
Differential Revision: https://reviews.llvm.org/D51768
llvm-svn: 341675
Add support for bitcasts from float type to an integer type of the same element bitwidth.
There maybe cases where we need to support different widths (e.g. as SSE __m128i is treated as v2i64) - but I haven't seen cases of this in the wild yet.
llvm-svn: 341652
Because t2LDREX (& t2STREX) were marked as AddrModeNone, but did allow a
FrameIndex operand, rewriteT2FrameIndex asserted. This gives them a
proper addressing-mode and tells the rewriter about it so that encodable
offsets are exploited and others are rejected.
Should fix PR38828.
llvm-svn: 341642
ADC is commutable and the load could be in either operand, but we were only checking operand 0.
Ideally we'd mark X86adc_flag as commutable and tablegen would automatically do this, but the EFLAGS register mention is preventing it.
llvm-svn: 341606
The ADC instruction is commutable, but we only have RMW isel patterns with a load on the left hand side. Nothing will canonicalize loads to the LHS on these ops. So we need two patterns.
llvm-svn: 341605
SHF_ARM_PURECODE flag when being built with the -mexecute-only flag.
All code sections of an ELF must have the flag set for the final .text
section to be execute-only, otherwise the flag gets removed.
A HasData flag is added to MCSection to aid in the determination that
the section is empty. A virtual setTargetSectionFlags is added to
MCELFObjectTargetWriter to allow subclasses to set target specific
section flags to be added to sections which we then use in the ARM
backend to set SHF_ARM_PURECODE.
Patch by Ivan Lozano!
Reviewed By: echristo
Differential Revision: https://reviews.llvm.org/D48792
llvm-svn: 341593
Summary:
I added a few ARM64 memset codegen tests in r341406 and r341493, and annotated
where the generated code was bad. This patch fixes the majority of the issues by
requesting that a 2xi64 vector be used for memset of 32 bytes and above.
The patch leaves the former request for f128 unchanged, despite f128
materialization being suboptimal: doing otherwise runs into other asserts in
isel and makes this patch too broad.
This patch hides the issue that was present in bzero_40_stack and bzero_72_stack
because the code now generates in a better order which doesn't have the store
offset issue. I'm not aware of that issue appearing elsewhere at the moment.
<rdar://problem/44157755>
Reviewers: t.p.northover, MatzeB, javed.absar
Subscribers: eraman, kristof.beyls, chrib, dexonsmith, llvm-commits
Differential Revision: https://reviews.llvm.org/D51706
llvm-svn: 341558
This basically reverts a change made in r336217, but improves the text of the error message for not allowing IP-relative addressing in 32-bit mode.
Fixes PR38826.
Patch by Iain Sandoe.
llvm-svn: 341512
I'm looking at some codegen optimization in this area and want to make sure I understand the current codegen and don't regress it. This patch further expands the tests (which I already expanded in r341406) to capture more of the current code generation when it comes to stack-based small non-zero memset on arm64. This patch annotates some potential fixes.
llvm-svn: 341493
This was proposed as an IR transform in D49306, but it was not clearly justifiable as a canonicalization.
Here, we only do the transform when the target tells us that sqrt can be lowered with inline code.
This is the basic case. Some potential enhancements are in the TODO comments:
1. Generalize the transform for other exponents (allow more than 2 sqrt calcs if that's really cheaper).
2. If we have less fast-math-flags, generate code to avoid -0.0 and/or INF.
3. Allow the transform when optimizing/minimizing size (might require a target hook to get that right).
Note that by default, x86 converts single-precision sqrt calcs into sqrt reciprocal estimate with
refinement. That codegen is controlled by CPU attributes and can be manually overridden. We have plenty
of test coverage for that already, so I didn't bother to include extra testing for that here. AArch uses
its full-precision ops in all cases (not sure if that's the intended behavior or not, but that should
also be covered by existing tests).
Differential Revision: https://reviews.llvm.org/D51630
llvm-svn: 341481
Reland r341269. Use std::stable_sort when sorting constant condidates.
Reverting commit, r341365:
Revert r341269: [Constant Hoisting] Hoisting Constant GEP Expressions
One of the tests is failing 50% of the time when expensive checks are
enabled. Not sure how deep the problem is so just reverting while the
author can investigate so that the bots stop repeatedly failing and
blaming things incorrectly. Will respond with details on the original
commit.
Original commit, r341269:
[Constant Hoisting] Hoisting Constant GEP Expressions
Leverage existing logic in constant hoisting pass to transform constant GEP
expressions sharing the same base global variable. Multi-dimensional GEPs are
rewritten into single-dimensional GEPs.
https://reviews.llvm.org/D51396
Differential Revision: https://reviews.llvm.org/D51654
llvm-svn: 341417
Emit a waterfall loop in the general case for a potentially-divergent Rsrc
operand. When practical, avoid this by using Addr64 instructions.
Differential Revision: https://reviews.llvm.org/D50982
llvm-svn: 341413
I'm looking at some codegen optimization in this area and want to make sure I understand the current codegen and don't regress it. This patch simply expands the two existing tests to capture more of the current code generation when it comes to heap-based and stack-based small memset on arm64. The tested code is already pretty good, notably when it comes to using STP, FP stores, FP immediate generation, and folding one of the stores into a stack spill when possible. The uses of STUR could be improved, and some more pairing could occur. Straying from bzero patterns currently yield suboptimal code, and I expect a variety of small changes could make things way better.
llvm-svn: 341406
The runtime pseudo relocations can't handle the AArch64 format PC
relative addressing in adrp+add/ldr pairs. By using stubs, the potentially
dllimported addresses can be touched up by the runtime pseudo relocation
framework.
Differential Revision: https://reviews.llvm.org/D51452
llvm-svn: 341401
Use MachineOperand::ChangeToImmediate rather than reassigning
MachineOperands to new values created from MachineOperand::CreateImm,
so that their parent pointers are preserved.
This fixes "Instruction has operand with wrong parent set" errors
reported by the MachineVerifier.
llvm-svn: 341389
One of the tests is failing 50% of the time when expensive checks are
enabled. Not sure how deep the problem is so just reverting while the
author can investigate so that the bots stop repeatedly failing and
blaming things incorrectly. Will respond with details on the original
commit.
llvm-svn: 341365
Load Hardening.
Wires up the existing pass to work with a proper IR attribute rather
than just a hidden/internal flag. The internal flag continues to work
for now, but I'll likely remove it soon.
Most of the churn here is adding the IR attribute. I talked about this
Kristof Beyls and he seemed at least initially OK with this direction.
The idea of using a full attribute here is that we *do* expect at least
some forms of this for other architectures. There isn't anything
*inherently* x86-specific about this technique, just that we only have
an implementation for x86 at the moment.
While we could potentially expose this as a Clang-level attribute as
well, that seems like a good question to defer for the moment as it
isn't 100% clear whether that or some other programmer interface (or
both?) would be best. We'll defer the programmer interface side of this
for now, but at least get to the point where the feature can be enabled
without relying on implementation details.
This also allows us to do something that was really hard before: we can
enable *just* the indirect call retpolines when using SLH. For x86, we
don't have any other way to mitigate indirect calls. Other architectures
may take a different approach of course, and none of this is surfaced to
user-level flags.
Differential Revision: https://reviews.llvm.org/D51157
llvm-svn: 341363
implementing the proposed mitigation technique described in the original
design document.
The idea is to check after calls that the return address used to arrive
at that location is in fact the correct address. In the event of
a mis-predicted return which reaches a *valid* return but not the
*correct* return, this will detect the mismatch much like it would
a mispredicted conditional branch.
This is the last published attack vector that I am aware of in the
Spectre v1 space which is not mitigated by SLH+retpolines. However,
don't read *too* much into that: this is an area of ongoing research
where we expect more issues to be discovered in the future, and it also
makes no attempt to mitigate Spectre v4. Still, this is an important
completeness bar for SLH.
The change here is of course delightfully simple. It was predicated on
cutting support for post-instruction symbols into LLVM which was not at
all simple. Many thanks to Hal Finkel, Reid Kleckner, and Justin Bogner
who helped me figure out how to do a bunch of the complex changes
involved there.
Differential Revision: https://reviews.llvm.org/D50837
llvm-svn: 341358
retpolines.
This implements the core design of tracing the intended target into the
target, checking it, and using that to update the predicate state. It
takes advantage of a few interesting aspects of SLH to make it a bit
easier to implement:
- We already split critical edges with conditional branches, so we can
assume those are gone.
- We already unfolded any memory access in the indirect branch
instruction itself.
I've left hard errors in place to catch if any of these somewhat subtle
invariants get violated.
There is some code that I can factor out and share with D50837 when it
lands, but I didn't want to couple landing the two patches, so I'll do
that in a follow-up cleanup commit if alright.
Factoring out the code to handle different scenarios of materializing an
address remains frustratingly hard. In a bunch of cases you want to fold
one of the cases into an immediate operand of some other instruction,
and you also have both symbols and basic blocks being used which require
different methods on the MI builder (and different operand kinds).
Still, I'll take a stab at sharing at least some of this code in
a follow-up if I can figure out how.
Differential Revision: https://reviews.llvm.org/D51083
llvm-svn: 341356
This patch modifies hasStandardEncoding() / inMicroMipsMode() /
inMips16Mode() methods of the MipsSubtarget class so only one can be
true at any one time. That prevents the selection of microMIPS and MIPS
instructions and patterns that are defined in TableGen files at the same
time. A few new patterns and instruction definitions hae been added to
keep test cases passed.
Differential revision: https://reviews.llvm.org/D51483
llvm-svn: 341338
For instructions that spill/fill to and from multiple frame-indices
in a single instruction, hasStoreToStackSlot and hasLoadFromStackSlot
should return an array of accesses, rather than just the first encounter
of such an access.
This better describes FI accesses for AArch64 (paired) LDP/STP
instructions.
Reviewers: t.p.northover, gberry, thegameg, rengolin, javed.absar, MatzeB
Reviewed By: MatzeB
Differential Revision: https://reviews.llvm.org/D51537
llvm-svn: 341301
When doing some instruction scheduling work, we noticed some missing itineraries.
Before we switch to machine scheduler, those missing itineraries might not have impact to actually scheduling,
because we can still get same latency due to default values.
With machine scheduler, however, itineraries will have impact to scheduling.
eg: NumMicroOps will default to be 0 if there is NO itineraries for specific instruction class.
And most of the instruction class with itineraries will have NumMicroOps default to 1.
This will has impact on the count of RetiredMOps, affects the Pending/Available Queue,
then causing different scheduling or suboptimal scheduling further.
Patch by jsji (Jinsong Ji)
Differential Revision: https://reviews.llvm.org/D51506
llvm-svn: 341293
Summary:
A follow-up for D49266 / rL337166 + D49497 / rL338044.
This is still the same pattern to check for the [lack of]
signed truncation, but in this case the constants and the predicate
are negated.
https://rise4fun.com/Alive/BDVhttps://rise4fun.com/Alive/n7Z
Reviewers: spatel, craig.topper, RKSimon, javed.absar, efriedma, dmgreen
Reviewed By: spatel
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D51532
llvm-svn: 341287
The 'rol Rd' instruction is equivalent to 'adc Rd'.
This caused compile warnings from tablegen because of conflicting bits
shared between each instruction.
llvm-svn: 341275
Leverage existing logic in constant hoisting pass to transform constant GEP
expressions sharing the same base global variable. Multi-dimensional GEPs are
rewritten into single-dimensional GEPs.
Differential Revision: https://reviews.llvm.org/D51396
llvm-svn: 341269
These intrinsics use the same implementation as PTEST intrinsics, but use vXi1 vectors.
New clang builtins will be accompanying them shortly.
llvm-svn: 341259
This patch recognizes shuffles that shift elements and fill with zeros. I've copied and modified the shift matching code we use for normal vector registers to do this. I'm not sure if there's a good way to share more of this code without making the existing function more complex than it already is.
This will be used to enable kshift intrinsics in clang.
Differential Revision: https://reviews.llvm.org/D51401
llvm-svn: 341227
The `mtc1` and `mfc1` definitions in the MipsInstrFPU.td have MMRel,
but do not have StdMMR6Rel tags. When these instructions are emitted
for microMIPS R6 targets, `Mips::MipsR62MicroMipsR6` nor
`Mips::Std2MicroMipsR6` cannot find correct op-codes and as a result the
backend uses mips32 variant of the instructions encoding.
The patch fixes this problem by adding the StdMMR6Rel tag and check
instructions encoding in the test case.
Differential revision: https://reviews.llvm.org/D51482
llvm-svn: 341221
The intention is to enable the extract_vector_elt load combine,
and doing this for other operations interferes with more
useful optimizations on vectors.
Handle any type of load since in principle we should do the
same combine for the various load intrinsics.
llvm-svn: 341219
This patch fixes the number of micro opcodes, and processor resource cycles for
the following AVX instructions:
vinsertf128rr/rm
vperm2f128rr/rm
vbroadcastf128
Tests have been regenerated using the usual scripts in the llvm/utils directory.
Differential Revision: https://reviews.llvm.org/D51492
llvm-svn: 341185
The runtime pseudo relocations can't handle the ARM format embedded
addresses in movw/movt pairs. By using stubs, the potentially
dllimported addresses can be touched up by the runtime pseudo relocation
framework.
Differential Revision: https://reviews.llvm.org/D51450
llvm-svn: 341176
This will cover the (v2i32 (setcc v2f32)) case in replaceNodeResults. That code shouldn't be needed at all in this mode. A future patch will skip it.
llvm-svn: 341171
This assert tried to check that AND constants are only on the RHS. But its possible for both operands to be constants if one is opaque which will prevent the AND from being constant folded.
Fixes PR38771
llvm-svn: 341102
With r295105, some 'noreturn' blocks (those that don't return and have no
successors) may be merged.
If such blocks' predecessors have different outgoing offset or register, don't
report an error in CFIInstrInserter verify().
Thanks to Vlad Tsyrklevich for reporting the issue.
Differential Revision: https://reviews.llvm.org/D51161
llvm-svn: 341087
Summary:
In the case of (and reg, constant) or (or reg, constant), it can be
beneficial to use a ANDNrr/ORNrr instruction instead of ANDrr/ORrr,
if the complement of the constant can be encoded using a single SETHI
instruction instead of a SETHI/ORri pair.
If the constant has more than one use, it is probably better to keep it
in its original form.
Reviewers: jyknight, venkatra
Reviewed By: jyknight
Subscribers: fedor.sergeev, jrtc27, llvm-commits
Differential Revision: https://reviews.llvm.org/D50964
llvm-svn: 341069
Summary:
This is a continuation of https://reviews.llvm.org/D49727
Below the original text, current changes in the comments:
Currently, in line with GCC, when specifying reserved registers like sp or pc on an inline asm() clobber list, we don't always preserve the original value across the statement. And in general, overwriting reserved registers can have surprising results.
For example:
extern int bar(int[]);
int foo(int i) {
int a[i]; // VLA
asm volatile(
"mov r7, #1"
:
:
: "r7"
);
return 1 + bar(a);
}
Compiled for thumb, this gives:
$ clang --target=arm-arm-none-eabi -march=armv7a -c test.c -o - -S -O1 -mthumb
...
foo:
.fnstart
@ %bb.0: @ %entry
.save {r4, r5, r6, r7, lr}
push {r4, r5, r6, r7, lr}
.setfp r7, sp, #12
add r7, sp, #12
.pad #4
sub sp, #4
movs r1, #7
add.w r0, r1, r0, lsl #2
bic r0, r0, #7
sub.w r0, sp, r0
mov sp, r0
@APP
mov.w r7, #1
@NO_APP
bl bar
adds r0, #1
sub.w r4, r7, #12
mov sp, r4
pop {r4, r5, r6, r7, pc}
...
r7 is used as the frame pointer for thumb targets, and this function needs to restore the SP from the FP because of the variable-length stack allocation a. r7 is clobbered by the inline assembly (and r7 is included in the clobber list), but LLVM does not preserve the value of the frame pointer across the assembly block.
This type of behavior is similar to GCC's and has been discussed on the bugtracker: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=11807 . No consensus seemed to have been reached on the way forward. Clang behavior has briefly been discussed on the CFE mailing (starting here: http://lists.llvm.org/pipermail/cfe-dev/2018-July/058392.html). I've opted for following Eli Friedman's advice to print warnings when there are reserved registers on the clobber list so as not to diverge from GCC behavior for now.
The patch uses MachineRegisterInfo's target-specific knowledge of reserved registers, just before we convert the inline asm string in the AsmPrinter.
If we find a reserved register, we print a warning:
repro.c:6:7: warning: inline asm clobber list contains reserved registers: R7 [-Winline-asm]
"mov r7, #1"
^
Reviewers: efriedma, olista01, javed.absar
Reviewed By: efriedma
Subscribers: eraman, kristof.beyls, llvm-commits
Differential Revision: https://reviews.llvm.org/D51165
llvm-svn: 341062
Providing that the load is known to be 4 byte aligned, we can optimise a
ldr(adr address) to just ldr address.
Differential Revision: https://reviews.llvm.org/D51030
llvm-svn: 341058
Bots are unhappy:
/Users/buildslave/jenkins/workspace/clang-stage1-cmake-RA-incremental/llvm/test/CodeGen/Hexagon/swp-const-tc2.ll:10:14: error: CHECK-NOT: excluded string found in input
; CHECK-NOT: = mpy
^
<stdin>:22:6: note: found here
r5 += mpyi(r2,r3)
^~~~~
This reverts commit r341046.
llvm-svn: 341049
Hacker's Delight 10-17: when C is constant,
the result of X % C == 0 can be computed more cheaply
without actually calculating the remainder.
The motivation is discussed here:
https://bugs.llvm.org/show_bug.cgi?id=35479.
Patch by: hermord (Dmytro Shynkevych)!
For https://reviews.llvm.org/D50222
llvm-svn: 341047
Summary:
As suggested in D50222, this has been refactored into a separate patch.
The undef and the infinite loop at the end cause this test to be translated
unpredictably. In particular, the checked-for `mpy` disappears under
certain legal optimizations (e.g. the one in D50222).
Since the use of these constructs is not relevant to the behavior tested,
according to the header comment, this change, suggested by @kparzysz,
eliminates them.
Patch by: hermord (Dmytro Shynkevych)!
Reviewers: kparzysz
Reviewed By: kparzysz
Subscribers: llvm-commits, kparzysz
Differential Revision: https://reviews.llvm.org/D50944
llvm-svn: 341046
In computeRegisterLiveness, the max instructions to search
was counting dbg_value instructions, which could potentially
cause an observable codegen change from the presence of debug
info.
llvm-svn: 341028
If there is an unused def, this would previously
report that the register was live. Check for uses
first so that it is reported as dead if never used.
llvm-svn: 341027
If the end of the block is reached during the scan, check
the live ins of the successors. This was already done in the
other direction if the block entry was reached.
llvm-svn: 341026
Check that Machine CSE correctly handles during the transformation, the
debug location information for local variables.
Differential Revision: https://reviews.llvm.org/D50887
llvm-svn: 341025
We now only add +64bit to the CPU string for "generic" CPU. All other CPU names are assumed to have the feature flag already set if they support 64-bit. I've remove the implies from CMPXCHG8 so that Feature64Bit only comes in via CPUs or user passing -mattr=+64bit.
I've changed the assert to a report_fatal_error so it's not lost in Release builds.
The test updates are to fix things that tripped the new error.
Differential Revision: https://reviews.llvm.org/D51231
llvm-svn: 341022
If an ABI-like value is used in a different block,
the type split used is not necessarily the same as
the call's ABI. The value is used through an intermediate
copy virtual registers from the other block. This
resulted in copies with inconsistent sizes later.
Fixes regressions since r338197 when AMDGPU started
splitting vector types for calls.
llvm-svn: 341018
We don't have enough information to know if struct types being
bitcast will cause validation failures or not, so be conservative
and allow such cases to persist (fot now).
Fixes: https://bugs.llvm.org/show_bug.cgi?id=38711
Subscribers: dschuff, jgravelle-google, aheejin, sunfish, llvm-commits
Differential Revision: https://reviews.llvm.org/D51460
llvm-svn: 341010
Summary:
Global variables that are external and zero initialized are
supposed to be merged with global variables in the bss section
rather than the data section.
Reviewers: efriedma, rengolin, t.p.northover, javed.absar, asl, john.brawn, pcc
Reviewed By: efriedma
Subscribers: dmgreen, llvm-commits
Differential Revision: https://reviews.llvm.org/D51379
llvm-svn: 341008
Variables declared with the dllimport attribute are accessed via a
stub variable named __imp_<var>. In MinGW configurations, variables that
aren't declared with a dllimport attribute might still end up imported
from another DLL with runtime pseudo relocs.
For x86_64, this avoids the risk that the target is out of range
for a 32 bit PC relative reference, in case the target DLL is loaded
further than 4 GB from the reference. It also avoids having to make the
text section writable at runtime when doing the runtime fixups, which
makes it worthwhile to do for i386 as well.
Add stub variables for all dso local data references where a definition
of the variable isn't visible within the module, since the DLL data
autoimporting might make them imported even though they are marked as
dso local within LLVM.
Don't do this for variables that actually are defined within the same
module, since we then know for sure that it actually is dso local.
Don't do this for references to functions, since there's no need for
runtime pseudo relocations for autoimporting them; if a function from
a different DLL is called without the appropriate dllimport attribute,
the call just gets routed via a thunk instead.
GCC does something similar since 4.9 (when compiling with -mcmodel=medium
or large; from that version, medium is the default code model for x86_64
mingw), but only for x86_64.
Differential Revision: https://reviews.llvm.org/D51288
llvm-svn: 340942
MipsSEInstrInfo class defines for internal purpose unconditional
branches as Mips::B nad Mips:J even in case of microMIPS code
generation. Under some conditions that leads to the bug - for rather long
branch which fits to Mips jump instruction offset size, but does not fit
to microMIPS jump offset size, we generate 'short' branch and later show
an error 'out of range PC16 fixup' after check in the isBranchOffsetInRange
routine.
Differential revision: https://reviews.llvm.org/D50615
llvm-svn: 340932
Involves microMIPS's jump in the analyzable branch set to reduce some
code patterns.
Differential revision: https://reviews.llvm.org/D50613
llvm-svn: 340931
For a certain combination of options, BuildPairF64_{64}, ExtractElementF64{_64}
may be expanded into instructions using stack.
Add implicit operand $sp for such cases so that ShrinkWrapping doesn't move
prologue setup below them.
Fixes MultiSource/Benchmarks/MallocBench/cfrac for
'--target=mips-img-linux-gnu -mcpu=mips32r6 -mfpxx -mnan=2008'
and
'--target=mips-img-linux-gnu -mcpu=mips32r6 -mfp64 -mnan=2008 -mno-odd-spreg'.
Differential Revision: https://reviews.llvm.org/D50986
llvm-svn: 340927
Adjust missed test to avoid the X / X -> 1 & X % X -> 0 folds while keeping their original purposes.
Differential Revision: https://reviews.llvm.org/D50636
llvm-svn: 340917
Adjust tests to avoid the X / X -> 1 & X % X -> 0 folds while keeping their original purposes.
Differential Revision: https://reviews.llvm.org/D50636
llvm-svn: 340916
Noticed while looking at D49562 codegen - we can avoid a large constant mask load and a slow VPBLENDVB select op by using VPBLENDW+VPBLENDD instead.
TODO: As discussed on the patch, we should investigate adding VPBLENDVB handling to target shuffle combining as well, that will allow us to extend this to VPBLENDW+VPBLENDW+VPBLENDD.
Differential Revision: https://reviews.llvm.org/D50074
llvm-svn: 340913
Summary:
Add some optional code to validate getInstSizeInBytes for emitted
instructions. This flushed out some issues which are fixed by this
patch:
- Streamline getInstSizeInBytes
- Properly define the VI readlane/writelane instruction as VOP3
- Fix the inline constant determination. Specifically, this change
fixes an issue where a 32-bit value of 0xffffffff was recorded
as unsigned. This is equal to -1 when restricting to a 32-bit
comparison, and an inline constant can be used.
Reviewers: arsenm, rampitec
Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits
Differential Revision: https://reviews.llvm.org/D50629
Change-Id: Id87c3b7975839da0de8156a124b0ce98c5fb47f2
llvm-svn: 340903
Summary: This is split out from D41062 to cover the code in LegalVectorTypes.cpp
Reviewers: RKSimon, spatel, efriedma
Reviewed By: efriedma
Subscribers: sdardis, jvesely, nhaehnle, jrtc27, atanasyan, llvm-commits
Differential Revision: https://reviews.llvm.org/D51337
llvm-svn: 340891
These are intrinsics for supporting kadd builtins in clang. These builtins are already in gcc to implement intrinsics from icc. Though they are missing from the Intel Intrinsics Guide.
This instruction adds two mask registers together as if they were scalar rather than a vXi1. We might be able to get away with a bitcast to scalar and a normal add instruction, but that would require DAG combine smarts in the backend to recoqnize add+bitcast. For now I'd prefer to go with the easiest implementation so we can get these builtins in to clang with good codegen.
Differential Revision: https://reviews.llvm.org/D51370
llvm-svn: 340869
This can leave behind the uses with the defs removed.
Since this should only really happen in tests, it's not worth the
effort of trying to handle this.
llvm-svn: 340866
https://reviews.llvm.org/D51197
Currently, IRTranslator (and GISel) seems to be arbitrarily picking
which overflow intrinsics get mapped into opcodes which either have a
carry as an input or not.
For intrinsics such as Intrinsic::uadd_with_overflow, translate it to an
opcode (G_UADDO) which doesn't have any carry inputs (similar to LLVM
IR).
This patch adds 4 missing opcodes for completeness - G_UADDO, G_USUBO,
G_SSUBE and G_SADDE.
llvm-svn: 340865
The original motivating example uses a 64-bit add, so the carry
is used. Insert a copy from VCC. This may allow shrinking of
the used carry instruction. At worst, we are replacing a
mov to materialize the constant with a copy of vcc.
llvm-svn: 340862
This needs to be done in the SSA fold operands
pass to be effective, so there is a bit of overlap
with SIShrinkInstructions but I don't think this
is practically avoidable.
llvm-svn: 340859
Summary:
The updated tests were previously infallible because the SIMD bitwise
operations do not contain vector types in their names.
Reviewers: aheejin, dschuff
Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits
Differential Revision: https://reviews.llvm.org/D51369
llvm-svn: 340858
This patch creates the shift mask and actual shift using the vXi16 vector shift ops.
Differential Revision: https://reviews.llvm.org/D51263
llvm-svn: 340813
Summary:
I'm not sure if this patch is correct or if it needs more qualifying somehow. Bitcast shouldn't change the size of the load so it should be ok? We already do something similar for stores. We'll change the type of a volatile store if the resulting store is Legal or Custom. I'm not sure we should be allowing Custom there...
I was playing around with converting X86 atomic loads/stores(except seq_cst) into regular volatile loads and stores during lowering. This would allow some special RMW isel patterns in X86InstrCompiler.td to be removed. But there's some floating point patterns in there that didn't work because we don't fold (f64 (bitconvert (i64 volatile load))) or (f32 (bitconvert (i32 volatile load))).
Reviewers: efriedma, atanasyan, arsenm
Reviewed By: efriedma
Subscribers: jvesely, arsenm, sdardis, kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, arichardson, jrtc27, atanasyan, jfb, llvm-commits
Differential Revision: https://reviews.llvm.org/D50491
llvm-svn: 340797
This patch issues an error message if Darwin ABI is attempted with the PPC
backend. It also cleans up existing test cases, either converting the test to
use an alternative triple or removing the test if the coverage is no longer
needed.
Updated Tests
-------------
The majority of test cases were updated to use a different triple that does not
include the Darwin ABI. Many tests were also updated to use FileCheck, in place
of grep.
Deleted Tests
-------------
llvm/test/tools/dsymutil/PowerPC/sibling.test was originally added to test
specific functionality of dsymutil using an object file created with an old
version of llvm-gcc for a Powerbook G4. After a discussion with @JDevlieghere he
suggested removing the test.
llvm/test/CodeGen/PowerPC/combine_loads_from_build_pair.ll was converted from a
PPC test to a SystemZ test, as the behavior is also reproducible there.
All other tests that were deleted were specific to the darwin/ppc ABI and no
longer necessary.
Phabricator Review: https://reviews.llvm.org/D50988
llvm-svn: 340795
The code that generates the loop definition operand for phis
in the epilog and kernel is incorrect in some cases.
In the kernel, when a phi refers to another phi, the code that
updates PhiOp2 needs to include the stage difference between
the two phis.
In the epilog, the check for using the loop definition instead
of the phi definition uses the StageDiffAdj value (the difference
between the phi stage and the loop definition stage), but the
adjustment is not needed to determine if the current stage
contains an iteration with the loop definition.
Differential Revision: https://reviews.llvm.org/D51167
llvm-svn: 340782
If the liveness of a physical register was invalid, this
was attempting to iterate the subregisters of all register
uses of the instruction, which would assert when it
encountered an implicit virtual register operand.
llvm-svn: 340763
We're using a 256-bit PACKUS to do the truncation, but that instruction operates on 128-bit lanes. So previously we shuffled first to rearrange the lanes. But that requires 2 shuffles. Instead we can shuffle after the PACKUS using a single VPERMQ. This matches what our normal LowerTRUNCATE code does when it uses PACKUS.
Differential Revision: https://reviews.llvm.org/D51284
llvm-svn: 340757
InstCombine mucks these up a bit. So we need to do some additional pattern matching to fix it. There are a still a few special cases not handled, but this covers the general case.
Differential Revision: https://reviews.llvm.org/D50952
llvm-svn: 340756
Summary:
Made it convert from register to stack based instructions, and removed the registers.
Fixes to related code that was expecting register based instructions.
Added the correct testing flag to all tests, depending on what the
format they were expecting so far.
Translated one test to stack format as example: reg-stackify-stack.ll
tested:
llvm-lit -v `find test -name WebAssembly`
unittests/MC/*
Reviewers: dschuff, sunfish
Subscribers: sbc100, jgravelle-google, eraman, aheejin, llvm-commits, jfb
Differential Revision: https://reviews.llvm.org/D51241
llvm-svn: 340750
This commit has caused failures in some internal benchmarks. Temporarily
reverting this patch until the issue can be diagnosed and fixed.
llvm-svn: 340740
Summary:
This commit adds the case of tail calling a sret function from a non-sret
function when both functions have the C calling convention.
llvm-svn: 340737
Summary:
Remove unnecessary lines from `sibcall.ll` and rename labels according
to @RKSimon's recommendations in the D45653 conversation.
llvm-svn: 340735
The internal benchmark failure reported by Google was due to a missing
check for the result type for the sign-extend and shift DAG. This commit
adds the check and re-commits the patch.
llvm-svn: 340734
Summary: The GR740 provides an up cycle counter in the registers ASR22
and ASR23. As these registers can not be read together atomically we only
use the value of ASR23 for llvm.readcyclecounter(). The ASR23 register
holds the 32 LSBs of the up-counter.
Reviewers: jyknight, venkatra
Reviewed By: jyknight
Subscribers: jfb, fedor.sergeev, jrtc27, llvm-commits
Differential Revision: https://reviews.llvm.org/D48638
llvm-svn: 340733
Summary:
Currently bitcasting constants from f64 to v2i32 is done by storing the
value to the stack and then loading it again. This is not necessary, but
seems to happen because v2i32 is a valid type for Sparc V8. If it had not
been legal, we would have gotten help from the type legalizer.
This patch tries to do the same work as the legalizer would have done by
bitcasting the floating point constant and splitting the value up into a
vector of two i32 values.
Reviewers: venkatra, jyknight
Reviewed By: jyknight
Subscribers: glaubitz, fedor.sergeev, jrtc27, llvm-commits
Differential Revision: https://reviews.llvm.org/D49219
llvm-svn: 340723
We cannot directy reuse the patterns of StPat because for some reason the store
DAG node and the atomic_store_nn DAG nodes put the ptr and the value in
different positions. Currently we attempt to store the address to an address
formed by the value.
Differential Revision: https://reviews.llvm.org/D51217
llvm-svn: 340722
Summary:
Previously most CPUs inherited cmov support through Feature64Bit(or FeatureCMPXCHG16HB implying Feature64Bit) or FeatureSSE1.
This has the surprising side effect that -mattr=-cmov causes an assert to fire in 64-bit mode because it clears the Feature64Bit. Or in 32-bit mode, -mattr=-cmov disables any sse/avx features which seems surprising.
This patch removes the implication and instead updates hasCMOV in X86Subtarget to check SSE1 or is64Bit in addition to the regular cmov flag. This should keep most things working the way they did before. I don't believe there is a way to specific "-cmov" directly from clang so this should only effect our lower level tools.
This does stop -mattr=cx16(cmpxchg16b) from implying cmov is enabled via the 64bit flag as you can see from one of the changed tests. But that was a 32-bit test so I don't know why it enabled cx16 anyway.
For the other test I had to add -sse to override the new sse check in hasCMOV.
Reviewers: RKSimon, DavidKreitzer, spatel
Reviewed By: RKSimon
Subscribers: llvm-commits, jfb
Differential Revision: https://reviews.llvm.org/D51228
llvm-svn: 340707
Summary: This matches gcc and one cpuid dump I found online. Given that these are considered 7th generation x86 CPU it seems likely they support cmov since cmov was added by Intel in their 6th generation.
Reviewers: RKSimon, spatel
Reviewed By: RKSimon
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D51264
llvm-svn: 340706
I noticed this along with the patterns in D51125, but when the index is variable,
we don't convert insertelement into a build_vector.
For x86, that means these get expanded at legalization time into the loading/spilling
code that we see in the tests. I think it's always better to avoid going to memory on
these, and we get the optimal 'broadcast' if it's available.
I suspect other targets may want to look at enabling the hook. AArch64 and AMDGPU have
regression tests that would be affected (although I did not check what would happen in
those cases). In the most basic cases shown here, AArch64 would probably do much
better with a splat.
Differential Revision: https://reviews.llvm.org/D51186
llvm-svn: 340705
Legalize G_ADD for types smaller than i32.
LegalizationArtifactCombiner replaces extend instructions with appropriate
bitwise instructions.
Patch by Petar Avramovic.
Differential Revision: https://reviews.llvm.org/D51213
llvm-svn: 340697
Summary:
The only time vector SMUL_LOHI/UMUL_LOHI nodes are created is during division/remainder lowering. If its created before op legalization, generic DAGCombine immediately turns that SMUL_LOHI/UMUL_LOHI into a MULHS/MULHU since only the upper half is used. That node will stick around through vector op legalization and will be turned back into UMUL_LOHI/SMUL_LOHI during op legalization. It will then be custom lowered by the X86 backend. Due to this two step lowering the vector shuffles created by the custom lowering get legalized after their inputs rather than before. This prevents the shuffles from being combined with any build_vector of constants.
This patch uses changes vXi32 to use MULHS/MULHU instead. This is what the later DAG combine did anyway. But by skipping the change back to UMUL_LOHI/SMUL_LOHI we lower it before any constant BUILD_VECTORS. This allows the vector_shuffle creation to constant fold with the build_vectors. This accounts for the test changes here.
Reviewers: RKSimon, spatel
Reviewed By: RKSimon
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D51254
llvm-svn: 340690
This is a preliminary step for a preliminary step for D50992.
I noticed that x86 often misses chances to load a scalar directly
into a vector register.
So this patch is just allowing more of those cases to match a
broadcast op in lowerBuildVectorAsBroadcast(). The old code comment
said it doesn't make sense to use a broadcast when we're loading a
single element and everything else is undef, but I think that's the
best case in the improved tests in insert-loaded-scalar.ll. We avoid
scalar-to-vector-register move and/or less efficient shuffling.
Note that there are some existing types that were already producing
a broadcast, but that happens semi-accidentally. Ie, it's not
happening as part of lowerBuildVectorAsBroadcast(). The build vector
gets expanded into load + shuffle, and then shuffle lowering produces
the broadcast.
Description of the other test diffs:
1. avx-basic.ll - replacing load+shufle is a win.
2. sse3-avx-addsub-2.ll - vmovddup vs. vbroadcastss is neutral
3. sse41.ll - don't care - we convert that intrinsic to generic IR now, so this test is deprecated
4. vector-shuffle-128-v8.ll / vector-shuffle-256-v16.ll - pshufb alternatives with an extra instruction are not obviously bad
Differential Revision: https://reviews.llvm.org/D51125
llvm-svn: 340685
Summary:
Patch by Marek Olsak and David Stuttard, both of AMD.
This adds a new amdgcn intrinsic supporting s.buffer.load, in particular
multiple dword variants. These are convenient to use from some front-end
implementations.
Also modified the existing llvm.SI.load.const intrinsic to common up the
underlying implementation.
This modification also requires that we can lower to non-uniform loads correctly
by splitting larger dword variants into sizes supported by the non-uniform
versions of the load.
V2: Addressed minor review comments.
V3: i1 glc is now i32 cachepolicy for consistency with buffer and
tbuffer intrinsics, plus fixed formatting issue.
V4: Added glc test.
Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, t-tye, llvm-commits
Differential Revision: https://reviews.llvm.org/D51098
Change-Id: I83a6e00681158bb243591a94a51c7baa445f169b
llvm-svn: 340684
Summary:
If any of the bundled instructions are marked as FrameSetup
or FrameDestroy, then that property is set on the BUNDLE
instruction as well.
As long as the scheduler/packetizer aren't mixing
prologue/epilogue instructions (i.e. all the bundled
instructions have the same property) then this simply gives
the bundle the correct property (so when using a bundle
iterator in late passes a bundle will be correctly identified
as FrameSetup/FrameDestroy).
When for example bundling a mix of FrameSetup instructions
with non-FrameSetup instructions it could be discussed if
the bundle should have the property or not. The choice here
has been to set these properties on the BUNDLE instruction if
any of the bundled instructions have the property set.
Reviewers: #debug-info, kparzysz
Reviewed By: kparzysz
Subscribers: vsk, thegameg, llvm-commits
Differential Revision: https://reviews.llvm.org/D50637
llvm-svn: 340680
Summary:
When computeIntervals is looking through COPY instruction to
extend the location mapping for a debug variable it did not
handle subregisters correctly.
For example
DBG_VALUE debug-use %0.sub_8bit_hi, ...
%1:gr16 = COPY %0
was transformed into
DBG_VALUE debug-use %0.sub_8bit_hi, ...
%1:gr16 = COPY %0
DBG_VALUE debug-use %1, ...
So the subregister index was missing in the added DBG_VALUE.
As long as the subreg refered to the least significant bits
of the superreg, then I guess we could get the correct
result in a debugger even when referring to the superreg.
But as in the example above when the subreg refers to other
parts of the superreg, then debuginfo would be incorrect.
I'm not sure exactly how to fix this properly, so this patch
just avoids looking through the COPY when there is a subreg
involved (for more info, see the FIXME added in the code).
Reviewers: rnk, aprantl
Reviewed By: aprantl
Subscribers: JDevlieghere, llvm-commits
Tags: #debug-info
Differential Revision: https://reviews.llvm.org/D50788
llvm-svn: 340679
Firstly, require the symbol to be used within the module. If a
symbol is unused within a module, then by definition it cannot be
address-significant within that module. This condition is useful on all
platforms because it could make symbol tables smaller -- without this
change, emitting an address-significance table could cause otherwise
unused undefined symbols to be added to the object file.
But this change is necessary with COFF specifically in order to
preserve the property that an unreferenced undefined symbol in an IR
module does not result in a link failure. This is already the case for
ELF because ELF linkers only reject links with unresolved symbols if
there is a relocation to that symbol, but COFF linkers require all
undefined symbols to be resolved regardless of relocations. So if
a module contains an unreferenced undefined symbol, we need to make
sure not to add it to the address-significance table (and thus the
symbol table) in case it doesn't end up resolved at link time.
Secondly, do not add dllimport symbols to the table. These symbols
won't be able to be resolved because their definitions live in another
module and are accessed via the IAT, and the address-significance
table has no effect on other modules anyway. It wouldn't make sense
to add the IAT entry symbol to the address-significance table either
because the IAT entry isn't address-significant -- the generated code
never takes its address.
Differential Revision: https://reviews.llvm.org/D51199
llvm-svn: 340648
This patch will address using the xscpsgndp instruction to copy floating point
scalar registers instead of the xxlor (specifically XXLORf) instruction that is
currently used. Additionally, this patch of utilizing xscpsgndp will apply to
P9, while pre-P9 will still use xxlor.
Patch by amyk
Differential Revision: https://reviews.llvm.org/D50004
llvm-svn: 340643
Previously we allowed the store to be Custom. But without knowing for sure that the Custom handling won't split the store, we shouldn't convert a volatile store. We also probably shouldn't be creating a store the requires custom handling after LegalizeOps. This could lead to an infinite loop if the custom handling was to insert a bitcast. Though I guess isStoreBitCastBeneficial could be used to block such a loop.
The test changes here are due to the volatile part of this. The stores in the test are all volatile and i32 stores are marked custom, So we are no longer converting them
This is related to D50491 where I was trying to allow some bitcasting of volatile loads
Differential Revision: https://reviews.llvm.org/D50578
llvm-svn: 340626
Lower integer arguments smaller than i32.
Support both register and stack arguments.
Define setLocInfo function for setting LocInfo field in ArgLocs vector.
Patch by Petar Avramovic.
Differential Revision: https://reviews.llvm.org/D51031
llvm-svn: 340572
Summary:
Splats are fewer bytes than v128.consts, so use them when either could
apply.
Reviewers: aheejin, dschuff
Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits
Differential Revision: https://reviews.llvm.org/D51179
llvm-svn: 340569
The variable index pattern is different than the constant index
cases as shown in D51125. We might want to splat regardless of
whether the scalar is loaded from memory or transferred from GPR.
llvm-svn: 340565
Summary:
I got "Use not jointly dominated by defs" when removePartialRedundancy
attempted to prune then re-extend a subrange whose only liveness was a
dead def at the copy being removed.
V2: Removed junk from test. Improved comment.
V3: Addressed minor review comments.
Subscribers: MatzeB, qcolombet, nhaehnle, llvm-commits
Differential Revision: https://reviews.llvm.org/D50914
Change-Id: I6f894e9f517f71e921e0c6d81d28c5f344db8dad
llvm-svn: 340549
subtarget features for indirect calls and indirect branches.
This is in preparation for enabling *only* the call retpolines when
using speculative load hardening.
I've continued to use subtarget features for now as they continue to
seem the best fit given the lack of other retpoline like constructs so
far.
The LLVM side is pretty simple. I'd like to eventually get rid of the
old feature, but not sure what backwards compatibility issues that will
cause.
This does remove the "implies" from requesting an external thunk. This
always seemed somewhat questionable and is now clearly not desirable --
you specify a thunk the same way no matter which set of things are
getting retpolines.
I really want to keep this nicely isolated from end users and just an
LLVM implementation detail, so I've moved the `-mretpoline` flag in
Clang to no longer rely on a specific subtarget feature by that name and
instead to be directly handled. In some ways this is simpler, but in
order to preserve existing behavior I've had to add some fallback code
so that users who relied on merely passing -mretpoline-external-thunk
continue to get the same behavior. We should eventually remove this
I suspect (we have never tested that it works!) but I've not done that
in this patch.
Differential Revision: https://reviews.llvm.org/D51150
llvm-svn: 340515
Previously we asumed a vector reduction add is part of a loop and one of the input is a phi. But the code in SelectionDAGBuilder that sets vector reduction flag handles more cases than that. It just requires that the use chain ends in a horizontal reduction. And there are no other uses. This means it can handle unrolled reduction loops.
If the initial value of the reduction was 0, an unrolled loop would begin with a vector reduction add that has two sad inputs. Previously we would only transform one side of the add, but for this case we need to transform both sides.
I've created a lambda to reuse some of the code for both sides. And fixed the variables names to remove reference to "phi".
Differential Revision: https://reviews.llvm.org/D50817
llvm-svn: 340478
Summary:
This CL adds support for arbitrary BUILD_VECTORS, i.e. not splats and
not consts. This is the last feature needed to properly lower v2i64
multiplies without a i64x2.mul instruction (which is not in the spec),
so i64x2.mul is removed as well.
Reviewers: aheejin, dschuff
Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits
Differential Revision: https://reviews.llvm.org/D51082
Remove unnecessary condition and fix whitespace
llvm-svn: 340472
This solves the motivating case from:
https://bugs.llvm.org/show_bug.cgi?id=38527
If we are legalizing an FP vector op that maps to 1 of the LLVM intrinsics that mimic libm calls,
but we're going to end up with scalar libcalls for that vector type anyway, then we should unroll
the vector op into scalars before widening. This avoids libcalls because we've lost the knowledge
that some of the scalar elements are undef.
Differential Revision: https://reviews.llvm.org/D50791
llvm-svn: 340469
The inline sequence is very long (about 70 bytes on Thumb1), so it's
not really a good idea to inline it, especially when optimizing for
size.
Differential Revision: https://reviews.llvm.org/D47917
llvm-svn: 340458
Summary:
When we don't actually have stack-allocated variables but need SP only
to support EH, we don't need to write SP back in the epilog, because we
don't bump down the stack pointer.
Reviewers: dschuff
Subscribers: jgravelle-google, sbc100, sunfish, llvm-commits
Differential Revision: https://reviews.llvm.org/D51114
llvm-svn: 340454
On Windows, movw+movt pairs with relocations are handled with a single
relocation that covers them both. Therefore we can't inject anything
between these instructions, otherwise the relocation (which in LLVM
only is treated as the movw instruction's relocation, while the movt
instruction's relocation is dropped) will end up bogus.
These instructions are bundled up until right before the constant
islands pass, making this effectively the only place that can split
them apart.
Differential Revision: https://reviews.llvm.org/D51032
llvm-svn: 340451
This avoids a potential infinite loop setting and unsetting bits in the
mask.
Reduced from a failure on the polly-aosp bot.
Differential Revision: https://reviews.llvm.org/D51066
llvm-svn: 340446
Inspired by what AArch64 does for shifts, this patch attempts to replace shift amounts with neg if we can.
This is done directly as part of isel so its as late as possible to avoid breaking some BZHI patterns since those patterns need an unmasked (32-n) to be correct.
To avoid manual load folding and custom instruction selection for the negate. I've inserted new nodes in the DAG above the shift node in topological order.
Differential Revision: https://reviews.llvm.org/D48789
llvm-svn: 340441
32-bit constant address space is declared as 6, so the
maximum number of address spaces is 6, not 5.
Fixes "LLVM ERROR: Pointer address space out of range".
v5: rename MAX_COMMON_ADDRESS to MAX_AMDGPU_ADDRESS
v4: - fix compilation issues
- fix out of bounds access
v3: use static_assert()
v2: add a very simple test for 32-bit addr space
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106630
llvm-svn: 340417
Constant and global may alias, also one rules table wasn't
ordered correctly.
Pinpointed by Matt.
v2: add a test with swapped parameters
llvm-svn: 340416
Add intrinsic isel patterns for sxtb16, sxtab16, uxtb16 and uxtab16
so that they can perform a ror.
Differential Revision: https://reviews.llvm.org/D51034
llvm-svn: 340405
This adds the plumbing for the Tiny code model for the AArch64 backend. This,
instead of loading addresses through the normal ADRP;ADD pair used in the Small
model, uses a single ADR. The 21 bit range of an ADR means that the code and
its statically defined symbols need to be within 1MB of each other.
This makes it mostly interesting for embedded applications where we want to fit
as much as we can in as small a space as possible.
Differential Revision: https://reviews.llvm.org/D49673
llvm-svn: 340397
This was hackily adding in the 4-bytes reserved for the callee's
emergency stack slot. Treat it like a normal stack allocation
so we get the correct alignment padding behavior. This fixes
an inconsistency between the caller and callee.
llvm-svn: 340396
Add patterns for unhandled CondCode enumerables:
SETEQ, SETGE, SETGT, SETLE, SETLT, SETNE.
Stated at the ISD::CondCode enum declaration:
`All of these (except for the 'always folded ops')
should be handled for floating point.`
Add patterns which use these nodes, same as corresponding
'ordered' CondCode nodes.
Referring to 'Ordered means that neither operand is a QNAN'
we assume it is safe to match ex. SETLT node to the same
instruction as SETOLT.
Differential Revision: https://reviews.llvm.org/D50757
llvm-svn: 340392
Summary:
After the stack is unwound due to a thrown exception, the
`__stack_pointer` global can point to an invalid address. This inserts
instructions that restore `__stack_pointer` global.
Reviewers: jgravelle-google, dschuff
Subscribers: mgorny, sbc100, sunfish, llvm-commits
Differential Revision: https://reviews.llvm.org/D50980
llvm-svn: 340339
Summary:
This CL implements v128.const for each vector type. New operand types
are added to ensure the vector contents can be serialized without LEB
encoding. Tests are added for instruction selection, encoding,
assembly and disassembly.
Reviewers: aheejin, dschuff, aardappel
Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits
Differential Revision: https://reviews.llvm.org/D50873
llvm-svn: 340336
Summary:
Catchpads and cleanuppads are not funclet entries; they are only EH
scope entries. We already dont't set `isEHFuncletEntry` for catchpads.
This patch does the same thing for cleanuppads.
Reviewers: dschuff
Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits
Differential Revision: https://reviews.llvm.org/D50654
llvm-svn: 340330
Summary:
When RegisterCoalescer::reMaterializeTrivialDef is substituting
a register use in a DBG_VALUE instruction, and the old register
is a subreg, and the new register is a physical register,
then we need to use substPhysReg in order to extract the correct
subreg.
Reviewers: wmi, aprantl
Reviewed By: wmi
Subscribers: hiraditya, MatzeB, qcolombet, tpr, llvm-commits
Differential Revision: https://reviews.llvm.org/D50844
llvm-svn: 340326
Summary:
So far, `isReturn` property is used to mean both a return instruction
from a functon and the end of an EH scope, a scope that starts with a EH
scope entry BB and ends with a catchret or a cleanupret instruction.
Because WinEH uses funclets, all EH-scope-ending instructions are also
real return instruction from a function. But for wasm, they only serve
as the end marker of an EH scope but not a return instruction that
exits a function. This mismatch caused incorrect prolog and epilog
generation in wasm EH scopes. This patch fixes this.
This patch is in the same vein with rL333045, which splits
`MachineBasicBlock::isEHFuncletEntry` into `isEHFuncletEntry` and
`isEHScopeEntry`.
Reviewers: dschuff
Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits
Differential Revision: https://reviews.llvm.org/D50653
llvm-svn: 340325
In removeCopyByCommutingDef, segments from the source live range are
copied into (and merged with) the segments of the target live range.
This is performed for all subranges of the source interval. It can
happen that there will be subranges of the target interval that had
no corresponding subranges in the source interval, and in such cases
these subrages will not be updated. Since the copy being coalesced
is about to be removed, these ranges need to be updated by removing
the segments that are started by the copy.
llvm-svn: 340318
The test demonstrates over-complicated codegen for a udiv that only has one divisor that doesn't equal 1. This should have allowed the codegen to be a lot simpler (uniform shifts etc.) but only the SSE2 manages to make use of this......
llvm-svn: 340313
DAGCombiner doesn't pay attention to whether constants are opaque before doing the div by constant optimization. So BypassSlowDivision shouldn't introduce control flow that would make DAGCombiner unable to see an opaque constant. This can occur when a div and rem of the same constant are used in the same basic block. it will be hoisted, but not leave the block.
Longer term we probably need to look into the X86 immediate cost model used by constant hoisting and maybe not mark div/rem immediates for hoisting at all.
This fixes the case from PR38649.
Differential Revision: https://reviews.llvm.org/D51000
llvm-svn: 340303
Summary:
Previously the new llvm.amdgcn.raw/struct.buffer.load/store intrinsics
only allowed float types for the data to be loaded or stored, which
sometimes meant the frontend needed to generate a bitcast. In this, the
new intrinsics copied the old buffer intrinsics.
This commit extends the new intrinsics to allow int types as well.
Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, t-tye, llvm-commits
Differential Revision: https://reviews.llvm.org/D50315
Change-Id: I8202af2d036455553681dcbb3d7d32ae273f8f85
llvm-svn: 340270
Summary:
This commit adds new intrinsics
llvm.amdgcn.raw.buffer.load
llvm.amdgcn.raw.buffer.load.format
llvm.amdgcn.raw.buffer.load.format.d16
llvm.amdgcn.struct.buffer.load
llvm.amdgcn.struct.buffer.load.format
llvm.amdgcn.struct.buffer.load.format.d16
llvm.amdgcn.raw.buffer.store
llvm.amdgcn.raw.buffer.store.format
llvm.amdgcn.raw.buffer.store.format.d16
llvm.amdgcn.struct.buffer.store
llvm.amdgcn.struct.buffer.store.format
llvm.amdgcn.struct.buffer.store.format.d16
llvm.amdgcn.raw.buffer.atomic.*
llvm.amdgcn.struct.buffer.atomic.*
with the following changes from the llvm.amdgcn.buffer.*
intrinsics:
* there are separate raw and struct versions: raw does not have an
index arg and sets idxen=0 in the instruction, and struct always sets
idxen=1 in the instruction even if the index is 0, to allow for the
fact that gfx9 does bounds checking differently depending on whether
idxen is set;
* there is a combined cachepolicy arg (glc+slc)
* there are now only two offset args: one for the offset that is
included in bounds checking and swizzling, to be split between the
instruction's voffset and immoffset fields, and one for the offset
that is excluded from bounds checking and swizzling, to go into the
instruction's soffset field.
The AMDISD::BUFFER_* SD nodes always have an index operand, all three
offset operands, combined cachepolicy operand, and an extra idxen
operand.
The obsolescent llvm.amdgcn.buffer.* intrinsics continue to work.
Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, t-tye, jfb, llvm-commits
Differential Revision: https://reviews.llvm.org/D50306
Change-Id: If897ea7dc34fcbf4d5496e98cc99a934f62fc205
llvm-svn: 340269
Summary:
This commit adds new intrinsics
llvm.amdgcn.raw.tbuffer.load
llvm.amdgcn.struct.tbuffer.load
llvm.amdgcn.raw.tbuffer.store
llvm.amdgcn.struct.tbuffer.store
with the following changes from the llvm.amdgcn.tbuffer.* intrinsics:
* there are separate raw and struct versions: raw does not have an index
arg and sets idxen=0 in the instruction, and struct always sets
idxen=1 in the instruction even if the index is 0, to allow for the
fact that gfx9 does bounds checking differently depending on whether
idxen is set;
* there is a combined format arg (dfmt+nfmt)
* there is a combined cachepolicy arg (glc+slc)
* there are now only two offset args: one for the offset that is
included in bounds checking and swizzling, to be split between the
instruction's voffset and immoffset fields, and one for the offset
that is excluded from bounds checking and swizzling, to go into the
instruction's soffset field.
The AMDISD::TBUFFER_* SD nodes always have an index operand, all three
offset operands, combined format operand, combined cachepolicy operand,
and an extra idxen operand.
The tbuffer pseudo- and real instructions now also have a combined
format operand.
The obsolescent llvm.amdgcn.tbuffer.* and llvm.SI.tbuffer.store
intrinsics continue to work.
V2: Separate raw and struct intrinsics.
V3: Moved extract_glc and extract_slc defs to a more sensible place.
V4: Rebased on D49995.
V5: Only two separate offset args instead of three.
V6: Pseudo- and real instructions have joint format operand.
V7: Restored optionality of dfmt and nfmt in assembler.
V8: Addressed minor review comments.
Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, t-tye, llvm-commits
Differential Revision: https://reviews.llvm.org/D49026
Change-Id: If22ad77e349fac3a5d2f72dda53c010377d470d4
llvm-svn: 340268
Summary:
Previously a BUNDLE instruction inherited the DebugLoc from the
first instruction in the bundle, even if that DebugLoc had no
DILocation. With this commit this is changed into selecting the
first DebugLoc that has a DILocation, by searching among the
bundled instructions.
The idea is to reduce amount of bundles that are lacking
debug locations.
Reviewers: #debug-info, JDevlieghere
Reviewed By: JDevlieghere
Subscribers: JDevlieghere, mattd, llvm-commits
Differential Revision: https://reviews.llvm.org/D50639
llvm-svn: 340267
During combining, ReduceLoadWdith is used to combine AND nodes that
mask loads into narrow loads. This patch allows the mask to be a
shifted constant. This results in a narrow load which is then left
shifted to compensate for the new offset.
Differential Revision: https://reviews.llvm.org/D50432
llvm-svn: 340261
This reduces most of the sdiv stages (the MULHS, shifts etc.) to just zero/identity values and use the numerator scale factor to multiply by +1/-1.
llvm-svn: 340260
Summary:
RegisterCoalescer::reMaterializeTrivialDef used to assert that
the input register was live in. But as shown by the new
coalesce-dead-lanes.mir test case that seems to be a valid
scenario. We now return false instead of the assert, simply
avoiding to remat the dead def.
Normally a COPY of an undef value is eliminated by
eliminateUndefCopy(). Although we only do that when the
destination isn't a physical register. So the situation
above should be limited to the case when we copy an undef
value to a physical register.
Reviewers: kparzysz, wmi, tpr
Reviewed By: kparzysz
Subscribers: MatzeB, qcolombet, tpr, llvm-commits
Differential Revision: https://reviews.llvm.org/D50842
llvm-svn: 340255
Summary:
We decided to revert this from i64 to i32 in Nov 28 CG meeting. Fixes
PR38632.
Reviewers: dschuff
Subscribers: sbc100, jgravelle-google, sunfish, jfb, llvm-commits
Differential Revision: https://reviews.llvm.org/D51010
llvm-svn: 340234
Due to some splat handling code in getVectorShuffle, its possible for NewV1/NewV2 to have their mask modified from what is requested. This can lead to cycles being created in the DAG.
This patch examines the returned mask and makes sure its different. Long term we may need to look closer at that splat code in getVectorShuffle, or add more splat awareness to getVectorShuffle.
Fixes PR38639
Differential Revision: https://reviews.llvm.org/D50981
llvm-svn: 340214
We can safely avoid interfering with the subus combine if both inputs are freely truncatable. Either both extends, or an extend and a constant vector.
Differential Revision: https://reviews.llvm.org/D50878
llvm-svn: 340212
1. Change the software pipeliner to use unknown size instead of dropping
memory operands. It used to do it before, but MachineInstr::mayAlias
did not handle it correctly.
2. Recognize UnknownSize in MachineInstr::mayAlias.
3. Print and parse UnknownSize in MIR.
Differential Revision: https://reviews.llvm.org/D50339
llvm-svn: 340208
This reverts commit 7debc334e6421bb5251ef8f18e97166dfc7dd787.
I missed updating legalizer-info-validation.mir as I had assertions
turned off in my build and that specific test requires asserts. Fixed it
now.
llvm-svn: 340197
32-bit constant address space is declared as 6, so the
maximum number of address spaces is 6, not 5.
Fixes "LLVM ERROR: Pointer address space out of range".
v3: use static_assert()
v2: add a very simple test for 32-bit addr space
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106630
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
llvm-svn: 340171
Only adds support to the existing 'large element' scalar/vector to 'small element' vector bitcasts.
Handle the case where the sign bit extends to only part of the small elements.
llvm-svn: 340169
If the arch is P8, we will select XFLOAD to load the floating point, and then, expand it to vsx and non-vsx X-form instruction post RA. This patch is trying to convert the X-form to D-form if it meets the requirement that one operand of the x-form inst is the special Zero register, and another operand fed by add inst. i.e.
y = add imm, reg
LFDX. 0, y
-->
LFD imm(reg)
Reviewers: Nemanjai
Differential Revision: https://reviews.llvm.org/D49007
llvm-svn: 340149
Only adds support to the existing 'large element' scalar/vector to 'small element' vector bitcasts.
The next step would be to support cases where the large elements aren't all sign bits, and determine the small element equivalent based on the demanded elements.
llvm-svn: 340143
We were basically assuming only one operand of the compare could be an ADD node and using that to swap operands. But we can have a normal add followed by a saturing add.
This rewrites the canonicalization to just be based on the condition code.
llvm-svn: 340134
The code already support 128 and 256 and even knows to split 256 for AVX1. So we really just needed to stop looking for specific VTs and subtarget features and just look for legal VTs with i8/i16 elements.
While there, add some curly braces around outer if statement bodies that contain only another if. It makes all the closing curly braces look more regular.
llvm-svn: 340128
Summary:
I believe this restores the behavior we had before r339147.
Fixes PR38622.
Reviewers: RKSimon, chandlerc, spatel
Reviewed By: chandlerc
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D50936
llvm-svn: 340120
Extending the concept introduced in D49562, this patch lowers constant vXi8 ISD::SRL/ISD::SRA by zero/sign extending to vXi16 and using PMULLW and then truncating the high 8 bits of the result.
Differential Revision: https://reviews.llvm.org/D50781
llvm-svn: 340062
This patch addresses:
- Implementation within PPCISelLowering.cpp to check if we should use direct
load into vector instructions (such as lxsd/lfd ) when the scalar_to_vector
function is used; which will allow us to catch as many cases of the
scalar_to_vector uses as possible to translate the ld->mtvsrd sequence into
lxsd.
- Test cases to exhibit the behaviour of emitting lxsd/lfd.
Patch by amyk
Differential revision: https://reviews.llvm.org/D49698
llvm-svn: 340037
* -march=x86-64 -> -mtriple=x86_64-unknown-linux to avoid _ prefixes to
symbols
* add -start-before to avoid running the whole codegen on the IR. I
assumed it is meant to be running after X86SpeculativeLoadHardening.
llvm-svn: 340034
test/CodeGen/X86/shadow-stack.ll has the following machine verifier
errors:
```
*** Bad machine code: Using a killed virtual register ***
- function: bar
- basic block: %bb.6 entry (0x7fdc81857818)
- instruction: %3:gr64 = MOV64rm killed %2:gr64, 1, $noreg, 8, $noreg
- operand 1: killed %2:gr64
*** Bad machine code: Using a killed virtual register ***
- function: bar
- basic block: %bb.6 entry (0x7fdc81857818)
- instruction: $rsp = MOV64rm killed %2:gr64, 1, $noreg, 16, $noreg
- operand 1: killed %2:gr64
*** Bad machine code: Virtual register killed in block, but needed live out. ***
- function: bar
- basic block: %bb.2 entry (0x7fdc818574f8)
Virtual register %2 is used after the block.
```
The fix here is to only copy the machine operand's register without the
kill flags for all the instructions except the very last one of the
sequence.
I had to insert dummy PHIs in the test case to force the NoPHI function
property to be set to false. More on this here: https://llvm.org/PR38439
Differential Revision: https://reviews.llvm.org/D50260
llvm-svn: 340033