A recent change introduced this assertion failure in some corner cases.
Repro:
mkdir /s/wasm/torture-out ; time /s/wasm/waterfall/src/compile_torture_tests.py --c /s/llvm/out/bin/clang --cxx /s/llvm/out/bin/clang++ --testsuite /s/gcc/gcc/testsuite --fails /s/llvm/llvm/lib/Target/WebAssembly/known_gcc_test_failures.txt --out /s/wasm/torture-out
Or look on the wasm integration bot:
https://build.chromium.org/p/client.wasm.llvm/console
llvm-svn: 257733
We rely on HasOpaqueSPAdjustment not changing after we've calculated
things based on it. Things like whether or not we can use 'rep;movs' to
copy bytes around, that sort of thing. If it changes, invariants in the
backend will quietly break. This situation arose when we had a call to
memcpy *and* a COPY of the FLAGS register where we would attempt to
reference local variables using %esi, a register that was clobbered by
the 'rep;movs'.
This fixes PR26124.
llvm-svn: 257730
Summary:
v2: Make ReturnsVoid private, so that I can another 8 lines of code and
look more productive.
Reviewers: tstellarAMD, arsenm
Subscribers: arsenm
Differential Revision: http://reviews.llvm.org/D16034
llvm-svn: 257622
Summary:
Return values can be stored in SGPRs (i32) and VGPRs (f32).
This will be used by functions which expect some bytecode or other binary to
be appended at the end. It allows defining in which registers the return
values will be stored.
v2: don't do this for compute shaders
Reviewers: tstellarAMD, arsenm
Subscribers: arsenm
Differential Revision: http://reviews.llvm.org/D16033
llvm-svn: 257621
Summary:
It is off by default, but can be used
with --misched=si
Patch by: Axel Davy
Reviewers: arsenm, tstellarAMD, nhaehnle
Subscribers: nhaehnle, solenskiner, arsenm, llvm-commits
Differential Revision: http://reviews.llvm.org/D11885
llvm-svn: 257609
The global entry point prologue currently assumes that the TOC
associated with a function is less than 2GB away from the function
entry point. This is always true when using the medium or small
code model, but may not be the case when using the large code model.
This patch adds a new variant of the ELFv2 global entry point prologue
that lifts the 2GB restriction when building with -mcmodel=large.
This works by emitting a quadword containing the distance from the
function entry point to its associated TOC immediately before the
entry point, and then using a prologue like:
ld r2,-8(r12)
add r2,r2,r12
Since creation of the entry point prologue is now split across two
separate routines (PPCLinuxAsmPrinter::EmitFunctionEntryLabel emits
the data word, PPCLinuxAsmPrinter::EmitFunctionBodyStart the prolog
code), I've switched to using named labels instead of just temporaries
to indicate the locations of the global and local entry points and the
new TOC offset data word.
These names are provided by new routines in PPCFunctionInfo modeled
after the existing PPCFunctionInfo::getPICOffsetSymbol.
Note that a corresponding change was committed to GCC here:
https://gcc.gnu.org/ml/gcc-patches/2015-12/msg00355.html
Reviewers: hfinkel
Differential Revision: http://reviews.llvm.org/D15500
llvm-svn: 257597
Summary:
With the ability to concatenate shader binaries, the limit of 15 no longer
applies.
Reviewers: tstellarAMD, arsenm
Subscribers: arsenm
Differential Revision: http://reviews.llvm.org/D16031
llvm-svn: 257592
Summary:
This allows Mesa to pass initial SPI_PS_INPUT_ADDR to LLVM.
The register assigns VGPR locations to PS inputs, while the ENA register
determines whether or not they are loaded.
Mesa needs to set some inputs as not-movable, so that a pixel shader prolog
binary appended at the beginning can assume where some inputs are.
v2: Make PSInputAddr private, because there is never enough silly getters
and setters for people to read.
Reviewers: tstellarAMD, arsenm
Subscribers: arsenm
Differential Revision: http://reviews.llvm.org/D16030
llvm-svn: 257591
Summary: ret.ll will contain a test for this
Reviewers: tstellarAMD, arsenm
Subscribers: arsenm
Differential Revision: http://reviews.llvm.org/D16029
llvm-svn: 257590
Make x86 OptimizeLEAs pass remove LEA instruction if there is another LEA
(in the same basic block) which calculates address differing only be a
displacement. Works only for -Oz.
Differential Revision: http://reviews.llvm.org/D13295
llvm-svn: 257589
AnalyzeBranch on X86 (and, previously, SPARC, which implementation was
copied from X86) tries to modify the branches based on block
layout (e.g. checking isLayoutSuccessor), when AllowModify is true.
The rest of the architectures leave that up to the caller, which can
call InsertBranch, RemoveBranch, and ReverseBranchCondition as
appropriate. That appears to be the preferred way to do it nowadays.
This commit makes SPARC like the rest: replaces AnalyzeBranch with an
implementation cribbed from AArch64, and adds a ReverseBranchCondition
implementation.
Additionally, a test-case has been added (also cribbed from AArch64)
demonstrating that redundant branch sequences no longer get emitted.
E.g., it used to emit code like this:
bne .LBB1_2
nop
ba .LBB1_1
nop
.LBB1_2:
And now emits:
cmp %i0, 42
be .LBB1_1
nop
llvm-svn: 257572
Summary:
BFC instructions are available in ARMv6T2 and above.
Reviewers: t.p.northover
Subscribers: aemerson
Differential Revision: http://reviews.llvm.org/D16076
llvm-svn: 257546
VMOVs are not strictly speaking cheap, but they are as expensive as a vector
copy (VORR), so we should prefer rematerialization over splitting when it
applies.
rdar://problem/23754176
llvm-svn: 257545
Only non-weighted predicates were handled in PPCInstrInfo::insertSelect. Handle
the weighted predicates as well.
This latent bug was triggered by r255398, because it added use of the
branch-weighted predicates.
While here, switch over an enum instead of an int to get the compiler to enforce
totality in the future.
llvm-svn: 257518
A request has been made to the official registry, but an official value is
not yet available. This patch uses a temporary value in order to support
development. When an official value is recieved, the value of EM_WEBASSEMBLY
will be updated.
llvm-svn: 257517
Refactor .param, .result, .local, and .endfunc, as directives, using the
proper MCTargetStreamer mechanism, rather than fake instructions.
llvm-svn: 257511
This patch changes the way labels are referenced. Instead of referencing the
basic-block label name (eg. .LBB0_0), instructions now just have an immediate
which indicates the depth in the control-flow stack to find a label to jump to.
This makes them much closer to what we expect to have in the binary encoding,
and avoids the problem of basic-block label names not being explicit in the
binary encoding.
Also, it terminates blocks and loops with end_block and end_loop instructions,
rather than basic-block label names, for similar reasons.
This will also fix problems where two constructs appear to have the same label,
because we no longer explicitly use labels, so consumers that need labels will
presumably create their own labels, and presumably they won't reuse labels
when they do.
This patch does make the code a little more awkward to read; as a partial
mitigation, this patch also introduces comments showing where the labels are,
and comments on each branch showing where it's branching to.
llvm-svn: 257505
This is a very limited implementation of DFG-based copy propagation.
It only handles actual COPY instructions (does not handle other equivalents
such as add-immediate with a 0 operand).
The major limitation is that it does not update the DFG: that will be the
change required to make it more robust (hopefully coming up soon).
llvm-svn: 257490
Summary: The result register is the second operand as per the other mt* instructions.
Reviewers: vkalintiris
Subscribers: llvm-commits, dsanders
Differential Revision: http://reviews.llvm.org/D15993
llvm-svn: 257478
Target independent, SSA-based data flow framework for representing
data flow between physical registers.
This commit implements the creation of the actual data flow graph.
llvm-svn: 257477
Summary:
This fixes three bugs, in all of which state is not or incorrecly reset between
objects (i.e. when reusing the same pass manager to create multiple object
files):
1) AttributeSection needs to be reset to nullptr, because otherwise the backend
will try to emit into the old object file's attribute section causing a
segmentation fault.
2) MappingSymbolCounter needs to be reset, otherwise the second object file
will start where the first one left off.
3) The MCStreamer base class resets the Streamer's e_flags settings. Since
EF_ARM_EABI_VER5 is set on streamer creation, we need to set it again
after the MCStreamer was rest.
Also rename Reset (uppser case) to EHReset to avoid confusion with
reset (lower case).
Reviewers: rengolin
Differential Revision: http://reviews.llvm.org/D15950
llvm-svn: 257473
(64 to 128-bit) matches against the pattern fragment 'vzmovl_v2i64'
(a zero-extended 64-bit load).
However, a change in r248784 teaches the instruction combiner that only
the lower 64 bits of the input to a 128-bit vcvtph2ps are used. This means
the instruction combiner will ordinarily optimize away the upper 64-bit
insertelement instruction in the zero-extension and so we no longer select
the memory-register form. To fix this a new pattern has been added.
Differential Revision: http://reviews.llvm.org/D16067
llvm-svn: 257470
Currently WebAssembly has two kinds of relocations; data addresses and
function addresses. This adds ELF relocations for them, as well as an
MC symbol kind to indicate which type of relocation is needed.
llvm-svn: 257416
Always expect tglobaladdr and texternalsym to be wrapped in
WebAssemblywrapper nodes. Also, split out a regPlusGA from regPlusImm so
that it can special-case global addresses, as they can be folded in more
cases.
Unfortunately this doesn't enable any new optimizations yet due to
SelectionDAG limitations. I'll be submitting changes to the SelectionDAG
infrastructure, along with tests, in a separate patch.
llvm-svn: 257394
The hardware instruction's output on 0 is -1 rather than 32.
Eliminate a test and select to -1. This removes an extra instruction
from the compatability function with HSAIL's firstbit instruction.
llvm-svn: 257352
Summary:
It actually takes an offset into the current PC-region.
This fixes the 'expr' command in lldb.
Reviewers: vkalintiris, jaydeep, bhushan
Subscribers: dsanders, llvm-commits
Differential Revision: http://reviews.llvm.org/D16054
llvm-svn: 257339
In the OptimizeLEA pass keep instructions' positions in the basic block saved and use them for calculation of the distance between two instructions instead of std::distance. This reduces complexity of the pass from O(n^3) to O(n^2) and thus the compile time.
Differential Revision: http://reviews.llvm.org/D15692
llvm-svn: 257328
AVX1 v8i32/v4i64 shuffles are bitcasted to v8f32/v4f64, this patch peeks through any bitcast to check for a load node to allow broadcasts to occur.
This is a re-commit of r257055 after r257264 fixed 32-bit broadcast loads of i64 scalars.
llvm-svn: 257266
Summary:
r255334 matches bit-reverse pattern in InstCombine and generates calls to Instrinsic::bitreverse.
RBIT instruction is only available for ARMv6t2 and above. This patch has the intrinsic expanded during legalization for ARMv4 and ARMv5.
Patch by Z. Zheng <zhaoshiz@codeaurora.org>
Reviewers: apazos, jmolloy, weimingz
Subscribers: aemerson, rengolin, llvm-commits
Differential Revision: http://reviews.llvm.org/D15932
llvm-svn: 257188
Summary: In ARMConstantIslandPass, which runs after Shrink Wrap pass, long jumps will be fixed up as BL (tBfar) which depends on spilling LR in epilogue. However, shrink-wrap may remove the LR, which causes issues when the function returns.
Reviewers: qcolombet, rengolin
Subscribers: aemerson, rengolin
Differential Revision: http://reviews.llvm.org/D15984
llvm-svn: 257187
This patch corresponds to review:
http://reviews.llvm.org/D15930
Moves to and from CR fields depend on shifts/masks that depend on the
target/source CR field. Thus, post-ra anti-dep breaking must not later
change that CR register assignment.
llvm-svn: 257168
This is a fix for bug http://llvm.org/bugs/show_bug.cgi?id=25839.
For a PIC TLS variable access in a function, prologue (mflr followed by std and
stdu) gets scheduled after a tls_get_addr call. tls_get_addr messed up LR but
no one saves/restores it.
Also added a test for save/restore clobbered registers during calling __tls_get_addr.
Patch by Tim Shen
llvm-svn: 257137
.zero is confusing when used with two arguments. Documentation:
This directive emits SIZE 0-valued bytes. SIZE must be an absolute
expression. This directive is actually an alias for the '.skip'
directive so in can take an optional second argument of the value to
store in the bytes instead of zero. Using '.zero' in this way would be
confusing however.
Ref: https://sourceware.org/bugzilla/show_bug.cgi?id=18353
Hexagon and Sparc do the same, and it's all the same to WebAssembly so
let's pick the less confusing of the two.
llvm-svn: 257111
Previously we only supported putting the FI into memory operand offset
fields if there was nothing there already. Now combine them.
Differential Revision: http://reviews.llvm.org/D15941
llvm-svn: 257084
The MC assembler doesn't like using the empty string as a private label
prefix because then it treats all labels as private. This commit reverts
back to the default prefix, which is .L, which is common in ELF targets
and consistent with the LLVM name mangler.
llvm-svn: 257083
Summary:
Multi-dword constant loads generated unnecessary moves from SGPRs into VGPRs,
increasing the code size and VGPR pressure. These moves are now folded away.
Note that this lack of operand folding was not a problem for VMEM loads,
because COPY nodes from VReg_Nnn to VGPR32 are eliminated by the register
coalescer.
Some tests are updated, note that the fsub.ll test explicitly checks that
the move is elided.
With the IR generated by current Mesa, the changes are obviously relatively
minor:
7063 shaders in 3531 tests
Totals:
SGPRS: 351872 -> 352560 (0.20 %)
VGPRS: 199984 -> 200732 (0.37 %)
Code Size: 9876968 -> 9881112 (0.04 %) bytes
LDS: 91 -> 91 (0.00 %) blocks
Scratch: 1779712 -> 1767424 (-0.69 %) bytes per wave
Wait states: 295164 -> 295337 (0.06 %)
Totals from affected shaders:
SGPRS: 65784 -> 66472 (1.05 %)
VGPRS: 38064 -> 38812 (1.97 %)
Code Size: 1993828 -> 1997972 (0.21 %) bytes
LDS: 42 -> 42 (0.00 %) blocks
Scratch: 795648 -> 783360 (-1.54 %) bytes per wave
Wait states: 54026 -> 54199 (0.32 %)
Reviewers: tstellarAMD, arsenm, mareko
Subscribers: arsenm, llvm-commits
Differential Revision: http://reviews.llvm.org/D15875
llvm-svn: 257074
Summary:
Somehow, I first interpreted the docs as saying space for xnack_mask is only
reserved when XNACK is enabled via SH_MEM_CONFIG. I felt uneasy about this and
went back to actually test what is happening, and it turns out that xnack_mask
is always reserved at least on Tonga and Carrizo, in the sense that flat_scr
is always fixed below the SGPRs that are used to implement xnack_mask, whether
or not they are actually used.
I confirmed this by writing a shader using inline assembly to tease out the
aliasing between flat_scratch and regular SGPRs. For example, on Tonga, where
we fix the number of SGPRs to 80, s[74:75] aliases flat_scratch (so
xnack_mask is s[76:77] and vcc is s[78:79]).
This patch changes both the calculation of the total number of SGPRs and the
various register reservations to account for this.
It ought to be possible to use the gap left by xnack_mask when the feature
isn't used, but this patch doesn't try to do that. (Note that the same applies
to vcc.)
Note that previously, even before my earlier change in r256794, the SGPRs that
alias to xnack_mask could end up being used as well when flat_scr was unused
and the total number of SGPRs happened to fall on the right alignment
(e.g. highest regular SGPR being used s29 and VCC used would lead to number
of SGPRs being 32, where s28 and s29 alias with xnack_mask). So if there
were some conflict due to such aliasing, we should have noticed that already.
Reviewers: arsenm, tstellarAMD
Subscribers: arsenm, llvm-commits
Differential Revision: http://reviews.llvm.org/D15898
llvm-svn: 257073
AVX1 v8i32/v4i64 shuffles are bitcasted to v8f32/v4f64, this patch peeks through bitcasts to check for a load node to allow broadcasts to occur.
Follow up to D15310
llvm-svn: 257055
Follow up to D15378, added INSERTPS to the list of decodable target shuffles and enabled XFormVExtractWithShuffleIntoLoad to handle target shuffles with SentinelZero and tested this with INSERTPS.
llvm-svn: 257046
Darwin TLS accesses most closely resemble ELF's general-dynamic situation,
since they have to be able to handle all possible situations. The descriptors
and so on are obviously slightly different though.
llvm-svn: 257039
Serialize will perform a hardware serialization operation, and is
acting as a memory barrier. Therefore it must have the hasSideEffects
flag set so it will be treated as a global memory object.
Reviewed by Ulrich Weigand
llvm-svn: 257036
getTargetShuffleMask may return shuffle masks with SM_SentinelZero (-2) values (currently just for PSHUFB but VPERM2X128 as well with this patch). Although some calling functions can make use of this (mainly for shuffle combining), others can not and their inclusion makes shuffle mask comparisons more difficult.
This patch adds a flag to getTargetShuffleMask to indicate if the calling function can't handle SM_SentinelZero; getTargetShuffleMask will then return false if it occurs to make handling much easier.
I've tidied up some uses of getTargetShuffleMask to better indicate what is going on - more could be done but at present I don't have test cases to demonstrate it.
Some upcoming patches will make use of this to both support more uses where SM_SentinelZero is not permitted (e.g. combineShuffleToAddSub), and also will allow us to add INSERTPS support to getTargetShuffleMask as part of better zero handling discussed in D14261.
Differential Revision: http://reviews.llvm.org/D15378
llvm-svn: 256992
Summary:
This is admittedly something that you could only run into by manually
playing around with shader assembly because the SITypeWriter pass is
skipped for compute.
Reviewers: arsenm, tstellarAMD
Subscribers: arsenm, llvm-commits
Differential Revision: http://reviews.llvm.org/D15902
llvm-svn: 256980
The first instruction in a block is what the rend() iterator points to, so
if it moves, we need to re-evaluate rend() so that we continue to iterate
through the rest of the instructions.
llvm-svn: 256953
As discussed on D15378, move the mask.empty() tests to after the switch statement and consider any shuffle decode where the extracted target shuffle mask is empty as a failure.
llvm-svn: 256921
In the discussion on http://reviews.llvm.org/D15730, Andy pointed out we had a utility function for merging MMO lists. Since it turned we actually had two copies and there's another review in progress (http://reviews.llvm.org/D15230) which needs the same, extract it into a utility function and clean up the interfaces to make it easier to use with a MachineInstBuilder.
I introduced a pair here to track size and allocation together. I think we should probably move in the direction of the MachineOperandsRef helper class, but I'm leaving that for further work. I want to get the poison state introduced before I make major changes to the interface.
Differential Revision: http://reviews.llvm.org/D15757
llvm-svn: 256909
Due to the SGPR init bug, every program claims to use the same number
of SGPRs anyway, so there's no point in trying to shift those registers
down from their initial spot of reservation.
Add a test that uses VGPR spilling and blocks most SGPRs from being used for
the scratch resource register. Previously, this would run into an assertion.
Differential Revision: http://reviews.llvm.org/D15724
llvm-svn: 256870
We queried hasFP before we hit ExpandISelPseudos. ExpandISelPseudos
manipulated state that hasFP relied on, potentially changing the result
after it has been queried elsewhere.
While I am not aware of any particular bug due to this state of affairs,
it seems best to avoid it entirely by changing the state during DAG
construction.
llvm-svn: 256849
PBLEND/BLENDPD/BLENDPS are no different to the other target shuffles and this will make future improvements to the target shuffle combines more straightforward.
llvm-svn: 256819
The red zone consists of 128 bytes beyond the stack pointer so that the
allocation of objects in leaf functions doesn't require decrementing
rsp. In r255656, we introduced an optimization that would cheaply
materialize certain constants via push/pop. Push decrements the stack
pointer and stores it's result at what is now the top of the stack.
However, this means that using push/pop would encroach on the red zone.
PR26023 gives an example where this corrupts an object in the red zone.
llvm-svn: 256808
Summary:
We had to sets of identical FLAT patterns one inside the
HasFlatAddressSpace predicate and one inside the useFlatForGloabl
predicate. This patch merges these sets into a single pattern
under the isCIVI predicate.
The reason we can remove the predicates is that when MUBUF instructions
are legal, the instruction selector will prefer selecting those over
FLAT instructions because MUBUF patterns have a higher complexity score.
So, in this case having patterns for FLAT instructions will have no effect.
This change also simplifies the process for forcing global address space
loads to use FLAT instructions, since we no only have to disable the
MUBUF patterns instead of having to disable the MUBUF patterns and
enable the FLAT patterns.
Reviewers: arsenm, cfang
Subscribers: llvm-commits
llvm-svn: 256807
Unfortunately this fix had the effect of exposing the
-verify-machineinstrs FIXME of X86InstrInfo.cpp in two testcases for
which I disabled it for now.
Two testcases also have additional pushq/popq where the corrected code
cannot prove that %rax is dead any longer. Looking at the examples, this
could potentially be fixed by improving computeRegisterLiveness() to check
the live-in lists of the successors blocks when reaching the end of a
block.
This fixes http://llvm.org/PR25951.
llvm-svn: 256799
Summary:
Enabling this feature will account for the two SGPRs used by the hardware
to store the XNACK_MASK physically.
The hardware only requires this reservation when the XNACK feature is
explicitly enabled. At some point, HSA will probably want to do that, but
it does increase SGPR register pressure, so leave it disabled by default
for now (but do add a small test).
Reviewers: arsenm, tstellarAMD
Subscribers: arsenm, llvm-commits
Differential Revision: http://reviews.llvm.org/D15869
llvm-svn: 256794
Summary: This was accidently moved to CIInstructions.td in r256282
Reviewers: cfang, arsenm
Subscribers: arsenm, llvm-commits
Differential Revision: http://reviews.llvm.org/D15763
llvm-svn: 256775
Summary:
The comment explains it: emitError does not necessarily exit the compilation
process, and then using NoRegister leads to assertions later on.
This generates incorrect code, of course, but the user should know to not use
the result when an error has been emitted.
It would be nice to have a test-case for this inside the LLVM repository,
but llc exits on error. shader-db tests trigger the underlying issue at least
on Tonga.
Reviewers: arsenm, tstellarAMD, mareko
Subscribers: arsenm, llvm-commits
Differential Revision: http://reviews.llvm.org/D15826
llvm-svn: 256757
We need a frame pointer if there is a push/pop sequence after the
prologue in order to unwind the stack. Scanning the instructions to
figure out if this happened made hasFP not constant-time which is a
violation of expectations. Let's compute this up-front and reuse that
computation when we need it.
llvm-svn: 256730
Summary:
There are a number of files in the tree which have been accidentally checked in with DOS line endings. Convert these to native line endings.
There are also a few files which have DOS line endings on purpose, and I have set the svn:eol-style property to 'CRLF' on those.
Reviewers: joerg, aaron.ballman
Subscribers: aaron.ballman, sanjoy, dsanders, llvm-commits
Differential Revision: http://reviews.llvm.org/D15848
llvm-svn: 256707
LLVM's targets need to know if stack pointer adjustments occur after the
prologue. This is needed to correctly determine if the red-zone is
appropriate to use or if a frame pointer is required.
Normally, LLVM can figure this out very precisely by reasoning about the
contents of the MachineFunction. There is an interesting corner case:
inline assembly.
The vast majority of inline assembly which will perform a push or pop is
done so to pair up with pushf or popf as appropriate. Unfortunately,
this inline assembly doesn't mark the stack pointer as clobbered
because, well, it isn't. The stack pointer is decremented and then
immediately incremented. Because of this, LLVM was changed in r256456
to conservatively assume that inline assembly contain a sequence of
stack operations. This is unfortunate because the vast majority of
inline assembly will not end up manipulating the stack pointer in any
way at all.
Instead, let's provide a more principled solution: an intrinsic.
FWIW, other compilers (MSVC and GCC among them) also provide this
functionality as an intrinsic.
llvm-svn: 256685
Not folding these cases tends to avoid partial register updates:
sqrtss (%eax), %xmm0
Has a partial update of %xmm0, while
movss (%eax), %xmm0
sqrtss %xmm0, %xmm0
Has a clobber of the high lanes immediately before the partial update,
avoiding a potential stall.
Given this, we only want to fold when optimizing for size.
This is consistent with the patterns we already have for some of
the fp/int converts, and in X86InstrInfo::foldMemoryOperandImpl()
Differential Revision: http://reviews.llvm.org/D15741
llvm-svn: 256671
This restores the previous behavior of not including the mnemonic in the classes table for every target that starts instruction lines with the mnemonic. Not only did the table size increase by 1 entry, but the class enum increased in size which caused every class in the array to increase in size. It also grew the size of the function that parsers tokens into classes by a substantial amount.
This adds a new HasMnemonicFirst flag to all AsmParsers. It's set to 1 by default and Hexagon target overrides it to 0.
For the X86 target alone this recovers 324KB of size on the llvm-mc executable.
I believe the current state is still a bad design choice for the Hexagon target as it causes most of the parsing to do a linear search through the entire match table to comparing operands against every instruction until it finds one that works. At least for the other targets we do a binary search based on mnemonic over which to do the linear scan.
llvm-svn: 256669
Summary:
* avoid generating POP {LR} in Thumb1 epilogues
* combine MOV LR, Rx + BX LR -> BX Rx in a peephole optimization pass
* combine POP {LR} + B + BX LR -> POP {PC} on v5T+
Test cases by Ana Pazos
Differential Revision: http://reviews.llvm.org/D15707
llvm-svn: 256523
The cost is calculated for all X86 targets. When gather/scatter instruction
is not supported we calculate the cost of scalar sequence.
Differential revision: http://reviews.llvm.org/D15677
llvm-svn: 256519
This adds support for the MCU psABI in a way different from r251223 and r251224,
basically reverting most of these two patches. The problem with the approach
taken in r251223/4 is that it only handled libcalls that originated from the backend.
However, the mid-end also inserts quite a few libcalls and assumes these use the
platform's default calling convention.
The previous patch tried to insert inregs when necessary both in the FE and,
somewhat hackily, in the CG. Instead, we now define a new default calling convention
for the MCU, which doesn't use inreg marking at all, similarly to what x86-64 does.
Differential Revision: http://reviews.llvm.org/D15054
llvm-svn: 256494
lower broadcast<type>x<vector> to shuffles.
there are two cases:
1.src is 128 bits and dest is 512 bits: in this case we will lower it to shuffle with imm = 0.
2.src is 256 bit and dest is 512 bits: in this case we will lower it to shuffle with imm = 01000100b (0x44) that way we will broadcast the 256bit source: ymm[0,1,2,3] => zmm[0,1,2,3,0,1,2,3] then it will mask it with the passthru value (in case it's mask op).
Differential Revision: http://reviews.llvm.org/D15790
llvm-svn: 256490
I believe this also fixes a case where a 64-bit memory form that is documented as being unsupported in 32-bit mode was able to be selected there.
llvm-svn: 256483
Fix TRUNCATE lowering vector to vector i1, use LSB and not MSB.
Implement VPMOVB/W/D/Q2M intrinsic.
Differential Revision: http://reviews.llvm.org/D15675
llvm-svn: 256470
A frame pointer must be used if stack pointer is modified after the
prologue. LLVM will emit pushf/popf if we need to save/restore the
FLAGS register, requiring us to have a frame pointer for the function.
There is a small twist: this sequence might exist in user code via
inline-assembly. For now, conservatively assume that such functions
require a frame pointer. For real world justification, please see
clang's implementation of __readeflags.
This fixes PR25945.
llvm-svn: 256456
Move RegStackify after coalescing and teach it to use LiveIntervals instead
of depending on SSA form. This avoids a problem where a register in a COPY
instruction is stackified and then subsequently coalesced with a register
that is not stackified.
This also puts it after the scheduler, which allows us to simplify the
EXPR_STACK constraint, as we no longer have instructions being reordered
after stackification and before coloring.
llvm-svn: 256402
The patterns that set a mask register to 0/1
KXOR %kn, %kn, %kn / KXNOR %kn, %kn, %kn
are replaced with
KXOR %k0, %k0, %kn / KXNOR %k0, %k0, %kn - AVX-512 targets optimization.
KNL does not recognize dependency-breaking idioms for mask registers,
so kxnor %k1, %k1, %k2 has a RAW dependence on %k1.
Using %k0 as the undef input register is a performance heuristic based
on the assumption that %k0 is used less frequently than the other mask
registers, since it is not usable as a write mask.
Differential Revision: http://reviews.llvm.org/D15739
llvm-svn: 256365
First step towards making better use of AVX's implicit zeroing of the upper half of a 256-bit vector by instructions that only act on the lower 128-bit vector - discussed on D14151.
As well as the fact that 128-bit shuffle instructions are generally more capable, this can be performant for older CPUs with 128-bit ALUs (e.g. Jaguar, Sandy Bridge) that must treat 256-bit vectors as multiple micro-ops.
Moved the similar subvector extraction shuffle combines from PerformShuffleCombine256 to lowerVectorShuffle as well.
Note: I've avoided combining shuffles that reference elements from the upper halves of the input vectors - this may be reviewed in future work as well (AVX1 would probably always gain, but AVX2 does have some cross-lane shuffle instructions).
Differential Revision: http://reviews.llvm.org/D15477
llvm-svn: 256332
Summary:
For some reason doing executing an MUBUF instruction with the addr64
bit set and a zero base pointer in the resource descriptor causes
the memory operation to be dropped when the shader is executed using
the HSA runtime.
This kind of MUBUF instruction is commonly used when the pointer is
stored in VGPRs. The base pointer field in the resource descriptor
is set to zero and and the pointer is stored in the vaddr field.
This patch resolves the issue by only using flat instructions for
global memory operations when targeting HSA. This is an overly
conservative fix as all other configurations of MUBUF instructions
appear to work.
NOTE: re-commit by fixing a failure in Codegen/AMDGPU/llvm.dbg.value.ll
Reviewers: tstellarAMD
Subscribers: arsenm, llvm-commits
Differential Revision: http://reviews.llvm.org/D15543
llvm-svn: 256282
Summary:
For some reason doing executing an MUBUF instruction with the addr64
bit set and a zero base pointer in the resource descriptor causes
the memory operation to be dropped when the shader is executed using
the HSA runtime.
This kind of MUBUF instruction is commonly used when the pointer is
stored in VGPRs. The base pointer field in the resource descriptor
is set to zero and and the pointer is stored in the vaddr field.
This patch resolves the issue by only using flat instructions for
global memory operations when targeting HSA. This is an overly
conservative fix as all other configurations of MUBUF instructions
appear to work.
Reviewers: tstellarAMD
Subscribers: arsenm, llvm-commits
Differential Revision: http://reviews.llvm.org/D15543
llvm-svn: 256273
This patch removes all weight-related interfaces from BPI and replace
them by probability versions. With this patch, we won't use edge weight
anymore in either IR or MC passes. Edge probabilitiy is a better
representation in terms of CFG update and validation.
Differential revision: http://reviews.llvm.org/D15519
llvm-svn: 256263
This is a recommit of r256004 which was reverted in r256160. The issue was the
incorrect promotion for half and byte loads transformed into mov instructions.
This fix will replace half and byte type loads only with bit field extracts.
Original commit message:
This change promotes load instructions which directly read from stored by
replacing them with mov instructions. If the store is wider than the load,
the load will be replaced with a bitfield extract.
For example :
STRWui %W1, %X0, 1
%W0 = LDRHHui %X0, 3
becomes
STRWui %W1, %X0, 1
%W0 = UBFMWri %W1, 16, 31
llvm-svn: 256249
Today, we always take into account the possibility that object files
produced by MC may be consumed by an incremental linker. This results
in us initialing fields which vary with time (TimeDateStamp) which harms
hermetic builds (e.g. verifying a self-host went well) and produces
sub-optimal code because we cannot assume anything about the relative
position of functions within a section (call sites can get redirected
through incremental linker thunks).
Let's provide an MCTargetOption which controls this behavior so that we
can disable this functionality if we know a-priori that the build will
not rely on /incremental.
llvm-svn: 256203
This patch transforms truncation between vectors of integers into
X86ISD::PACKUS/PACKSS operations during DAG combine. We don't do it in
lowering phase because after type legalization, the original truncation
will be turned into a BUILD_VECTOR with each element that is extracted
from a vector and then truncated, and from them it is difficult to do
this optimization. This greatly improves the performance of truncations
on some specific types.
Cost table is updated accordingly.
Differential revision: http://reviews.llvm.org/D14588
llvm-svn: 256194
Summary:
These register has different encodings on CI and VI, so we add pseudo
FLAT_SCRACTH registers to be used before MC, and subtarget specific
registers to be used by the MC layer.
Reviewers: arsenm
Subscribers: arsenm, llvm-commits
Differential Revision: http://reviews.llvm.org/D15661
llvm-svn: 256178
This patch adds to the target description two additional patterns for matching
extract-extend operations to SMOV. The patterns catch the v16i8-to-i64 and
v8i16-to-i64 cases. The existing patterns miss these cases because the
extracted elements must first be legalized to i32, resulting in any_extend
nodes.
This was originally implemented as a DAG combine (r255895), but was reverted
due to failing out-of-tree tests.
llvm-svn: 256176
This allows the AsmMatcherEmitter to properly tokenize the AsmStrings for
load and store instructions. This is a step towards asm parsing.
llvm-svn: 256166