Previously, on long branches (relative jumps of >4 kB), an assertion
failure was hit, as AVRInstrInfo::insertIndirectBranch was not
implemented. Despite its name, it is called by the branch relaxator
for *all* unconditional jumps.
Patch by Thomas Backman.
llvm-svn: 314891
In some cases, the code generator attempts to generate instructions such as:
lddw r24, Y+63
which expands to:
ldd r24, Y+63
ldd r25, Y+64 # Oops! This is actually ld r25, Y in the binary
This commit limits the first offset to 62, and thus the second to 63.
It also updates some asserts in AVRExpandPseudoInsts.cpp, including for
INW and OUTW, which appear to be unused.
Patch by Thomas Backman.
llvm-svn: 314890
This adds diagnostics for invalid immediate operands to the MOVW and MOVT
instructions (ARM and Thumb).
Differential revision: https://reviews.llvm.org/D31879
llvm-svn: 314888
Currently, our diagnostics for assembly operands are not consistent.
Some start with (for example) "immediate operand must be ...",
and some with "operand must be an immediate ...". I think the latter
form is preferable for a few reasons:
* It's unambiguous that it is referring to the expected type of operand, not
the type the user provided. For example, the user could provide an register
operand, and get a message taking about an operand is if it is already an
immediate, just not in the accepted range.
* It allows us to have a consistent style once we add diagnostics for operands
that could take two forms, for example a label or pc-relative memory operand.
Differential revision: https://reviews.llvm.org/D36689
llvm-svn: 314887
Summary:
1/ Operand folding during complex pattern matching for LEAs has been
extended, such that it promotes Scale to accommodate similar operand
appearing in the DAG.
e.g.
T1 = A + B
T2 = T1 + 10
T3 = T2 + A
For above DAG rooted at T3, X86AddressMode will no look like
Base = B , Index = A , Scale = 2 , Disp = 10
2/ During OptimizeLEAPass down the pipeline factorization is now performed over LEAs
so that if there is an opportunity then complex LEAs (having 3 operands)
could be factored out.
e.g.
leal 1(%rax,%rcx,1), %rdx
leal 1(%rax,%rcx,2), %rcx
will be factored as following
leal 1(%rax,%rcx,1), %rdx
leal (%rdx,%rcx) , %edx
3/ Aggressive operand folding for AM based selection for LEAs is sensitive to loops,
thus avoiding creation of any complex LEAs within a loop.
Reviewers: lsaba, RKSimon, craig.topper, qcolombet, jmolloy
Reviewed By: lsaba
Subscribers: jmolloy, spatel, igorb, llvm-commits
Differential Revision: https://reviews.llvm.org/D35014
llvm-svn: 314886
I found that llvm-mc does not like non-english characters even in comments,
which it tries to tokenize.
Problem happens because of functions like isdigit(), isalnum() which takes
int argument and expects it is not negative.
But at the same time MCParser uses char* to store input buffer poiner, char has signed value,
so it is possible to pass negative value to one of functions from above and
that triggers an assert.
Testcase for demonstration is provided.
To fix the issue helper functions were introduced in StringExtras.h
Differential revision: https://reviews.llvm.org/D38461
llvm-svn: 314883
This time invoking llc with "-march=x86-64" in the testcase, so we don't assume
the default target is x86.
Summary:
If we have
%vreg0<def> = PHI %vreg2<undef>, <BB#0>, %vreg3, <BB#2>; GR32:%vreg0,%vreg2,%vreg3
%vreg3<def,tied1> = ADD32ri8 %vreg0<kill,tied0>, 1, %EFLAGS<imp-def>; GR32:%vreg3,%vreg0
then we can't just change %vreg0 into %vreg3, since %vreg2 is actually
undef. We would have to also copy the undef flag to be able to change the
register.
Instead we deal with this case like other cases where we can't just
replace the register: we insert a COPY. The code creating the COPY already
copied all flags from the PHI input, so the undef flag will be transferred
as it should.
Reviewers: kparzysz
Reviewed By: kparzysz
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D38235
llvm-svn: 314882
We have found some corner cases connected to range intersection where IRCE makes
a bad thing when the latch condition is unsigned. The fix for that will go as a follow up.
This patch temporarily disables IRCE for unsigned latch conditions until the issue is fixed.
The unsigned latch conditions were introduced to IRCE by rL310027.
Differential Revision: https://reviews.llvm.org/D38529
llvm-svn: 314881
Summary:
If we have
%vreg0<def> = PHI %vreg2<undef>, <BB#0>, %vreg3, <BB#2>; GR32:%vreg0,%vreg2,%vreg3
%vreg3<def,tied1> = ADD32ri8 %vreg0<kill,tied0>, 1, %EFLAGS<imp-def>; GR32:%vreg3,%vreg0
then we can't just change %vreg0 into %vreg3, since %vreg2 is actually
undef. We would have to also copy the undef flag to be able to change the
register.
Instead we deal with this case like other cases where we can't just
replace the register: we insert a COPY. The code creating the COPY already
copied all flags from the PHI input, so the undef flag will be transferred
as it should.
Reviewers: kparzysz
Reviewed By: kparzysz
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D38235
llvm-svn: 314879
The previous version didn't work if the jump table base address didn't
fit in 32 bit, since it was encoded as an immediate offset. And in case
the jump table is encoded as 32 bit label differences, we need to
load and add them to the table base first.
This solves the first half of the issues mentioned in PR34720.
Also fix some of the errors pointed out by -verify-machineinstrs, by
using GR32_NOSPRegClass.
Differential Revision: https://reviews.llvm.org/D38333
llvm-svn: 314876
Test needs some slight adjustment because we no longer check the existence of
BFI but rather that the actual hotness is set on the remark. If entry_count
is not set getBlockProfileCount returns None.
llvm-svn: 314874
This is because lib/Fuzzer doesn't really depend on llvm infrastucture.
It's not easy to access the llvm hardware_concurrency here.
Differential Reivision: https://reviews.llvm.org/D38481
llvm-svn: 314870
Summary:
After r308422 we defer optimizations that can destroy loop canonical forms to
LateSimplifyCFG. Running LateSimplifyCFG after expanding atomic operations
can exploit more control-flow opportunities.
Reviewers: mcrosier, t.p.northover, efriedma
Reviewed By: efriedma
Subscribers: aemerson, rengolin, javed.absar, llvm-commits, kristof.beyls
Differential Revision: https://reviews.llvm.org/D38262
llvm-svn: 314857
This was dead when it landed in r252578. We have this functionality, if
not for stack probe calls, but for regular calls in
X86CallFrameOptimization.cpp.
llvm-svn: 314845
An archive looks like
<header>
<symbol table>
<tail>
The symbol table refers to offsets in the tail. A complication is that
we would like to support symbol tables that use 64 bit offsets if it
turns out that any of the offsets is too big.
This patch changes the archive writer to first compute the tail. We
cannot just compute one big StringRef since that would require reading
every member upfront, but we can represent it as a series of
StringRefs.
Having done that it is much easier to compute the symbol table and all
offsets are computed before it is written. With this if there is an
accounting problem it will show up with a regular symbol table, not
just when a 64 bit one is needed.
llvm-svn: 314844
This commit does two things. Firstly, it cleans up some of the benefit
calculation wrt outlined functions and candidates. Secondly, it fixes an
off-by-one bug in the cost model which was caused by the benefit value of
an OutlinedFunction and Candidate differing by 1. It updates the remarks test
to reflect this change.
llvm-svn: 314836
Summary:
For the amdpal OS type:
We write an AMDGPU_PAL_METADATA record in the .note section in the ELF
(or as an assembler directive). It contains key=value pairs of 32 bit
ints. It is a merge of metadata from codegen of the shaders, and
metadata provided by the frontend as _amdgpu_pal_metadata IR metadata.
Where both sources have a key=value with the same key, the two values
are ORed together.
This .note record is part of the amdpal ABI and will be documented in
docs/AMDGPUUsage.rst in a future commit.
Eventually the amdpal OS type will stop generating the .AMDGPU.config
section once the frontend has safely moved over to using the .note
records above instead of .AMDGPU.config.
Reviewers: arsenm, nhaehnle, dstuttard
Subscribers: kzhuravl, wdng, yaxunl, llvm-commits, t-tye
Differential Revision: https://reviews.llvm.org/D37753
llvm-svn: 314829
The test fails on Linux; see follow-up email on the llvm-commits list.
> Add the option to lookup an address in the debug information and print
> out the file, function, block and line table details.
>
> Differential revision: https://reviews.llvm.org/D38409
This also reverts the follow-up r314818:
> [test] Fix llvm-dwarfdump/cmdline.test
>
> Fixes test/tools/llvm-dwarfdump/cmdline.test
llvm-svn: 314825
All the buildbots are red, e.g.
http://lab.llvm.org:8011/builders/clang-cmake-aarch64-lld/builds/2436/
> Summary:
> This patch tries to vectorize loads of consecutive memory accesses, accessed
> in non-consecutive or jumbled way. An earlier attempt was made with patch D26905
> which was reverted back due to some basic issue with representing the 'use mask' of
> jumbled accesses.
>
> This patch fixes the mask representation by recording the 'use mask' in the usertree entry.
>
> Change-Id: I9fe7f5045f065d84c126fa307ef6ebe0787296df
>
> Reviewers: mkuper, loladiro, Ayal, zvi, danielcdh
>
> Reviewed By: Ayal
>
> Subscribers: hans, mzolotukhin
>
> Differential Revision: https://reviews.llvm.org/D36130
llvm-svn: 314824
The list of register ids was previously written out in a couple of dirrent
places. This puts it in a .def file and also adds a few more registers (e.g.
the x87 regs) which should lead to more readable dumps, but I didn't include
the whole list since that seems unnecessary.
X86_MC::initLLVMToSEHAndCVRegMapping is pretty ugly, but at least it's not
relying on magic constants anymore. The TODO of using tablegen still stands.
Differential revision: https://reviews.llvm.org/D38480
llvm-svn: 314821
Summary:
This should fix a regression introduced by r313786, which switched from
MachineInstr::isIndirectDebugValue() to checking if operand 1 is an
immediate. I didn't have a test case for it until now.
A single UserValue, which approximates a user variable, may have many
DBG_VALUE instructions that disagree about whether the variable is in
memory or in a virtual register. This will become much more common once
we have llvm.dbg.addr, but you can construct such a test case manually
today with llvm.dbg.value.
Before this change, we would get two UserValues: one for direct and one
for indirect DBG_VALUE instructions describing the same variable. If we
build separate interval maps for direct and indirect locations, we will
end up accidentally coalescing identical DBG_VALUE intervals that need
to remain separate because they are broken up by intervals of the
opposite direct-ness.
Reviewers: aprantl
Subscribers: llvm-commits, hiraditya
Differential Revision: https://reviews.llvm.org/D37932
llvm-svn: 314819
Add the option to lookup an address in the debug information and print
out the file, function, block and line table details.
Differential revision: https://reviews.llvm.org/D38409
llvm-svn: 314817
The issue with std:🧵:hardware_concurrency is that it forwards
to libc and some implementations (like glibc) don't take thread
affinity into consideration.
With this change a llvm program that can execute in only 2 cores will
use 2 threads, even if the machine has 32 cores.
This makes benchmarking a lot easier, but should also help if someone
doesn't want to use all cores for compilation for example.
llvm-svn: 314809
Summary:
This patch tries to vectorize loads of consecutive memory accesses, accessed
in non-consecutive or jumbled way. An earlier attempt was made with patch D26905
which was reverted back due to some basic issue with representing the 'use mask' of
jumbled accesses.
This patch fixes the mask representation by recording the 'use mask' in the usertree entry.
Change-Id: I9fe7f5045f065d84c126fa307ef6ebe0787296df
Reviewers: mkuper, loladiro, Ayal, zvi, danielcdh
Reviewed By: Ayal
Subscribers: hans, mzolotukhin
Differential Revision: https://reviews.llvm.org/D36130
llvm-svn: 314806
This switches the ARM AsmParser to use assembly operand diagnostics from
tablegen, rather than a switch statement on the ARMMatchResultTy. It
moves the existing diagnostic strings to tablegen, but adds no new ones,
so this is NFC except for one diagnostic string that had an off-by-1 error
in the hand-written switch statement.
Differential revision: https://reviews.llvm.org/D31607
llvm-svn: 314804
tryParseRegister advances the lexer, so we need to take copies of the start and
end locations of the register operand before calling it.
Previously, the caret in the diagnostic pointer to the comma after the r0
operand in the test, rather than the start of the operand.
Differential revision: https://reviews.llvm.org/D31537
llvm-svn: 314799
The dsp register class is an alias of the gpr register class, so
we have to define instructions for spilling and reloading.
Reviewers: atanasyan
Differential Revision: https://reviews.llvm.org/D38038
llvm-svn: 314798
Currently optimizeMemoryInst requires that all of the AddrModes it sees are
identical. This patch makes it capable of tracking multiple AddrModes, so long
as they differ in at most one field.
This patch does nothing by itself, but later patches will make use of it to
insert or reuse phi or select instructions for the differing fields.
Differential Revision: https://reviews.llvm.org/D38278
llvm-svn: 314795
This lets us optimize away selects that perform the same address computation in
two different ways and is also the first step towards being able to handle
selects between two different, but compatible, address computations.
Differential Revision: https://reviews.llvm.org/D38242
llvm-svn: 314794
In this code, we use ~0U as a sentinel value for any operand class that doesn't
have a user-friendly error message, but this value isn't in range of the
MatchClassKind enum, so we need to ensure it does not get passed to isSubclass.
llvm-svn: 314793
If the upper bits of a truncation shuffle patterns have at least the minimum number of sign/zero bits on their inputs then we can safely use PACKSS/PACKUS as shuffles.
Partial fix for https://bugs.llvm.org/show_bug.cgi?id=34773
Differential Revision: https://reviews.llvm.org/D38472
llvm-svn: 314788
The code responsible for analysis of inbounds GEPs is extracted into a separate
function: CallAnalyzer::canFoldInboundsGEP. With the patch SROA
enabling/disabling code is localized at one place instead of spreading across
the code of CallAnalyzer::visitGetElementPtr.
Differential Revision: https://reviews.llvm.org/D38233
llvm-svn: 314787
Summary:
Take the target's endianness into account when splitting the
debug information in DAGTypeLegalizer::SetExpandedInteger.
This patch fixes so that, for big-endian targets, the fragment
expression corresponding to the high part of a split integer
value is placed at offset 0, in order to correctly represent
the memory address order.
I have attached a PPC32 reproducer where the resulting DWARF
pieces for a 64-bit integer were incorrectly reversed.
Original patch was reverted due to using -stop-after=isel in
the test case (but that is only working when AMDGPU target
is included in the llc build). The test case has now been
updated to use -stop-before=expand-isel-pseudos instead.
Patch by: dstenb
Reviewers: JDevlieghere, aprantl, dblaikie
Reviewed By: JDevlieghere, aprantl, dblaikie
Subscribers: nemanjai
Differential Revision: https://reviews.llvm.org/D38172
llvm-svn: 314781
This converts the ARM AsmParser to use the new assembly matcher error
reporting mechanism, which allows errors to be reported for multiple
instruction encodings when it is ambiguous which one the user intended
to use.
By itself this doesn't improve many error messages, because we don't have
diagnostic text for most operand types, but as we add that then this will allow
more of those diagnostic strings to be used when they are relevant.
Differential revision: https://reviews.llvm.org/D31530
llvm-svn: 314779
This adds some more debug messages to the type legalizer and functions
like PromoteNode, ExpandNode, ExpandLibCall in an attempt to make
the debug messages a little bit more informative and useful.
Differential Revision: https://reviews.llvm.org/D38450
llvm-svn: 314773
This was previously being silently dropped by obj2yaml and caused
parsing errors with yaml2obj.
Differential Revision: https://reviews.llvm.org/D38490
llvm-svn: 314768
This makes sure the LSDA pointer isn't truncated to 32 bit.
Make LowerINTRINSIC_WO_CHAIN a member function instead of a static
function, so that it can use the getGlobalWrapperKind method.
This solves the second half of the issues mentioned in PR34720.
Differential Revision: https://reviews.llvm.org/D38343
llvm-svn: 314767
Summary:
When checking if a constant expression is a noop cast we fetched the
IntPtrType by doing DL->getIntPtrType(V->getType())). However, there can
be cases where V doesn't return a pointer, and then getIntPtrType()
triggers an assertion.
Now we pass DataLayout to isNoopCast so the method itself can determine
what the IntPtrType is.
Reviewers: arsenm
Reviewed By: arsenm
Subscribers: wdng, llvm-commits
Differential Revision: https://reviews.llvm.org/D37894
llvm-svn: 314763
Apparently this works by virtue of the fact that the pointers are pointers to the APInts stored inside of the ConstantInt objects. But I really don't think we should be relying on that.
llvm-svn: 314761
Previous code was a bit puzzling because of its use of pointers.
In this patch, we pass a vector and its offsets, instead of pointers to
vector elements.
llvm-svn: 314756
Move error handling code next to the code that returns the error,
and change the error message in order to distinguish it from a similar
error message elsewhere in this file.
llvm-svn: 314745
These are problematic because they apply to everything,
and can easily clobber whatever more specific predicate
you are trying to add to a function.
Currently instructions use SubtargetPredicate/PredicateControl
to apply this to patterns applied to an instruction definition,
but not to free standing Pats. Add a wrapper around Pat
so the special PredicateControls requirements can be appended
to the final predicate list like how Mips does it.
llvm-svn: 314742
Call ConstantFoldSelectInstruction() to fold cases like below
select <2 x i1><i1 true, i1 false>, <2 x i8> <i8 0, i8 1>, <2 x i8> <i8 2, i8 3>
All operands are constants and the condition has mixed true and false conditions.
Differential Revision: https://reviews.llvm.org/D38369
llvm-svn: 314741
Summary:
This avoids using void * as the type of the lattice value and ugly casts needed to make that happen.
(If folks want to use references, etc, they can use a reference_wrapper).
Reviewers: davide, mssimpso
Subscribers: sanjoy, llvm-commits
Differential Revision: https://reviews.llvm.org/D38476
llvm-svn: 314734
Issues addressed since original review:
- Avoid bug in regalloc greedy/machine verifier when forwarding to use
in an instruction that re-defines the same virtual register.
- Fixed bug when forwarding to use in EarlyClobber instruction slot.
- Fixed incorrect forwarding to register definitions that showed up in
explicit_uses() iterator (e.g. in INLINEASM).
- Moved removal of dead instructions found by
LiveIntervals::shrinkToUses() outside of loop iterating over
instructions to avoid instructions being deleted while pointed to by
iterator.
- Fixed ARMLoadStoreOptimizer bug exposed by this change in r311907.
- The pass no longer forwards COPYs to physical register uses, since
doing so can break code that implicitly relies on the physical
register number of the use.
- The pass no longer forwards COPYs to undef uses, since doing so
can break the machine verifier by creating LiveRanges that don't
end on a use (since the undef operand is not considered a use).
[MachineCopyPropagation] Extend pass to do COPY source forwarding
This change extends MachineCopyPropagation to do COPY source forwarding.
This change also extends the MachineCopyPropagation pass to be able to
be run during register allocation, after physical registers have been
assigned, but before the virtual registers have been re-written, which
allows it to remove virtual register COPY LiveIntervals that become dead
through the forwarding of all of their uses.
llvm-svn: 314729
Summary: Also add support for some older Myriad CPUs that were missing.
Reviewers: jyknight
Subscribers: fedor.sergeev
Differential Revision: https://reviews.llvm.org/D37552
llvm-svn: 314705
This came out of a recent discussion on llvm-dev
(https://reviews.llvm.org/D38042). Currently the Verifier will strip
the debug info metadata from a module if it finds the dbeug info to be
malformed. This feature is very valuable since it allows us to improve
the Verifier by making it stricter without breaking bcompatibility,
but arguable the Verifier pass should not be modifying the IR. This
patch moves the stripping of broken debug info into AutoUpgrade
(UpgradeDebugInfo to be precise), which is a much better location for
this since the stripping of malformed (i.e., produced by older, buggy
versions of Clang) is a (harsh) form of AutoUpgrade.
This change is mostly NFC in nature, the one big difference is the
behavior when LLVM module passes are introducing malformed debug
info. Prior to this patch, a NoAsserts build would have printed a
warning and stripped the debug info, after this patch the Verifier
will report a fatal error. I believe this behavior is actually more
desirable anyway.
Differential Revision: https://reviews.llvm.org/D38184
llvm-svn: 314699
Summary: If the merged instruction is call instruction, we need to set the scope to the closes common scope between 2 locations, otherwise it will cause trouble when the call is getting inlined.
Reviewers: dblaikie, aprantl
Reviewed By: dblaikie, aprantl
Subscribers: llvm-commits, sanjoy
Differential Revision: https://reviews.llvm.org/D37877
llvm-svn: 314694
The refactoring in
"[X86][SSE] Add createPackShuffleMask helper function. NFCI."
resulted in warning when compiling the code (seen in build bots).
This patch restores some types from int to unsigned to avoid
those warnings.
llvm-svn: 314667
Summary:
Take the target's endianness into account when splitting the
debug information in DAGTypeLegalizer::SetExpandedInteger.
This patch fixes so that, for big-endian targets, the fragment
expression corresponding to the high part of a split integer
value is placed at offset 0, in order to correctly represent
the memory address order.
I have attached a PPC32 reproducer where the resulting DWARF
pieces for a 64-bit integer were incorrectly reversed.
Patch by: dstenb
Reviewers: JDevlieghere, aprantl, dblaikie
Reviewed By: JDevlieghere, aprantl, dblaikie
Subscribers: nemanjai
Differential Revision: https://reviews.llvm.org/D38172
llvm-svn: 314666
This patch add a support of ISD::ZERO_EXTEND in PPCDAGToDAGISel::tryBitPermutation to increase the opportunity to use rotate-and-mask by reordering ZEXT and ANDI.
Since tryBitPermutation stops analyzing nodes if it hits a ZEXT node while traversing SDNodes, we want to avoid ZEXT between two nodes that can be folded into a rotate-and-mask instruction.
For example, we allow these nodes
t9: i32 = add t7, Constant:i32<1>
t11: i32 = and t9, Constant:i32<255>
t12: i64 = zero_extend t11
t14: i64 = shl t12, Constant:i64<2>
to be folded into a rotate-and-mask instruction.
Such case often happens in array accesses with logical AND operation in the index, e.g. array[i & 0xFF];
Differential Revision: https://reviews.llvm.org/D37514
llvm-svn: 314655
I continue to support different VF interleaved and in this pass for this patch,
I added the vf64 stride3 support for both load and store.
I also added support fot the stride4 store.
Reviewers:
1. zvi
2. dorit
3. igorb
4. guyblank
Differential Revision: https://reviews.llvm.org/D37687
Change-Id: I3d238efedf217d1768b348d710de1efa2f19d27b
llvm-svn: 314651
This unifies the patterns between both modes. This should be effectively NFC since all the available registers in 32-bit mode statisfy this constraint.
llvm-svn: 314643
If the two instructions being compared for equivalence have corresponding operands
that are integer constants, then check their values to determine equivalence.
Patch by Suyog Sarda!
llvm-svn: 314642
Summary: This currently uses ConstantExpr to do its math, but as noted in a TODO it can all be done directly on APInt.
Reviewers: spatel, majnemer
Reviewed By: majnemer
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D38440
llvm-svn: 314640
Summary:
Intel documentation shows the memory operand as the first operand. But we currently treat it as the second operand. Conceptually the order doesn't matter since it doesn't write memory. We have aliases to parse with the operands in either order and the isel matching is commutable.
For the register®ister form order does matter for the assembly parser. PR22995 was previously filed and fixed by changing the register®ister form from MRMSrcReg to MRMDestReg to match gas. Ideally the memory form should match by using MRMDestMem.
I believe this supercedes D38025 which was trying to switch the register®ister form back to pre-PR22995.
Reviewers: aymanmus, RKSimon, zvi
Reviewed By: aymanmus
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D38120
llvm-svn: 314639
And follow-up r314585.
Leads to segfaults. I'll forward reproduction instructions to the patch
author.
Also, for a recommit, still add the original patch description.
Otherwise, it becomes really tedious to find out what a patch actually
does. The fact that it is a recommit with a fix is somewhat secondary.
llvm-svn: 314622
Summary: In SamplePGO ThinLTO compile phase, we will not invoke ICP as it may introduce confusion to the 2nd annotation. This patch extracted that logic and makes it clearer before profile annotation. In the mean time, we need to make function importing process both inlined callsites as well as not promoted indirect callsites.
Reviewers: tejohnson
Reviewed By: tejohnson
Subscribers: sanjoy, mehdi_amini, llvm-commits, inglorion
Differential Revision: https://reviews.llvm.org/D38094
llvm-svn: 314619
Implemented by splitting into two v32i8 mulhu/mulhs and concatenating the results.
Differential Revision: https://reviews.llvm.org/D38307
llvm-svn: 314584
We have a single library build without relaxation options.
When inlined library functions remove fast math attributes
from the functions they are integrated into.
This patch sets relaxation attributes on the functions after
linking provided corresponding relaxation options are given.
Math instructions inside the inlined functions remain to have
no fast flags, but inlining does not prevent fast math
transformations of a surrounding caller code anymore.
Differential Revision: https://reviews.llvm.org/D38325
llvm-svn: 314568
Currently expandUnalignedLoad/Store uses place holder pointer info for temporary memory operand
in stack, which does not have correct address space. This causes unaligned private double16 load/store to be
lowered to flat_load instead of buffer_load for amdgcn target.
This fixes failures of OpenCL conformance test basic/vload_private/vstore_private on target amdgcn---amdgizcl.
Differential Revision: https://reviews.llvm.org/D35361
llvm-svn: 314566
This patch will eliminate redundant intptr/ptrtoint that pessimizes
analyses such as SCEV, AA and will make optimization passes such
as auto-vectorization more powerful.
Differential revision: http://reviews.llvm.org/D37832
llvm-svn: 314561
When type shrinking reductions, we should insert the truncations and extends at
the end of the loop latch block. Previously, these instructions were inserted
at the end of the loop header block. The difference is only a problem for loops
with predicated instructions (e.g., conditional stores and instructions that
may divide by zero). For these instructions, we create new basic blocks inside
the vectorized loop, which cause the loop header and latch to no longer be the
same block. This should fix PR34687.
Reference: https://bugs.llvm.org/show_bug.cgi?id=34687
llvm-svn: 314542
The type of a SCEVConstant may not match the corresponding LLVM Value.
In this case, we skip the constant folding for now.
TODO: Replace ConstantInt Zero by ConstantPointerNull
llvm-svn: 314531
This implement the insertion operator for DWARF address ranges so they
are consistently printed as [LowPC, HighPC).
While a dump method might have felt more consistent, it is used
exclusively for printing error messages in the verifier and never used
for actual dumping. Hence this approach is more intuitive and creates
less clutter at the call sites.
Differential revision: https://reviews.llvm.org/D38395
llvm-svn: 314523
The hardware will only forward EXEC_LO; the high 32 bits will be zero.
Additionally, inline constants do not work. At least,
v_addc_u32_e64 v0, vcc, v0, v1, -1
which could conceivably be used to combine (v0 + v1 + 1) into a single
instruction, acts as if all carry-in bits are zero.
The llvm.amdgcn.ps.live test is adjusted; it would be nice to combine
s_mov_b64 s[0:1], exec
v_cndmask_b32_e64 v0, v1, v2, s[0:1]
into
v_mov_b32 v0, v3
but it's not particularly high priority.
Fixes dEQP-GLES31.functional.shaders.helper_invocation.value.*
llvm-svn: 314522
Summary:
Currently, getGEPCost() returns TCC_FREE whenever a GEP is a legal addressing mode in the target.
However, since it doesn't check its actual users, it will return FREE even in cases
where the GEP cannot be folded away as a part of actual addressing mode.
For example, if an user of the GEP is a call instruction taking the GEP as a parameter,
then the GEP may not be folded in isel.
Reviewers: hfinkel, efriedma, mcrosier, jingyue, haicheng
Reviewed By: hfinkel
Subscribers: javed.absar, llvm-commits
Differential Revision: https://reviews.llvm.org/D38085
llvm-svn: 314517
Implement shouldCoalesce() to help regalloc avoid running out of GR128
registers.
If a COPY involving a subreg of a GR128 is coalesced, the live range of the
GR128 virtual register will be extended. If this happens where there are
enough phys-reg clobbers present, regalloc will run out of registers (if
there is not a single GR128 allocatable register available).
This patch tries to allow coalescing only when it can prove that this will be
safe by checking the (local) interval in question.
Review: Ulrich Weigand, Quentin Colombet
https://reviews.llvm.org/D37899https://bugs.llvm.org/show_bug.cgi?id=34610
llvm-svn: 314516
New instructions are added to AArch32 and AArch64 to aid
floating-point multiplication and addition of complex numbers, where
the complex numbers are packed in a vector register as a pair of
elements. The Imaginary part of the number is placed in the more
significant element, and the Real part of the number is placed in the
less significant element.
This patch adds assembler for the ARM target.
Differential Revision: https://reviews.llvm.org/D36789
llvm-svn: 314511
Fix nested callseq* nodes by moving callseq_start after the
arguments calculation to temporary registers, so that callseq* nodes
in resulting DAG are linear.
Recommitting r314497. This version does not contain test which fails
when compiler is not build in debug mode.
Differential Revision: https://reviews.llvm.org/D37328
llvm-svn: 314507
Add missing license information to MicroMipsInstrFPU.td and
fix most of the formatting errors present. Others will be
addressed in a follow up commits.
llvm-svn: 314505
Summary:
This commit adds comments on how the AMDPAL OS type overloads the
existing AMDGPU_ calling conventions used by Mesa, and adds a couple of
new ones.
Reviewers: arsenm, nhaehnle, dstuttard
Subscribers: mehdi_amini, kzhuravl, wdng, yaxunl, t-tye, llvm-commits
Differential Revision: https://reviews.llvm.org/D37752
llvm-svn: 314502
Summary:
Added support for scratch (including spilling) for OS type amdpal:
generates code to set up the scratch descriptor if it is needed.
With amdpal, the scratch resource descriptor is loaded from offset 0 of
the global information table. The low 32 bits of the address of the
global information table is passed in s0.
Added amdgpu-git-ptr-high function attribute to hard-wire the high 32
bits of the address of the global information table. If the function
attribute is not specified, or is 0xffffffff, then the backend generates
code to use the high 32 bits of pc.
The documentation for the AMDPAL ABI will be added in a later commit.
Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, t-tye
Differential Revision: https://reviews.llvm.org/D37483
llvm-svn: 314501
Summary:
This operating system type represents the AMDGPU PAL runtime, and will
be required by the AMDGPU backend in order to generate correct code for
this runtime.
Currently it generates the same code as not specifying an OS at all.
That will change in future commits.
Patch from Tim Corringham.
Subscribers: arsenm, nhaehnle
Differential Revision: https://reviews.llvm.org/D37380
llvm-svn: 314500
This patch introduces 3 helper functions: error(), warn() and note() to
make printing during verification more consistent. When supported, the
respective prefixes are printed in color using the same color scheme as
clang.
Differential revision: https://reviews.llvm.org/D38368
llvm-svn: 314498
Fix nested callseq* nodes by moving callseq_start after the
arguments calculation to temporary registers, so that callseq* nodes
in resulting DAG are linear.
Differential Revision: https://reviews.llvm.org/D37328
llvm-svn: 314497
Allow the proper recognition of Enum values and global variables inside ms inline-asm memory / immediate expressions, as they require some additional overhead and treated incorrect if doesn't early recognized.
supersedes D33278, D35774
Differential Revision: https://reviews.llvm.org/D37412
llvm-svn: 314493
This commit yanks out the repeated sections of code in pruneCandidates into
two lambdas: ShouldSkipCandidate and Prune. This simplifies the logic in
pruneCandidates significantly, and reduces the chance of introducing bugs by
folding all of the shared logic into one place.
llvm-svn: 314475
Summary:
X86ISelDAGToDAG tries to analyze ANDs compared with 0 to optimize to narrower immediates using subregisters.
I don't think we should be optimizing to 16-bit test instructions. It goes against our normal behavior of promoting i16 operations to i32. It only saves one byte due to the need to add a 0x66 prefix. I think it would also be subject to a length changing prefix penalty in the decoders on Intel CPUs.
Reviewers: RKSimon, zvi, spatel
Reviewed By: spatel
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D38273
llvm-svn: 314474
LR is an untypical callee saved register in that it is restored into a
different register (PC) and thus does not live-out of the return block.
This case requires the `Restored` flag in CalleeSavedInfo to be cleared.
This fixes a number of cases where this wasn't handled correctly yet.
llvm-svn: 314471
The expensive-checks build bot found a problem with the r314428 commit:
if CC is live after a ATOMIC_CMP_SWAPW instruction, it needs to be
marked as live-in to the block after the loop the pseudo gets expanded
to. This actually fixes a code-gen bug as well, since if the CC isn't
live, the CR and JLH are merged to a CRJLH which doesn't actually set
the condition code any more.
llvm-svn: 314465
If we have BWI, we can truncate in a much simpler way by using vpmovwb. This even works without VLX by using the wider zmm->ymm truncate with a subvector extract.
Differential Revision: https://reviews.llvm.org/D38375
llvm-svn: 314457
In setupEntryBlockAndCallSites in CodeGen/SjLjEHPrepare.cpp,
we fetch and store the actual frame pointer, but on return via
the longjmp intrinsic, it always was restored into the r7 variable.
On windows, the frame pointer should be restored into r11 instead of r7.
On Darwin (where sjlj exception handling is used by default), the frame
pointer is always r7, both in arm and thumb mode, and likewise, on
windows, the frame pointer always is r11.
On linux however, if sjlj exception handling is enabled (which it isn't
by default), libcxxabi and the user code can be built in differing modes
using different registers as frame pointer. Therefore, when restoring
registers on a platform where we don't always use the same register
depending on code mode, restore both r7 and r11.
Differential Revision: https://reviews.llvm.org/D38253
llvm-svn: 314451
We aren't do any in register extends here so we should be able to just the target independent nodes directly and allow them to be lowered as necessary.
llvm-svn: 314447
This patch implements the dwarfdump option --find=<name>. This option
looks for a DIE in the accelerator tables and dumps it if found. This
initial patch only adds support for .apple_names to keep the review
small, adding the other sections and pubnames support should be
trivial though.
Differential Revision: https://reviews.llvm.org/D38282
llvm-svn: 314439
JumpThreading now preserves dominance and lazy value information across the
entire pass. The pass manager is also informed of this preservation with
the goal of DT and LVI being recalculated fewer times overall during
compilation.
This change prepares JumpThreading for enhanced opportunities; particularly
those across loop boundaries.
Patch by: Brian Rzycki <b.rzycki@samsung.com>,
Sebastian Pop <s.pop@samsung.com>
Differential revision: https://reviews.llvm.org/D37528
llvm-svn: 314435
Summary: If we have BWI instructions we can widen to v32i16 to do the multiply instead of splitting.
Reviewers: RKSimon, spatel, zvi
Reviewed By: zvi
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D38305
llvm-svn: 314432
Summary:
Lowering never creates X86ISD::UMUL for 8-bit types. X86ISD::UMUL8 is used instead. If X86ISD::UMUL 8-bit were ever used it would crash.
DAGCombiner replaces UMUL_LOHI/SMUL_LOHI with a wider MUL and a shift if the type twice as wide is legal. So we should never see i8 UMUL_LOHI/SMUL_LOHI. In fact I think there was a bug in part of the i8 code. Similar is true for i16 though without the bug.
Reviewers: RKSimon, spatel, zvi
Reviewed By: zvi
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D38276
llvm-svn: 314430
Previously we were using one of the subvector indices twice. The included test case causes an assert without this change.
Thanks to Simon Pilgrim for catching this.
llvm-svn: 314429
The SystemZ compare-and-swap instructions already provide the "success"
indication via a condition-code value, so the default expansion of those
operations generates an unnecessary extra comparsion.
llvm-svn: 314428
This patch adds a check to the DWARF verifier to detect CUs without a
unit DIE.
Differential revision: https://reviews.llvm.org/D38363
llvm-svn: 314426
This patch disables codegen support for branch likely instructions to
address a potential bug. These branches were unselectable as
they had the same patterns as the normal branches but came after them
when ISel was concerned.
The branch likely instructions were marked as having no delay
slots when they have annulling delay slots. The delay slot filler
does not currently handle annulling delay slot branches, so this
would lead to wrong codegen if these branches were generated.
Reviewers: atanasyan, nitesh.jain
Differential Revision: https://reviews.llvm.org/D38169
llvm-svn: 314421
Summary:
A DBG_VALUE that is referring to a physical register is
valid up until the next def of the register, or the end
of the basic block that it belongs to.
LiveDebugVariables is computing live intervals (slot index
ranges) for DBG_VALUE instructions, before regalloc, in order
to be able to re-insert DBG_VALUE instructions again after
regalloc. When the DBG_VALUE is mapping a variable to a
physical register we do not need to compute the range. We
should simply re-insert the DBG_VALUE at the start position.
The problem that was found, resulting in this patch, was a
situation when the DBG_VALUE was the last real use of the
physical register. The computeIntervals/extendDef methods
extended the range to cover the whole basic block, even though
the physical register very well could be allocated to some
virtual register inside the basic block. So the extended
range could not be trusted.
This patch is a preparation for https://reviews.llvm.org/D38229,
where the goal is to insert DBG_VALUE after each new definition
of a variable, even if the virtual registers that the variable
was connected to has been coalesced into using the same physical
register (e.g. due to two address instructions). For more info
see https://bugs.llvm.org/show_bug.cgi?id=34545
Reviewers: aprantl, rnk, echristo
Reviewed By: aprantl
Subscribers: Ka-Ka, llvm-commits
Differential Revision: https://reviews.llvm.org/D38140
llvm-svn: 314414
Summary:
This allows sharing the lattice value code between LVI and SCCP (D36656).
It also adds a `satisfiesPredicate` function, used by D36656.
Reviewers: davide, sanjoy, efriedma
Reviewed By: sanjoy
Subscribers: mgorny, llvm-commits
Differential Revision: https://reviews.llvm.org/D37591
llvm-svn: 314411
MS allows the following size directives: float/double and long as synonymous to dword/qword and dword, respectively.
Differential Revision: https://reviews.llvm.org/D37190
llvm-svn: 314410
It's currently quite difficult to test passes like branch relaxation, which
requires branches with large displacement to be generated. The .space assembler
directive makes it easy to create arbitrarily large basic blocks, but
getInlineAsmLength is not able to parse it and so the size of the block is not
correctly estimated. Other backends (AArch64, AMDGPU) introduce options just
for testing that artificially restrict the ranges of branch instructions (e.g.
aarch64-tbz-offset-bits). Although parsing a single form of the .space
directive feels inelegant, it does allow a more direct testing approach.
This patch adapts the .space parsing code from
Mips16InstrInfo::getInlineAsmLength and removes it now the extra functionality
is provided by the base implementation. I want to move this functionality to
the generic getInlineAsmLength as 1) I need the same for RISC-V, and 2) I feel
other backends will benefit from more direct testing of large branch
displacements.
Differential Revision: https://reviews.llvm.org/D37798
llvm-svn: 314393
This is a follow-on of D37211.
D37211 eliminates a compare instruction if two conditional branches can be made based on the one compare instruction, e.g.
if (a == 0) { ... }
else if (a < 0) { ... }
This patch extends this optimization to support partially redundant cases, which often happen in while loops.
For example, one compare instruction is moved from the loop body into the preheader by this optimization in the following example.
do {
if (a == 0) dummy1();
a = func(a);
} while (a > 0);
Differential Revision: https://reviews.llvm.org/D38236
llvm-svn: 314390
%lo(), %hi(), and %pcrel_hi() are supported and test cases have been added to
ensure the appropriate fixups and relocations are generated. I've added an
instruction format field which is used in RISCVMCCodeEmitter to, for
instance, tell whether it should emit a lo12_i fixup or a lo12_s fixup
(RISC-V has two 12-bit immediate encodings depending on the instruction
type).
Differential Revision: https://reviews.llvm.org/D23568
llvm-svn: 314389
Summary:
If we have a non-allocated register, we allow us to try recoloring of an
already allocated and "Done" register, even if they are of the same
register class, if the non-allocated register has at least one tied def
and the allocated one has none.
It should be easier to recolor the non-tied register than the tied one, so
it might be an improvement even if they use the same regclasses.
Reviewers: qcolombet
Reviewed By: qcolombet
Subscribers: llvm-commits, MatzeB
Differential Revision: https://reviews.llvm.org/D38309
llvm-svn: 314388
Without this, we could end up trying to get the Nth (0-indexed) element
from a subvector of size N.
Differential Revision: https://reviews.llvm.org/D37880
llvm-svn: 314380
This patch adds new insn, "reg = be16/be32/be64 reg",
for bswap to little endian for big-endian target (bpfeb).
It also adds new insn for negation "reg = -reg".
Currently, for source code, e.g.,
b = -a
LLVM still prefers to generate:
b = 0 - a
But "reg = -reg" format can be used in assembly code.
Signed-off-by: Yonghong Song <yhs@fb.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
llvm-svn: 314376
Summary:
And now that we no longer have to explicitly free() the Loop instances, we can
(with more ease) use the destructor of LoopBase to do what LoopBase::clear() was
doing.
Reviewers: chandlerc
Subscribers: mehdi_amini, mcrosier, llvm-commits
Differential Revision: https://reviews.llvm.org/D38201
llvm-svn: 314375
If we do not initialize Prefix here, Prefix.data() returns a nullptr.
Later, it is passed to memcpy. memcpy's behavior is undefined if src (or
dst) is a nullptr even if a given size is 0. That's why this code
triggered UBsan.
llvm-svn: 314368
This patch produces a crash and hexagon_vector_loop_carried_reuse_constant.ll test fails on Windows (llvm-clang-x86_64-expensive-checks-win build bot).
llvm-svn: 314361
This reverts r314017 and similar code added in later commits. It seems to not work for pointer compares and is causing a bot failure for the last several days.
llvm-svn: 314360
The tar format originally supported up to 99 byte filename. The two
extensions are proposed later: Ustar or PAX.
In the UStar extension, a pathanme is split at a '/' and its "prefix"
and "suffix" are stored in different locations in the tar header. Since
"prefix" can be up to 155 byte, it can represent up to 254 byte
filename (but exact limit depends on the location of '/' character in
a pathname.)
Our TarWriter first attempt to use UStar extension and then fallback to
PAX extension.
But there's a bug in UStar header creation. "Suffix" part must be a NUL-
terminated string, but we didn't handle it correctly. As a result, if
your filename just 100 characters long, the last character was droppped.
This patch fixes the issue.
Differential Revision: https://reviews.llvm.org/D38149
llvm-svn: 314349
FileOutputBuffer::create() attempts to remove a target file if the file
is a regular one, which results in an unexpected result in a failure
scenario.
If something goes wrong and the user of FileOutputBuffer decides to not
call commit(), it leaves nothing. An existing file is removed, and no
new file is created.
What we should do is to atomically replace an existing file with a new
file using rename(), so that it wouldn't remove an existing file without
creating a new one.
Differential Revision: https://reviews.llvm.org/D38283
llvm-svn: 314345
This commit allows the outliner to avoid saving and restoring the link register
on AArch64 when it is dead within an entire class of candidates.
This introduces changes to the way the outliner interfaces with the target.
For example, the target now interfaces with the outliner using a
MachineOutlinerInfo struct rather than by using getOutliningCallOverhead and
getOutliningFrameOverhead.
This also improves several comments on the outliner's cost model.
https://reviews.llvm.org/D36721
llvm-svn: 314341