Previously some targets printed their own message at the start of Select to indicate what they were selecting. For the targets that didn't, it means there was no print of the root node before any custom handling in the target executed. So if the target did something custom and never called SelectNodeCommon, no print would be made. For the targets that did print a message in Select, if they didn't custom handle a node SelectNodeCommon would reprint the root node before walking the isel table.
It seems better to just print the message before the call to Select so all targets behave the same. And then remove the root node printing from SelectNodeCommon and just leave a message that says we're starting the table search.
There were also some oddities in blank line behavior. Usually due to a \n after a call to SelectionDAGNode::dump which already inserted a new line.
llvm-svn: 323551
Summary: This is the producer side for DWARF v5 string offsets tables. The reader/consumer
side was committed with r321295. All compile and type units in a module share a
contribution to the string offsets table. Indirect strings use the strx{1,2,3,4} index forms.
Reviewers: dblaikie, aprantl, JDevliegehere
Differential Revision: https://reviews.llvm.org/D42021
llvm-svn: 323546
We currently coalesce v4i32 extracts from all 4 elements to 2 v2i64 extracts + shifts/sign-extends.
This seems to have been added back in the days when we tended to spill vectors and reload scalars, or ended up with repeated shuffles moving everything down to 0'th index. I don't think either of these are likely these days as we have better EXTRACT_VECTOR_ELT and VECTOR_SHUFFLE handling, and the existing code tends to make it very difficult for various vector and load combines.
Differential Revision: https://reviews.llvm.org/D42308
llvm-svn: 323541
Summary:
If the same value is going to be vectorized several times in the same
tree entry, this entry is considered to be a gather entry and cost of
this gather is counter as cost of InsertElementInstrs for each gathered
value. But we can consider these elements as ShuffleInstr with
SK_PermuteSingle shuffle kind.
Reviewers: spatel, RKSimon, mkuper, hfinkel
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D38697
llvm-svn: 323530
Add support for printing / parsing the addrspace of a MachineMemOperand.
Fixes PR35970.
Differential Revision: https://reviews.llvm.org/D42502
llvm-svn: 323521
- using qualified pointer addrspace in intrinsics class to avoid .f32 mangling
- changed too common atomic mangling to ds
- added missing intrinsics to AMDGPUTTIImpl::getTgtMemIntrinsic
Reviewed by: b-sumner
Differential Revision: https://reviews.llvm.org/D42383
llvm-svn: 323516
load instruction
The function `Thumb1InstrInfo::loadRegFromStackSlot` accepts only the `tGPR`
register class. The function serves to emit a `tLDRspi` instruction and
certainly any subset of the `tGPR` register class is a valid destination of the
load.
Differential revision: https://reviews.llvm.org/D42535
llvm-svn: 323514
This is the groundwork for Armv8.2-A FP16 code generation .
Clang passes and returns _Float16 values as floats, together with the required
bitconverts and truncs etc. to implement correct AAPCS behaviour, see D42318.
We will implement half-precision argument passing/returning lowering in the ARM
backend soon, but for now this means that this:
_Float16 sub(_Float16 a, _Float16 b) {
return a + b;
}
gets lowered to this:
define float @sub(float %a.coerce, float %b.coerce) {
entry:
%0 = bitcast float %a.coerce to i32
%tmp.0.extract.trunc = trunc i32 %0 to i16
%1 = bitcast i16 %tmp.0.extract.trunc to half
<SNIP>
%add = fadd half %1, %3
<SNIP>
}
When FullFP16 is *not* supported, we don't make f16 a legal type, and we get
legalization for "free", i.e. nothing changes and everything works as before.
And also f16 argument passing/returning is handled.
When FullFP16 is supported, we do make f16 a legal type, and have 2 places that
we need to patch up: f16 argument passing and returning, which involves minor
tweaks to avoid unnecessary code generation for some bitcasts.
As a "demonstrator" that this works for the different FP16, FullFP16, softfp
modes, etc., I've added match rules to the VSUB instruction description showing
that we can codegen this instruction from IR, but more importantly, also to
some conversion instructions. These conversions were causing issue before in
the FP16 and FullFP16 cases.
I've also added match rules to the VLDRH and VSTRH desriptions, so that we can
actually compile the entire half-precision sub code example above. This showed
that these loads and stores had the wrong addressing mode specified: AddrMode5
instead of AddrMode5FP16, which turned out not be implemented at all, so that
has also been added.
This is the minimal patch that shows all the different moving parts. In patch
2/3 I will add some efficient lowering of bitcasts, and in 2/3 I will add the
remaining Armv8.2-A FP16 instruction descriptions.
Thanks to Sam Parker and Oliver Stannard for their help and reviews!
Differential Revision: https://reviews.llvm.org/D38315
llvm-svn: 323512
Type legalization would prevent any i64 operands to the build_vector from existing before we get here. The coverage bots show this code as uncovered.
llvm-svn: 323506
The original autoupgrade for kunpck intrinsics used a bitcasted scalar shift, or, and. This combine would turn this into a concat_vectors. Now the kunpck intrinsics are autoupgraded to a vector shuffle that will become a concat_vectors.
llvm-svn: 323504
This listed all legal 128-bit integer types individually, but since we already know we have a legal type and its integer, we can just check is128BitVector.
llvm-svn: 323502
When pass creates a MOV instruction for
lea (%base,%index,1), %dst => mov %base,%dst; add %index,%dst
modification it should clean the killed flag for base
if base is equal to index.
Otherwise verifier complains about usage of killed register in add instruction.
Reviewers: lsaba, zvi, zansari, aaboud
Reviewed By: lsaba
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D42522
llvm-svn: 323497
Inserting a dbg.value instruction at the start of a basic block with a
landingpad instruction triggers a verifier failure. We should be OK if
we insert the instruction a bit later.
Speculative fix for the bot failure described here:
https://reviews.llvm.org/D42551
llvm-svn: 323482
Move standard forms from a switch statement to the table of forms;
fill in all the missing ones defined in DWARF v5. I'm guessing at
classifications in a couple of cases where v5 forms aren't actually
supported yet, but whoever adds support for the forms can fix the
classifications as needed.
llvm-svn: 323481
This form is like DW_FORM_strp, but points to .debug_line_str instead
of .debug_str as the string section. It's intended to be used from
the line-table header, and allows string-pooling of directory and
filenames across compilation units.
Differential Revision: https://reviews.llvm.org/D42553
llvm-svn: 323476
Summary:
The intent of this is to allow the code to be used with ThinLTO. In
Thinlink phase, a traditional Callgraph can not be computed even though
all the necessary information (nodes and edges of a call graph) is
available. This is due to the fact that CallGraph class is closely tied
to the IR. This patch first extends GraphTraits to add a CallGraphTraits
graph. This is then used to implement a version of counts propagation
on a generic callgraph.
Reviewers: davidxl
Subscribers: mehdi_amini, tejohnson, llvm-commits
Differential Revision: https://reviews.llvm.org/D42311
llvm-svn: 323475
This patch enables aggressive FMA by default on T99, and provides a -mllvm
option to enable the same on other AArch64 micro-arch's (-mllvm
-aarch64-enable-aggressive-fma).
Test case demonstrating the effects on T99 is included.
Patch by: steleman (Stefan Teleman)
Differential Revision: https://reviews.llvm.org/D40696
llvm-svn: 323474
This patch is an enhancement to propagate dbg.value information when
Phis are created on behalf of LCSSA. I noticed a case where a value
carried across a loop was reported as <optimized out>.
Specifically this case:
int bar(int x, int y) {
return x + y;
}
int foo(int size) {
int val = 0;
for (int i = 0; i < size; ++i) {
val = bar(val, i); // Both val and i are correct
}
return val; // <optimized out>
}
In the above case, after all of the interesting computation completes
our value is reported as "optimized out." This change will add a
dbg.value to correct this.
This patch also moves the dbg.value insertion routine from
LoopRotation.cpp into Local.cpp, so that we can share it in both places
(LoopRotation and LCSSA).
Patch by Matt Davis!
Differential Revision: https://reviews.llvm.org/D42551
llvm-svn: 323472
Right now clang uses "_n" suffix for some user space callbacks and "N" for the matching kernel ones. There's no need for this and it actually breaks kernel build with inline instrumentation. Use the same callback names for user space and the kernel (and also make them consistent with the names GCC uses).
Patch by Andrey Konovalov.
Differential Revision: https://reviews.llvm.org/D42423
llvm-svn: 323470
The asm parser puts the lock prefix in the MCInst flags so we need to check that in addition to TSFlags. This matches what the ATT printer does.
llvm-svn: 323469
It was reverted after buildbot regressions.
Original commit message:
This allows relative block frequency of call edges to be passed
to the thinlink stage where it will be used to compute synthetic
entry counts of functions.
llvm-svn: 323460
Summary:
If the same value is going to be vectorized several times in the same
tree entry, this entry is considered to be a gather entry and cost of
this gather is counter as cost of InsertElementInstrs for each gathered
value. But we can consider these elements as ShuffleInstr with
SK_PermuteSingle shuffle kind.
Reviewers: spatel, RKSimon, mkuper, hfinkel
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D38697
llvm-svn: 323441
This is guarded by shouldChangeType(), so the tests show that
we don't do the fold if the narrower type is not legal. Note
that there is a proposal (D42424) that would change the results
for the specific cases shown in these tests. That difference is
also discussed in PR35792:
https://bugs.llvm.org/show_bug.cgi?id=35792
Alive proofs for the cases handled here as well as the bitwise
logic binops that we should already do better on:
https://rise4fun.com/Alive/c97https://rise4fun.com/Alive/Lc5Ehttps://rise4fun.com/Alive/kdf
llvm-svn: 323437
Summary:
If the same value is going to be vectorized several times in the same
tree entry, this entry is considered to be a gather entry and cost of
this gather is counter as cost of InsertElementInstrs for each gathered
value. But we can consider these elements as ShuffleInstr with
SK_PermuteSingle shuffle kind.
Reviewers: spatel, RKSimon, mkuper, hfinkel
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D38697
llvm-svn: 323430
This patch extends the atom types used by the Apple accelerator tables
with two dsymutil extensions:
- DW_ATOM_type_type_flags
- DW_ATOM_qual_name_hash
llvm-svn: 323414
Summary:
When creating the debug fragments for a SRA'd struct, use the fields'
offsets, taken from the struct layout, as the offsets for the resulting
fragments. This fixes an issue where GlobalOpt would emit fragments with
incorrect offsets for padded fields.
This should solve PR36016.
Patch by David Stenberg.
Reviewers: aprantl
Reviewed By: aprantl
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D42489
llvm-svn: 323411
The regular expressions and the imul names caused some instructions to be matched by multiple regexs creating unpredictable results.
This changes them all to use explicit instrs instead.
While doing this I also found that some instructions in Skylake were missing load latency so I fixed that too.
llvm-svn: 323406
The IMUL instruction names mixed with the prefix matching of the instregex lead to some strange matches. The worst being that several memory instructions are using the register form latency.
I don't know what the right answer is, so I've left TODOs and will try to work with the AMD folks to get this cleaned up.
llvm-svn: 323405
MMX instrutions all start with MMX_ so the 64 isn't needed for disambigutation.
SSE/AVX1 instructions are assumed 128-bit so we don't need to say 128.
AVX2 instructions should use a Y to indicate 256-bits.
llvm-svn: 323402
These were treated as optional suffixes, but the regular expressions are already prefix matches so this is unnecessary. It breaks the binary search optimization in tablegen due to the top level question mark.
llvm-svn: 323401
first argument.
This makes lookupFlags more consistent with lookup (which takes the query as the
first argument) and composes better in practice, since lookups are usually
linearly chained: Each lookupFlags can populate the result map based on the
symbols not found in the previous lookup. (If the maps were returned rather than
passed by reference there would have to be a merge step at the end).
llvm-svn: 323398
https://reviews.llvm.org/D41373
The various components are
GICombinerHelper contains transformations that are common to all
targets. Targets can pick and choose which transformations (at
function/opcode granularity) each pass uses via configuring a
GICombinerInfo.
GICombiner contains some common code and it does the traversal,
driving of combines, worklist management and iterating until
convergence.
GICombinerInfo is an interface with a virtual method called combine.
The combiner info will allow targets to pick and choose (or
implement their own specific combines). CombineInfos can make
use of available combines in GICombineHelper to configure the
transformations for a particular pass. Currently this approach allows
cherry picking transformations from helpers (at function/opcode
granularity) and also allows early returning on specific
transformations. Targets also get to prioritize whether target specific
combines run before/after the opt-in generic combines. Ideally we would
like this part to be configured by both C++ and Tablegen. The
CombinerInfo also has a field which indicates how to deal with
IllegalOps (ie - should we allow to create them/or legalize them?).
A CombinerPass would configure a CombinerInfo, create the GICombiner
with the Info, and call
GICombiner::combineMachineInstrs(MachineFunction&).
This organization is very similar to the GISelLegalizer.
llvm-svn: 323392
The code in EmitFunctionEntryCode needs to know the maximum stack
alignment, but it runs very early in the selection process (before
lowering). The final stack alignment may change during lowering, so
the code needs to be moved to where the alignment is known.
llvm-svn: 323374
The tablegen imported patterns for sext(load(a)) don't check for single uses
of the load or delete the original after matching. As a result two loads are
left in the generated code. This particular issue will be fixed by adding
support for a G_SEXTLOAD opcode in future.
There are however other potential issues around this that wouldn't be fixed by
a G_SEXTLOAD, so until we have a proper solution we don't try to handle volatile
loads at all in the AArch64 selector.
Fixes/works around PR36018.
llvm-svn: 323371
Apparently checking the pass structure isn't enough to ensure that we don't fall
back to FastISel, as it's set up as part of the SelectionDAGISel.
llvm-svn: 323369
As discussed in D41484, PMADDWD for 'zero extended' vXi32 is nearly always a better option than PMULLD:
On SNB it will result in code that isn't any faster, but not any slower so we may as well keep it.
On KNL it only has half the throughput, so I've disabled it on there - ideally there'd be a better way than this.
Differential Revision: https://reviews.llvm.org/D42258
llvm-svn: 323367
Summary:
Move reserveRegisterTuples into AMDGPURegisterInfo and use it in
R600RegisterInfo::getReservedRegs and
R600InstrInfo::reserveIndirectRegisters to ensure that all super
registers of reserved registers are also marked as reserved.
Before this change, under certain circumstances, the registers %t1_x and
%t1_xyzw would be marked as reserved, but %t1_xy and %t1_xyz would not
be, leading to the register allocator sometimes assigning a register to
%t1_xy, which is invalid since %t1_x is reserved.
Reviewers: arsenm, tstellar, MatzeB, qcolombet
Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, mcrosier, llvm-commits
Differential Revision: https://reviews.llvm.org/D42448
llvm-svn: 323356
It causes regressions in various OpenGL test suites.
Keep the test cases introduced by r321751 as XFAIL, and add a test case
for the regression.
Change-Id: I90b4cc354f68cebe5fcef1f2422dc8fe1c6d3514
Bugzilla: https://bugs.llvm.org/show_bug.cgi?id=36015
llvm-svn: 323355
Summary: For long shifts, the inlined version takes about 20 instructions on Thumb1. To avoid the code bloat, expand to __aeabi_ calls if target is Thumb1.
Reviewers: samparker
Reviewed By: samparker
Subscribers: samparker, aemerson, javed.absar, kristof.beyls, llvm-commits
Differential Revision: https://reviews.llvm.org/D42401
llvm-svn: 323354
The regexs are treated as a prefix match already so the checking for optional text at the end provides no value. Instead it prevents the binary search optimization in tablegen from kicking in due to the top level question mark.
llvm-svn: 323351
Summary:
This allows relative block frequency of call edges to be passed to the
thinlink stage where it will be used to compute synthetic entry counts
of functions.
Reviewers: tejohnson, pcc
Subscribers: mehdi_amini, llvm-commits, inglorion
Differential Revision: https://reviews.llvm.org/D42212
llvm-svn: 323349
Summary:
If the same value is going to be vectorized several times in the same
tree entry, this entry is considered to be a gather entry and cost of
this gather is counter as cost of InsertElementInstrs for each gathered
value. But we can consider these elements as ShuffleInstr with
SK_PermuteSingle shuffle kind.
Reviewers: spatel, RKSimon, mkuper, hfinkel
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D38697
llvm-svn: 323348
Summary:
If any vector divisor element is undef, we can arbitrarily choose it be
zero which would make the div/rem an undef value by definition.
Reviewers: spatel, reames
Reviewed By: spatel
Subscribers: magabari, llvm-commits
Differential Revision: https://reviews.llvm.org/D42485
llvm-svn: 323343
Summary:
`getAction(const InstrAspect &) const` breaks encapsulation by exposing
the smaller components that are used to decide how to legalize an
instruction.
This is a problem because we need to change the implementation of
LegalizerInfo so that it's able to describe particular type combinations
rather than just cartesian products of types.
For example, declaring the following
setAction({..., 0, s32}, Legal)
setAction({..., 0, s64}, Legal)
setAction({..., 1, s32}, Legal)
setAction({..., 1, s64}, Legal)
currently declares these type combinations as legal:
{s32, s32}
{s64, s32}
{s32, s64}
{s64, s64}
but we currently have no means to say that, for example, {s64, s32} is
not legal. Some operations such as G_INSERT/G_EXTRACT/G_MERGE_VALUES/
G_UNMERGE_VALUES has relationships between the types that are currently
described incorrectly.
Additionally, G_LOAD/G_STORE currently have no means to legalize non-atomics
differently to atomics. The necessary information is in the MMO but we have no
way to use this in the legalizer. Similarly, there is currently no way for the
register type and the memory type to differ so there is no way to cleanly
represent extending-load/truncating-store in a way that can't be broken by
optimizers (resulting in illegal MIR).
This patch introduces LegalityQuery which provides all the information
needed by the legalizer to make a decision on whether something is legal
and how to legalize it.
Reviewers: ab, t.p.northover, qcolombet, rovka, aditya_nandakumar, volkan, reames, bogner
Reviewed By: bogner
Subscribers: bogner, llvm-commits, kristof.beyls
Differential Revision: https://reviews.llvm.org/D42244
llvm-svn: 323342
This allows us to specify the magic number for the DJB hash function.
This feature is needed by dsymutil to emit Apple types accelerator
table.
llvm-svn: 323341
We're getting bug reports:
https://bugs.llvm.org/show_bug.cgi?id=35807https://bugs.llvm.org/show_bug.cgi?id=35840https://bugs.llvm.org/show_bug.cgi?id=36045
...where we blow up the stack in value tracking because other passes are sending
in selects that have an operand that is itself the select.
We don't currently have a reliable way to avoid analyzing dead code that may take
non-standard forms, so bail out when things go too far.
This mimics the recursion depth limitations in other parts of value tracking.
Unfortunately, this pushes the underlying problems for other passes (jump-threading,
simplifycfg, correlated-propagation) into hiding. If someone wants to uncover those
again, the first draft of this patch on Phab would do that (it would assert rather
than bail out).
Differential Revision: https://reviews.llvm.org/D42442
llvm-svn: 323331
Summary:
Loads/stores of some NEON vector types are promoted to other vector
types with different lane sizes but same vector size. This is not a
problem in little-endian but, when in big-endian, it requires
additional byte reversals required to preserve the lane ordering
while keeping the right endianness of the data inside each lane.
For example:
%1 = load <4 x half>, <4 x half>* %p
results in the following assembly:
ld1 { v0.2s }, [x1]
rev32 v0.4h, v0.4h
This patch changes the promotion of these loads/stores so that the
actual vector load/store (LD1/ST1) takes care of the endianness
correctly and there is no need for further byte reversals. The
previous code now results in the following assembly:
ld1 { v0.4h }, [x1]
Reviewers: olista01, SjoerdMeijer, efriedma
Reviewed By: efriedma
Subscribers: aemerson, rengolin, javed.absar, llvm-commits, kristof.beyls
Differential Revision: https://reviews.llvm.org/D42235
llvm-svn: 323325
Summary:
This patch implements the codegen of DWARF debug info for non-constant
'count' fields for DISubrange.
This is patch [2/3] in a series to extend LLVM's DISubrange Metadata
node to support debugging of C99 variable length arrays and vectors with
runtime length like the Scalable Vector Extension for AArch64. It is
also a first step towards representing more complex cases like arrays
in Fortran.
Reviewers: echristo, pcc, aprantl, dexonsmith, clayborg, kristof.beyls, dblaikie
Reviewed By: aprantl
Subscribers: fhahn, aemerson, rengolin, JDevlieghere, llvm-commits
Differential Revision: https://reviews.llvm.org/D41696
llvm-svn: 323323
Combine expression patterns to form expressions with fewer, simple instructions.
This pass does not modify the CFG.
For example, this pass reduce width of expressions post-dominated by TruncInst
into smaller width when applicable.
It differs from instcombine pass in that it contains pattern optimization that
requires higher complexity than the O(1), thus, it should run fewer times than
instcombine pass.
Differential Revision: https://reviews.llvm.org/D38313
llvm-svn: 323321
Summary:
This patch extends the DISubrange 'count' field to take either a
(signed) constant integer value or a reference to a DILocalVariable
or DIGlobalVariable.
This is patch [1/3] in a series to extend LLVM's DISubrange Metadata
node to support debugging of C99 variable length arrays and vectors with
runtime length like the Scalable Vector Extension for AArch64. It is
also a first step towards representing more complex cases like arrays
in Fortran.
Reviewers: echristo, pcc, aprantl, dexonsmith, clayborg, kristof.beyls, dblaikie
Reviewed By: aprantl
Subscribers: rnk, probinson, fhahn, aemerson, rengolin, JDevlieghere, llvm-commits
Differential Revision: https://reviews.llvm.org/D41695
llvm-svn: 323313
For the included test case, the DAG transformation
concat_vectors(scalar, undef) -> scalar_to_vector(sclr)
would attempt to create a v2i32 vector for a v9i8
concat_vector. Bail out to avoid creating a bitcast with
mismatching sizes later on.
Differential Revision: https://reviews.llvm.org/D42379
llvm-svn: 323312
This patch removes assert that SCEV is able to prove that a value is
non-negative. In fact, SCEV can sometimes be unable to do this because
its cache does not update properly. This assert will be returned once this
problem is resolved.
llvm-svn: 323309
This matches what MSVC does for alloca() function calls on ARM.
Even if MSVC doesn't support VLAs at the language level, it does
support the alloca function.
On the clang level, both the _alloca() (when emulating MSVC, which is
what the alloca() function expands to) and __builtin_alloca() builtin
functions, and VLAs, map to the same LLVM IR "alloca" function - so
within LLVM they're not distinguishable from each other.
Differential Revision: https://reviews.llvm.org/D42292
llvm-svn: 323308