This is a test that Akira Hatanaka wrote to test GlobalOpt's handling of
aliases with GEP operands. David Majnemer independently made the same
change to GlobalOpt in r212079. Akira's test is a useful addition, so I'm
pulling it over from the llvm repo for Swift on GitHub.
llvm-svn: 262510
When div+rem calls on the same arguments are found, the ARM back-end merges the
two calls into one __aeabi_divmod call for up to 32-bits values. However,
for 64-bit values, which also have a lib call (__aeabi_ldivmod), it wasn't
merging the calls, and thus calling ldivmod twice and spilling the temporary
results, which generated pretty bad code.
This patch legalises 64-bit lib calls for divmod, so that now all the spilling
and the second call are gone. It also relaxes the DivRem combiner a bit on the
legal type check, since it was already checking for isLegalOrCustom on every
value, so the extra check for isTypeLegal was redundant.
This patch fixes PR17193 (and a long time FIXME in the tests).
llvm-svn: 262507
This reverts commit r262370.
It turns out there is code out there that does sequences of allocas
greater than 4K: http://crbug.com/591404
The goal of this change was to improve the code size of inalloca call
sequences, but we got tangled up in the mess of dynamic allocas.
Instead, we should come back later with a separate MI pass that uses
dominance to optimize the full sequence. This should also be able to
remove the often unneeded stacksave/stackrestore pairs around the call.
llvm-svn: 262505
Most of the time ARM has the CCR.UNALIGN_TRP bit set to false which
means that unaligned loads/stores do not trap and even extensive testing
will not catch these bugs. However the multi/double variants are not
affected by this bit and will still trap. In effect a more aggressive
load/store optimization will break existing (bad) code.
These bugs do not necessarily manifest in the broken code where the
misaligned pointer is formed but often later in perfectly legal code
where it is accessed. This means recompiling system libraries (which
have no alignment bugs) with a newer compiler will break existing
applications (with alignment bugs) that worked before.
So (under protest) I implemented this safe mode which limits the
formation of multi/double operations to cases that are not affected by
user code (stack operations like spills/reloads) or cases where the
normal operations trap anyway (floating point load/stores). It is
disabled by default.
Differential Revision: http://reviews.llvm.org/D17015
llvm-svn: 262504
The placement new calls here were all calling the allocation function
in RecyclingAllocator/Recycler for SDNode, instead of the function for
the specific subclass we were constructing.
Since this particular allocator always overallocates it more or less
worked, but would hide what we're actually doing from any memory
tools. Also, if you tried to change this allocator so something like a
BumpPtrAllocator or MallocAllocator, the compiler would crash horribly
all the time.
Part of llvm.org/PR26808.
llvm-svn: 262500
Summary:
This change enables frame pointer elimination in non-leaf functions.
The -fomit-frame-pointer option still needs to be used when compiling
via clang (or an equivalent method of not setting the
'no-frame-pointer-elim*' function attributes if generating llvm IR via
some other method) to take advantage of this optimization.
This change should be NFC when compiling via clang without
-fomit-frame-pointer.
Reviewers: t.p.northover
Subscribers: aemerson, rengolin, tberghammer, qcolombet, llvm-commits, danalbert, mcrosier, srhines
Differential Revision: http://reviews.llvm.org/D17730
llvm-svn: 262495
Otherwise users get messages from CheckAtomic about missing libatomic
instead of a sensible message that says "use GCC 4.7 or newer".
I structured the change along the lines of HandleLLVMStdlib.cmake, so
that the standalone build of Clang still gets the compiler version
check.
Reviewers: beanz
Differential Revision: http://reviews.llvm.org/D17789
llvm-svn: 262491
parts of the AA interface out of the base class of every single AA
result object.
Because this logic reformulates the query in terms of some other aspect
of the API, it would easily cause O(n^2) query patterns in alias
analysis. These could in turn be magnified further based on the number
of call arguments, and then further based on the number of AA queries
made for a particular call. This ended up causing problems for Rust that
were actually noticable enough to get a bug (PR26564) and probably other
places as well.
When originally re-working the AA infrastructure, the desire was to
regularize the pattern of refinement without losing any generality.
While I think it was successful, that is clearly proving to be too
costly. And the cost is needless: we gain no actual improvement for this
generality of making a direct query to tbaa actually be able to
re-use some other alias analysis's refinement logic for one of the other
APIs, or some such. In short, this is entirely wasted work.
To the extent possible, delegation to other API surfaces should be done
at the aggregation layer so that we can avoid re-walking the
aggregation. In fact, this significantly simplifies the logic as we no
longer need to smuggle the aggregation layer into each alias analysis
(or the TargetLibraryInfo into each alias analysis just so we can form
argument memory locations!).
However, we also have some delegation logic inside of BasicAA and some
of it even makes sense. When the delegation logic is baking in specific
knowledge of aliasing properties of the LLVM IR, as opposed to simply
reformulating the query to utilize a different alias analysis interface
entry point, it makes a lot of sense to restrict that logic to
a different layer such as BasicAA. So one aspect of the delegation that
was in every AA base class is that when we don't have operand bundles,
we re-use function AA results as a fallback for callsite alias results.
This relies on the IR properties of calls and functions w.r.t. aliasing,
and so seems a better fit to BasicAA. I've lifted the logic up to that
point where it seems to be a natural fit. This still does a bit of
redundant work (we query function attributes twice, once via the
callsite and once via the function AA query) but it is *exactly* twice
here, no more.
The end result is that all of the delegation logic is hoisted out of the
base class and into either the aggregation layer when it is a pure
retargeting to a different API surface, or into BasicAA when it relies
on the IR's aliasing properties. This should fix the quadratic query
pattern reported in PR26564, although I don't have a stand-alone test
case to reproduce it.
It also seems general goodness. Now the numerous AAs that don't need
target library info don't carry it around and depend on it. I think
I can even rip out the general access to the aggregation layer and only
expose that in BasicAA as it is the only place where we re-query in that
manner.
However, this is a non-trivial change to the AA infrastructure so I want
to get some additional eyes on this before it lands. Sadly, it can't
wait long because we should really cherry pick this into 3.8 if we're
going to go this route.
Differential Revision: http://reviews.llvm.org/D17329
llvm-svn: 262490
We have a number of useful lowering strategies for VBROADCAST instructions (both from memory and register element 0) which the 128-bit form of the MOVDDUP instruction can make use of.
This patch tweaks lowerVectorShuffleAsBroadcast to enable it to broadcast 2f64 args using MOVDDUP as well.
It does require a slight tweak to the lowerVectorShuffleAsBroadcast mechanism as the existing MOVDDUP lowering uses isShuffleEquivalent which can match binary shuffles that can lower to (unary) broadcasts.
Differential Revision: http://reviews.llvm.org/D17680
llvm-svn: 262478
This is going to be used in .hsatext disassembler and can be used
in current assembler parser (lit tests passed on parsing).
Code using this helpers isn't included in this patch.
Benefits:
unified approach
fast field name lookup on parsing
Later I would like to enhance some of the field naming/syntax using this code.
Patch by: Valery Pykhtin
Differential Revision: http://reviews.llvm.org/D17150
llvm-svn: 262473
We modeled the RDFLAGS{32,64} operations as "using" {E,R}FLAGS.
While technically correct, this is not be desirable for folks who want
to examine aspects of the FLAGS register which are not related to
computation like whether or not CPUID is a valid instruction.
Differential Revision: http://reviews.llvm.org/D17782
llvm-svn: 262465
For some instructions the register is not the last operand and the immediate handling had to detect this and hardcode the index to find it. It also required CurOp to be pointing at the last operand handled in the Form switch whereas for any instruction it would be pointing at the next operand.
Now we just capture the value in the Form switch when we know exactly where it is and the CurOp pointer can behave normally.
llvm-svn: 262462
Fix checking the same instruction twice instead of the
second branch that uses vccz. I don't think this matters
currently because s_branch_vccnz is always used currently.
llvm-svn: 262457
For some reason MSVC seems to think I'm calling getConstant() from a
static context. Try to avoid this issue by explicitly specifying
'this->' (though I'm not confident that this will actually work).
llvm-svn: 262451
Have ScalarEvolution::getRange re-consider cases like "{C?A:B,+,C?P:Q}"
by factoring out "C" and computing RangeOf{A,+,P} union RangeOf({B,+,Q})
instead.
The latter can be easier to compute precisely in cases like
"{C?0:N,+,C?1:-1}" N is the backedge taken count of the loop; since in
such cases the latter form simplifies to [0,N+1) union [0,N+1).
llvm-svn: 262438
As noted in the code comment, I don't think we can do the same transform that we do for
*scalar* integers comparisons to *vector* integers comparisons because it might pessimize
the general case.
Exhibit A for an incomplete integer comparison ISA remains x86 SSE/AVX: it only has EQ and GT
for integer vectors.
But we should now recognize all the variants of this construct and produce the optimal code
for the cases shown in:
https://llvm.org/bugs/show_bug.cgi?id=26701
llvm-svn: 262424
Summary: SampleProfile pass needs to be performed after InstructionCombiningPass, which helps eliminate un-inlinable function calls.
Reviewers: davidxl, dnovillo
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D17742
llvm-svn: 262419
On AMDGPU where operations i64 operations are often bitcasted to v2i32
and back, this pattern shows up regularly where it breaks some
expected combines on i64, such as load width reducing.
This fixes some test failures in a future commit when i64 loads
are changed to promote.
llvm-svn: 262397
This adds some missing generic schedule info definitions, enables
completeness checking for cyclone and fixes a typo uncovered by that.
Differential Revision: http://reviews.llvm.org/D17748
llvm-svn: 262393
This isn't quite NFC because some of the SDLocs may change which could
cause scheduling differences. But no regression tests are affected and
there is no functional change intended.
llvm-svn: 262391
Revert r262248 in an attempt to fix the clang-native-aarch64-full
bot and to investigate a performance regression in
SingleSource/Benchmarks/CoyoteBench/huffbench
llvm-svn: 262388
This reverts commit r262316.
It seems that my change breaks an out-of-tree chromium buildbot, so
I'm reverting this in order to investigate the situation further.
llvm-svn: 262387
TableGen checks at compiletime that for scheduling models with
"CompleteModel = 1" one of the following holds:
- Is marked with the hasNoSchedulingInfo flag
- The instruction is a subclass of Sched
- There are InstRW definitions in the scheduling model
Typical steps necessary to complete a model:
- Ensure all pseudo instructions that are expanded before machine
scheduling (usually everything handled with EmitYYY() functions in
XXXTargetLowering).
- If a CPU does not support some instructions mark the corresponding
resource unsupported: "WriteRes<WriteXXX, []> { let Unsupported = 1; }".
- Add missing scheduling information.
Differential Revision: http://reviews.llvm.org/D17747
llvm-svn: 262384
This introduces a new flag that indicates that a specific instruction
will never be present when the MachineScheduler runs and therefore needs
no scheduling information.
This is in preparation for an upcoming commit which checks completeness
of a scheduling model when tablegen runs.
Differential Revision: http://reviews.llvm.org/D17728
llvm-svn: 262383
Summary:
Tablegen was unable to determine that param loads/stores were actually
reading or writing from memory. I think this isn't a problem in
practice for param stores, because those occur in a block right before
we make our call. But param loads don't have to at the very beginning
of a function, so should be annotated as mayLoad so we don't incorrectly
optimize them.
Reviewers: jholewinski
Subscribers: jholewinski, llvm-commits
Differential Revision: http://reviews.llvm.org/D17471
llvm-svn: 262381
Summary: Looks like this was caused by a typo.
Reviewers: jholewinski
Subscribers: jholewinski, llvm-commits, tra
Differential Revision: http://reviews.llvm.org/D17357
llvm-svn: 262380
Summary:
Calls sometimes need to be convergent. This is already handled at the
LLVM IR level, but it also needs to be handled at the MI level.
Ideally we'd propagate convergence from instructions, down through the
selection DAG, and into MIs. But this is Hard, and would affect
optimizations in the SDNs -- right now only SDNs with two operands have
any flags at all.
Instead, here's a much simpler hack: Add new opcodes for NVPTX for
convergent calls, and generate these when lowering convergent LLVM
calls.
Reviewers: jholewinski
Subscribers: jholewinski, chandlerc, joker.eph, jhen, tra, llvm-commits
Differential Revision: http://reviews.llvm.org/D17423
llvm-svn: 262373
Summary:
Also simplify some of the embedded C++ logic.
No functional changes.
Reviewers: jholewinski
Subscribers: llvm-commits, tra, jholewinski
Differential Revision: http://reviews.llvm.org/D17354
llvm-svn: 262371
The _chkstk function is called by the compiler to probe the stack in an
order consistent with Windows' expectations. However, it is possible to
elide the call to _chkstk and manually adjust the stack pointer if we
can prove that the allocation is fixed size and smaller than the probe
size.
This shrinks chrome.dll, chrome_child.dll and chrome.exe by a
cummulative ~133 KB.
Differential Revision: http://reviews.llvm.org/D17679
llvm-svn: 262370
Summary:
This adds the beginning of an update API to preserve MemorySSA. In particular,
this patch adds a way to remove memory SSA accesses when instructions are
deleted.
It also adds relevant unit testing infrastructure for MemorySSA's API.
(There is an actual user of this API, i will make that diff dependent on this one. In practice, a ton of opt passes remove memory instructions, so it's hopefully an obviously useful API :P)
Reviewers: hfinkel, reames, george.burgess.iv
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D17157
llvm-svn: 262362
CIE augmentation data might contain non-printable characters.
The patch prints the data as a list of hex bytes.
Differential Revision: http://reviews.llvm.org/D17759
llvm-svn: 262361
This adds support to convert ProfileSummary object to Metadata and create a
ProfileSummary object from metadata. This would allow attaching profile summary
information to Module allowing optimization passes to use it.
llvm-svn: 262360
Summary:
This patch impleemnts DS_PERMUTE/DS_BPERMUTE instruction definitions and intrinsics,
which are new since VI.
Reviewers: tstellarAMD, arsenm
Subscribers: llvm-commits, arsenm
Differential Revision: http://reviews.llvm.org/D17614
llvm-svn: 262356
In the code below on 32-bit targets, x would previously get forwarded to g()
without sign-extension to 32 bits as required by the parameter attribute.
void g(signed short);
void f(unsigned short x) {
g(x);
}
llvm-svn: 262352
This patch fixes calculating correct value for builtin_object_size function
when pointer is used only in builtin_object_size function call and never
after that.
Patch by Strahinja Petrovic.
Differential Revision: http://reviews.llvm.org/D17337
llvm-svn: 262337
Idea behind this change is to make code shorter and as much common for all targets as possible. Let's even accept more code than is valid for a particular target, leaving it for the assembler to sort out.
64bit instructions decoding added.
Error\warning messages on unrecognized instructions operands added, InstPrinter allowed to print invalid operands helping to find invalid/unsupported code.
The change is massive and hard to compare with previous version, so it makes sense just to take a look on the new version. As a bonus, with a few TD changes following, it disassembles the majority of instructions. Currently it fully disassembles >300K binary source of some blas kernel.
Previous TODOs were saved whenever possible.
Patch by: Valery Pykhtin
Differential Revision: http://reviews.llvm.org/D17720
llvm-svn: 262332
Function lto_module_create_in_local_context() would previously
rely on the default LLVMContext being created for it by
LTOModule::makeLTOModule(). This context exits the program on
error and is not arranged to update sLastStringError in
tools/lto/lto.cpp.
Function lto_module_create_in_local_context() now creates an
LLVMContext by itself, sets it up correctly to its needs and then
passes it to LTOModule::createInLocalContext() which takes
ownership of the context and keeps it present for the lifetime of
the returned LTOModule.
Function LTOModule::makeLTOModule() is modified to take a
reference to LLVMContext (instead of a pointer) and no longer
creates a default context when nullptr is passed to it. Method
LTOModule::createInContext() that takes a pointer to LLVMContext
is removed because it allows to pass a nullptr to it. Instead
LTOModule::createFromBuffer() (that takes a reference to
LLVMContext) should be used.
Differential Revision: http://reviews.llvm.org/D17715
llvm-svn: 262330
Summary:
This patch modifies the existing comparison, branch, conditional-move
and select patterns, and adds new ones where needed. Also, the updated
SLT{u,i,iu} set of instructions generate a GPR width result.
The majority of the code changes in the Mips back-end fix the wrong
assumption that the result of SETCC nodes always produce an i32 value.
The changes in the common code path account for the fact that in 64-bit
MIPS targets, i1 is promoted to i32 instead of i64.
Reviewers: dsanders
Subscribers: dsanders, llvm-commits
Differential Revision: http://reviews.llvm.org/D10970
llvm-svn: 262316
Previosy, if actual instruction have one of optional operands then other optional operands listed before this also should be presented.
For example instruction v_fract_f32 v0, v1, mul:2 have one optional operand - OMod and do not have optional operand clamp. Previously this was not allowed because clamp is listed before omod in AsmString:
string AsmString = "v_fract_f32$vdst, $src0_modifiers$clamp$omod";
Making this work required some hacks (both OMod and Clamp match classes have same PredicateMethod).
Now, if MatchInstructionImpl meets formal optional operand that is not presented in actual instruction it skips this formal operand and tries to match current actual operand with next formal.
Patch by: Sam Kolton
Review: http://reviews.llvm.org/D17568
[AMDGPU] Assembler: Check immediate types for several optional operands in predicate methods
With this change you should place optional operands in order specified by asm string:
clamp -> omod
offset -> glc -> slc -> tfe
Fixes for several tests.
Depends on D17568
Patch by: Sam Kolton
Review: http://reviews.llvm.org/D17644
llvm-svn: 262314
Technically you aren't supposed to emit these after type legalization
for some reason, and we use vector extracts of bitcasted integers
as the canonical way to do this.
llvm-svn: 262298
This currently does not have the control over the bitwidth,
and there are missing optimizations to reduce the integer to
32-bit if it can be.
But in most situations we do want the sinking to occur.
llvm-svn: 262296
The CatchObjOffset is relative to the end of the EH registration node
for 32-bit x86 WinEH targets. A special sentinel value, 0, is used to
indicate that no catch object should be initialized.
This means that a catch object allocated immediately before the
registration node would be assigned a CatchObjOffset of 0, leading the
runtime to believe that a catch object should not be initialized.
To handle this, allocate the registration node prior to any other frame
object. This will ensure that catch objects will not be allocated
before the registration node.
This fixes PR26757.
Differential Revision: http://reviews.llvm.org/D17689
llvm-svn: 262294
Generally speaking, this can only happen with unreachable code.
However, neglecting to check for this condition would lead us to loop
forever.
llvm-svn: 262284
The intended effect of this patch in conjunction with:
http://reviews.llvm.org/rL259392http://reviews.llvm.org/rL260145
is that customers using the AVX intrinsics in C will benefit from combines when
the load mask is constant:
__m128 mload_zeros(float *f) {
return _mm_maskload_ps(f, _mm_set1_epi32(0));
}
__m128 mload_fakeones(float *f) {
return _mm_maskload_ps(f, _mm_set1_epi32(1));
}
__m128 mload_ones(float *f) {
return _mm_maskload_ps(f, _mm_set1_epi32(0x80000000));
}
__m128 mload_oneset(float *f) {
return _mm_maskload_ps(f, _mm_set_epi32(0x80000000, 0, 0, 0));
}
...so none of the above will actually generate a masked load for optimized code.
This is the masked load counterpart to:
http://reviews.llvm.org/rL262064
llvm-svn: 262269
Combinations of suffixes that look useful are actually ignored;
complaining about them will avoid mistakes.
Differential Revision: http://reviews.llvm.org/D17587
llvm-svn: 262263
Summary:
I re-benchmarked this and results are similar to original results in
D13259:
On ARM64:
SingleSource/Benchmarks/Polybench/linear-algebra/solvers/dynprog -59.27%
SingleSource/Benchmarks/Polybench/stencils/adi -19.78%
On x86:
SingleSource/Benchmarks/Polybench/linear-algebra/solvers/dynprog -27.14%
And of course the original ~20% gain on SPECint_2006/456.hmmer with Loop
Distribution.
In terms of compile time, there is ~5% increase on both
SingleSource/Benchmarks/Misc/oourafft and
SingleSource/Benchmarks/Linkpack/linkpack-pc. These are both very tiny
loop-intensive programs where SCEV computations dominates compile time.
The reason that time spent in SCEV increases has to do with the design
of the old pass manager. If a transform pass does not preserve an
analysis we *invalidate* the analysis even if there was *no*
modification made by the transform pass.
This means that currently we don't take advantage of LLE and LV sharing
the same analysis (LAA) and unfortunately we recompute LAA *and* SCEV
for LLE.
(There should be a way to work around this limitation in the case of
SCEV and LAA since both compute things on demand and internally cache
their result. Thus we could pretend that transform passes preserve
these analyses and manually invalidate them upon actual modification.
On the other hand the new pass manager is supposed to solve so I am not
sure if this is worthwhile.)
Reviewers: hfinkel, dberlin
Subscribers: dberlin, reames, mssimpso, aemerson, joker.eph, llvm-commits
Differential Revision: http://reviews.llvm.org/D16300
llvm-svn: 262250
When a variable is described by a single DBG_VALUE instruction we can
often use a more efficient inline DW_AT_location instead of using a
location list.
This commit makes the heuristic that decides when to apply this
optimization stricter by also verifying that the DBG_VALUE is live at the
entry of the function (instead of just checking that it is valid until
the end of the function).
<rdar://problem/24611008>
llvm-svn: 262247
Summary:
Rename the section embeds bitcode from ".llvmbc,.llvmbc" to "__LLVM,__bitcode".
The new name matches MachO section naming convention.
Reviewers: rafael, pcc
Subscribers: davide, llvm-commits, joker.eph
Differential Revision: http://reviews.llvm.org/D17388
llvm-svn: 262245
This is long-standing dirtiness, as acknowledged by r77582:
The current trick is to select it into a merge_values with
the first definition being an implicit_def. The proper solution is
to add new ISD opcodes for the no-output variant.
Doing this before selection will let us combine away some constructs.
Differential Revision: http://reviews.llvm.org/D17659
llvm-svn: 262244
32-bit X86 EH on Windows utilizes a stack of registration nodes
allocated and deallocated on entry/exit. A registration node contains a
bunch of EH personality specific information like which try-state we are
currently in.
Because a setjmp target allows control flow from arbitrary program
points, there is no way to ensure that the try-state we are in is
correctly updated once we transfer control.
MSVC compatible compilers, like MSVC and ICC, utilize runtime helpers to
reinitialize the try-state when a longjmp occurs. This is implemented
by adding additional arguments to _setjmp3: the desired try-state and
a helper routine to update the try-state.
Differential Revision: http://reviews.llvm.org/D17721
llvm-svn: 262241
Corresponds to Phabricator review:
http://reviews.llvm.org/D16592
This fix includes both an update to how we handle the "generic" CPU on LE
systems as well as Anton's fix for the Fast Isel issue.
llvm-svn: 262233
Summary:
The bug was that dextu's operand 3 would print 0-31 instead of 32-63 when
printing assembly. This came up when replacing
MipsInstPrinter::printUnsignedImm() with a version that could handle arbitrary
bit widths.
MipsAsmPrinter::printUnsignedImm*() don't seem to be used so they have been
removed.
Reviewers: vkalintiris
Subscribers: dsanders, llvm-commits
Differential Revision: http://reviews.llvm.org/D15521
llvm-svn: 262231
Summary:
Previously, it would always select DEXT and substitute any invalid matches
for DEXTU/DEXTM during MipsMCCodeEmitter::encodeInstruction(). This works
but causes problems when adding range checked immediates to IAS.
Now isel selects the correct variant up front.
Reviewers: vkalintiris
Subscribers: dsanders, llvm-commits
Differential Revision: http://reviews.llvm.org/D16810
llvm-svn: 262229
in the PassBuilder.
These are really just stubs for now, but they give a nice API surface
that Clang or other tools can start learning about and enabling for
experimentation.
I've also wired up parsing various synthetic module pass names to
generate these set pipelines. This allows the pipelines to be combined
with other passes and have their order controlled, with clear separation
between the *kind* of canned pipeline, and the *level* of optimization
to be used within that canned pipeline.
The most interesting part of this patch is almost certainly the spec for
the different optimization levels. I don't think we can ever have hard
and fast rules that would make it easy to determine whether a particular
optimization makes sense at a particular level -- it will always be in
large part a judgement call. But hopefully this will outline the
expected rationale that should be used, and the direction that the
pipelines should be taken. Much of this was based on a long llvm-dev
discussion I started years ago to try and crystalize the intent behind
these pipelines, and now, at long long last I'm returning to the task of
actually writing it down somewhere that we can cite and try to be
consistent with.
Differential Revision: http://reviews.llvm.org/D12826
llvm-svn: 262196
The maximum private allocation for the whole GPU is 4G,
so the maximum possible index for a single workitem is the
maximum size divided by the smallest granularity for a dispatch.
This increases the number of known zero high bits, which
enables more offset folding. The maximum private size per
workitem with this is 128M but may be smaller still.
llvm-svn: 262153
InlineSpiller::rematerializeFor() never uses its parameter as an
iterator, so take it by reference instead. This removes an implicit
conversion from MachineBasicBlock::iterator to MachineInstr*.
llvm-svn: 262152
Change MachineInstr API to prefer MachineInstr& over MachineInstr*
whenever the parameter is expected to be non-null. Slowly inching
toward being able to fix PR26753.
llvm-svn: 262149
In the case where op = add, y = base_ptr, and x = offset, this
transform:
(op y, (op x, c1)) -> (op (op x, y), c1)
breaks the canonical form of add by putting the base pointer in the
second operand and the offset in the first.
This fix is important for the R600 target, because for some address
spaces the base pointer and the offset are stored in separate register
classes. The old pattern caused the ISel code for matching addressing
modes to put the base pointer and offset in the wrong register classes,
which required no-trivial code transformations to fix.
llvm-svn: 262148
Take parameters as MachineInstr& instead of MachineInstr* in
AntiDepBreaker API, since these are required to be non-null. No
functionality change intended. Looking toward PR26753.
llvm-svn: 262145
In all but one case, change the DFAPacketizer API to take MachineInstr&
instead of MachineInstr*. In DFAPacketizer::endPacket(), take
MachineBasicBlock::iterator. Besides cleaning up the API, this is in
search of PR26753.
llvm-svn: 262142
Update APIs in MachineInstrBundle.h to take and return MachineInstr&
instead of MachineInstr* when the instruction cannot be null. Besides
being a nice cleanup, this is tacking toward a fix for PR26753.
llvm-svn: 262141
Previous check-in message was:
The patch adds missing registers and instructions to complete all the registers supported by the Sparc v8 manual.
These are all co-processor registers, with the exception of the floating-point deferred-trap queue register.
Although these will not be lowered automatically by any instructions, it allows the use of co-processor
instructions implemented by inline-assembly.
Code Reviewed at http://reviews.llvm.org/D17133, with the exception of a very small change in brace placement in SparcInstrInfo.td,
which was formerly causing a problem in the disassembly of the %fq register.
llvm-svn: 262135
These are all co-processor registers, with the exception of the floating-point deferred-trap queue register.
Although these will not be lowered automatically by any instructions, it allows the use of co-processor
instructions implemented by inline-assembly.
Code Reviewed at http://reviews.llvm.org/D17133, with the exception of a very small change in brace placement in SparcInstrInfo.td,
which was formerly causing a problem in the disassembly of the %fq register.
llvm-svn: 262133
manager as some compilers print the typedef name and others print the
"canonical" name of the underlying class template.
This isn't really an important artifact of the test anyways so it seems
fine to just loosen the test assertions here.
llvm-svn: 262129
manager proxies and use those rather than repeating their definition
four times.
There are real differences between the two directions: outer AMs are
const and don't need to have invalidation tracked. But every proxy in
a particular direction is identical except for the analysis manager type
and the IR unit they proxy into. This makes them prime candidates for
nice templates.
I've started introducing explicit template instantiation declarations
and definitions as well because we really shouldn't be emitting all this
everywhere. I'm going to go back and add the same for the other
templates like this in a follow-up patch.
I've left the analysis manager as an opaque type rather than using two
IR units and requiring it to be an AnalysisManager template
specialization. I think its important that users retain the ability to
provide their own custom analysis management layer and provided it has
the appropriate API everything should Just Work.
llvm-svn: 262127
This matches the behavior of the HSAIL clock instruction.
s_realmemtime is used if the subtarget supports it, and falls
back to s_memtime if not.
Also introduces new intrinsics for each of s_memtime / s_memrealtime.
llvm-svn: 262119
Avoid another implicit conversion from MachineInstrBundleIterator to
MachineInstr*, this time in MachineInstrBuilder.h (this is in pursuit of
PR26753).
llvm-svn: 262118
Take MachineInstr by reference instead of by pointer in SlotIndexes and
the SlotIndex wrappers in LiveIntervals. The MachineInstrs here are
never null, so this cleans up the API a bit. It also incidentally
removes a few implicit conversions from MachineInstrBundleIterator to
MachineInstr* (see PR26753).
At a couple of call sites it was convenient to convert to a range-based
for loop over MachineBasicBlock::instr_begin/instr_end, so I added
MachineBasicBlock::instrs.
llvm-svn: 262115
Summary:
The PS4 linker seems to handle this fine.
Hi David, it seems that indeed most ELF linkers support
__{start,stop}_SECNAME, as our proprietary linker does as well.
This follows the pattern of r250679 w.r.t. the testing.
Maggie, Phillip, Paul: I've tested this with the PS4 SDK 3.5 toolchain
prerelease and it seems to work fine.
Reviewers: davidxl
Subscribers: probinson, phillip.power, MaggieYi
Differential Revision: http://reviews.llvm.org/D17672
llvm-svn: 262112
merged into a loop that was subsequently unrolled (or otherwise nuked).
In this case it can't merge in the ASTs for any remaining nested loops,
it needs to re-add their instructions dircetly.
The fix is very isolated, but I've pulled the code for merging blocks
into the AST into a single place in the process. The only behavior
change is in the case which would have crashed before.
This fixes a crash reported by Mikael Holmen on the list after r261316
restored much of the loop pass pipelining and allowed us to actually do
this kind of nested transformation sequenc. I've taken that test case
and further reduced it into the somewhat twisty maze of loops in the
included test case. This does in fact trigger the bug even in this
reduced form.
llvm-svn: 262108
Summary:
Without tree pruning clang has 2,667,552 points.
Wiht only dominators pruning: 1,515,586.
With both dominators & predominators pruning: 1,340,534.
Differential Revision: http://reviews.llvm.org/D17671
llvm-svn: 262103
Combinations of suffixes that look useful actually are ignored;
complaining about them will avoid mistakes.
Differential Revision: http://reviews.llvm.org/D17587
llvm-svn: 262092
Most of this is fairly straight forward. Add handling for min/max via existing matcher utility and ConstantRange routines. Add handling for clamp by exploiting condition constraints on inputs.
Note that I'm only handling two constant ranges at this point. It would be reasonable to consider treating overdefined as a full range if the instruction is typed as an integer, but that should be a separate change.
Differential Revision: http://reviews.llvm.org/D17184
llvm-svn: 262085
Currently we always expand ISD::FNEG. For v4f32 and v2f64 vector types VSX has
native support for this opcode
Phabricator: http://reviews.llvm.org/D17647
llvm-svn: 262079
This allows a user to specify "Native" as a target when configuring LLVM. Native will resolve to the LLVM_NATIVE_ARCH, which is the target that supports code generation for the host.
llvm-svn: 262070
This is needed to connect dependencies between the LLVMgold plugin and the clang stage-2 builds due to limitations in ExternalProject_Add.
Patch by Mike Edwards
Differential Revision: http://reviews.llvm.org/D17655
llvm-svn: 262067
The intended effect of this patch in conjunction with:
http://reviews.llvm.org/rL259392http://reviews.llvm.org/rL260145
is that customers using the AVX intrinsics in C will benefit from combines when
the store mask is constant:
void mstore_zero_mask(float *f, __m128 v) {
_mm_maskstore_ps(f, _mm_set1_epi32(0), v);
}
void mstore_fake_ones_mask(float *f, __m128 v) {
_mm_maskstore_ps(f, _mm_set1_epi32(1), v);
}
void mstore_ones_mask(float *f, __m128 v) {
_mm_maskstore_ps(f, _mm_set1_epi32(0x80000000), v);
}
void mstore_one_set_elt_mask(float *f, __m128 v) {
_mm_maskstore_ps(f, _mm_set_epi32(0x80000000, 0, 0, 0), v);
}
...so none of the above will actually generate a masked store for optimized code.
Differential Revision: http://reviews.llvm.org/D17485
llvm-svn: 262064
Summary: This is extracted from D17555
Reviewers: davidxl, reames, sanjoy, MatzeB, pete
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D17580
llvm-svn: 262058
This should save a pointer of padding from all MSVC Value subclasses.
Recall that MSVC will not pack the following bitfields together:
unsigned Bits : 29;
unsigned Flag1 : 1;
unsigned Flag2 : 1;
unsigned Flag3 : 1;
Add a static_assert because LLVM developers always trip over this
behavior. This regressed in June.
llvm-svn: 262045
This patch updates cmake build scripts to build on Haiku. It adds Haiku x86_64 to config.guess.
Please consider reviewing.
Pathc by Jérôme Duval.
llvm-svn: 262038
This is one of the cases shown in:
https://llvm.org/bugs/show_bug.cgi?id=26701
Shift and negate is what InstCombine appears to prefer, so I've started with that pattern.
Note that the 'pcmpeq' instructions are always generating the negative one for the actual
'pcmpgt' comparison in each case (side note: why isn't there an alias mnemonic for that?).
Differential Revision: http://reviews.llvm.org/D17630
llvm-svn: 262036
MBB slot index intervals are half open, not closed. getMBBEndIndex()
returns the slot index of the start of the next block in layout order.
Placing a register mask there is incorrect if the successor of the
funclet return is not laid out after the return. Clang generates IR for
catch bodies before generating the following normal code, so we never
noticed this issue until the D frontend authors filed a bug about it.
Instead, we can put the clobber mask on the last instruction of the
funclet return block. We still aren't using a register mask operand on
the CATCHRET instruction because it would cause PEI to spill all CSRs,
including XMM regs, in the prologue.
Fixes PR26679.
llvm-svn: 262035
classes changed whether the decltype of these expressions was
a reference. I'm somewhat horrified why, and there may need to be
a deeper fix on MSVC, but this should at least get the bots a step
further.
llvm-svn: 262008
analyses in the new pass manager.
These just handle really basic stuff: turning a type name into a string
statically that is nice to print in logs, and getting a static unique ID
for each analysis.
Sadly, the format of passes in anonymous namespaces makes using their
names in tests really annoying so I've customized the names of the no-op
passes to keep tests sane to read.
This is the first of a few simplifying refactorings for the new pass
manager that should reduce boilerplate and confusion.
llvm-svn: 262004
Add parsing and printing of image operands. Matches legacy sp3 assembler.
Change image instruction order to have data/image/sampler operands in the beginning. This is needed because optional operands in MC are always last.
Update SITargetLowering for new order.
Add basic MC test.
Update CodeGen tests.
Review: http://reviews.llvm.org/D17574
llvm-svn: 261995
Instead of the convoluted if-statment we can just use getColor. This also fixes
a bug where we relied upon the parity of tablegen-generated register indexes
(instead of using the machine encoding).
llvm-svn: 261990
These diagnostics aren't perfect - in the case of merging several dwos
into dwps and those dwps into more dwps - just getting the message about
the original source file name might not be much help (since it's the
same in both dwos, by definition - but doesn't tell you which chain of
dwps to backtrack)
It might be worth adding the DW_AT_dwo_id to the split debug info to
improve the diagnostic experience - might help track down the duplicates
better.
llvm-svn: 261988