As noted in the code comment, I don't think we can do the same transform that we do for
*scalar* integers comparisons to *vector* integers comparisons because it might pessimize
the general case.
Exhibit A for an incomplete integer comparison ISA remains x86 SSE/AVX: it only has EQ and GT
for integer vectors.
But we should now recognize all the variants of this construct and produce the optimal code
for the cases shown in:
https://llvm.org/bugs/show_bug.cgi?id=26701
llvm-svn: 262424
Summary: SampleProfile pass needs to be performed after InstructionCombiningPass, which helps eliminate un-inlinable function calls.
Reviewers: davidxl, dnovillo
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D17742
llvm-svn: 262419
On AMDGPU where operations i64 operations are often bitcasted to v2i32
and back, this pattern shows up regularly where it breaks some
expected combines on i64, such as load width reducing.
This fixes some test failures in a future commit when i64 loads
are changed to promote.
llvm-svn: 262397
Revert r262248 in an attempt to fix the clang-native-aarch64-full
bot and to investigate a performance regression in
SingleSource/Benchmarks/CoyoteBench/huffbench
llvm-svn: 262388
This reverts commit r262316.
It seems that my change breaks an out-of-tree chromium buildbot, so
I'm reverting this in order to investigate the situation further.
llvm-svn: 262387
Summary:
Calls sometimes need to be convergent. This is already handled at the
LLVM IR level, but it also needs to be handled at the MI level.
Ideally we'd propagate convergence from instructions, down through the
selection DAG, and into MIs. But this is Hard, and would affect
optimizations in the SDNs -- right now only SDNs with two operands have
any flags at all.
Instead, here's a much simpler hack: Add new opcodes for NVPTX for
convergent calls, and generate these when lowering convergent LLVM
calls.
Reviewers: jholewinski
Subscribers: jholewinski, chandlerc, joker.eph, jhen, tra, llvm-commits
Differential Revision: http://reviews.llvm.org/D17423
llvm-svn: 262373
The _chkstk function is called by the compiler to probe the stack in an
order consistent with Windows' expectations. However, it is possible to
elide the call to _chkstk and manually adjust the stack pointer if we
can prove that the allocation is fixed size and smaller than the probe
size.
This shrinks chrome.dll, chrome_child.dll and chrome.exe by a
cummulative ~133 KB.
Differential Revision: http://reviews.llvm.org/D17679
llvm-svn: 262370
CIE augmentation data might contain non-printable characters.
The patch prints the data as a list of hex bytes.
Differential Revision: http://reviews.llvm.org/D17759
llvm-svn: 262361
Summary:
This patch impleemnts DS_PERMUTE/DS_BPERMUTE instruction definitions and intrinsics,
which are new since VI.
Reviewers: tstellarAMD, arsenm
Subscribers: llvm-commits, arsenm
Differential Revision: http://reviews.llvm.org/D17614
llvm-svn: 262356
In the code below on 32-bit targets, x would previously get forwarded to g()
without sign-extension to 32 bits as required by the parameter attribute.
void g(signed short);
void f(unsigned short x) {
g(x);
}
llvm-svn: 262352
This patch fixes calculating correct value for builtin_object_size function
when pointer is used only in builtin_object_size function call and never
after that.
Patch by Strahinja Petrovic.
Differential Revision: http://reviews.llvm.org/D17337
llvm-svn: 262337
Function lto_module_create_in_local_context() would previously
rely on the default LLVMContext being created for it by
LTOModule::makeLTOModule(). This context exits the program on
error and is not arranged to update sLastStringError in
tools/lto/lto.cpp.
Function lto_module_create_in_local_context() now creates an
LLVMContext by itself, sets it up correctly to its needs and then
passes it to LTOModule::createInLocalContext() which takes
ownership of the context and keeps it present for the lifetime of
the returned LTOModule.
Function LTOModule::makeLTOModule() is modified to take a
reference to LLVMContext (instead of a pointer) and no longer
creates a default context when nullptr is passed to it. Method
LTOModule::createInContext() that takes a pointer to LLVMContext
is removed because it allows to pass a nullptr to it. Instead
LTOModule::createFromBuffer() (that takes a reference to
LLVMContext) should be used.
Differential Revision: http://reviews.llvm.org/D17715
llvm-svn: 262330
Summary:
This patch modifies the existing comparison, branch, conditional-move
and select patterns, and adds new ones where needed. Also, the updated
SLT{u,i,iu} set of instructions generate a GPR width result.
The majority of the code changes in the Mips back-end fix the wrong
assumption that the result of SETCC nodes always produce an i32 value.
The changes in the common code path account for the fact that in 64-bit
MIPS targets, i1 is promoted to i32 instead of i64.
Reviewers: dsanders
Subscribers: dsanders, llvm-commits
Differential Revision: http://reviews.llvm.org/D10970
llvm-svn: 262316
Previosy, if actual instruction have one of optional operands then other optional operands listed before this also should be presented.
For example instruction v_fract_f32 v0, v1, mul:2 have one optional operand - OMod and do not have optional operand clamp. Previously this was not allowed because clamp is listed before omod in AsmString:
string AsmString = "v_fract_f32$vdst, $src0_modifiers$clamp$omod";
Making this work required some hacks (both OMod and Clamp match classes have same PredicateMethod).
Now, if MatchInstructionImpl meets formal optional operand that is not presented in actual instruction it skips this formal operand and tries to match current actual operand with next formal.
Patch by: Sam Kolton
Review: http://reviews.llvm.org/D17568
[AMDGPU] Assembler: Check immediate types for several optional operands in predicate methods
With this change you should place optional operands in order specified by asm string:
clamp -> omod
offset -> glc -> slc -> tfe
Fixes for several tests.
Depends on D17568
Patch by: Sam Kolton
Review: http://reviews.llvm.org/D17644
llvm-svn: 262314
This currently does not have the control over the bitwidth,
and there are missing optimizations to reduce the integer to
32-bit if it can be.
But in most situations we do want the sinking to occur.
llvm-svn: 262296
The CatchObjOffset is relative to the end of the EH registration node
for 32-bit x86 WinEH targets. A special sentinel value, 0, is used to
indicate that no catch object should be initialized.
This means that a catch object allocated immediately before the
registration node would be assigned a CatchObjOffset of 0, leading the
runtime to believe that a catch object should not be initialized.
To handle this, allocate the registration node prior to any other frame
object. This will ensure that catch objects will not be allocated
before the registration node.
This fixes PR26757.
Differential Revision: http://reviews.llvm.org/D17689
llvm-svn: 262294
Generally speaking, this can only happen with unreachable code.
However, neglecting to check for this condition would lead us to loop
forever.
llvm-svn: 262284
The intended effect of this patch in conjunction with:
http://reviews.llvm.org/rL259392http://reviews.llvm.org/rL260145
is that customers using the AVX intrinsics in C will benefit from combines when
the load mask is constant:
__m128 mload_zeros(float *f) {
return _mm_maskload_ps(f, _mm_set1_epi32(0));
}
__m128 mload_fakeones(float *f) {
return _mm_maskload_ps(f, _mm_set1_epi32(1));
}
__m128 mload_ones(float *f) {
return _mm_maskload_ps(f, _mm_set1_epi32(0x80000000));
}
__m128 mload_oneset(float *f) {
return _mm_maskload_ps(f, _mm_set_epi32(0x80000000, 0, 0, 0));
}
...so none of the above will actually generate a masked load for optimized code.
This is the masked load counterpart to:
http://reviews.llvm.org/rL262064
llvm-svn: 262269
Combinations of suffixes that look useful are actually ignored;
complaining about them will avoid mistakes.
Differential Revision: http://reviews.llvm.org/D17587
llvm-svn: 262263
When a variable is described by a single DBG_VALUE instruction we can
often use a more efficient inline DW_AT_location instead of using a
location list.
This commit makes the heuristic that decides when to apply this
optimization stricter by also verifying that the DBG_VALUE is live at the
entry of the function (instead of just checking that it is valid until
the end of the function).
<rdar://problem/24611008>
llvm-svn: 262247
Summary:
Rename the section embeds bitcode from ".llvmbc,.llvmbc" to "__LLVM,__bitcode".
The new name matches MachO section naming convention.
Reviewers: rafael, pcc
Subscribers: davide, llvm-commits, joker.eph
Differential Revision: http://reviews.llvm.org/D17388
llvm-svn: 262245
32-bit X86 EH on Windows utilizes a stack of registration nodes
allocated and deallocated on entry/exit. A registration node contains a
bunch of EH personality specific information like which try-state we are
currently in.
Because a setjmp target allows control flow from arbitrary program
points, there is no way to ensure that the try-state we are in is
correctly updated once we transfer control.
MSVC compatible compilers, like MSVC and ICC, utilize runtime helpers to
reinitialize the try-state when a longjmp occurs. This is implemented
by adding additional arguments to _setjmp3: the desired try-state and
a helper routine to update the try-state.
Differential Revision: http://reviews.llvm.org/D17721
llvm-svn: 262241
Corresponds to Phabricator review:
http://reviews.llvm.org/D16592
This fix includes both an update to how we handle the "generic" CPU on LE
systems as well as Anton's fix for the Fast Isel issue.
llvm-svn: 262233
Summary:
The bug was that dextu's operand 3 would print 0-31 instead of 32-63 when
printing assembly. This came up when replacing
MipsInstPrinter::printUnsignedImm() with a version that could handle arbitrary
bit widths.
MipsAsmPrinter::printUnsignedImm*() don't seem to be used so they have been
removed.
Reviewers: vkalintiris
Subscribers: dsanders, llvm-commits
Differential Revision: http://reviews.llvm.org/D15521
llvm-svn: 262231
Summary:
Previously, it would always select DEXT and substitute any invalid matches
for DEXTU/DEXTM during MipsMCCodeEmitter::encodeInstruction(). This works
but causes problems when adding range checked immediates to IAS.
Now isel selects the correct variant up front.
Reviewers: vkalintiris
Subscribers: dsanders, llvm-commits
Differential Revision: http://reviews.llvm.org/D16810
llvm-svn: 262229
in the PassBuilder.
These are really just stubs for now, but they give a nice API surface
that Clang or other tools can start learning about and enabling for
experimentation.
I've also wired up parsing various synthetic module pass names to
generate these set pipelines. This allows the pipelines to be combined
with other passes and have their order controlled, with clear separation
between the *kind* of canned pipeline, and the *level* of optimization
to be used within that canned pipeline.
The most interesting part of this patch is almost certainly the spec for
the different optimization levels. I don't think we can ever have hard
and fast rules that would make it easy to determine whether a particular
optimization makes sense at a particular level -- it will always be in
large part a judgement call. But hopefully this will outline the
expected rationale that should be used, and the direction that the
pipelines should be taken. Much of this was based on a long llvm-dev
discussion I started years ago to try and crystalize the intent behind
these pipelines, and now, at long long last I'm returning to the task of
actually writing it down somewhere that we can cite and try to be
consistent with.
Differential Revision: http://reviews.llvm.org/D12826
llvm-svn: 262196
The maximum private allocation for the whole GPU is 4G,
so the maximum possible index for a single workitem is the
maximum size divided by the smallest granularity for a dispatch.
This increases the number of known zero high bits, which
enables more offset folding. The maximum private size per
workitem with this is 128M but may be smaller still.
llvm-svn: 262153
In the case where op = add, y = base_ptr, and x = offset, this
transform:
(op y, (op x, c1)) -> (op (op x, y), c1)
breaks the canonical form of add by putting the base pointer in the
second operand and the offset in the first.
This fix is important for the R600 target, because for some address
spaces the base pointer and the offset are stored in separate register
classes. The old pattern caused the ISel code for matching addressing
modes to put the base pointer and offset in the wrong register classes,
which required no-trivial code transformations to fix.
llvm-svn: 262148
Previous check-in message was:
The patch adds missing registers and instructions to complete all the registers supported by the Sparc v8 manual.
These are all co-processor registers, with the exception of the floating-point deferred-trap queue register.
Although these will not be lowered automatically by any instructions, it allows the use of co-processor
instructions implemented by inline-assembly.
Code Reviewed at http://reviews.llvm.org/D17133, with the exception of a very small change in brace placement in SparcInstrInfo.td,
which was formerly causing a problem in the disassembly of the %fq register.
llvm-svn: 262135
These are all co-processor registers, with the exception of the floating-point deferred-trap queue register.
Although these will not be lowered automatically by any instructions, it allows the use of co-processor
instructions implemented by inline-assembly.
Code Reviewed at http://reviews.llvm.org/D17133, with the exception of a very small change in brace placement in SparcInstrInfo.td,
which was formerly causing a problem in the disassembly of the %fq register.
llvm-svn: 262133
manager as some compilers print the typedef name and others print the
"canonical" name of the underlying class template.
This isn't really an important artifact of the test anyways so it seems
fine to just loosen the test assertions here.
llvm-svn: 262129
manager proxies and use those rather than repeating their definition
four times.
There are real differences between the two directions: outer AMs are
const and don't need to have invalidation tracked. But every proxy in
a particular direction is identical except for the analysis manager type
and the IR unit they proxy into. This makes them prime candidates for
nice templates.
I've started introducing explicit template instantiation declarations
and definitions as well because we really shouldn't be emitting all this
everywhere. I'm going to go back and add the same for the other
templates like this in a follow-up patch.
I've left the analysis manager as an opaque type rather than using two
IR units and requiring it to be an AnalysisManager template
specialization. I think its important that users retain the ability to
provide their own custom analysis management layer and provided it has
the appropriate API everything should Just Work.
llvm-svn: 262127
This matches the behavior of the HSAIL clock instruction.
s_realmemtime is used if the subtarget supports it, and falls
back to s_memtime if not.
Also introduces new intrinsics for each of s_memtime / s_memrealtime.
llvm-svn: 262119
Summary:
The PS4 linker seems to handle this fine.
Hi David, it seems that indeed most ELF linkers support
__{start,stop}_SECNAME, as our proprietary linker does as well.
This follows the pattern of r250679 w.r.t. the testing.
Maggie, Phillip, Paul: I've tested this with the PS4 SDK 3.5 toolchain
prerelease and it seems to work fine.
Reviewers: davidxl
Subscribers: probinson, phillip.power, MaggieYi
Differential Revision: http://reviews.llvm.org/D17672
llvm-svn: 262112
merged into a loop that was subsequently unrolled (or otherwise nuked).
In this case it can't merge in the ASTs for any remaining nested loops,
it needs to re-add their instructions dircetly.
The fix is very isolated, but I've pulled the code for merging blocks
into the AST into a single place in the process. The only behavior
change is in the case which would have crashed before.
This fixes a crash reported by Mikael Holmen on the list after r261316
restored much of the loop pass pipelining and allowed us to actually do
this kind of nested transformation sequenc. I've taken that test case
and further reduced it into the somewhat twisty maze of loops in the
included test case. This does in fact trigger the bug even in this
reduced form.
llvm-svn: 262108
Combinations of suffixes that look useful actually are ignored;
complaining about them will avoid mistakes.
Differential Revision: http://reviews.llvm.org/D17587
llvm-svn: 262092
Most of this is fairly straight forward. Add handling for min/max via existing matcher utility and ConstantRange routines. Add handling for clamp by exploiting condition constraints on inputs.
Note that I'm only handling two constant ranges at this point. It would be reasonable to consider treating overdefined as a full range if the instruction is typed as an integer, but that should be a separate change.
Differential Revision: http://reviews.llvm.org/D17184
llvm-svn: 262085
Currently we always expand ISD::FNEG. For v4f32 and v2f64 vector types VSX has
native support for this opcode
Phabricator: http://reviews.llvm.org/D17647
llvm-svn: 262079
The intended effect of this patch in conjunction with:
http://reviews.llvm.org/rL259392http://reviews.llvm.org/rL260145
is that customers using the AVX intrinsics in C will benefit from combines when
the store mask is constant:
void mstore_zero_mask(float *f, __m128 v) {
_mm_maskstore_ps(f, _mm_set1_epi32(0), v);
}
void mstore_fake_ones_mask(float *f, __m128 v) {
_mm_maskstore_ps(f, _mm_set1_epi32(1), v);
}
void mstore_ones_mask(float *f, __m128 v) {
_mm_maskstore_ps(f, _mm_set1_epi32(0x80000000), v);
}
void mstore_one_set_elt_mask(float *f, __m128 v) {
_mm_maskstore_ps(f, _mm_set_epi32(0x80000000, 0, 0, 0), v);
}
...so none of the above will actually generate a masked store for optimized code.
Differential Revision: http://reviews.llvm.org/D17485
llvm-svn: 262064
This is one of the cases shown in:
https://llvm.org/bugs/show_bug.cgi?id=26701
Shift and negate is what InstCombine appears to prefer, so I've started with that pattern.
Note that the 'pcmpeq' instructions are always generating the negative one for the actual
'pcmpgt' comparison in each case (side note: why isn't there an alias mnemonic for that?).
Differential Revision: http://reviews.llvm.org/D17630
llvm-svn: 262036
MBB slot index intervals are half open, not closed. getMBBEndIndex()
returns the slot index of the start of the next block in layout order.
Placing a register mask there is incorrect if the successor of the
funclet return is not laid out after the return. Clang generates IR for
catch bodies before generating the following normal code, so we never
noticed this issue until the D frontend authors filed a bug about it.
Instead, we can put the clobber mask on the last instruction of the
funclet return block. We still aren't using a register mask operand on
the CATCHRET instruction because it would cause PEI to spill all CSRs,
including XMM regs, in the prologue.
Fixes PR26679.
llvm-svn: 262035
analyses in the new pass manager.
These just handle really basic stuff: turning a type name into a string
statically that is nice to print in logs, and getting a static unique ID
for each analysis.
Sadly, the format of passes in anonymous namespaces makes using their
names in tests really annoying so I've customized the names of the no-op
passes to keep tests sane to read.
This is the first of a few simplifying refactorings for the new pass
manager that should reduce boilerplate and confusion.
llvm-svn: 262004
Add parsing and printing of image operands. Matches legacy sp3 assembler.
Change image instruction order to have data/image/sampler operands in the beginning. This is needed because optional operands in MC are always last.
Update SITargetLowering for new order.
Add basic MC test.
Update CodeGen tests.
Review: http://reviews.llvm.org/D17574
llvm-svn: 261995
These diagnostics aren't perfect - in the case of merging several dwos
into dwps and those dwps into more dwps - just getting the message about
the original source file name might not be much help (since it's the
same in both dwos, by definition - but doesn't tell you which chain of
dwps to backtrack)
It might be worth adding the DW_AT_dwo_id to the split debug info to
improve the diagnostic experience - might help track down the duplicates
better.
llvm-svn: 261988
Though a bit odd, this is handy for a few reasons - for example, in a
build system that wants consistent input/output of build steps, but
where split-dwarf might be overriden/disabled by the user on a per-file
basis.
llvm-svn: 261987