Summary:
I've been carrying this change around with me for a while, because the if ()
managed to confuse me while following the code. All callers ensure that the
assertion holds.
Reviewers: arsenm, tstellarAMD
Subscribers: arsenm, llvm-commits
Differential Revision: http://reviews.llvm.org/D19042
llvm-svn: 266344
FastRegAlloc works only at the basic-block level and spills all live-out
registers. Unfortunately for a stack-based cmpxchg near the spill slots, this
can perpetually clear the exclusive monitor, which means the cmpxchg will never
succeed.
I believe the only way to handle this within LLVM is by expanding the loop
post-regalloc. We don't want this in general because it severely limits the
optimisations that can be done, so we limit this to -O0 compilations.
It's an ugly hack, and about the one good point in the whole mess is that we
can treat all cmpxchg operations in the most naive way possible (seq_cst, no
clrex faff) without affecting correctness.
Should fix PR25526.
llvm-svn: 266339
Summary:
For GL_ARB_compute_shader we need to support workgroup sizes of at least 1024. However, if we want to allow large workgroup sizes, we may need to use less registers, as we have to run more waves per SIMD.
This patch adds an attribute to specify the maximum work group size the compiled program needs to support. It defaults, to 256, as that has no wave restrictions.
Reducing the number of registers available is done similarly to how the registers were reserved for chips with the sgpr init bug.
Reviewers: mareko, arsenm, tstellarAMD, nhaehnle
Subscribers: FireBurn, kerberizer, llvm-commits, arsenm
Differential Revision: http://reviews.llvm.org/D18340
Patch By: Bas Nieuwenhuizen
llvm-svn: 266337
Summary:
The code previously always used s1 as it was using the user + system SGPR
information for compute kernels. This is incorrect for Mesa shaders though,
The register should be the next SGPR after all user and system SGPR's.
We use that Mesa adds arguments for all input and system SGPR's and
take the next available SGPR for the scratch wave offset register.
Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewers: mareko, arsenm, nhaehnle, tstellarAMD
Subscribers: qcolombet, arsenm, llvm-commits
Differential Revision: http://reviews.llvm.org/D18941
Patch By: Bas Nieuwenhuizen
llvm-svn: 266336
Summary:
Add a print method to Predicated Scalar Evolution which prints all interesting
transformations done by PSE.
Loop Access Analysis will now print this as part of the analysis output.
We now use this to check the exact expression transformations that were done
by PSE in LAA.
The additional checking also acts as white-box testing for the getAsAddRec method.
Reviewers: anemet, sanjoy
Subscribers: sanjoy, mzolotukhin, llvm-commits
Differential Revision: http://reviews.llvm.org/D18792
llvm-svn: 266334
Alias 'jic $reg, 0' to 'jrc $reg' and 'jialc $reg, 0' to 'jalrc $reg' like
binutils.
This patch was previous committed as r266055 as seemed to have caused some spurious
test failures. They did not reappear after further local testing.
llvm-svn: 266301
Summary:
The only difference between the removed tests and the pre-existing
ones, is the materialization of the zero constant, which shouldn't
matter for these cases.
Reviewers: dsanders, sdardis
Subscribers: dsanders, sdardis, llvm-commits
Differential Revision: http://reviews.llvm.org/D18693
llvm-svn: 266285
The behavior of {MIN,MAX}NAN differs from that of {MIN,MAX}NUM when only
one of the inputs is NaN: -NUM will return the non-NaN argument while
-NAN would return NaN.
It is desirable to lower to @llvm.{min,max}num to -NAN if they don't
have a native instruction for -NUM. Notably, ARMv7 NEON's vmin has the
-NAN semantics.
N.B. Of course, it is only safe to do this if the intrinsic call is
marked nnan.
llvm-svn: 266279
This code was creating a new type in the global context, regardless
of which context the user is sitting in, what can possibly go wrong?
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 266275
At some point, ARM stopped getting any benefit from ConstantHoisting because
the pass called a different variant of getIntImmCost. Reimplementing the
correct variant revealed some problems, however:
+ ConstantHoisting was modifying switch statements. This is simply invalid,
the cases must remain integer constants no matter the notional cost.
+ ConstantHoisting was mangling alloca instructions in the entry block. These
should be handled by FrameLowering, so constants actually have a cost of 0.
Worse, the resulting bitcasts meant they became dynamic allocas.
rdar://25707382
llvm-svn: 266260
Fix a major bug from r265456. Although it's now much rarer, ValueMapper
sometimes has to duplicate cycles. The
might-transitively-reference-a-temporary counts don't decrement on their
own when there are cycles, and you need to call MDNode::resolveCycles to
fix it.
r265456 was checking the input nodes to see if they were unresolved.
This is useless; they should never be unresolved. Instead we should
check the output nodes and resolve cycles on them.
llvm-svn: 266258
Summary: LLVMAttribute has outlived its utility and is becoming a problem for C API users that what to use all the LLVM attributes. In order to help moving away from LLVMAttribute in a smooth manner, this diff introduce LLVMGetAttrKindIDInContext, which can be used instead of the enum values.
Reviewers: Wallbraker, whitequark, joker.eph, echristo
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D18749
llvm-svn: 266257
An unsigned 2 bit bitfield takes 4 bytes in MSVC. Instead of a bitfield,
just use an unsigned char. We can go back to a bitfield when someone
implements the TODO of exposing and reusing the remaining 6 bits.
llvm-svn: 266256
It is very likely that the swiftself parameter is alive throughout most
functions function so putting it into a callee save register should
avoid spills for the callers with only a minimum amount of extra spills
in the callees.
Currently the generated code is correct but unnecessarily spills and
reloads arguments passed in callee save registers, I will address this
in upcoming patches.
This also adds a missing check that for tail calls the preserved value
of the caller must be the same as the callees parameter.
Differential Revision: http://reviews.llvm.org/D18901
llvm-svn: 266253
It is very likely that the swiftself parameter is alive throughout most
functions function so putting it into a callee save register should
avoid spills for the callers with only a minimum amount of extra spills
in the callees.
Currently the generated code is correct but unnecessarily spills and
reloads arguments passed in callee save registers, I will address this
in upcoming patches.
This also adds a missing check that for tail calls the preserved value
of the caller must be the same as the callees parameter.
Differential Revision: http://reviews.llvm.org/D18902
llvm-svn: 266252
It is very likely that the swiftself parameter is alive throughout most
functions function so putting it into a callee save register should
avoid spills for the callers with only a minimum amount of extra spills
in the callees.
Currently the generated code is correct but unnecessarily spills and
reloads arguments passed in callee save registers, I will address this
in upcoming patches.
This also adds a missing check that for tail calls the preserved value
of the caller must be the same as the callees parameter.
Differential Revision: http://reviews.llvm.org/D19007
llvm-svn: 266251
And update the existing test cases in test/Object/macho-invalid.test
to use llvm-objdump with the -macho option to produce these
error messages and stop producing the generic "Invalid data
was encountered while parsing the file" message.
Working from the beginning of the file, if the mach header is too large for
the size of the file and then if the load commands that follow extend past
the end of the file these two errors now generate correct error messages.
Both of these have existing test cases in test/Object/macho-invalid.test .
But the first with macho-invalid-header it will never trigger the error message
"mach header extends past the end of the file" using any of the llvm tools as
they all use identify_magic() which rejects files with the correct magic number
that are too small in size. So I tested this by hacking that code and seeing the
error message down in parseHeader() really does happen. So in case there
is ever code in llvm that directly calls createMachOObjectFile() this error
message will be correctly produced.
The second error message of "load commands extends past the end of the file"
is triggered by a number of existing tests cases in test/Object/macho-invalid.test .
Also other tests trigger different error messages now like "ilocalsym plus
nlocalsym in LC_DYSYMTAB load command extends past the end of the
symbol table".
There are two existing test cases that still get the "Invalid data was encountered ..."
error messages that I will tackle next. But they will involve a bit of pluming an
Expect<...> up through the call stack and I want to do those as separate changes.
FYI, for those test cases that were trying to test specific errors that now get
different errors I’ll fix those in follow on changes and create new test cases
for those so they test the error they were meant to test.
llvm-svn: 266248
Summary:
When we are spilling SGPRs to scratch memory, we usually don't have
free SGPRs to do the address calculation, so we need to re-use the
ScratchOffset register for the calculation.
Reviewers: arsenm
Subscribers: arsenm, llvm-commits
Differential Revision: http://reviews.llvm.org/D18917
llvm-svn: 266244
A DISubprogram on x86_64 was 48 bytes. During an LTO build we
end up allocating *a lot* of these (see Duncan's numbers on
llvm-dev and/or my numbers in the review link).
This change reduces the size to 40 bytes, with a nice effect
on peak memory usage when LTO'ing clang.
There are more classes in the hierarchy which can be compacted
so more patches will come. DISubprogram was the biggest offender
in my profiling, anyway.
Differential Revision: http://reviews.llvm.org/D18918
llvm-svn: 266241
Since we can't emit diagnostics for missing "jmp 1f" labels until the end of
the file, we need to be able to restore the context used to calculate
file/line. This is basically the "# line file" directive that's being used at
the time the expression is seen.
rdar://25706972
llvm-svn: 266238
LLVM optimization passes may reduce a profiled target expression
to a constant. Removing runtime calls at such instrumentation points
would help speedup the runtime of the instrumented program.
llvm-svn: 266229
This patch corresponds to review:
http://reviews.llvm.org/D17850
This patch implements the following instructions:
cmprb, cmpeqb, cnttzw, cnttzw., cnttzd, cnttzd.
llvm-svn: 266228
Disable LDP/STP for quads on Exynos M1 as they are not as efficient as pairs
of regular LDR/STR.
Patch by Abderrazek Zaafrani <a.zaafrani@samsung.com>.
llvm-svn: 266223
This patch fixes a bug (PR26827) when using anti-aliasing in store
merging. This sets the chain users of the component stores to point to
the new store instead of the component stores chain parent.
Reviewers: jyknight
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D18909
llvm-svn: 266217
Summary:
To be able to work accurately on the reference graph when taking decision
about internalizing, promoting, renaming, etc. We need to have the alias
information explicit.
Reviewers: tejohnson
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D18836
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 266214
Tests added along with implemented feature.
Note that there is a small leftover of unecessary MI sheduling issue
(more info in the review). CodeGen/AMDGPU/salu-to-valu.ll updated to fix
the false regression.
TODO: Support for TTMP quads, comma-separated syntax in "[]" and more.
Differential Revision: http://reviews.llvm.org/D17825
llvm-svn: 266205
Summary:
This is a special case for MIPS64 because the architecture requires
properly 32-bit sign-extended values in the register containers.
Additionaly, we merge consecutive trunc + AssertZExt nodes in order
to avoid unnecessary sign-extensions when the extension comes from a
type smaller than i32.
Reviewers: dsanders
Subscribers: dsanders, sdardis, llvm-commits
Differential Revision: http://reviews.llvm.org/D18893
llvm-svn: 266203
This patch fixes calculating of builtin_object_size if it depends on a
condition. Before this patch compiler did not know how to calculate the
object size when it finds a condition that cannot be eliminated.
This patch enables calculating of builtin_object_size even in case when
condition cannot be eliminated by choosing minimum or maximum value as a
result from condition. Choosing minimum or maximum value from condition
is based on the second argument of __builtin_object_size function.
Patch by Strahinja Petrovic.
Differential Revision: http://reviews.llvm.org/D18438
llvm-svn: 266193
Differential Revision: http://reviews.llvm.org/D17137
This patch was reverted after the revertion of dependant patch http://reviews.llvm.org/D17068.
There was the problem with test-suite failure.
The problem is hopefully solved with dependant patch so this patch is commited again.
llvm-svn: 266179
Remove an ad-hoc transform in InstCombine and replace it with more
general machinery (ValueTracking, InstructionSimplify and VectorUtils).
This fixes PR27332.
llvm-svn: 266175
It is now only doing the update to the llvm.compiler_used global.
The client has to call separately the internalization stage.
Hopefully the code is simpler to understand this way.
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 266174
Differential Revision: http://reviews.llvm.org/D17068
This changes contains fix for failing test-suite. So, this patch should hopefully work now.
llvm-svn: 266171
This will save a bunch of copies / initialization of intermediate
datastructure, and (hopefully) simplify the code.
This also abstract the symbol preservation mechanism outside of the
Internalization pass into the client code, which is not forced
to keep a map of strings for instance (ThinLTO will prefere hashes).
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 266163
two fixes with one about error verify-regalloc reported, and
another about live range update of phi after rematerialization.
r265547:
Replace analyzeSiblingValues with new algorithm to fix its compile
time issue. The patch is to solve PR17409 and its duplicates.
analyzeSiblingValues is a N x N complexity algorithm where N is
the number of siblings generated by reg splitting. Although it
causes siginificant compile time issue when N is large, it is also
important for performance since it removes redundent spills and
enables rematerialization.
To solve the compile time issue, the patch removes analyzeSiblingValues
and replaces it with lower cost alternatives containing two parts. The
first part creates a new spill hoisting method in postOptimization of
register allocation. It does spill hoisting at once after all the spills
are generated instead of inside every instance of selectOrSplit. The
second part queries the define expr of the original register for
rematerializaiton and keep it always available during register allocation
even if it is already dead. It deletes those dead instructions only in
postOptimization. With the two parts in the patch, it can remove
analyzeSiblingValues without sacrificing performance.
Patches on top of r265547:
r265610 "Fix the compare-clang diff error introduced by r265547."
r265639 "Fix the sanitizer bootstrap error in r265547."
r265657 "InlineSpiller.cpp: Escap \@ in r265547. [-Wdocumentation]"
Differential Revision: http://reviews.llvm.org/D15302
Differential Revision: http://reviews.llvm.org/D18934
Differential Revision: http://reviews.llvm.org/D18935
Differential Revision: http://reviews.llvm.org/D18936
llvm-svn: 266162
Initialization of m0 is emitted for each LDS operation, so
every block with LDS usage ends up with one. MachineLICM
used to fail to hoist this out of the loop, so every loop
iteration with LDS usage in it would re-initialize it.
This seems to be fixed now, so add a test to make sure that
it stays this way.
llvm-svn: 266156
Summary:
It seems like this was broken in r252327. I thought we had test cases
for this, but it's really hard to tirgger spills of this exact register
size since they aren't used very much.
Reviewers: arsenm, nhaehnle
Subscribers: nhaehnle, arsenm, llvm-commits
Differential Revision: http://reviews.llvm.org/D19021
llvm-svn: 266152
This state is no longer useful and not guaranteed to be valid in later
codegen passes. For example, see the added test, which would print a
savepoint of %bb.-1 without this change, and crashes with a
use-after-free error under ASan if you apply the recycling allocator
patch from llvm.org/PR26808.
llvm-svn: 266150
This bug was introduced with:
http://reviews.llvm.org/rL262269
AVX masked loads are specified to set vector lanes to zero when the high bit of the mask
element for that lane is zero:
"If the mask is 0, the corresponding data element is set to zero in the load form of these
instructions, and unmodified in the store form." --Intel manual
Differential Revision: http://reviews.llvm.org/D19017
llvm-svn: 266148
Summary:
For correct handling of alias to nameless
function, we need to be able to refer them through a GUID in the summary.
Here we name them using a hash of the non-private global names in the module.
Reviewers: tejohnson
Subscribers: joker.eph, llvm-commits
Differential Revision: http://reviews.llvm.org/D18883
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 266132
Summary:
Let keep llvm-as "dumb": it converts textual IR to bitcode. This
commit removes the dependency from llvm-as to libLLVMAnalysis.
We'll add back summary in llvm-as if we get to a textual
representation for it at some point. In the meantime, opt seems
like a better place for that.
Reviewers: tejohnson
Subscribers: joker.eph, llvm-commits
Differential Revision: http://reviews.llvm.org/D19032
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 266131
This fixes two use-after-frees in selectLEA64_32Addr. If matchAddress
matches an ADD with an AND as an operand, and that AND hits one of the
"heroic transforms" that folds masks and shifts, we end up with N
pointing to an SDNode that was deleted. Make sure we're done accessing
it before that.
Found by ASan with the recycling allocator changes in llvm.org/PR26808.
llvm-svn: 266130
Summary:
They correspond to BUFFER_LOAD/STORE_DWORD[_X2,X3,X4] and mostly behave like
llvm.amdgcn.buffer.load/store.format. They will be used by Mesa for SSBO and
atomic counters at least when robust buffer access behavior is desired.
(These instructions perform no format conversion and do buffer range checking
per component.)
As a side effect of sharing patterns with llvm.amdgcn.buffer.store.format,
it has become trivial to add support for the f32 and v2f32 variants of that
intrinsic, so the patch does so.
Also DAG-ify (and fix) some tests that I noticed intermittent failures in
while developing this patch.
Some tests were (temporarily) adjusted for the required mayLoad/hasSideEffects
changes to the BUFFER_STORE_DWORD* instructions. See also
http://reviews.llvm.org/D18291.
Reviewers: arsenm, tstellarAMD, mareko
Subscribers: arsenm, llvm-commits
Differential Revision: http://reviews.llvm.org/D18292
llvm-svn: 266126
Summary:
The function import pass was computing all the imports for all the
modules in the index, and only using the imports for the current module.
Change this to instead compute only for the given module. This means
that the exports list can't be populated, but they weren't being used
anyway.
Longer term, the linker can collect all the imports and export lists
and serialize them out for consumption by the distributed backend
processes which use this pass.
Reviewers: joker.eph
Subscribers: llvm-commits, joker.eph
Differential Revision: http://reviews.llvm.org/D18945
llvm-svn: 266125
(Recommit of r266002, with r266011, r266016, and not accidentally
including an extra unused/uninitialized element in LibcallRoutineNames)
AtomicExpandPass can now lower atomic load, atomic store, atomicrmw, and
cmpxchg instructions to __atomic_* library calls, when the target
doesn't support atomics of a given size.
This is the first step towards moving all atomic lowering from clang
into llvm. When all is done, the behavior of __sync_* builtins,
__atomic_* builtins, and C11 atomics will be unified.
Previously LLVM would pass everything through to the ISelLowering
code. There, unsupported atomic instructions would turn into __sync_*
library calls. Because of that behavior, Clang currently avoids emitting
llvm IR atomic instructions when this would happen, and emits __atomic_*
library functions itself, in the frontend.
This change makes LLVM able to emit __atomic_* libcalls, and thus will
eventually allow clang to depend on LLVM to do the right thing.
It is advantageous to do the new lowering to atomic libcalls in
AtomicExpandPass, before ISel time, because it's important that all
atomic operations for a given size either lower to __atomic_*
libcalls (which may use locks), or native instructions which won't. No
mixing and matching.
At the moment, this code is enabled only for SPARC, as a
demonstration. The next commit will expand support to all of the other
targets.
Differential Revision: http://reviews.llvm.org/D18200
llvm-svn: 266115
Summary:
We will be able to handle this case much better once the hazard recognizer
is finished, but this conservative implementation fixes a hang with the piglit
test:
spec/arb_arrays_of_arrays/execution/sampler/fs-nested-struct-arrays-nonconst-nested-arra
Reviewers: arsenm, nhaehnle
Subscribers: arsenm, llvm-commits
Differential Revision: http://reviews.llvm.org/D18988
llvm-svn: 266105
This helps clean up some of the mess when expanding unaligned 64-bit
loads when changed to be promote to v2i32, and fixes situations
where or x, 0 was emitted after splitting 64-bit ors during moveToVALU.
I think this could be a generic combine but I'm not sure.
llvm-svn: 266104
Add a check to catch violations. ~60 tests were broken and prevented
this change to be committed. Adrian and I (thanks Adrian!) went
through them in the last week or so updating. The check can be
done more efficiently but I'd still like to get this in ASAP to
avoid more broken tests to be checked in (if any).
PR: 27101
llvm-svn: 266102