Merge members that are describing the same member of the same ODR type,
even if other bits differ. If the file or line differ, we don't care;
if anything else differs, it's an ODR violation (and we still don't
really care).
For DISubprogram declarations, this looks at the LinkageName and Scope.
For DW_TAG_member instances of DIDerivedType, this looks at the Name and
Scope. In both cases, we know that the Scope follows ODR rules if it
has a non-empty identifier.
llvm-svn: 266548
Split up the long RUN and clarify the CHECK lines:
- Explicitly confirm there are no other subprograms inside of "A".
- Remove checks for "bar" and "baz", which were just implicitly
checking that there were no other subprograms inside of "A".
This prepares for adding a RUN line which links the two files in the
opposite direction.
llvm-svn: 266543
To be able to work accurately on the reference graph when taking
decision about internalizing, promoting, renaming, etc. We need
to have the alias information explicit.
Differential Revision: http://reviews.llvm.org/D18836
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 266517
At the moment almost every lit.site.cfg.in contains two lines comment:
## Autogenerated by LLVM/Clang configuration.
# Do not edit!
The patch adds variable LIT_SITE_CFG_IN_HEADER, that is replaced from
configure_lit_site_cfg with the note and some useful information.
llvm-svn: 266515
This resolves more frame indexes early and folds
the immediate offsets into the scratch mubuf instructions.
This cleans up a lot of the mess that's currently emitted,
such as emitting add 0s and repeatedly initializing the same
register to 0 when spilling.
llvm-svn: 266508
Because HoistSpillHelper::hoistAllSpills is called in postOptimization, before the
patch we didn't want LiveRangeEdit::eliminateDeadDefs to call splitSeparateComponents
and generate unassigned new vregs. However, skipping splitSeparateComponents will make
verify-machineinstrs unhappy, so I remove the early return, and use
HoistSpillHelper::LRE_DidCloneVirtReg to assign physreg/stackslot for those new vregs.
In addition, some code reorganization to make class HoistSpillHelper privately inheriting
from LiveRangeEdit::Delegate possible. This is to be consistent with class RAGreedy and
class RegisterCoalescer.
Differential Revision: http://reviews.llvm.org/D19142
llvm-svn: 266489
Allow explicit section for indirectly called functions in cfi-icall.
Jumptables for functions in the same type class must be contiguous, so they
always go to the default text section.
Fixes PR25079.
llvm-svn: 266486
After r245976, LLVM will skip the last bit test case if knows it will always be
true. However, we would still erroneously update PHI nodes with incoming values
from the MBB that would perform the final bit test, causing -verify-machineinstrs
to fail.
llvm-svn: 266479
Adds an interface to get ProfileSummary for a module and makes InlineCost use ProfileSummary to get max function count.
Differential Revision: http://reviews.llvm.org/D18622
llvm-svn: 266477
Divisions by a constant can be converted into multiplies which are usually
cheaper, but this isn't possible if the constant gets separated (particularly
in loops). Fix this by telling ConstantHoisting that the immediate in a DIV is
cheap.
I considered making the check generic, but neither AArch64 (strangely) nor x86
showed any benefit on the tests I had.
llvm-svn: 266464
This improves AA in the MI schduler when reason about paired instructions.
Phabricator Revision: http://reviews.llvm.org/D17098
PR26358
llvm-svn: 266462
InstCombine wants to optimize compares of calls to fabs with zero.
However, we didn't have the necessary legality checking to verify that
the function call had the same behavior as fabs.
llvm-svn: 266452
Currently each Function points to a DISubprogram and DISubprogram has a
scope field. For member functions the scope is a DICompositeType. DIScopes
point to the DICompileUnit to facilitate type uniquing.
Distinct DISubprograms (with isDefinition: true) are not part of the type
hierarchy and cannot be uniqued. This change removes the subprograms
list from DICompileUnit and instead adds a pointer to the owning compile
unit to distinct DISubprograms. This would make it easy for ThinLTO to
strip unneeded DISubprograms and their transitively referenced debug info.
Motivation
----------
Materializing DISubprograms is currently the most expensive operation when
doing a ThinLTO build of clang.
We want the DISubprogram to be stored in a separate Bitcode block (or the
same block as the function body) so we can avoid having to expensively
deserialize all DISubprograms together with the global metadata. If a
function has been inlined into another subprogram we need to store a
reference the block containing the inlined subprogram.
Attached to https://llvm.org/bugs/show_bug.cgi?id=27284 is a python script
that updates LLVM IR testcases to the new format.
http://reviews.llvm.org/D19034
<rdar://problem/25256815>
llvm-svn: 266446
This is almost identical to:
http://reviews.llvm.org/rL264527
This doesn't solve PR27344; it just allows the profile weights to survive.
To solve the bug, we need to use the profile weights in the backend.
llvm-svn: 266442
Summary:
Without MMOs, the callee-save load/store instructions were treated as
volatile by the MI post-RA scheduler and AArch64LoadStoreOptimizer.
Reviewers: t.p.northover, mcrosier
Subscribers: aemerson, rengolin, mcrosier, llvm-commits
Differential Revision: http://reviews.llvm.org/D17661
llvm-svn: 266439
[PPC] Previously when casting generic loads to LXV2DX/ST instructions we
would leave the original load return type in place allowing for an
assertion failure when we merge two equivalent LXV2DX nodes with
different types.
This fixes PR27350.
Reviewers: nemanjai
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D19133
llvm-svn: 266438
Perform store clustering just like load clustering. This change add
StoreClusterMutation in machine-scheduler. To control StoreClusterMutation,
added enableClusterStores() in TargetInstrInfo.h. This is enabled only on
AArch64 for now.
This change also add support for unscaled stores which were not handled in
getMemOpBaseRegImmOfs().
llvm-svn: 266437
The root of the problem was that findMainViewFileID(File, Function)
could return some ID for any given file, even though that file
was not the main file for that function.
This patch ensures that the result of this function is conformed
with the result of findMainViewFileID(Function).
Differential Revision: http://reviews.llvm.org/D18787
llvm-svn: 266436
Summary:
In the added test-case, the atomic instruction feeds into a non-machine
CopyToReg node which hasn't been selected yet, so guard against
non-machine opcodes here.
Reviewers: arsenm, tstellarAMD
Subscribers: arsenm, llvm-commits
Differential Revision: http://reviews.llvm.org/D19043
llvm-svn: 266433
Summary:
This lets us add this pass to the IR pass manager unconditionally; it
will simply not do anything on targets without branch divergence.
Reviewers: tra
Subscribers: llvm-commits, jingyue, rnk, chandlerc
Differential Revision: http://reviews.llvm.org/D18625
llvm-svn: 266398
If the size of an AST entry changes, we also need to make sure we perform
necessary alias set merges, as the new size may overlap pointers in other sets.
We happen to run into this with memset, because memset allows an entry for a
i8* pointer to have a decidedly non-i8 size.
This fixes PR27262.
Differential Revision: http://reviews.llvm.org/D18939
llvm-svn: 266381
Some SIMD implementations are not IEEE-754 compliant, for example ARM's NEON.
This patch teaches the loop vectorizer to only allow transformations of loops
that either contain no floating-point operations or have enough allowance
flags supporting lack of precision (ex. -ffast-math, Darwin).
For that, the target description now has a method which tells us if the
vectorizer is allowed to handle FP math without falling into unsafe
representations, plus a check on every FP instruction in the candidate loop
to check for the safety flags.
This commit makes LLVM behave like GCC with respect to ARM NEON support, but
it stops short of fixing the underlying problem: sub-normals. Neither GCC
nor LLVM have a flag for allowing sub-normal operations. Before this patch,
GCC only allows it using unsafe-math flags and LLVM allows it by default with
no way to turn it off (short of not using NEON at all).
As a first step, we push this change to make it safe and in sync with GCC.
The second step is to discuss a new sub-normal's flag on both communitues
and come up with a common solution. The third step is to improve the FastMath
flags in LLVM to encode sub-normals and use those flags to restrict NEON FP.
Fixes PR16275.
llvm-svn: 266363
https://llvm.org/bugs/show_bug.cgi?id=27105
We can check if all bits outside of a constant mask are set with a
single constant.
As noted in the bug report, although this form should be considered the
canonical IR, backends may want to transform this into an 'andn' / 'andc'
comparison against zero because that could be a single machine instruction.
Differential Revision: http://reviews.llvm.org/D18842
llvm-svn: 266362
Summary:
This adds the necessary target code to be able to run the ir translator.
Lowering function arguments and returns is a nop and there is no support
for RegBankSelect.
Reviewers: arsenm, qcolombet
Subscribers: arsenm, joker.eph, vkalintiris, llvm-commits
Differential Revision: http://reviews.llvm.org/D19077
llvm-svn: 266356
Summary:
If a PHI has an incoming undef, we can pretend that it is equal to one
non-undef, non-self incoming value.
This is particularly relevant in combination with the StructurizeCFG
pass, which introduces PHI nodes with undefs. Previously, this lead to
branch conditions that were uniform before StructurizeCFG to become
non-uniform afterwards, which confused the SIAnnotateControlFlow
pass.
This fixes a crash when Mesa radeonsi compiles a shader from
dEQP-GLES3.functional.shaders.switch.switch_in_for_loop_dynamic_vertex
Reviewers: arsenm, tstellarAMD, jingyue
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D19013
llvm-svn: 266347
Summary:
This pass is unnecessary and overly conservative. It was motivated by
situations like
def %vreg0:SGPR_32
...
if-block:
..
def %vreg1:SGPR_32
...
else-block:
...
use %vreg0:SGPR_32
...
and similar situations with uses after the non-uniform control flow, where
we are not allowed to assign %vreg0 and %vreg1 to the same physical register,
even though in the original, thread/workitem-based CFG, it looks like the
live ranges of these registers do not overlap.
However, by the time register allocation runs, we have moved to a wave-based
CFG that accurately represents the fact that the wave may run through both
the if- and the else-block. So the live ranges of %vreg0 and %vreg1 already
overlap even without the SIFixSGPRLiveRanges pass.
In addition to proving this change correct, I have tested it with Piglit
and a small number of other tests.
Reviewers: arsenm, tstellarAMD
Subscribers: MatzeB, arsenm, llvm-commits
Differential Revision: http://reviews.llvm.org/D19041
llvm-svn: 266345
FastRegAlloc works only at the basic-block level and spills all live-out
registers. Unfortunately for a stack-based cmpxchg near the spill slots, this
can perpetually clear the exclusive monitor, which means the cmpxchg will never
succeed.
I believe the only way to handle this within LLVM is by expanding the loop
post-regalloc. We don't want this in general because it severely limits the
optimisations that can be done, so we limit this to -O0 compilations.
It's an ugly hack, and about the one good point in the whole mess is that we
can treat all cmpxchg operations in the most naive way possible (seq_cst, no
clrex faff) without affecting correctness.
Should fix PR25526.
llvm-svn: 266339
Summary:
For GL_ARB_compute_shader we need to support workgroup sizes of at least 1024. However, if we want to allow large workgroup sizes, we may need to use less registers, as we have to run more waves per SIMD.
This patch adds an attribute to specify the maximum work group size the compiled program needs to support. It defaults, to 256, as that has no wave restrictions.
Reducing the number of registers available is done similarly to how the registers were reserved for chips with the sgpr init bug.
Reviewers: mareko, arsenm, tstellarAMD, nhaehnle
Subscribers: FireBurn, kerberizer, llvm-commits, arsenm
Differential Revision: http://reviews.llvm.org/D18340
Patch By: Bas Nieuwenhuizen
llvm-svn: 266337
Summary:
The code previously always used s1 as it was using the user + system SGPR
information for compute kernels. This is incorrect for Mesa shaders though,
The register should be the next SGPR after all user and system SGPR's.
We use that Mesa adds arguments for all input and system SGPR's and
take the next available SGPR for the scratch wave offset register.
Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewers: mareko, arsenm, nhaehnle, tstellarAMD
Subscribers: qcolombet, arsenm, llvm-commits
Differential Revision: http://reviews.llvm.org/D18941
Patch By: Bas Nieuwenhuizen
llvm-svn: 266336
Summary:
Add a print method to Predicated Scalar Evolution which prints all interesting
transformations done by PSE.
Loop Access Analysis will now print this as part of the analysis output.
We now use this to check the exact expression transformations that were done
by PSE in LAA.
The additional checking also acts as white-box testing for the getAsAddRec method.
Reviewers: anemet, sanjoy
Subscribers: sanjoy, mzolotukhin, llvm-commits
Differential Revision: http://reviews.llvm.org/D18792
llvm-svn: 266334
Alias 'jic $reg, 0' to 'jrc $reg' and 'jialc $reg, 0' to 'jalrc $reg' like
binutils.
This patch was previous committed as r266055 as seemed to have caused some spurious
test failures. They did not reappear after further local testing.
llvm-svn: 266301
Summary:
The only difference between the removed tests and the pre-existing
ones, is the materialization of the zero constant, which shouldn't
matter for these cases.
Reviewers: dsanders, sdardis
Subscribers: dsanders, sdardis, llvm-commits
Differential Revision: http://reviews.llvm.org/D18693
llvm-svn: 266285
The behavior of {MIN,MAX}NAN differs from that of {MIN,MAX}NUM when only
one of the inputs is NaN: -NUM will return the non-NaN argument while
-NAN would return NaN.
It is desirable to lower to @llvm.{min,max}num to -NAN if they don't
have a native instruction for -NUM. Notably, ARMv7 NEON's vmin has the
-NAN semantics.
N.B. Of course, it is only safe to do this if the intrinsic call is
marked nnan.
llvm-svn: 266279
At some point, ARM stopped getting any benefit from ConstantHoisting because
the pass called a different variant of getIntImmCost. Reimplementing the
correct variant revealed some problems, however:
+ ConstantHoisting was modifying switch statements. This is simply invalid,
the cases must remain integer constants no matter the notional cost.
+ ConstantHoisting was mangling alloca instructions in the entry block. These
should be handled by FrameLowering, so constants actually have a cost of 0.
Worse, the resulting bitcasts meant they became dynamic allocas.
rdar://25707382
llvm-svn: 266260
It is very likely that the swiftself parameter is alive throughout most
functions function so putting it into a callee save register should
avoid spills for the callers with only a minimum amount of extra spills
in the callees.
Currently the generated code is correct but unnecessarily spills and
reloads arguments passed in callee save registers, I will address this
in upcoming patches.
This also adds a missing check that for tail calls the preserved value
of the caller must be the same as the callees parameter.
Differential Revision: http://reviews.llvm.org/D18901
llvm-svn: 266253
It is very likely that the swiftself parameter is alive throughout most
functions function so putting it into a callee save register should
avoid spills for the callers with only a minimum amount of extra spills
in the callees.
Currently the generated code is correct but unnecessarily spills and
reloads arguments passed in callee save registers, I will address this
in upcoming patches.
This also adds a missing check that for tail calls the preserved value
of the caller must be the same as the callees parameter.
Differential Revision: http://reviews.llvm.org/D18902
llvm-svn: 266252
It is very likely that the swiftself parameter is alive throughout most
functions function so putting it into a callee save register should
avoid spills for the callers with only a minimum amount of extra spills
in the callees.
Currently the generated code is correct but unnecessarily spills and
reloads arguments passed in callee save registers, I will address this
in upcoming patches.
This also adds a missing check that for tail calls the preserved value
of the caller must be the same as the callees parameter.
Differential Revision: http://reviews.llvm.org/D19007
llvm-svn: 266251
And update the existing test cases in test/Object/macho-invalid.test
to use llvm-objdump with the -macho option to produce these
error messages and stop producing the generic "Invalid data
was encountered while parsing the file" message.
Working from the beginning of the file, if the mach header is too large for
the size of the file and then if the load commands that follow extend past
the end of the file these two errors now generate correct error messages.
Both of these have existing test cases in test/Object/macho-invalid.test .
But the first with macho-invalid-header it will never trigger the error message
"mach header extends past the end of the file" using any of the llvm tools as
they all use identify_magic() which rejects files with the correct magic number
that are too small in size. So I tested this by hacking that code and seeing the
error message down in parseHeader() really does happen. So in case there
is ever code in llvm that directly calls createMachOObjectFile() this error
message will be correctly produced.
The second error message of "load commands extends past the end of the file"
is triggered by a number of existing tests cases in test/Object/macho-invalid.test .
Also other tests trigger different error messages now like "ilocalsym plus
nlocalsym in LC_DYSYMTAB load command extends past the end of the
symbol table".
There are two existing test cases that still get the "Invalid data was encountered ..."
error messages that I will tackle next. But they will involve a bit of pluming an
Expect<...> up through the call stack and I want to do those as separate changes.
FYI, for those test cases that were trying to test specific errors that now get
different errors I’ll fix those in follow on changes and create new test cases
for those so they test the error they were meant to test.
llvm-svn: 266248
Since we can't emit diagnostics for missing "jmp 1f" labels until the end of
the file, we need to be able to restore the context used to calculate
file/line. This is basically the "# line file" directive that's being used at
the time the expression is seen.
rdar://25706972
llvm-svn: 266238
LLVM optimization passes may reduce a profiled target expression
to a constant. Removing runtime calls at such instrumentation points
would help speedup the runtime of the instrumented program.
llvm-svn: 266229
This patch corresponds to review:
http://reviews.llvm.org/D17850
This patch implements the following instructions:
cmprb, cmpeqb, cnttzw, cnttzw., cnttzd, cnttzd.
llvm-svn: 266228
Disable LDP/STP for quads on Exynos M1 as they are not as efficient as pairs
of regular LDR/STR.
Patch by Abderrazek Zaafrani <a.zaafrani@samsung.com>.
llvm-svn: 266223
This patch fixes a bug (PR26827) when using anti-aliasing in store
merging. This sets the chain users of the component stores to point to
the new store instead of the component stores chain parent.
Reviewers: jyknight
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D18909
llvm-svn: 266217
Summary:
To be able to work accurately on the reference graph when taking decision
about internalizing, promoting, renaming, etc. We need to have the alias
information explicit.
Reviewers: tejohnson
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D18836
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 266214
Tests added along with implemented feature.
Note that there is a small leftover of unecessary MI sheduling issue
(more info in the review). CodeGen/AMDGPU/salu-to-valu.ll updated to fix
the false regression.
TODO: Support for TTMP quads, comma-separated syntax in "[]" and more.
Differential Revision: http://reviews.llvm.org/D17825
llvm-svn: 266205
Summary:
This is a special case for MIPS64 because the architecture requires
properly 32-bit sign-extended values in the register containers.
Additionaly, we merge consecutive trunc + AssertZExt nodes in order
to avoid unnecessary sign-extensions when the extension comes from a
type smaller than i32.
Reviewers: dsanders
Subscribers: dsanders, sdardis, llvm-commits
Differential Revision: http://reviews.llvm.org/D18893
llvm-svn: 266203
This patch fixes calculating of builtin_object_size if it depends on a
condition. Before this patch compiler did not know how to calculate the
object size when it finds a condition that cannot be eliminated.
This patch enables calculating of builtin_object_size even in case when
condition cannot be eliminated by choosing minimum or maximum value as a
result from condition. Choosing minimum or maximum value from condition
is based on the second argument of __builtin_object_size function.
Patch by Strahinja Petrovic.
Differential Revision: http://reviews.llvm.org/D18438
llvm-svn: 266193
Differential Revision: http://reviews.llvm.org/D17137
This patch was reverted after the revertion of dependant patch http://reviews.llvm.org/D17068.
There was the problem with test-suite failure.
The problem is hopefully solved with dependant patch so this patch is commited again.
llvm-svn: 266179
Remove an ad-hoc transform in InstCombine and replace it with more
general machinery (ValueTracking, InstructionSimplify and VectorUtils).
This fixes PR27332.
llvm-svn: 266175
Differential Revision: http://reviews.llvm.org/D17068
This changes contains fix for failing test-suite. So, this patch should hopefully work now.
llvm-svn: 266171
two fixes with one about error verify-regalloc reported, and
another about live range update of phi after rematerialization.
r265547:
Replace analyzeSiblingValues with new algorithm to fix its compile
time issue. The patch is to solve PR17409 and its duplicates.
analyzeSiblingValues is a N x N complexity algorithm where N is
the number of siblings generated by reg splitting. Although it
causes siginificant compile time issue when N is large, it is also
important for performance since it removes redundent spills and
enables rematerialization.
To solve the compile time issue, the patch removes analyzeSiblingValues
and replaces it with lower cost alternatives containing two parts. The
first part creates a new spill hoisting method in postOptimization of
register allocation. It does spill hoisting at once after all the spills
are generated instead of inside every instance of selectOrSplit. The
second part queries the define expr of the original register for
rematerializaiton and keep it always available during register allocation
even if it is already dead. It deletes those dead instructions only in
postOptimization. With the two parts in the patch, it can remove
analyzeSiblingValues without sacrificing performance.
Patches on top of r265547:
r265610 "Fix the compare-clang diff error introduced by r265547."
r265639 "Fix the sanitizer bootstrap error in r265547."
r265657 "InlineSpiller.cpp: Escap \@ in r265547. [-Wdocumentation]"
Differential Revision: http://reviews.llvm.org/D15302
Differential Revision: http://reviews.llvm.org/D18934
Differential Revision: http://reviews.llvm.org/D18935
Differential Revision: http://reviews.llvm.org/D18936
llvm-svn: 266162
Initialization of m0 is emitted for each LDS operation, so
every block with LDS usage ends up with one. MachineLICM
used to fail to hoist this out of the loop, so every loop
iteration with LDS usage in it would re-initialize it.
This seems to be fixed now, so add a test to make sure that
it stays this way.
llvm-svn: 266156
This state is no longer useful and not guaranteed to be valid in later
codegen passes. For example, see the added test, which would print a
savepoint of %bb.-1 without this change, and crashes with a
use-after-free error under ASan if you apply the recycling allocator
patch from llvm.org/PR26808.
llvm-svn: 266150
This bug was introduced with:
http://reviews.llvm.org/rL262269
AVX masked loads are specified to set vector lanes to zero when the high bit of the mask
element for that lane is zero:
"If the mask is 0, the corresponding data element is set to zero in the load form of these
instructions, and unmodified in the store form." --Intel manual
Differential Revision: http://reviews.llvm.org/D19017
llvm-svn: 266148
Summary:
For correct handling of alias to nameless
function, we need to be able to refer them through a GUID in the summary.
Here we name them using a hash of the non-private global names in the module.
Reviewers: tejohnson
Subscribers: joker.eph, llvm-commits
Differential Revision: http://reviews.llvm.org/D18883
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 266132
Summary:
Let keep llvm-as "dumb": it converts textual IR to bitcode. This
commit removes the dependency from llvm-as to libLLVMAnalysis.
We'll add back summary in llvm-as if we get to a textual
representation for it at some point. In the meantime, opt seems
like a better place for that.
Reviewers: tejohnson
Subscribers: joker.eph, llvm-commits
Differential Revision: http://reviews.llvm.org/D19032
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 266131
Summary:
They correspond to BUFFER_LOAD/STORE_DWORD[_X2,X3,X4] and mostly behave like
llvm.amdgcn.buffer.load/store.format. They will be used by Mesa for SSBO and
atomic counters at least when robust buffer access behavior is desired.
(These instructions perform no format conversion and do buffer range checking
per component.)
As a side effect of sharing patterns with llvm.amdgcn.buffer.store.format,
it has become trivial to add support for the f32 and v2f32 variants of that
intrinsic, so the patch does so.
Also DAG-ify (and fix) some tests that I noticed intermittent failures in
while developing this patch.
Some tests were (temporarily) adjusted for the required mayLoad/hasSideEffects
changes to the BUFFER_STORE_DWORD* instructions. See also
http://reviews.llvm.org/D18291.
Reviewers: arsenm, tstellarAMD, mareko
Subscribers: arsenm, llvm-commits
Differential Revision: http://reviews.llvm.org/D18292
llvm-svn: 266126
(Recommit of r266002, with r266011, r266016, and not accidentally
including an extra unused/uninitialized element in LibcallRoutineNames)
AtomicExpandPass can now lower atomic load, atomic store, atomicrmw, and
cmpxchg instructions to __atomic_* library calls, when the target
doesn't support atomics of a given size.
This is the first step towards moving all atomic lowering from clang
into llvm. When all is done, the behavior of __sync_* builtins,
__atomic_* builtins, and C11 atomics will be unified.
Previously LLVM would pass everything through to the ISelLowering
code. There, unsupported atomic instructions would turn into __sync_*
library calls. Because of that behavior, Clang currently avoids emitting
llvm IR atomic instructions when this would happen, and emits __atomic_*
library functions itself, in the frontend.
This change makes LLVM able to emit __atomic_* libcalls, and thus will
eventually allow clang to depend on LLVM to do the right thing.
It is advantageous to do the new lowering to atomic libcalls in
AtomicExpandPass, before ISel time, because it's important that all
atomic operations for a given size either lower to __atomic_*
libcalls (which may use locks), or native instructions which won't. No
mixing and matching.
At the moment, this code is enabled only for SPARC, as a
demonstration. The next commit will expand support to all of the other
targets.
Differential Revision: http://reviews.llvm.org/D18200
llvm-svn: 266115
Summary:
We will be able to handle this case much better once the hazard recognizer
is finished, but this conservative implementation fixes a hang with the piglit
test:
spec/arb_arrays_of_arrays/execution/sampler/fs-nested-struct-arrays-nonconst-nested-arra
Reviewers: arsenm, nhaehnle
Subscribers: arsenm, llvm-commits
Differential Revision: http://reviews.llvm.org/D18988
llvm-svn: 266105
This helps clean up some of the mess when expanding unaligned 64-bit
loads when changed to be promote to v2i32, and fixes situations
where or x, 0 was emitted after splitting 64-bit ors during moveToVALU.
I think this could be a generic combine but I'm not sure.
llvm-svn: 266104
Add a check to catch violations. ~60 tests were broken and prevented
this change to be committed. Adrian and I (thanks Adrian!) went
through them in the last week or so updating. The check can be
done more efficiently but I'd still like to get this in ASAP to
avoid more broken tests to be checked in (if any).
PR: 27101
llvm-svn: 266102
Summary:
Under certain circumstances, multi-level breaks (or what is understood by
the control flow passes as such) could be miscompiled in a way that causes
infinite loops, by emitting incorrect control flow intrinsics.
This fixes a hang in
dEQP-GLES3.functional.shaders.loops.while_dynamic_iterations.conditional_continue_vertex
Reviewers: arsenm, tstellarAMD
Subscribers: arsenm, llvm-commits
Differential Revision: http://reviews.llvm.org/D18967
llvm-svn: 266088
This is a resubmittion of 263158 change.
This patch fixes the problem which occurs when loop-vectorize tries to use @llvm.masked.load/store intrinsic for a non-default addrspace pointer. It fails with "Calling a function with a bad signature!" assertion in CallInst constructor because it tries to pass a non-default addrspace pointer to the pointer argument which has default addrspace.
The fix is to add pointer type as another overloaded type to @llvm.masked.load/store intrinsics.
Reviewed By: reames
Differential Revision: http://reviews.llvm.org/D17270
llvm-svn: 266086
Summary:
In getUnderlyingObjectsForInstr(): Don't give up on instructions with
multiple MMOs, instead look through all the MMOs and if they all meet
the conservative criteria previously used for single MMO instructions,
then return all of the underlying objects derived from the MMOs.
The change to ScheduleDAGInstrs::buildSchedGraph() is needed to avoid
the case where multiple underlying objects are present and are related
in such a way that successive iterations of the loop end up adding a
dependency from an instruction to itself.
Reviewers: atrick, hfinkel
Subscribers: MatzeB, mcrosier, llvm-commits
Differential Revision: http://reviews.llvm.org/D18093
llvm-svn: 266084
This patch enables assembler support for .set arch=octeon.
It will fix issues with inline assembler when this directive is used.
Patch by Strahinja Petrovic.
Differential Revision: http://reviews.llvm.org/D18548
llvm-svn: 266081
When the memory vectorizer is enabled, these tests break.
These tests don't really care about the memory instructions,
and it's easier to write check lines with the unmerged loads.
llvm-svn: 266071
They broke the msan bot.
Original message:
Add __atomic_* lowering to AtomicExpandPass.
AtomicExpandPass can now lower atomic load, atomic store, atomicrmw,and
cmpxchg instructions to __atomic_* library calls, when the target
doesn't support atomics of a given size.
This is the first step towards moving all atomic lowering from clang
into llvm. When all is done, the behavior of __sync_* builtins,
__atomic_* builtins, and C11 atomics will be unified.
Previously LLVM would pass everything through to the ISelLowering
code. There, unsupported atomic instructions would turn into __sync_*
library calls. Because of that behavior, Clang currently avoids emitting
llvm IR atomic instructions when this would happen, and emits __atomic_*
library functions itself, in the frontend.
This change makes LLVM able to emit __atomic_* libcalls, and thus will
eventually allow clang to depend on LLVM to do the right thing.
It is advantageous to do the new lowering to atomic libcalls in
AtomicExpandPass, before ISel time, because it's important that all
atomic operations for a given size either lower to __atomic_*
libcalls (which may use locks), or native instructions which won't. No
mixing and matching.
At the moment, this code is enabled only for SPARC, as a
demonstration. The next commit will expand support to all of the other
targets.
Differential Revision: http://reviews.llvm.org/D18200
llvm-svn: 266062
Summary:
Alias 'jic $reg, 0' to 'jrc $reg' and 'jialc $reg, 0' to 'jalrc $reg' like
binutils.
Reviewers: dsanders
Differential Revision: http://reviews.llvm.org/D18856
llvm-svn: 266055
This is intended to be shared by the ThinLTOCodeGenerator.
Note that there is a change in the way the verifier is run, previously
it was ran as a Pass on the merged module during internalization.
While now the verifier is called explicitely on the merged module
outside of the internalize "pass pipeline".
What remains strange in the API is the fact that `DisableVerify` in
the API does not disable this initial verifier.
Differential Revision: http://reviews.llvm.org/D19000
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 266047
Resolve Bug 27046 (https://llvm.org/bugs/show_bug.cgi?id=27046).
The PPCInstrInfo::optimizeCompareInstr function could create a new use of
CR0, even if CR0 were previously dead. This patch marks CR0 live if a use of
CR0 is created.
Author: Tom Jablin (tjablin)
Reviewers: hfinkel kbarton cycheng
http://reviews.llvm.org/D18884
llvm-svn: 266040
In the ELFv2 ABI, we are not required to save all CR fields. If only one
nonvolatile CR field is clobbered, use mfocrf instead of mfcr to
selectively save the field, because mfocrf has short latency compares to
mfcr.
Thanks Nemanja's invaluable hint!
Reviewers: nemanjai tjablin hfinkel kbarton
http://reviews.llvm.org/D17749
llvm-svn: 266038
`allocsize` is a function attribute that allows users to request that
LLVM treat arbitrary functions as allocation functions.
This patch makes LLVM accept the `allocsize` attribute, and makes
`@llvm.objectsize` recognize said attribute.
The review for this was split into two patches for ease of reviewing:
D18974 and D14933. As promised on the revisions, I'm landing both
patches as a single commit.
Differential Revision: http://reviews.llvm.org/D14933
llvm-svn: 266032
r237193 fix handling of alloca size / align in MergeFunctions, but only tested one and didn't follow FunctionComparator::cmpOperations's usual comparison pattern. It also didn't update Instruction.cpp:haveSameSpecialState which I'll do separately.
llvm-svn: 266022
This is more robust to changes in the link ordering.
Differential Revision: http://reviews.llvm.org/D18946
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 266018
Add StackProtector to SafeStack. This adds limited protection against
data corruption in the caller frame. Current implementation treats
all stack protector levels as -fstack-protector-all.
llvm-svn: 266004
This is better for a few reasons:
+ It matches the other tooling for iOS.
+ It matches EABI in more cases (i.e. Thumb-mode, and in practice we don't
use ARM mode).
+ It leads to infinitesimally smaller code (0.2%, yay!).
rdar://25369506
llvm-svn: 266003
AtomicExpandPass can now lower atomic load, atomic store, atomicrmw, and
cmpxchg instructions to __atomic_* library calls, when the target
doesn't support atomics of a given size.
This is the first step towards moving all atomic lowering from clang
into llvm. When all is done, the behavior of __sync_* builtins,
__atomic_* builtins, and C11 atomics will be unified.
Previously LLVM would pass everything through to the ISelLowering
code. There, unsupported atomic instructions would turn into __sync_*
library calls. Because of that behavior, Clang currently avoids emitting
llvm IR atomic instructions when this would happen, and emits __atomic_*
library functions itself, in the frontend.
This change makes LLVM able to emit __atomic_* libcalls, and thus will
eventually allow clang to depend on LLVM to do the right thing.
It is advantageous to do the new lowering to atomic libcalls in
AtomicExpandPass, before ISel time, because it's important that all
atomic operations for a given size either lower to __atomic_*
libcalls (which may use locks), or native instructions which won't. No
mixing and matching.
At the moment, this code is enabled only for SPARC, as a
demonstration. The next commit will expand support to all of the other
targets.
Differential Revision: http://reviews.llvm.org/D18200
llvm-svn: 266002
xor/and/or (bitcast(A), bitcast(B)) -> bitcast(op (A,B)) was only being combined at the AfterLegalizeTypes stage, this patch permits the combine to occur anytime before then as well.
The main aim with this to improve the ability to recognise bitmasks that can be converted to shuffles.
I had to modify a number of AVX512 mask tests as the basic bitcast to/from scalar pattern was being stripped out, preventing testing of the mmask bitops. By replacing the bitcasts with loads we can get almost the same result.
Differential Revision: http://reviews.llvm.org/D18944
llvm-svn: 265998
We were incorrectly reporting all non-64 bit integers as int64s.
This is most evident when trying to print the "short" type, but
in theory could happen with chars too (although usually chars use
a different builtin type).
Additionally, we were using the wrong check when deciding whether
to print an enum definition as a global enum. We were checking
whether or not the enum was "nested", and if so saving it until
we print the class definition that it was nested in. But this is
not correct in rare situations where the enum is nested, but the
class that it's nested in does not have type information in the PDB.
So instead we check if there is a class definition for the parent
in the PDB. If so we save it for later, otherwise we print it.
llvm-svn: 265993
Before, ELF at least managed a diagnostic but it was a completely untraceable
"undefined symbol" error. MachO had a variety of even worse behaviours: crash,
emit corrupt file, or an equally bad message.
llvm-svn: 265984
This patch ensures that when we detect first-order recurrences, we reject a phi
node if its previous value is also a phi node. During vectorization the initial
and previous values of the recurrence are shuffled together to create the value
for the current iteration. However, phi nodes are not widened like other
instructions. This fixes PR27246.
Differential Revision: http://reviews.llvm.org/D18971
llvm-svn: 265983
This is the straightforward fix for PR26760:
https://llvm.org/bugs/show_bug.cgi?id=26760
But we still need to make some changes to generalize this helper function
and then send the lshr case into here.
llvm-svn: 265960
This change follows up defaults for GCC and Clang, so LLVM does not differ
from them. While number of the test files are touched with this change, they
all keep the old (expected) behaviour with the explicit option:
"-relocation-model=pic"
The tests that have not been touched are insensitive to relocation model.
Differential Revision: http://reviews.llvm.org/D17995
llvm-svn: 265949