it to actually test the new pass manager AA wiring.
This patch was extracted from the (somewhat too large) D12357 and
rebosed on top of the slightly different design of the new pass manager
AA wiring that I just landed. With this we can start testing the AA in
a thorough way with the new pass manager.
Some minor cleanups to the code in the pass was necessitated here, but
otherwise it is a very minimal change.
Differential Revision: http://reviews.llvm.org/D17372
llvm-svn: 261403
Cleanuppads may be merged together if one is the only predecessor of the
other in which case a simple transform can be performed: replace the
a cleanupret with a branch and remove an unnecessary cleanuppad.
Differential Revision: http://reviews.llvm.org/D17459
llvm-svn: 261390
TLSADDR nodes are lowered into actuall calls inside MC. In order to prevent
shrink-wrapping from pushing prologue/epilogue past them (which result
in TLS variables being accessed before the stack frame is set up), we
put markers, so that the stack gets adjusted properly.
Thanks to Quentin Colombet for guidance/help on how to fix this problem!
llvm-svn: 261387
Summary:
Instead of trying to replace SMRD instructions with a VGPR base pointer
with an equivalent MUBUF instruction, we now copy the base pointer to
SGPRs using v_readfirstlane.
This is safe to do, because any load selected as an SMRD instruction
has been proven to have a uniform base pointer, so each thread in the
wave will have the same pointer value in VGPRs.
This will fix some errors on VI from trying to replace SMRD instructions
with addr64-enabled MUBUF instructions that don't exist.
Reviewers: arsenm, cfang, nhaehnle
Subscribers: arsenm, llvm-commits
Differential Revision: http://reviews.llvm.org/D17305
llvm-svn: 261385
Summary:
When optimizing for size, sqrt calls can be incorrectly selected as
AVX512 VSQRT instructions. This is because X86InstrAVX512.td has a
`Requires<[OptForSize]>` in its `avx512_sqrt_scalar` multiclass
definition. Even if the target does not support AVX512, the class can
apparently still be chosen, leading to an incorrect selection of
`vsqrtss`.
In PR26625, this lead to an assertion: Reg >= X86::FP0 && Reg <=
X86::FP6 && "Expected FP register!", because the `vsqrtss` instruction
requires an XMM register, which is not available on i686 CPUs.
Reviewers: grosbach, resistor, joker.eph
Subscribers: spatel, emaste, llvm-commits
Differential Revision: http://reviews.llvm.org/D17414
llvm-svn: 261360
This patch enables the vectorization of first-order recurrences. A first-order
recurrence is a non-reduction recurrence relation in which the value of the
recurrence in the current loop iteration equals a value defined in the previous
iteration. The load PRE of the GVN pass often creates these recurrences by
hoisting loads from within loops.
In this patch, we add a new recurrence kind for first-order phi nodes and
attempt to vectorize them if possible. Vectorization is performed by shuffling
the values for the current and previous iterations. The vectorization cost
estimate is updated to account for the added shuffle instruction.
Contributed-by: Matthew Simpson and Chad Rosier <mcrosier@codeaurora.org>
Differential Revision: http://reviews.llvm.org/D16197
llvm-svn: 261346
allocateStackSlot did not consider the size of the value to be spilled
before deciding to re-use a spill slot. This was originally okay (since
originally we'd only ever spill pointers), but it became not okay when
we changed our scheme to directly spill vectors of pointers.
While this change fixes the bug pointed out, it has two performance
caveats:
- It matches spill slot and spillee size exactly, while in theory we
can spill, e.g., an 8 byte pointer into a 16 byte slot. This is
slightly complicated to fix since in the stackmaps section, we report
the size of the spill slot as the size of the "indirect value"; and
if they're no longer equivalent, we'll have to keep track of the
(indirect) value size separately from the stack slot size.
- It will "spuriously run out" of reusable slots, since we now have an
second check in the search loop in addition to the availablity
check (e.g. you had two free scalar slots, and you first ask for a
vector slot followed by a scalar slot). I'll fix this in a later
commit.
llvm-svn: 261336
Summary:
If we don't have the first and last access of an interleaved load group,
the first and last wide load in the loop can do an out of bounds
access. Even though we discard results from speculative loads,
this can cause problems, since it can technically generate page faults
(or worse).
We now discard interleaved load groups that don't have the first and
load in the group.
Reviewers: hfinkel, rengolin
Subscribers: rengolin, llvm-commits, mzolotukhin, anemet
Differential Revision: http://reviews.llvm.org/D17332
llvm-svn: 261331
Summary:
This was broken in r260694 which swapped the address and data operands
for flat store instructions. The code in SIInsertWaits assumes
that the data operand always comes before the address operand, so
we need to add a special case for flat.
Reviewers: arsenm
Subscribers: arsenm, llvm-commits
Differential Revision: http://reviews.llvm.org/D17366
llvm-svn: 261330
As discussed on PR24580, this patch adds some (more to come) initial fast-isel codegen tests to match the IR generated in clang/test/CodeGen/avx-builtins.c
llvm-svn: 261329
According to the SystemZ ABI, 128-bit integer types should be
passed and returned via implicit reference. However, this is
not currently implemented at the LLVM IR level for the i128
type. This does not matter when compiling C/C++ code, since
clang will implement the implicit reference itself.
However, it turns out that when calling libgcc helper routines
operating on 128-bit integers, LLVM will use i128 argument and
return value types; the resulting code is not compatible with
the ABI used in libgcc, leading to crashes (see PR26559).
This should be simple to fix, except that i128 currently is not
even a legal type for the SystemZ back end. Therefore, common
code will already split arguments and return values into multiple
parts. The bulk of this patch therefore consists of detecting
such parts, and correctly handling passing via implicit reference
of a value split into multiple parts. If at some time in the
future, i128 becomes a legal type, this code can be removed again.
This fixes PR26559.
llvm-svn: 261325
routine.
We were getting this wrong in small ways and generally being very
inconsistent about it across loop passes. Instead, let's have a common
place where we do this. One minor downside is that this will require
some analyses like SCEV in more places than they are strictly needed.
However, this seems benign as these analyses are complete no-ops, and
without this consistency we can in many cases end up with the legacy
pass manager scheduling deciding to split up a loop pass pipeline in
order to run the function analysis half-way through. It is very, very
annoying to fix these without just being very pedantic across the board.
The only loop passes I've not updated here are ones that use
AU.setPreservesAll() such as IVUsers (an analysis) and the pass printer.
They seemed less relevant.
With this patch, almost all of the problems in PR24804 around loop pass
pipelines are fixed. The one remaining issue is that we run simplify-cfg
and instcombine in the middle of the loop pass pipeline. We've recently
added some loop variants of these passes that would seem substantially
cleaner to use, but this at least gets us much closer to the previous
state. Notably, the seven loop pass managers is down to three.
I've not updated the loop passes using LoopAccessAnalysis because that
analysis hasn't been fully wired into LoopSimplify/LCSSA, and it isn't
clear that those transforms want to support those forms anyways. They
all run late anyways, so this is harmless. Similarly, LSR is left alone
because it already carefully manages its forms and doesn't need to get
fused into a single loop pass manager with a bunch of other loop passes.
LoopReroll didn't use loop simplified form previously, and I've updated
the test case to match the trivially different output.
Finally, I've also factored all the pass initialization for the passes
that use this technique as well, so that should be done regularly and
reliably.
Thanks to James for the help reviewing and thinking about this stuff,
and Ben for help thinking about it as well!
Differential Revision: http://reviews.llvm.org/D17435
llvm-svn: 261316
especially the *structure* of it with respect to various pass managers.
This uncovers an absolute horror show of problems. This test shows just
how bad PR24804 is: we have a totaly of *seven* loop pass managers in
the main optimization pipeline.
I've tried to comment the various bits to the best of my knowledge, but
more enhancements here would be great.
Also great would be folks adding various test for other pipelines, I'm
focused on trying to fix the O2 pipeline. I just wanted a test to show
what I'm changing.
llvm-svn: 261305
Certain optimization passes (like globaldce) can prune function
declaration that SjLjEHPrepare assumed would exit when it'd
runOnFunction.
This fixes PR26669.
llvm-svn: 261303
Summary:
Without this, this command
$ llvm-run llc -stop-after machine-cp -o - <( echo '' )
outputs an error, because we close stdout twice -- once when closing the
file opened for "-o", and again when closing outs().
Also clarify in the outs() definition that you can't ever call it if you
want to open your own raw_fd_ostream on stdout.
Reviewers: jroelofs, tstellarAMD
Subscribers: jholewinski, qcolombet, dsanders, llvm-commits
Differential Revision: http://reviews.llvm.org/D17422
llvm-svn: 261286
This test builds on 261250 (IR support for cmpxchg of pointers) and 261245 (capture tracking support for cmpxchg) to show that correctly analyze the capturing of pointers in a cmpxchg of pointer type.
llvm-svn: 261284
Today, we do not allow cmpxchg operations with pointer arguments. We require the frontend to insert ptrtoint casts and do the cmpxchg in integers. While correct, this is problematic from a couple of perspectives:
1) It makes the IR harder to analyse (for instance, it make capture tracking overly conservative)
2) It pushes work onto the frontend authors for no real gain
This patch implements the simplest form of IR support. As we did with floating point loads and stores, we teach AtomicExpand to convert back to the old representation. This prevents us needing to change all backends in a single lock step change. Over time, we can migrate each backend to natively selecting the pointer type. In the meantime, we get the advantages of a cleaner IR representation without waiting for the backend changes.
Differential Revision: http://reviews.llvm.org/D17413
llvm-svn: 261281
If we know that all of our successors want to be in the exact same
state, it makes sense to hoist the state transition into their common
predecessor.
Differential Revision: http://reviews.llvm.org/D17391
llvm-svn: 261262
IRBuilder has two ways of putting bundle operands on calls: the default
operand bundle, and an overload of CreateCall that takes an operand
bundle list.
Previously, this overload used a default argument of None. This made it
impossible to distinguish between the case were the caller doesn't care
about bundles, and the case where the caller explicitly wants no
bundles. We behaved as if they wanted the latter behavior rather than
the former, which led to problems with simplifylibcalls and WinEH.
This change fixes it by making the parameter non-optional, so we can
distinguish these two cases.
llvm-svn: 261258
Summary: As per title. There was a lot of part missing in the C API, so I had to extend the invoke and landingpad API.
Reviewers: echristo, joker.eph, Wallbraker
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D17359
llvm-svn: 261254
These atomic operations are conceptually both a load and store from the same location. As such, we can treat them as the most conservative of those two components which in practice, means we can treat them like stores. An cmpxchg or atomicrmw captures the values, but not the locations accessed.
Note: We can probably be more aggressive about the comparison value in an cmpxhg since to have it be in memory, it must already be captured, but I figured it was better to avoid that for the moment.
Note 2: It turns out that since we don't actually support cmpxchg of pointer type, writing a negative test is impossible.
Differential Revision: http://reviews.llvm.org/D17400
llvm-svn: 261245
In r260133, LLVM was changed to no longer extend i8/i16 return values,
as it's not required by the ABI. However, code was found in the wild
that relies on the old behaviour on Darwin, so this commit reverts
back to that old behaviour for Darwin.
On other platforms, it's less likely that code would be depending on
the old behaviour, as GCC and MSVC haven't been extending such return
values.
llvm-svn: 261235
Summary:
These correspond to IMAGE_LOAD/STORE[_MIP] and are going to be used by Mesa
for the GL_ARB_shader_image_load_store extension.
IMAGE_LOAD is already matched by llvm.SI.image.load. That intrinsic has
a legacy name and pretends not to read memory.
Differential Revision: http://reviews.llvm.org/D17276
llvm-svn: 261224
Commit r259357 was reverted because it caused PR26629. We were assuming all
roots of a vectorizable tree could be truncated to the same width, which is not
the case in general. This commit reapplies the patch along with a fix and a new
test case to ensure we don't regress because of this issue again. This should
fix PR26629.
llvm-svn: 261212
convert one test to use this.
This is a particularly significant milestone because it required
a working per-function AA framework which can be queried over each
function from within a CGSCC transform pass (and additionally a module
analysis to be accessible). This is essentially *the* point of the
entire pass manager rewrite. A CGSCC transform is able to query for
multiple different function's analysis results. It works. The whole
thing appears to actually work and accomplish the original goal. While
we were able to hack function attrs and basic-aa to "work" in the old
pass manager, this port doesn't use any of that, it directly leverages
the new fundamental functionality.
For this to work, the CGSCC framework also has to support SCC-based
behavior analysis, etc. The only part of the CGSCC pass infrastructure
not sorted out at this point are the updates in the face of inlining and
running function passes that mutate the call graph.
The changes are pretty boring and boiler-plate. Most of the work was
factored into more focused preperatory patches. But this is what wires
it all together.
llvm-svn: 261203
In cases where the PSHUFB shuffle mask is shared it might not be bitcasted to a vXi8 byte vector. This patch adds support for decoding these wider shuffle masks from the ConstantPool.
The test case in question makes use of this to recognise the shuffle mask is an unary UNPCKL pattern and simplifies accordingly.
llvm-svn: 261201
analysis passes, support pre-registering analyses, and use that to
implement parsing and pre-registering a custom alias analysis pipeline.
With this its possible to configure the particular alias analysis
pipeline used by the AAManager from the commandline of opt. I've updated
the test to show this effectively in use to build a pipeline including
basic-aa as part of it.
My big question for reviewers are around the APIs that are used to
expose this functionality. Are folks happy with pass-by-lambda to do
pass registration? Are folks happy with pre-registering analyses as
a way to inject customized instances of an analysis while still using
the registry for the general case?
Other thoughts of course welcome. The next round of patches will be to
add the rest of the alias analyses into the new pass manager and wire
them up here so that they can be used from opt. This will require
extending the (somewhate limited) functionality of AAManager w.r.t.
module passes.
Differential Revision: http://reviews.llvm.org/D17259
llvm-svn: 261197
While we still do want reducible control flow, the RequiresStructuredCFG
flag imposes more strict structure constraints than WebAssembly wants.
Unsetting this flag enables critical edge splitting and tail merging.
Also, disable TailDuplication explicitly, as it doesn't support virtual
registers, and was previously only disabled by the RequiresStructuredCFG
flag.
llvm-svn: 261190
Changes:
- Added disassembler project
- Fixed all decoding conflicts in .td files
- Added DecoderMethod=“NONE” option to Target.td that allows to
disable decoder generation for an instruction.
- Created decoding functions for VS_32 and VReg_32 register classes.
- Added stubs for decoding all register classes.
- Added several tests for disassembler
Disassembler only supports:
- VI subtarget
- VOP1 instruction encoding
- 32-bit register operands and inline constants
[Valery]
One of the point that requires to pay attention to is how decoder
conflicts were resolved:
- Groups of target instructions were separated by using different
DecoderNamespace (SICI, VI, CI) using similar to AssemblerPredicate
approach.
- There were conflicts in IMAGE_<> instructions caused by two
different reasons:
1. dmask wasn’t specified for the output (fixed)
2. There are image instructions that differ only by the number of
the address components but have the same encoding by the HW spec. The
actual number of address components is determined by the HW at runtime
using image resource descriptor starting from the VGPR encoded in an
IMAGE instruction. This means that we should choose only one instruction
from conflicting group to be the rule for decoder. I didn’t find the way
to disable decoder generation for an arbitrary instruction and therefore
made a onelinear fix to tablegen generator that would suppress decoder
generation when DecoderMethod is set to “NONE”. This is a change that
should be reviewed and submitted first. Otherwise I would need to
specify different DecoderNamespace for every instruction in the
conflicting group. I haven’t checked yet if DecoderMethod=“NONE” is not
used in other targets.
3. IMAGE_GATHER decoder generation is for now disabled and to be
done later.
[/Valery]
Patch By: Sam Kolton
Differential Revision: http://reviews.llvm.org/D16723
llvm-svn: 261185
After r261154, we were only clearing flags if the known-zero register was
originally live-in to the basic block, but we have to do it even if not when
more than one COPY has been eliminated, otherwise the user of the first COPY
may still have <kill> marked.
E.g.
BB#N:
%X0 = COPY %XZR
STRXui %X0<kill>, <fi#0>
%X0 = COPY %XZR
STRXui %X0<kill>, <fi#1>
We can eliminate both copies, X0 is not live-in, but we must clear the kill on
the first store.
Unfortunately, I've been unable to come up with a non-fragile test for this.
I've only seen it in the wild with regalloc-created spills, and attempts to
reproduce that in a reasonable way run afoul of COPY coalescing. Even volatile
asm clobbers were moved around. Should fix the aarch64 bot though.
llvm-svn: 261175
described by an immediate.
Found via http://reviews.llvm.org/D16867
Thanks to Paul Robinson for pointing this out.
<rdar://problem/24456528>
llvm-svn: 261168
Mostly, this fixes the bug that if the CBZ guaranteed Xn but Wn was used, we
didn't sort out the use-def chain properly.
I've also made it check more than just the last instruction for a compatible
CBZ (so it can cope without fallthroughs). I'd have liked to do that
separately, but it's helps writing the test.
Finally, I removed some custom loops in favour of MachineInstr helpers and
refactored the control flow to flatten it and avoid possibly quadratic
iterations in blocks with many copies. NFC for these, just a general tidy-up.
llvm-svn: 261154
This function is used to check whether a dbg.value intrinsic has already
been inserted, but without comparing the DIExpression, it would erroneously
fire on split aggregates and only the first scalar would survive.
Found via http://reviews.llvm.org/D16867.
<rdar://problem/24456528>
llvm-svn: 261145
Loop vectorizer now knows to vectorize GEP and create masked gather and scatter intrinsics for random memory access.
The feature is enabled on AVX-512 target.
Differential Revision: http://reviews.llvm.org/D15690
llvm-svn: 261140
Summary: Store and loads unpacked by instcombine do not always have the right alignement. This explicitely compute the alignement and set it.
Reviewers: dblaikie, majnemer, reames, hfinkel, joker.eph
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D17326
llvm-svn: 261139
When support for objc_unsafeClaimAutoreleasedReturnValue has been added to the
ARC optimizer in r258970, one case was missed which would lead the optimizer
to execute an llvm_unreachable. In this case, just handle ClaimRV in the same
way we handle RetainRV.
llvm-svn: 261134
32-bit x86 Windows targets use a linked-list of nodes allocated on the
stack, referenced to via thread-local storage. The personality routine
interprets one of the fields in the node as a 'state number' which
indicates where the personality routine should transfer control.
State transitions are possible only before call-sites which may throw
exceptions. Our previous scheme had us update the state number before
all call-sites which may throw.
Instead, we can try to minimize the number of times we need to store by
reasoning about the nearest store which dominates the current call-site.
If the last store agrees with the current call-site, then we know that
the state-update is redundant and can be elided.
This is largely straightforward: an RPO walk of the blocks allows us to
correctly forward propagate the information when the function is a DAG.
Currently, loops are not handled optimally and may trigger superfluous
state stores.
Differential Revision: http://reviews.llvm.org/D16763
llvm-svn: 261122
Summary:
The syncthreads MI is modeled as mayread/maywrite -- convergence doesn't
even come into play here. Nonetheless this property is highly implicit
in the tablegen files, so a test seems appropriate.
Reviewers: jingyue
Subscribers: llvm-commits, jholewinski
Differential Revision: http://reviews.llvm.org/D17319
llvm-svn: 261114
Summary:
Otherwise we'll try to do unsafe optimizations on these MIs, such as
sinking loads below calls.
(I suspect that this is not the only bug in the NVPTX instruction
tablegen files; I need to comb through them.)
Reviewers: jholewinski, tra
Subscribers: jingyue, jhen, llvm-commits
Differential Revision: http://reviews.llvm.org/D17315
llvm-svn: 261113
The dynamic table is also an array of a fixed structure, so it can be
represented with a DynReginoInfo.
No major functionality change. The extra error checking is covered by
existing tests with a broken dynamic program header.
Idea extracted from r260488. I did the extra cleanups.
llvm-svn: 261107
We are getting better at combining constant pshufb masks - use a real input instead of undef.
Add test for decoding multi-use bitcasted masks as well (actual support will come soon).
llvm-svn: 261101
We used to keep both a section and a pointer to the first symbol.
The oddity of keeping a section for dynamic symbols is because there is
a DT_SYMTAB but no DT_SYMTABZ, so to print the table we have to find the
size via a section table.
The reason for still keeping a pointer to the first symbol is because we
want to be able to print relocation tables even if the section table is
missing (it is mandatory only for files used in linking).
With this patch we keep just a DynRegionInfo. This then requires
changing a few places that were asking for a Elf_Shdr but actually just
needed the first symbol.
The test change is to delete the program header pointer.
Now that we use the information of both DT_SYMTAB and .dynsym, we don't
depend on the sh_entsize of .dynsym if we see DT_SYMTAB.
Note: It is questionable if it is worth it putting the effort to report
broken sh_entsize given that in files with no section table we have to
assume it is sizeof(Elf_Sym), but that is for another change.
Extracted from r260488.
llvm-svn: 261099
Bug description:
The bug was discovered when test was compiled with -O0.
In case scatter result is DAG root , VectorLegalizer failed (assert) due to LowerMSCATTER() return kmask as result.
Change LowerMSCATTER() to return chain as original node do.
Differential Revision: http://reviews.llvm.org/D17331
llvm-svn: 261090
This section is used for debug information and has no need to be
in memory at runtime. This patch also fixes an error when compiling
the Linux kernel. The error is that there are relocations within the
.pdr section in a VDSO. SHT_REL was removed as it is a section type
and not a section flag, therefore it does not make sense for it to
be there. With this patch, LLVM now emits the same flags as
the GNU assembler.
llvm-svn: 261083
AVX1 doesn't support the shuffling of 256-bit integer vectors. For 32/64-bit elements we get around this by shuffling as float/double but for 8/16-bit elements (assuming they can't widen) we currently just split, shuffle as 128-bit vectors and concatenate the results back.
This patch adds the ability to lower using the bit-blend patterns before defaulting to the splitting behaviour.
Part 2 of 2
Differential Revision: http://reviews.llvm.org/D17292
llvm-svn: 261082
AVX1 doesn't support the shuffling of 256-bit integer vectors. For 32/64-bit elements we get around this by shuffling as float/double but for 8/16-bit elements (assuming they can't widen) we currently just split, shuffle as 128-bit vectors and concatenate the results back.
This patch adds the ability to lower using the bit-mask patterns before defaulting to the splitting behaviour. In some cases this ends up matching what AVX2 would do anyhow or what AVX1 does on the split vectors.
Part 1 of 2
Differential Revision: http://reviews.llvm.org/D17292
llvm-svn: 261081
This patch detects vector reductions before instruction selection. Vector
reductions are vectorized reduction operations, and for such operations we have
freedom to reorganize the elements of the result as long as the reduction of them
stay unchanged. This will enable some reduction pattern recognition during
instruction combine such as SAD/dot-product on X86. A flag is added to
SDNodeFlags to mark those vector reduction nodes to be checked during instruction
combine.
To detect those vector reductions, we search def-use chains starting from the
given instruction, and check if all uses fall into two categories:
1. Reduction with another vector.
2. Reduction on all elements.
in which 2 is detected by recognizing the pattern that the loop vectorizer
generates to reduce all elements in the vector outside of the loop, which
includes several ShuffleVector and one ExtractElement instructions.
Differential revision: http://reviews.llvm.org/D15250
llvm-svn: 261070
This fixes very slow compilation on
test/CodeGen/Generic/2010-11-04-BigByval.ll . Note that MaxStoresPerMemcpy
and friends are not yet carefully tuned so the cutoff point is currently
somewhat arbitrary. However, it's important that there be a cutoff point
so that we don't emit unbounded quantities of loads and stores.
llvm-svn: 261050
reference-edge SCCs.
This essentially builds a more normal call graph as a subgraph of the
"reference graph" that was the old model. This allows both to exist and
the different use cases to use the aspect which addresses their needs.
Specifically, the pass manager and other *ordering* constrained logic
can use the reference graph to achieve conservative order of visit,
while analyses reasoning about attributes and other properties derived
from reachability can reason about the direct call graph.
Note that this isn't necessarily complete: it doesn't model edges to
declarations or indirect calls. Those can be found by scanning the
instructions of the function if desirable, and in fact every user
currently does this in order to handle things like calls to instrinsics.
If useful, we could consider caching this information in the call graph
to save the instruction scans, but currently that doesn't seem to be
important.
An important realization for why the representation chosen here works is
that the call graph is a formal subset of the reference graph and thus
both can live within the same data structure. All SCCs of the call graph
are necessarily contained within an SCC of the reference graph, etc.
The design is to build 'RefSCC's to model SCCs of the reference graph,
and then within them more literal SCCs for the call graph.
The formation of actual call edge SCCs is not done lazily, unlike
reference edge 'RefSCC's. Instead, once a reference SCC is formed, it
directly builds the call SCCs within it and stores them in a post-order
sequence. This is used to provide a consistent platform for mutation and
update of the graph. The post-order also allows for very efficient
updates in common cases by bounding the number of nodes (and thus edges)
considered.
There is considerable common code that I'm still looking for the best
way to factor out between the various DFS implementations here. So far,
my attempts have made the code harder to read and understand despite
reducing the duplication, which seems a poor tradeoff. I've not given up
on figuring out the right way to do this, but I wanted to wait until
I at least had the system working and tested to continue attempting to
factor it differently.
This also requires introducing several new algorithms in order to handle
all of the incremental update scenarios for the more complex structure
involving two edge colorings. I've tried to comment the algorithms
sufficiently to make it clear how this is expected to work, but they may
still need more extensive documentation.
I know that there are some changes which are not strictly necessarily
coupled here. The process of developing this started out with a very
focused set of changes for the new structure of the graph and
algorithms, but subsequent changes to bring the APIs and code into
consistent and understandable patterns also ended up touching on other
aspects. There was no good way to separate these out without causing
*massive* merge conflicts. Ultimately, to a large degree this is
a rewrite of most of the core algorithms in the LCG class and so I don't
think it really matters much.
Many thanks to the careful review by Sanjoy Das!
Differential Revision: http://reviews.llvm.org/D16802
llvm-svn: 261040
__chkstk clobbers EAX. If EAX is live across the prologue, then we have
to take extra steps to save it. We already had code to do this if EAX
was a register parameter. This change adapts it to work when shrink
wrapping is used.
llvm-svn: 261039
Currently, we sometimes miscompile this vector pattern:
(c ? -v : v)
We lower it to (because "c" is <4 x i1>, lowered as a vector mask):
(~c & v) | (c & -v)
When we have SSSE3, we incorrectly lower that to PSIGN, which does:
(c < 0 ? -v : c > 0 ? v : 0)
in other words, when c is either all-ones or all-zero:
(c ? -v : 0)
While this is an old bug, it rarely triggers because the PSIGN combine
is too sensitive to operand order. This will be improved separately.
Note that the PSIGN tests are also incorrect. Consider:
%b.lobit = ashr <4 x i32> %b, <i32 31, i32 31, i32 31, i32 31>
%sub = sub nsw <4 x i32> zeroinitializer, %a
%0 = xor <4 x i32> %b.lobit, <i32 -1, i32 -1, i32 -1, i32 -1>
%1 = and <4 x i32> %a, %0
%2 = and <4 x i32> %b.lobit, %sub
%cond = or <4 x i32> %1, %2
ret <4 x i32> %cond
if %b is zero:
%b.lobit = <4 x i32> zeroinitializer
%sub = sub nsw <4 x i32> zeroinitializer, %a
%0 = <4 x i32> <i32 -1, i32 -1, i32 -1, i32 -1>
%1 = <4 x i32> %a
%2 = <4 x i32> zeroinitializer
%cond = or <4 x i32> %a, zeroinitializer
ret <4 x i32> %a
whereas we currently generate:
psignd %xmm1, %xmm0
retq
which returns 0, as %xmm1 is 0.
Instead, use a pure logic sequence, as described in:
https://graphics.stanford.edu/~seander/bithacks.html#ConditionalNegate
Fixes PR26110.
Differential Revision: http://reviews.llvm.org/D17181
llvm-svn: 261023
We're going to stop generating PSIGN, so calling a test "psign"
isn't ideal. Instead, call these tests what they really are:
variable blends using logic.
Also add a test to exhibit a case we're currently missing in
the PSIGN combine.
llvm-svn: 261022
The register stackifier currently checks for intervening stores (and
loads that may alias them) but doesn't account for the fact that the
instruction being moved may affect intervening loads.
Differential Revision: http://reviews.llvm.org/D17298
llvm-svn: 261014
Summary:
I thought -Xlinker -mllvm -Xlinker -stats worked at some point but maybe
it never did.
For clang, I believe that stats are printed from cc1_main. This patch
also prints them for LTO, specifically right after codegen happens.
I only looked at the C API for LTO briefly to see if this is a good
place. Probably there are still cases where this wouldn't be printed
but it seems to be working for the common case. I also experimented
putting this in the LTOCodeGenerator destructor but that didn't trigger
for me because ld64 does not destroy the LTOCodeGenerator.
Reviewers: dexonsmith, joker.eph
Subscribers: rafael, joker.eph, llvm-commits
Differential Revision: http://reviews.llvm.org/D17302
llvm-svn: 261013
The usual way to get a 32-bit relocation is to use a constant extender which doubles the size of the instruction, 4 bytes to 8 bytes.
Another way is to put a .word32 and mix code and data within a function. The disadvantage is it's not a valid instruction encoding and jumping over it causes prefetch stalls inside the hardware.
This relocation packs a 23-bit value in to an "r0 = add(rX, #a)" instruction by overwriting the source register bits. Since r0 is the return value register, if this instruction is placed after a function call which return void, r0 will be filled with an undefined value, the prefetch won't be confused, and the callee can access the constant value by way of the link register.
llvm-svn: 261006
Summary:
This change will add a pass to remove unnecessary zero copies in target blocks
of cbz/cbnz instructions. E.g., the copy instruction in the code below can be
removed because the cbz jumps to BB1 when x0 is zero :
BB0:
cbz x0, .BB1
BB1:
mov x0, xzr
Jun
Reviewers: gberry, jmolloy, HaoLiu, MatzeB, mcrosier
Subscribers: mcrosier, mssimpso, haicheng, bmakam, llvm-commits, aemerson, rengolin
Differential Revision: http://reviews.llvm.org/D16203
llvm-svn: 261004
CopyToReg nodes don't support FrameIndex operands. Other targets select
the FI to some LEA-like instruction, but since we don't have that, we
need to insert some kind of instruction that can take an FI operand and
produces a value usable by CopyToReg (i.e. in a vreg). So insert a dummy
copy_local between Op and its FI operand. This results in a redundant
copy which we should optimize away later (maybe in the post-FI-lowering
peephole pass).
Differential Revision: http://reviews.llvm.org/D17213
llvm-svn: 260987
The root issue appears to be a confusion around what makeNoWrapRegion actually does. It seems likely we need two versions of this function with slightly different semantics.
llvm-svn: 260981
WebAssembly doesn't require full RPO; topological sorting is sufficient and
can preserve more of the MachineBlockPlacement ordering. Unfortunately, this
still depends a lot on heuristics, because while we use the
MachineBlockPlacement ordering as a guide, we can't use it in places where
it isn't topologically ordered. This area will require further attention.
llvm-svn: 260978
This avoids some complications updating LiveIntervals to be aware of the new
register lifetimes, because we can just compute new intervals from scratch
rather than describe how the old ones have been changed.
llvm-svn: 260971
Original commit message:
[readobj] Dump DT_JMPREL relocations when outputting dynamic relocations.
The bits of r260488 it depends on have been committed.
llvm-svn: 260970
This requires making an error message a bit more generic, but that seems
a reasonable tradeoff.
Extracted from r260488 but simplified a bit.
llvm-svn: 260967
Original messages:
Revert "[readobj] Handle ELF files with no section table or with no program headers."
Revert "[readobj] Dump DT_JMPREL relocations when outputting dynamic relocations."
r260489 depends on r260488 and among other issues r260488 deleted error
handling code.
llvm-svn: 260962
Summary:
Extending findExistingExpansion can use existing value in ExprValueMap.
This patch gives 0.3~0.5% performance improvements on
benchmarks(test-suite, spec2000, spec2006, commercial benchmark)
Reviewers: mzolotukhin, sanjoy, zzheng
Differential Revision: http://reviews.llvm.org/D15559
llvm-svn: 260938
Summary:
In order to pass the tests, this required marking R_MIPS_16 relocations
as needing to point to the symbol and not the section.
Reviewers: vkalintiris, dsanders
Subscribers: dsanders, llvm-commits
Differential Revision: http://reviews.llvm.org/D17200
llvm-svn: 260896
Summary:
While shrinking types according to the required bits, we can
encounter insert/extract element instructions. This will cause us to
reach an llvm_unreachable statement.
This change adds support for truncating insert/extract element
operations, and adds a regression test.
Reviewers: jmolloy
Subscribers: mzolotukhin, llvm-commits
Differential Revision: http://reviews.llvm.org/D17078
llvm-svn: 260893
Summary:
This section is used for debug information and has no need to be
in memory at runtime. With this patch, LLVM now emits the same flags as
the GNU assembler. This patch also fixes an error when compiling
the Linux kernel, The error is that there are relocations within the
.pdr section in a VDSO.
Reviewers: vkalintiris, dsanders
Subscribers: llvm-commits, dsanders
Differential Revision: http://reviews.llvm.org/D17199
llvm-svn: 260879
If KMOVB not supported (require AVX512DQ) only KMOVW can be used so store size should be 2 bytes.
Differential Revision: http://reviews.llvm.org/D17138
llvm-svn: 260878
This ensures that all of the various pieces are working. The next patch
will wire up commandline-driven alias analysis chain building and allow
BasicAA to work with the AAManager.
llvm-svn: 260838
into the new pass manager and fix the latent bugs there.
This lets everything live together nicely, but it isn't really useful
yet. I never finished wiring the AA layer up for the new pass manager,
and so subsequent patches will change this to do that wiring and get AA
stuff more fully integrated into the new pass manager. Turns out this is
necessary even to get functionattrs ported over. =]
llvm-svn: 260836
This patch attempts to represent a shuffle as a repeating shuffle (recognisable by is128BitLaneRepeatedShuffleMask) with the source input(s) in their original lanes, followed by a single permutation of the 128-bit lanes to their final destinations.
On AVX2 we can additionally attempt to match using 64-bit sub-lane permutation. AVX2 can also now match a similar 'broadcasted' repeating shuffle.
This patch has several benefits:
* Avoids prematurely matching with lowerVectorShuffleByMerging128BitLanes which can require both inputs to have their input lanes permuted before shuffling.
* Can replace PERMPS/PERMD instructions - although these are useful for cross-lane unary shuffling, they require their shuffle mask to be pre-loaded (and increase register pressure).
* Matching the repeating shuffle makes use of a lot of existing shuffle lowering.
There is an outstanding minor AVX1 regression (combine_unneeded_subvector1 in vector-shuffle-combining.ll) of a previously 128-bit shuffle + subvector splat being converted to a subvector splat + (2 instruction) 256-bit shuffle, I intend to fix this in a followup patch for review.
Differential Revision: http://reviews.llvm.org/D16537
llvm-svn: 260834
As shown in:
https://llvm.org/bugs/show_bug.cgi?id=23203
...we currently die because lowering believes that mfence is allowed without SSE2 on x86-64,
but the instruction def doesn't know that.
I don't know if allowing mfence without SSE is right, but if not, at least now it's consistently wrong. :)
Differential Revision: http://reviews.llvm.org/D17219
llvm-svn: 260828
than the SCC object, and have it scan the instruction stream directly
rather than relying on call records.
This makes the behavior of this routine consistent between libc routines
and LLVM intrinsics for libc routines. We can go and start teaching it
about those being norecurse, but we should behave the same for the
intrinsic and the libc routine rather than differently. I chatted with
James Molloy and the inconsistency doesn't seem intentional and likely
is due to intrinsic calls not being modelled in the call graph analyses.
This also fixes a bug where we would deduce norecurse on optnone
functions, when generally we try to handle optnone functions as-if they
were replaceable and thus unanalyzable.
llvm-svn: 260813
Tests for the new scalarize all private access options will be
included with a future commit.
The only functional change is to make the split/scalarize behavior
for private access of > 4 element vectors to be consistent
with the flat/global handling. This makes the spilling worse
in the two changed tests.
llvm-svn: 260804
Summary:
This patch skips DAG combine of fp_round (fp_round x) if it results in
an fp_round from f80 to f16.
fp_round from f80 to f16 always generates an expensive (and as yet,
unimplemented) libcall to __truncxfhf2. This prevents selection of
native f16 conversion instructions from f32 or f64. Moreover, the first
(value-preserving) fp_round from f80 to either f32 or f64 may become a
NOP in platforms like x86.
Reviewers: ab
Subscribers: srhines, llvm-commits
Differential Revision: http://reviews.llvm.org/D17221
llvm-svn: 260769
Replace spills to memory with spills to registers, if possible. This
applies mostly to predicate registers (both scalar and vector), since
they are very limited in number. A spill of a predicate register may
happen even if there is a general-purpose register available. In cases
like this the stack spill/reload may be eliminated completely.
This optimization will consider all stack objects, regardless of where
they came from and try to match the live range of the stack slot with
a dead range of a register from an appropriate register class.
llvm-svn: 260758
Summary:
Performing this optimization duplicates the call to the convergent
function and adds new control-flow dependencies, which is a no-no.
Reviewers: jingyue
Subscribers: broune, hfinkel, tra, resistor, joker.eph, arsenm, llvm-commits, mzolotukhin
Differential Revision: http://reviews.llvm.org/D17128
llvm-svn: 260730
Summary:
Calls to convergent functions can be duplicated, but only if the
duplicates are not control-flow dependent on any additional values.
Loop rotation doesn't meet the bar.
Reviewers: jingyue
Subscribers: mzolotukhin, llvm-commits, arsenm, joker.eph, resistor, tra, hfinkel, broune
Differential Revision: http://reviews.llvm.org/D17127
llvm-svn: 260729
As the title says. Modelled after similar code in SCEV.
This is useful when analysing induction variables in loops which have been canonicalized by other passes. I wrote the tests as non-loops specifically to avoid the generality introduced in http://reviews.llvm.org/D17174. While that can handle many induction variables without *needing* to exploit nsw, there's no reason not to use it if we've already proven it.
Differential Revision: http://reviews.llvm.org/D17177
llvm-svn: 260705
Rewrite the code to handle all pseudo-instructions in a single pass.
This temporarily reverts spill slot optimization that used general-
purpose registers to hold values of spilled predicate registers.
llvm-svn: 260696
For some cases, InstCombine replaces the sequence of xor/sub instruction
followed by cmp instruction into a single cmp instruction.
However, this replacement may result suboptimal result especially when
the xor/sub has more than one use, as discussed in
bug 26465 (https://llvm.org/bugs/show_bug.cgi?id=26465).
This patch make the replacement happen only when xor/sub has only one
use.
Differential Revision: http://reviews.llvm.org/D16915
Patch by Taewook Oh!
llvm-svn: 260695
Historically, AMD internal sp3 assembler has flat_store* addr, data
format. To match existing code and to enable reuse, change LLVM
definitions to match. Also update MC and CodeGen tests.
Differential Revision: http://reviews.llvm.org/D16927
Patch by: Nikolay Haustov
llvm-svn: 260694
Summary:
It is possible that the loop condition can be a boolean constant (infinite loop,
for example). So we sould handle constant condition in annotating a loop. This
patch adds this functionality to support annotating constant condition.
Reviewers: tstellarAMD, arsenm
Subscribers: llvm-commits, arsenm
Differential Revision: http://reviews.llvm.org/D15093
llvm-svn: 260692
We can generate the actual instructions from the intrinsics without the
need for pseudo-instructions. Also, since the intrinsics have a side-
effect in a form of a store, attempt to optimize away loads from the
store location.
llvm-svn: 260690
Summary:
Before this change, callee-save registers would be rounded up to even
pairs of GPRs and FPRs. This change eliminates these extra padding
load/stores, though it does keep the stack allocation the same size
unless both the GPR and FPR sets have an odd size, in which case one
full pair stack slot (16 bytes) is saved.
This optimization cannot currently be done for MachO targets since they
rely on a fast-path .debug_frame equivalent that can only encode
callee-save registers as pairs.
Reviewers: t.p.northover, rengolin, mcrosier, jmolloy
Subscribers: aemerson, rengolin, mcrosier, llvm-commits
Differential Revision: http://reviews.llvm.org/D17000
llvm-svn: 260689
The DataLayout can calculate alignment of vectors based on the alignment
of the element type and the number of elements. In fact, it is the product
of these two values. The problem is that for vectors of N x i1, this will
return the alignment of N bytes, since the alignment of i1 is 8 bits. The
vector types of vNi1 should be aligned to N bits instead. Provide explicit
alignment for HVX vectors to avoid such complications.
llvm-svn: 260678
This was hardcoded to the static private size, but this
would be missing the offset and additional size for someday
when we have dynamic sizing.
Also stops always initializing flat_scratch even when unused.
In the future we should stop emitting this unless flat instructions
are used to access private memory. For example this will initialize
it almost always on VI because flat is used for global access.
llvm-svn: 260658
Introduce a subtarget feature for this, and leave the default with
the current behavior which assumes up to 16-byte loads/stores can
be used. The field also seems to have the ability to be set to 2 bytes,
but I'm not sure what that would be used for.
llvm-svn: 260651
MSan adds a constructor to each translation unit that calls
__msan_init, and does nothing else. The idea is to run __msan_init
before any instrumented code. This results in multiple constructors
and multiple .init_array entries in the final binary, one per
translation unit. This is absolutely unnecessary; one would be
enough.
This change moves the constructors to a comdat group in order to drop
the extra ones.
llvm-svn: 260632
Multi-dso programs result in multiple coverage files dumped of the form
'<module_name>.<pid>.sancov'. When analyzing these coverage files it is
important to use correct corresponding object file.
This change removes the "-obj" sancov flag and lets user specify object
file names alongside coverage files. Sancov tool would match them using
<module_name> part of coverage file and short file name of the object
file.
Corresponding changes:
- compiler-rt: http://reviews.llvm.org/D17171
- docs: http://reviews.llvm.org/D17175
Differential Revision: http://reviews.llvm.org/D17169
llvm-svn: 260628
This patches teaches LVI to recognize clamp idioms (e.g. select(a > 5, a, 5) will always produce something greater than 5.
The tests end up being somewhat simplistic because trying to exercise the case I actually care about (a loop with a range check on a clamped secondary induction variable) ends up tripping across a couple of other imprecisions in the analysis. Ah, the joys of LVI...
Differential Revision: http://reviews.llvm.org/D16827
llvm-svn: 260627
Original commit message:
[InstCombine] Fold IntToPtr and PtrToInt into preceding loads.
Currently we only fold a BitCast into a Load when the BitCast is its
only user.
Do the same for any no-op cast.
Patch by Philip Pfaffe!
Differential Revision: http://reviews.llvm.org/D9152
llvm-svn: 260612
Summary:
It's possible to have resource descriptors and samplers stored in
VGPRs, either by a VMEM instruction or in the case of samplers,
floating-point calculations. When this happens, we need to use
v_readfirstlane to copy these values back to sgprs.
Reviewers: mareko, arsenm
Subscribers: arsenm, llvm-commits
Differential Revision: http://reviews.llvm.org/D17102
llvm-svn: 260599
Summary: This required to add binding to Instruction::removeFromParent so that instruction can be forward declared and then moved at the right place.
Reviewers: bogner, chandlerc, echristo, dblaikie, joker.eph, Wallbraker
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D17057
llvm-svn: 260597
When optimizing a extractvalue(load), we generate a load from the
aggregate type. This load didn't have alignment set and so would
get the alignment of the type. This breaks when the type is packed
and so the alignment should be lower.
For example, loading { int, int } would give us alignment of 4, but
the original load from this type may have an alignment of 1 if packed.
Reviewed by David Majnemer
Differential revision: http://reviews.llvm.org/D17158
llvm-svn: 260587
The code change is simple enough: instead of attaching an anonymous SDLoc to splatted
vector constants, use the scalar constant's existing SDLoc since that is what is passed
into getConstant() as a param. But this changes instruction scheduling, so I'll explain
why that happens.
The motivation for this patch starts near:
http://reviews.llvm.org/rL258833
...x86's getZeroVector() could be similarly cleaned up and I thought it would be 'NFC'.
But when I made that change locally, several x86 codegen tests wiggled.
It turns out that the lack of SDLoc consistency in getConstant() changes the way
ScheduleDAGRRList behaves. This is because the SDLoc contains 'IROrder' and some DAG
scheduler algorithms use IROrder for tie-breaking.
Differential Revision: http://reviews.llvm.org/D16972
llvm-svn: 260582
Summary:
Added support for "VOP3Only" attribute in VOP3bInst encoding.
Set VOP3Only=1 for V_DIV_SCALE_F64/32 insns.
Added support for multi-dest instructions in AMDGPUAs::cvt*().
Added lit test for "V_DIV_SCALE_F64|F32 vreg,vcc|sreg,vreg,vreg,vreg".
Reviewers: tstellarAMD, arsenm
Subscribers: arsenm, SamWot, nhaustov, vpykhtin
Differential Revision: http://reviews.llvm.org/D16995
Patch By: Artem Tamazov
llvm-svn: 260560
Summary:
Added a test case just to make sure that isKnownNonZero() returns false
when we cannot guarantee that a ConstantExpr is a non-zero constant.
Reviewers: sanjoy, majnemer, mcrosier, nlewycky
Subscribers: nlewycky, mssimpso, mcrosier, llvm-commits
Differential Revision: http://reviews.llvm.org/D16908
llvm-svn: 260544
Summary:
When a PHI is used only to be compared with zero, it is possible to replace an
incoming value with any non-zero constant if the incoming value can be proved as
a known nonzero value. For example, in below code, we can replace the incoming value %v with
any non-zero constant based on the fact that the PHI is only used to be compared with zero
and %v is a known non-zero value:
%v = select %cond, 1, 2
%p = phi [%v, BB] ...
%c = icmp eq, %p, 0
Reviewers: mcrosier, jmolloy, sanjoy
Subscribers: hfinkel, mcrosier, majnemer, llvm-commits, haicheng, bmakam, mssimpso, gberry
Differential Revision: http://reviews.llvm.org/D16240
llvm-svn: 260530
This patch allows the mixing of scaled and unscaled load/stores to form
load/store pairs.
This is a reapplication of r259812, which had an incorrect assert. The
test_stur_str_no_assert() test is a reduced version of the issue hit in
the AArch64 self-host.
PR24465
llvm-svn: 260523
Summary:
Fixed an issue for mips with an instruction such as 'sdc1 $f1, 272 +8(a0)' which has a space between '272' and '+'. The parser would then parse '272' and '+8' as two arguments instead of a single expression resulting in one too many arguments in the pseudo instruction.
The reason that the test case has been changed is so that the expected
output matches the output of the GNU assembler.
Reviewers: vkalintiris, dsanders
Subscribers: dsanders, llvm-commits
Differential Revision: http://reviews.llvm.org/D13592
llvm-svn: 260521
If the two operands to an instruction were both
subregisters of the same super register, it would incorrectly
think this counted as the same constant bus use.
This fixes the verifier error in fmin_legacy.ll which
was missing -verify-machineinstrs.
llvm-svn: 260495
This adds support for finding the dynamic table and dynamic symbol table via
the section table or the program header table. If there's no section table an
attempt is made to figure out the length of the dynamic symbol table.
llvm-svn: 260488
Separate methods to convert parsed instructions to MCInst:
- VOP3 only instructions (always create modifiers as operands in MCInst)
- VOP2 instrunctions with modifiers (create modifiers as operands
in MCInst when e64 encoding is forced or modifiers are parsed)
- VOP2 instructions without modifiers (do not create modifiers
as operands in MCInst)
- Add VOP3Only flag. Pass HasMods flag to VOP3Common.
- Simplify code that deals with modifiers (-1 is now same as
0). This is no longer needed.
- Add few tests (more will be added separately).
Update error message now correct.
Patch By: Nikolay Haustov
Differential Revision: http://reviews.llvm.org/D16778
llvm-svn: 260483
The current function importer will walk the callgraph, importing
transitively any callee that is below the threshold. This can
lead to import very deep which is costly in compile time and not
necessarily beneficial as most of the inline would happen in
imported function and not necessarilly in user code.
The actual factor has been carefully chosen by flipping a coin ;)
Some tuning need to be done (just at the existing limiting threshold).
Reviewers: tejohnson
Differential Revision: http://reviews.llvm.org/D17082
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 260466
This restores commit r260408, along with a fix for a bot failure.
The bot failure was caused by dereferencing a unique_ptr in the same
call instruction parameter list where it was passed via std::move.
Apparently due to luck this was not exposed when I built the compiler
with clang, only with gcc.
llvm-svn: 260442
Summary:
Refactor common value, scope, and label tracking logic out of DwarfDebug
into a common base class called DebugHandlerBase.
Update an old LLVM IR test case to avoid an assertion in LexicalScopes.
Reviewers: dblaikie, majnemer
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D16931
llvm-svn: 260432
New option --elf-output-style=LLVM or GNU
Enables -file-headers in readelf style when elf-output-style=GNU
Differential revision: http://reviews.llvm.org/D14128
llvm-svn: 260430
Summary:
This fixes a crash where subsequent spills would be unable to scavenge
a register. In particular, it fixes a crash in piglit's
spec@glsl-1.50@execution@geometry@max-input-components (the test still
has a shader that fails to compile because of too many SGPR spills, but
at least it doesn't crash any more).
This is a candidate for the release branch.
Reviewers: arsenm, tstellarAMD
Subscribers: qcolombet, arsenm
Differential Revision: http://reviews.llvm.org/D16558
llvm-svn: 260427
Instead of passing varargs directly on the user stack, allocate a buffer in
the caller's stack frame and pass a pointer to it. This simplifies the C
ABI (e.g. non-C callers of C functions do not need to use C's user stack if
they have their own mechanism) and allows further optimizations in the future
(e.g. fewer functions may need to use the stack).
Differential Revision: http://reviews.llvm.org/D17048
llvm-svn: 260421
Summary:
This patch uses the lower 64-bits of the MD5 hash of a function name as
a GUID in the function index, instead of storing function names. Any
local functions are first given a global name by prepending the original
source file name. This is the same naming scheme and GUID used by PGO in
the indexed profile format.
This change has a couple of benefits. The primary benefit is size
reduction in the combined index file, for example 483.xalancbmk's
combined index file was reduced by around 70%. It should also result in
memory savings for the index file in memory, as the in-memory map is
also indexed by the hash instead of the string.
Second, this enables integration with indirect call promotion, since the
indirect call profile targets are recorded using the same global naming
convention and hash. This will enable the function importer to easily
locate function summaries for indirect call profile targets to enable
their import and subsequent promotion.
The original source file name is recorded in the bitcode in a new
module-level record for use in the ThinLTO backend pipeline.
Reviewers: davidxl, joker.eph
Subscribers: llvm-commits, joker.eph
Differential Revision: http://reviews.llvm.org/D17028
llvm-svn: 260408
Currently you can't specify node properties like commutativity on
a PatFrag. If you want to create a PatFrag on a commutative node
with a hasOneUse predicate, this enables you to specify that the
PatFrag is also commutable.
llvm-svn: 260404