Re-commit of r258951 after fixing layering violation.
The BPF and WebAssembly backends had identical code for emitting errors
for unsupported features, and AMDGPU had very similar code. This merges
them all into one DiagnosticInfo subclass, that can be used by any
backend.
There should be minimal functional changes here, but some AMDGPU tests
have been updated for the new format of errors (it used a slightly
different format to BPF and WebAssembly). The AMDGPU error messages will
now benefit from having precise source locations when debug info is
available.
llvm-svn: 259498
differentiate between indirect references to functions an direct calls.
This doesn't do a whole lot yet other than change the print out produced
by the analysis, but it lays the groundwork for a very major change I'm
working on next: teaching the call graph to actually be a call graph,
modeling *both* the indirect reference graph and the call graph
simultaneously. More details on that in the next patch though.
The rest of this is essentially a bunch of over-engineering that won't
be interesting until the next patch. But this also isolates essentially
all of the churn necessary to introduce the edge abstraction from the
very important behavior change necessary in order to separately model
the two graphs. So it should make review of the subsequent patch a bit
easier at the cost of making this patch seem poorly motivated. ;]
Differential Revision: http://reviews.llvm.org/D16038
llvm-svn: 259463
These sets do linear searching in small mode; It is not a good idea to
use huge numbers as the small value here, save people from themselves by
adding a static_assert.
Differential Revision: http://reviews.llvm.org/D16706
llvm-svn: 259419
Changed emitting offset of macinfo entry into compiler unit DIE to use "addSectionLabel" method rather than explicitly calculating size/offset of macro entry.
Differential Revision: http://reviews.llvm.org/D16292
llvm-svn: 259358
Patch adds a DWARF language vendor extension for RenderScript.
We are already using this identifier in LLDB with a hard coded value, so it's preferable to use a LLVM generated enum instead.
The language is intended to be added to the next version of the standard.
See http://www.dwarfstd.org/ShowIssue.php?issue=150331.1
Reviewers: dexonsmith, echristo
Subscribers: probinson domipheus, srhines, llvm-commits
Differential Revision: http://reviews.llvm.org/D16409
llvm-svn: 259348
Add an option to llvm-profdata merge for writing out sparse indexed
profiles. These profiles omit InstrProfRecords for functions which are
never executed.
Differential Revision: http://reviews.llvm.org/D16727
llvm-svn: 259258
Loop transformations can sometimes fail because the loop, while in
valid rotated LCSSA form, is not in a canonical CFG form. This is
an extremely simple pass that just merges obviously redundant
blocks, which can be used to fix some known failure cases. In the
future, it may be enhanced with more cases (and have code shared with
SimplifyCFG).
This allows us to run LoopSimplifyCFG -> LoopRotate -> LoopUnroll,
so that SimplifyCFG cleans up the loop before Rotate tries to run.
Not currently used in the pass manager, since this pass doesn't do
anything unless you can hook it up in an LPM with other loop passes.
It'll be added once Chandler cleans up things to allow this.
Tested in a custom pipeline out of tree to confirm it works in
practice (in addition to the included trivial test).
llvm-svn: 259256
The majority of attribute queries checks for the existence of an enum
attribute in the FunctionIndex slot. We only have 48 of those and can
therefore summarize them in an uint64_t bitset which measurably improves
compile time.
Differential Revision: http://reviews.llvm.org/D16618
llvm-svn: 259252
This support is _very_ rudimentary, just enough to get some basic data
into the CodeView debug section.
Left to do is:
- Use the combined opcodes to save space.
- Do something about code offsets.
llvm-svn: 259230
Summary:
There are three parts to inlined call frames:
1. The inlinee line subsection
2. The inline site symbol record
3. The function ids referenced by both
This change starts by emitting function ids (3) for all subprograms and
emitting the base inline site symbol record (2). The actual line numbers
in (2) use an encoded format that will come next, along with the inlinee
line subsection.
Reviewers: majnemer
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D16333
llvm-svn: 259217
The buildSchedGraph() was in need of reworking as the AA features had been
added on top of earlier code. It was very difficult to understand, and buggy.
There had been found cases where scheduling dependencies had actually been
missed (see r228686).
AliasChain, RejectMemNodes, adjustChainDeps() and iterateChainSucc() have
been removed. There are instead now just the four maps from Value to SUs, which
have been renamed to Stores, Loads, NonAliasStores and NonAliasLoads.
An unknown store used to become the AliasChain, but now becomes a store mapped
to 'unknownValue' (in Stores). What used to be PendingLoads is instead the
list of SUs mapped to 'unknownValue' in Loads.
RejectMemNodes and adjustChainDeps() used to be a safety-net for everything.
The SU maps were sometimes cleared and SUs were put in RejectMemNodes, where
adjustChainDeps() would look. Instead of this, a more straight forward approach
is used in maintaining the SU maps without clearing them and simply letting
them grow over time. Instead of the cutt-off in adjustChainDeps() search, a
reduction of maps will be done if needed (see below).
Each SUnit either becomes the BarrierChain, or is put into one of the maps. For
each SUnit encountered, all the information about previous ones are still
available until a new BarrierChain is set, at which point the maps are cleared.
For huge regions, the algorithm becomes slow, therefore the maps will get
reduced at a threshold (current default is 1000 nodes), by a fraction (default 1/2).
These values can be tuned by use of CL options in case some test case shows that
they need to be changed (-dag-maps-huge-region and -dag-maps-reduction-size).
There has not been any considerable change observed in output quality or compile
time. There may now be more DAG edges inserted than before (i.e. if A->B->C,
then A->C is not needed). However, in a comparison run there were fewer total
calls to AA, and a somewhat improved compile time, which means this seems to
be not a problem.
http://reviews.llvm.org/D8705
Reviewers: Hal Finkel, Andy Trick.
llvm-svn: 259201
Also removed a few redundant `else`s.
Bug was found by a test I wrote for MemorySSA (in review at
http://reviews.llvm.org/D7864; shiny update coming soon). So, assuming
that lands at some point, this should be covered by that. If anyone
feels this deserves its own explicit test case, please let me know.
I'll write one.
llvm-svn: 259179
This patch enables llvm-bcanalyzer to print the bitcode wrapper header
if the file has one, which is needed to test the changes made in
r258627 (bitcode-wrapper-header-armv7m.ll is the test case for r258627).
Differential Revision: http://reviews.llvm.org/D16642
llvm-svn: 259162
This reverts commit r259117.
The LineInfo constructor is defined in the codeview library and we have
to link against it now. Doing that isn't trivial, so reverting for now.
llvm-svn: 259126
Adds a new family of .cv_* directives to LLVM's variant of GAS syntax:
- .cv_file: Similar to DWARF .file directives
- .cv_loc: Similar to the DWARF .loc directive, but starts with a
function id. CodeView line tables are emitted by function instead of
by compilation unit, so we needed an extra field to communicate this.
Rather than overloading the .loc direction further, we decided it was
better to have our own directive.
- .cv_stringtable: Emits the codeview string table at the current
position. Currently this just contains the filenames as
null-terminated strings.
- .cv_filechecksums: Emits the file checksum table for all files used
with .cv_file so far. There is currently no support for emitting
actual checksums, just filenames.
This moves the line table emission code down into the assembler. This
is in preparation for implementing the inlined call site line table
format. The inline line table format encoding algorithm requires knowing
the absolute code offsets, so it must run after the assembler has laid
out the code.
David Majnemer collaborated on this patch.
llvm-svn: 259117
Re-commit of r258951 after fixing layering violation.
The related LLVM patch adds a backend diagnostic type for reporting
unsupported features, this adds a printer for them to clang.
In the case where debug location information is not available, I've
changed the printer to report the location as the first line of the
function, rather than the closing brace, as the latter does not give the
user any information. This also affects optimisation remarks.
Differential Revision: http://reviews.llvm.org/D16590
llvm-svn: 259035
move ptestm{q|d} intrinsics from patterns form (in td file) to the intrinsics table
Differential Revision: http://reviews.llvm.org/D16633
llvm-svn: 259029
This patch revamps the RegStackifier pass with a new tree traversal mechanism,
enabling three major new features:
- Stackification of values with multiple uses, using the result value of set_local
- More aggressive stackification of instructions with side effects
- Reordering operands in commutative instructions to enable more stackification.
llvm-svn: 259009
This patch is part of the work to make PPCLoopDataPrefetch
target-independent
(http://thread.gmane.org/gmane.comp.compilers.llvm.devel/92758).
As it was discussed in the above thread, getPrefetchDistance is
currently using instruction count which may change in the future.
llvm-svn: 258995
Various bits we want to use the new ABI actually compile with "-arch armv7k
-miphoneos-version-min=9.0". Not ideal, but also not ridiculous given how
slices work.
llvm-svn: 258975
ObjC ARC Optimizer.
The main implication of this is:
1. Ensuring that we treat it conservatively in terms of optimization.
2. We put the ASM marker on it so that the runtime can recognize
objc_unsafeClaimAutoreleasedReturnValue from releaseRV.
<rdar://problem/21567064>
Patch by Michael Gottesman!
llvm-svn: 258970
The BPF and WebAssembly backends had identical code for emitting errors
for unsupported features, and AMDGPU had very similar code. This merges
them all into one DiagnosticInfo subclass, that can be used by any
backend.
There should be minimal functional changes here, but some AMDGPU tests
have been updated for the new format of errors (it used a slightly
different format to BPF and WebAssembly). The AMDGPU error messages will
now benefit from having precise source locations when debug info is
available.
The implementation of DiagnosticInfoUnsupported::print must be in
lib/Codegen rather than in the existing file in lib/IR/ to avoid
introducing a dependency from IR to CodeGen.
Differential Revision: http://reviews.llvm.org/D16590
llvm-svn: 258951
Most of the time we only hit the small case, so it is beneficial to pull
it out of the insert_imp() implementation. This improves compile time
at least for non-LTO builds.
Differential Revision: http://reviews.llvm.org/D16619
llvm-svn: 258908
This brings the compile time of Function.cpp from ~40s down to ~4s for
me locally. It also shaves off about 400KB of object file size in a
release+asserts build.
I also realized that the AMDGPU backend does not have any GCC builtin
names to match, so the extra lookup was a no-op. I removed it to silence
a zero-length string table array warning. There should be no functional
change here.
This change really ends the story of PR11951.
llvm-svn: 258897
This is a recommit of r258620 which causes PR26293.
The original message:
Now LIR can turn following codes into memset:
typedef struct foo {
int a;
int b;
} foo_t;
void bar(foo_t *f, unsigned n) {
for (unsigned i = 0; i < n; ++i) {
f[i].a = 0;
f[i].b = 0;
}
}
void test(foo_t *f, unsigned n) {
for (unsigned i = 0; i < n; i += 2) {
f[i] = 0;
f[i+1] = 0;
}
}
llvm-svn: 258777
These two functions are hard to reason about. This commit makes the code
more comprehensible:
- Use four distinct variables (OldIdxIn, OldIdxOut, NewIdxIn, NewIdxOut)
with a fixed value instead of a changing iterator I that points to
different things during the function.
- Remove the early explanation before the function in favor of more
detailed comments inside the function. Should have more/clearer comments now
stating which conditions are tested and which invariants hold at
different points in the functions.
The behaviour of the code was not changed.
I hope that this will make it easier to review the changes in
http://reviews.llvm.org/D9067 which I will adapt next.
Differential Revision: http://reviews.llvm.org/D16379
llvm-svn: 258756
This patch was originally committed as r257885, but was reverted due to windows
failures. The cause of these failures has been fixed under r258677, hence
re-committing the original patch.
llvm-svn: 258683
VPMADD52LUQ - Packed Multiply of Unsigned 52-bit Integers and Add the Low 52-bit Products to Qword Accumulators
VPMADD52HUQ - Packed Multiply of Unsigned 52-bit Unsigned Integers and Add High 52-bit Products to 64-bit Accumulators
Differential Revision: http://reviews.llvm.org/D16407
llvm-svn: 258680
SCCP has code identical to changeToUnreachable's behavior, switch it
over to just call changeToUnreachable.
No functionality change intended.
llvm-svn: 258654
InstCombine and SCCP both want to remove dead code in a very particular
way but using identical means to do so. Share the code between the two.
No functionality change is intended.
llvm-svn: 258653
Summary: Helper so we don't have to enumerate nvptx && nvptx64 everywhere.
Reviewers: echristo
Subscribers: llvm-commits, jhen, tra
Differential Revision: http://reviews.llvm.org/D16494
llvm-svn: 258639
Summary:
Update ObjectTransformLayer::addObjectSet to take the object set by
value rather than reference and pass it to the base layer with move
semantics rather than copy, to match r258185's changes to
ObjectLinkingLayer.
Update the unit test to verify that ObjectTransformLayer's signature stays
in sync with ObjectLinkingLayer's.
Reviewers: lhames
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D16414
llvm-svn: 258630
If the intrinsic is overloaded and works on multiple types,
it cannot resolve to a single corresponding builtin and requires
handling in clang. This just causes crashes now.
llvm-svn: 258559
The intrinsic target prefix should match the target name
as it appears in the triple.
This is not yet complete, but gets most of the important ones.
llvm.AMDGPU.* intrinsics used by mesa and libclc are still handled
for compatability for now.
llvm-svn: 258557
\src\llvm-rw\include\llvm/Support/AlignOf.h(254) :
error C2872: 'detail' : ambiguous symbol
could be 'llvm::detail'
or 'llvm::support::detail'
llvm-svn: 258553
Make the variable a member of the writer trait object owned
now by the writer. Also use a different generator interface
to pass the infoObject from the writer.
llvm-svn: 258544
Using an array instead of ArrayRef would allow type inference, but
(short of using C99) one would still need to write
typedef uint16_t VT[];
LE.write(VT{0x1234, 0x5678});
llvm-svn: 258535
This reapplies r258296 and r258366, and also fixes an existing bug in
SelectionDAG.cpp's isMemSrcFromString, neglecting to account for the
offset in a GlobalAddressSDNode, which is uncovered by those patches.
llvm-svn: 258482
This reverts r258296 and the follow up r258366. With this change, we
miscompiled the following program on Windows:
#include <string>
#include <iostream>
static const char kData[] = "asdf jkl;";
int main() {
std::string s(kData + 3, sizeof(kData) - 3);
std::cout << s << '\n';
}
llvm-svn: 258465
The X86 musttail implementation finds register parameters to forward by
running the calling convention algorithm until a non-register location
is returned. However, assigning a vector memory location has the side
effect of increasing the function's stack alignment. We shouldn't
increase the stack alignment when we are only looking for register
parameters, so this change conditionalizes it.
llvm-svn: 258442
Include the needed headfile to fix the buildbot failure due to r258420 [PGO] Passmanagerbuilder change that enable IR level PGO instrumentation.
llvm-svn: 258423
This patch includes the passmanagerbuilder change that enables IR level PGO instrumentation. It adds two passmanagerbuilder options: -profile-generate=<profile_filename> and -profile-use=<profile_filename>. The new options are primarily for debug purpose.
Reviewers: davidxl, silvas
Differential Revision: http://reviews.llvm.org/D15828
llvm-svn: 258420
Summary:
And use it in PPCLoopDataPrefetch.cpp.
@hfinkel, please let me know if your preference would be to preserve the
ppc-loop-prefetch-cache-line option in order to be able to override the
value of TTI::getCacheLineSize for PPC.
Reviewers: hfinkel
Subscribers: hulx2000, mcrosier, mssimpso, hfinkel, llvm-commits
Differential Revision: http://reviews.llvm.org/D16306
llvm-svn: 258419
Summary:
This is now the same as the behaviour of the GNU assembler. This was done
as it is required in order to build the Linux kernel with the integrated
assembler enabled.
Reviewers: dsanders, vkalintiris
Subscribers: dsanders, llvm-commits
Differential Revision: http://reviews.llvm.org/D13594
llvm-svn: 258400
Summary:
The previous form, taking opcode and type, is moved to an internal
helper and the new form, taking an instruction, is a wrapper around this
helper.
Although this is a slight cleanup on its own, the main motivation is to
refactor the constant folding API to ease migration to opaque pointers.
This will be follow-up work.
Reviewers: eddyb
Subscribers: dblaikie, llvm-commits
Differential Revision: http://reviews.llvm.org/D16383
llvm-svn: 258391
Summary:
Although this is a slight cleanup on its own, the main motivation is to
refactor the constant folding API to ease migration to opaque pointers.
This will be follow-up work.
Reviewers: eddyb
Subscribers: zzheng, dblaikie, llvm-commits
Differential Revision: http://reviews.llvm.org/D16380
llvm-svn: 258390
Summary:
Although this is a slight cleanup on its own, the main motivation is to
refactor the constant folding API to ease migration to opaque pointers.
This will be follow-up work.
Reviewers: eddyb
Subscribers: dblaikie, llvm-commits
Differential Revision: http://reviews.llvm.org/D16378
llvm-svn: 258389
This patch adds the necessary plumbing to cmake to build the sources related to
GlobalISel.
To build the sources related to GlobalISel, we need to add -DBUILD_GLOBAL_ISEL=ON.
By default, this is OFF, thus GlobalISel sources will not impact people that do
not explicitly opt-in.
Differential Revision: http://reviews.llvm.org/D15983
llvm-svn: 258344
Summary:
This adds a new kind of operand bundle to LLVM denoted by the
`"gc-transition"` tag. Inputs to `"gc-transition"` operand bundle are
lowered into the "transition args" section of `gc.statepoint` by
`RewriteStatepointsForGC`.
This removes the last bit of functionality that was unsupported in the
deopt bundle based code path in `RewriteStatepointsForGC`.
Reviewers: pgavlin, JosephTremoulet, reames
Subscribers: sanjoy, mcrosier, llvm-commits
Differential Revision: http://reviews.llvm.org/D16342
llvm-svn: 258338
The selection process being split into separate passes, we need generic opcodes
to translate the LLVM IR to target independent code.
This patch adds an opcode for addition: G_ADD.
Differential Revision: http://reviews.llvm.org/D15472
llvm-svn: 258333
SelectionDAG previously missed opportunities to fold constants into
GlobalAddresses in several areas. For example, given `(add (add GA, c1), y)`, it
would often reassociate to `(add (add GA, y), c1)`, missing the opportunity to
create `(add GA+c, y)`. This isn't often visible on targets such as X86 which
effectively reassociate adds in their complex address-mode folding logic,
however it is currently visible on WebAssembly since it currently has very
simple address mode folding code that doesn't reassociate anything.
This patch fixes this by making SelectionDAG fold offsets into GlobalAddresses
at the same times that it folds constants together, so that it doesn't miss any
opportunities to perform such folding.
Differential Revision: http://reviews.llvm.org/D16090
llvm-svn: 258296
Note that this is disabled by default and still requires a patch to
handleMove() which is not upstreamed yet.
If the TrackLaneMasks policy/strategy is enabled the MachineScheduler
will build a schedule graph where definitions of independent
subregisters are no longer serialised.
Implementation comments:
- Without lane mask tracking a sub register def also counts as a use
(except for the first one with the read-undef flag set), with lane
mask tracking enabled this is no longer the case.
- Pressure Diffs where previously maintained per definition of a
vreg with the help of the SSA information contained in the
LiveIntervals. With lanemask tracking enabled we cannot do this
anymore and instead change the pressure diffs for all uses of the vreg
as it becomes live/dead. For this changed style to work correctly we
ignore uses of instructions that define the same register again: They
won't affect register pressure.
- With lanemask tracking we remove all read-undef flags from
sub register defs when building the graph and re-add them later when
all vreg lanes have become dead.
Differential Revision: http://reviews.llvm.org/D14969
llvm-svn: 258259
This renaming is necessary to avoid a subregister aware scheduler
accidentally creating liveness "holes" which are rejected by the
MachineVerifier.
Explanation as found in this patch:
Helper class that can divide MachineOperands of a virtual register into
equivalence classes of connected components.
MachineOperands belong to the same equivalence class when they are part of
the same SubRange segment or adjacent segments (adjacent in control
flow); Different subranges affected by the same MachineOperand belong to
the same equivalence class.
Example:
vreg0:sub0 = ...
vreg0:sub1 = ...
vreg0:sub2 = ...
...
xxx = op vreg0:sub1
vreg0:sub1 = ...
store vreg0:sub0_sub1
The example contains 3 different equivalence classes:
- One for the (dead) vreg0:sub2 definition
- One containing the first vreg0:sub1 definition and its use,
but not the second definition!
- The remaining class contains all other operands involving vreg0.
We provide a utility function here to rename disjunct classes to different
virtual registers.
Differential Revision: http://reviews.llvm.org/D16126
llvm-svn: 258257
they're needed.
Prior to this patch objects were loaded (via RuntimeDyld::loadObject) when they
were added to the ObjectLinkingLayer, but were not relocated and finalized until
a symbol address was requested. In the interim, another object could be loaded
and finalized with the same memory manager, causing relocation/finalization of
the first object to fail (as the first finalization call may have marked the
allocated memory for the first object read-only).
By deferring the loadObject call (and subsequent memory allocations) until an
object file is needed we can avoid prematurely finalizing memory.
llvm-svn: 258185
In some cases, the max backedge taken count can be more conservative
than the exact backedge taken count (for instance, because
ScalarEvolution::getRange is not control-flow sensitive whereas
computeExitLimitFromICmp can be). In these cases,
computeExitLimitFromCond (specifically the bit that deals with `and` and
`or` instructions) can create an ExitLimit instance with a
`SCEVCouldNotCompute` max backedge count expression, but a computable
exact backedge count expression. This violates an implicit SCEV
assumption: a computable exact BE count should imply a computable max BE
count.
This change
- Makes the above implicit invariant explicit by adding an assert to
ExitLimit's constructor
- Changes `computeExitLimitFromCond` to be more robust around
conservative max backedge counts
llvm-svn: 258184
According the build bots, clang is using the Registry class somewhere as well. Will reapply with appropriate clang changes at a later point.
llvm-svn: 258159
The Registry class constructs a linked list of nodes whose storage is inside static variables and nodes are added via static initializers. The trick is that those static initializers are in both the LLVM code base, and some random plugin that might get loaded in at runtime. The existing code tries to use C++ templates and their ODR rules to get a single definition of the registry for each type, but, experimentally, this doesn't quite work as designed. (Well, the entire structure doesn't. It might not actually be an ODR problem.)
Previously, when I tried moving the GCStrategy class (along with it's registry) from CodeGen to IR, I ran into a problem where asking the GCStrategyRegistry a question would return inconsistent results depending on whether you asked from CodeGen (where the static initializers still were) or Transforms. My best guess is that this is a result of either a) an order of initialization error, or b) we ended up with two copies of the registry being created. I remember at the time having convinced myself it was probably (b), but I don't have any of my notes around from that investigation any more.
See http://reviews.llvm.org/rL226311 for the original patch in question.
This patch tries to remove the possibility of (b) above. (a) was already fixed in change 258109.
Differential Revision: http://reviews.llvm.org/D16170
llvm-svn: 258157
Our loop construct is not a way to identify cycles in the CFG. This wasn't immediately obvious from the header, so clarify that fact.
The motivation for this was that I just fixed a out of tree bug due to a mistaken assumption (on my part) on what a Loop actually was. While it was fresh in my mind, I wanted to document the key point.
llvm-svn: 258154
Summary:
GEPOperator: provide getResultElementType alongside getSourceElementType.
This is made possible by adding a result element type field to GetElementPtrConstantExpr, which GetElementPtrInst already has.
GEP: replace get(Pointer)ElementType uses with get{Source,Result}ElementType.
Reviewers: mjacob, dblaikie
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D16275
llvm-svn: 258145
The value size was always 1 or 0, so we don't need to store it.
In a no asserts build this takes the testcase of pr26208 from 11 to 10
seconds.
llvm-svn: 258141
Summary:
This is a companion patch for http://reviews.llvm.org/D16124.
Internalized symbols increase the size of strongly-connected components in
SCC-based module splitting and thus reduce the amount of parallelism. This
patch records the original linkage of non-local symbols prior to
internalization and then restores it just before splitting/CodeGen. This is
also useful for cases where the linker requires symbols to remain external, for
instance, so they can be placed according to linker script rules.
It's currently under its own flag (-restore-globals) but should eventually
share a common flag with D16124.
Reviewers: joker.eph, pcc
Subscribers: slarin, llvm-commits, joker.eph
Differential Revision: http://reviews.llvm.org/D16229
llvm-svn: 258100
Although glibc defines it, this is currently of no use for my primary
use-case (dumping DT_* keys correctly). Its semantic is not described
anywhere I can find, so better leave it out for now.
Thanks to Rafael for pointing out in his post-commit review!
llvm-svn: 258089
Summary:
Currently llvm::SplitModule as the first step globalizes all local objects, which might not be desirable in some scenarios.
This change adds a new flag to llvm::SplitModule that uses SCC approach to search for a balanced partition without the need to externalize symbols.
Such partition might not be possible or fully balanced for a given number of partitions, and is a function of the module properties (global/local dependencies within the module).
Joint development Tobias Edler von Koch (tobias@codeaurora.org) and Sergei Larin (slarin@codeaurora.org)
Subscribers: llvm-commits, joker.eph
Differential Revision: http://reviews.llvm.org/D16124
llvm-svn: 258083
Summary:
When SimplifySetCC sees a setcc node that compares the result of a
value extension operation with a constant, it tries to simplify the
setcc node by eliminating the extension and shrinking the constant.
If shrinking the inputs to setcc is deemed not desirable by the target
(e.g. the target does not want a setcc comparing i1 values), then it
is still possible to optimize this sequence in some cases.
This patch adds the following combines to SimplifySetCC when shrinking setcc
inputs is not desirable:
(setcc ([sz]ext (setcc x, y, cc)), 0, setne) -> (setcc (x, y, cc))
(setcc ([sz]ext (setcc x, y, cc)), 0, seteq) -> (setcc (x, Y, !cc))
There are no tests for this yet, but once AMDGPU correctly implements
TargetLowering::isTypeDesirableForOp(), this new combine will be
exercised by the existing CodeGen/AMDGPU/setcc-opt.ll test.
Reviewers: resistor, arsenm
Subscribers: jroelofs, arsenm, llvm-commits
Differential Revision: http://reviews.llvm.org/D15034
llvm-svn: 258067
The symtab is logically referenced beyond the call to the create
method. This changes makes sure its lifetime matches that of
the reader.
llvm-svn: 258036
Entry block count was not counted and is corrected. Also
introduce a new metric that is MaxInternalBlockCount which
show command shows (as before).
llvm-svn: 257987
This is part of a new statistics gathering feature for the sanitizers.
See clang/docs/SanitizerStats.rst for further info and docs.
Differential Revision: http://reviews.llvm.org/D16174
llvm-svn: 257970
WebAssembly's stack will never be executable by default, so it isn't
necessary to declare .note.GNU-stack sections to request a non-executable
stack.
Differential Revision: http://reviews.llvm.org/D15969
llvm-svn: 257962
In the optimizer (GVN etc.) when eliminating redundant nodes with different
flags, the flags are ignored for the purposes of testing for congruence, and
then intersected for the purposes of producing a result that supports the union
of all the uses. This commit makes SelectionDAG's CSE do the same thing,
allowing it to CSE nodes in more cases. This fixes PR26063.
Differential Revision: http://reviews.llvm.org/D15957
llvm-svn: 257940
Summary:
Rename to getCatchSwitchParentPad, to make it more clear which ancestor
the "parent" in question is. Add a comment pointing out the key feature
that the returned pad indicates which funclet contains the successor
block.
Reviewers: rnk, andrew.w.kaylor, majnemer
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D16222
llvm-svn: 257933
These casts were from function pointer to data pointer type, which some
compilers (including GCC) may warn about. In all cases where these casts were
used the original value was still available as a TargetAddress (uint64_t), so
we can just print a formatted version of that instead.
llvm-svn: 257932
The file and buffer writer code are mostly shared except for the
stream back-patching. This is because raw_string_ostream does not
support seek like interface. The result is that the data patching
code needs to be pushed to the caller which is not quite readable
(passing around offset, value etc). This also makes future enhancement
(which needs more patching) more difficult (and can make impl messy).
In this patch, two types of streams needed by the writer are now
unified with same set of interfaces under ProfOStream class. The patch
method is added so that common implementation becomes cleaner. It
also enables future enhancement. Should be NFC.
llvm-svn: 257921
This reverts commit r257751, bringing back r256105.
The problem the assert found was fixed in r257915.
Original commit message:
Assert that we have all use/users in the getters.
An error that is pretty easy to make is to use the lazy bitcode reader
and then do something like
if (V.use_empty())
The problem is that uses in unmaterialized functions are not accounted
for.
This patch adds asserts that all uses are known.
llvm-svn: 257920
# The first commit's message is:
Revert "[ARM] Add DSP build attribute and extension targeting"
This reverts commit b11cc50c0b4a7c8cdb628abc50b7dc226ff583dc.
# This is the 2nd commit message:
Revert "[ARM] Add new system registers to ARMv8-M Baseline/Mainline"
This reverts commit 837d08454e3e5beb8581951ac26b22fa07df3cd5.
llvm-svn: 257916
Added 2 constants:
DT_TLSDESC_PLT = 0x6FFFFEF6, Location of PLT entry for TLS descriptor resolver calls.
DT_TLSDESC_GOT = 0x6FFFFEF7, Location of GOT entry used by TLS descriptor resolver PLT entry.
Constants were taken from "Thread-Local Storage Descriptors for IA32 and AMD64/EM64T Version 0.9.5" http://www.fsfla.org/~lxoliva/writeups/TLS/RFC-TLSDESC-x86.txt
Differential revision: http://reviews.llvm.org/D16185
llvm-svn: 257911
platforms.
With ELF, the alignment of a global variable in a shared library will
get copied into an executables linked against it, if the executable even
accesss the variable. So, it's not possible to implicitly increase
alignment based on access patterns, or you'll break existing binaries.
This happened to affect libc++'s std::cout symbol, for example. See
thread: http://thread.gmane.org/gmane.comp.compilers.clang.devel/45311
(This is a re-commit of r257719, without the bug reported in
PR26144. I've tweaked the code to not assert-fail in
enforceKnownAlignment when computeKnownBits doesn't recurse far enough
to find the underlying Alloca/GlobalObject value.)
Differential Revision: http://reviews.llvm.org/D16145
llvm-svn: 257902
There are several requirements that ended up with this design;
1. Matching bitreversals is too heavyweight for InstCombine and doesn't really need to be done so early.
2. Bitreversals and byteswaps are very related in their matching logic.
3. We want to implement support for matching more advanced bswap/bitreverse patterns like partial bswaps/bitreverses.
4. Bswaps are best matched early in InstCombine.
The result of these is that a new utility function is created in Transforms/Utils/Local.h that can be configured to search for bswaps, bitreverses or both. InstCombine uses it to find only bswaps, CGP uses it to find only bitreversals.
We can then extend the matching logic in one place only.
llvm-svn: 257875
This method has no callers.
Also remove X86ELFRelocationInfo.cpp and X86MachORelocationInfo.cpp
which only existed to provide an implementation of that method.
Ok'd by Rafael and Jim.
llvm-svn: 257859
classes.
OrcRemoteTargetClient::RCMemoryManager will now register EH frames with the
server automatically. This allows remote-execution of code that uses exceptions.
llvm-svn: 257816
Rounding up an integer m to a nearest multiple of n where n is a power
of 2 is used very often if you are writing code to emit binary files.
RoundUpToAlignment is a small function to do that. But we found that the
function has a small but annoying issue; the name is a bit too long.
Because it is used quite often, that hurts readability.
This patch is to rename the function. The original name is kept as a
forwarder, so that submitting this patch won't immediately break Clang
and other LLVM projects. Once I update all occurrences of RoundUpToAlignment,
I'll remove the old name entirely.
http://reviews.llvm.org/D16162
llvm-svn: 257799
Binary annotations are encoded along the lines of UTF-8 and ECI but with
a few minor differences.
The algorithm specified in "ECMA-335 CLI Section II.3.2 - Blobs and
Signatures" is used to compress binary annotations. Signed binary
annotations are encoded like unsigned annotations except the sign bit is
rotated left to reduce the number of bits needed to be encoded.
llvm-svn: 257742