This rewrites and expands the existing codeview dumping functionality in
llvm-readobj using techniques similar to those in lib/Object. This defines a
number of new records and enums useful for reading memory mapped codeview
sections in COFF objects.
The dumper is intended as a testing tool for LLVM as it grows more codeview
output capabilities.
Reviewers: majnemer
Differential Revision: http://reviews.llvm.org/D16104
llvm-svn: 257658
The LinkAllPasses.h file is included in several main programs, to force
a large number of passes to be linked in. However, the ForcePassLinking
constructor uses undefined behavior, since it calls member functions on
`nullptr`, e.g.:
((llvm::Function*)nullptr)->viewCFGOnly();
llvm::RGPassManager RGM;
((llvm::RegionPass*)nullptr)->runOnRegion((llvm::Region*)nullptr, RGM);
When the optimization level is -O2 or higher, the code below the first
nullptr dereference is optimized away, and replaced by `ud2` (on x86).
Therefore, the calls after that first dereference are never emitted. In
my case, I noticed there was no call to `llvm::sys::RunningOnValgrind()`!
Replace instances of dereferencing `nullptr` with either objects on the
stack, or regular function calls.
Differential Revision: http://reviews.llvm.org/D15996
llvm-svn: 257645
Summary:
v2: Make ReturnsVoid private, so that I can another 8 lines of code and
look more productive.
Reviewers: tstellarAMD, arsenm
Subscribers: arsenm
Differential Revision: http://reviews.llvm.org/D16034
llvm-svn: 257622
Summary:
Return values can be stored in SGPRs (i32) and VGPRs (f32).
This will be used by functions which expect some bytecode or other binary to
be appended at the end. It allows defining in which registers the return
values will be stored.
v2: don't do this for compute shaders
Reviewers: tstellarAMD, arsenm
Subscribers: arsenm
Differential Revision: http://reviews.llvm.org/D16033
llvm-svn: 257621
is < ``2.0``.
Older versions of psutil (e.g. ``1.2.1`` which is the version shipped with
Ubuntu 14.04) use a different API for retrieving the child processes.
To handle this try the new API first and if that fails try the old API.
llvm-svn: 257616
Summary:
It is off by default, but can be used
with --misched=si
Patch by: Axel Davy
Reviewers: arsenm, tstellarAMD, nhaehnle
Subscribers: nhaehnle, solenskiner, arsenm, llvm-commits
Differential Revision: http://reviews.llvm.org/D11885
llvm-svn: 257609
The global entry point prologue currently assumes that the TOC
associated with a function is less than 2GB away from the function
entry point. This is always true when using the medium or small
code model, but may not be the case when using the large code model.
This patch adds a new variant of the ELFv2 global entry point prologue
that lifts the 2GB restriction when building with -mcmodel=large.
This works by emitting a quadword containing the distance from the
function entry point to its associated TOC immediately before the
entry point, and then using a prologue like:
ld r2,-8(r12)
add r2,r2,r12
Since creation of the entry point prologue is now split across two
separate routines (PPCLinuxAsmPrinter::EmitFunctionEntryLabel emits
the data word, PPCLinuxAsmPrinter::EmitFunctionBodyStart the prolog
code), I've switched to using named labels instead of just temporaries
to indicate the locations of the global and local entry points and the
new TOC offset data word.
These names are provided by new routines in PPCFunctionInfo modeled
after the existing PPCFunctionInfo::getPICOffsetSymbol.
Note that a corresponding change was committed to GCC here:
https://gcc.gnu.org/ml/gcc-patches/2015-12/msg00355.html
Reviewers: hfinkel
Differential Revision: http://reviews.llvm.org/D15500
llvm-svn: 257597
Summary:
With the ability to concatenate shader binaries, the limit of 15 no longer
applies.
Reviewers: tstellarAMD, arsenm
Subscribers: arsenm
Differential Revision: http://reviews.llvm.org/D16031
llvm-svn: 257592
Summary:
This allows Mesa to pass initial SPI_PS_INPUT_ADDR to LLVM.
The register assigns VGPR locations to PS inputs, while the ENA register
determines whether or not they are loaded.
Mesa needs to set some inputs as not-movable, so that a pixel shader prolog
binary appended at the beginning can assume where some inputs are.
v2: Make PSInputAddr private, because there is never enough silly getters
and setters for people to read.
Reviewers: tstellarAMD, arsenm
Subscribers: arsenm
Differential Revision: http://reviews.llvm.org/D16030
llvm-svn: 257591
Summary: ret.ll will contain a test for this
Reviewers: tstellarAMD, arsenm
Subscribers: arsenm
Differential Revision: http://reviews.llvm.org/D16029
llvm-svn: 257590
Make x86 OptimizeLEAs pass remove LEA instruction if there is another LEA
(in the same basic block) which calculates address differing only be a
displacement. Works only for -Oz.
Differential Revision: http://reviews.llvm.org/D13295
llvm-svn: 257589
This patch turns off the fast-math optimization attribute on the caller
if the callee's fast-math attribute is not turned on.
For example,
- before inlining
caller: "less-precise-fpmad"="true"
callee: "less-precise-fpmad"="false"
- after inlining
caller: "less-precise-fpmad"="false"
Alternatively, it's possible to block inlining if the caller's and
callee's attributes don't match. If this approach is preferable to the
one in this patch, we can discuss post-commit.
rdar://problem/19836465
Differential Revision: http://reviews.llvm.org/D7802
llvm-svn: 257575
Summary:
The problem here is that an enum class can not be implicitly converted to an
integer. That assumption snuck back into PointerIntPair. This commit fixes the
issue and more importantly adds some unittests to make sure that we do not break
this again.
rdar://23594806
Reviewers: gribozavr
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D16131
llvm-svn: 257574
AnalyzeBranch on X86 (and, previously, SPARC, which implementation was
copied from X86) tries to modify the branches based on block
layout (e.g. checking isLayoutSuccessor), when AllowModify is true.
The rest of the architectures leave that up to the caller, which can
call InsertBranch, RemoveBranch, and ReverseBranchCondition as
appropriate. That appears to be the preferred way to do it nowadays.
This commit makes SPARC like the rest: replaces AnalyzeBranch with an
implementation cribbed from AArch64, and adds a ReverseBranchCondition
implementation.
Additionally, a test-case has been added (also cribbed from AArch64)
demonstrating that redundant branch sequences no longer get emitted.
E.g., it used to emit code like this:
bne .LBB1_2
nop
ba .LBB1_1
nop
.LBB1_2:
And now emits:
cmp %i0, 42
be .LBB1_1
nop
llvm-svn: 257572
(Resubmit after fixing a typo that breaks test on big endian
machines)
In this refactoring, member functions are introduced to access
CovMap header/func record members and hide layout details. This
will enable further code restructuring to support reading multiple
versions of coverage mapping data with shared/templatized code.
(When coveremap format version changes, backward compatibtility
should be preserved).
llvm-svn: 257571
When we arrive at the end of the function, the validation of
the object has been done already. In theory, so, we should never
arrive here with something broken as the object isn't mutated.
Practice sometimes proves theory to be wrong, so leave an assertion
instead, as suggested by David Blaikie, to catch bugs.
llvm-svn: 257570
The version numbers of the darwin kernel are different from the version
numbers of OS X, so we need adjustments if we had "*-*-darwin" triples.
Use the existing utility functions in TargetTriple for this.
Fixes rdar://22056966
Differential Revision: http://reviews.llvm.org/D14601
llvm-svn: 257555
The line tables for CodeView make a distinction between expressions and
statements. As it turns out, MSVC always emits them as statements and
we always emit them as expressions. Let's switch to statements to match
the CodeView that they emit.
llvm-svn: 257553
This change has us print out fields we didn't previously understand. To
improve readability, we now group column information with it's
respective line.
llvm-svn: 257552
(Resubmit after fixing build bot failures)
In this refactoring, member functions are introduced to access
CovMap header/func record members and hide layout details. This
will enable further code restructuring to support reading multiple
versions of coverage mapping data with shared/templatized code.
(When coveremap format version changes, backward compatibtility
should be preserved).
llvm-svn: 257551
The follow extra changes were made to test cases:
Manually making the variable be the actual type instead of a pointer
to avoid pointer-size differences in generic code:
LLVM :: DebugInfo/Generic/2010-03-24-MemberFn.ll
LLVM :: DebugInfo/Generic/2010-04-06-NestedFnDbgInfo.ll
LLVM :: DebugInfo/Generic/2010-05-03-DisableFramePtr.ll
LLVM :: DebugInfo/Generic/varargs.ll
Delete sizing information from debug info for the same reason
(but the presence of the pointer was important to the test case):
LLVM :: DebugInfo/Generic/restrict.ll
LLVM :: DebugInfo/Generic/tu-composite.ll
LLVM :: Linker/type-unique-type-array-a.ll
LLVM :: Linker/type-unique-simple2.ll
Fixing an incorrect DW_OP_deref
LLVM :: DebugInfo/Generic/2010-05-03-OriginDIE.ll
Fixing a missing DW_OP_deref
LLVM :: DebugInfo/Generic/incorrect-variable-debugloc.ll
Additionally, clang should no longer complain during bootstrap should no
longer happen after r257534.
The original commit message was:
```
Summary:
Teach the Verifier to make sure that the storage size given to llvm.dbg.declare
or the value size given to llvm.dbg.value agree with what is declared in
DebugInfo. This is implicitly assumed in a number of passes (e.g. in SROA).
Additionally this catches a number of common mistakes, such as passing a
pointer when a value was intended or vice versa.
One complication comes from stack coloring which modifies the original IR when
it merges allocas in order to make sure that if AA falls back to the IR it gets
the correct result. However, given this new invariant, indiscriminately
replacing one alloca by a different (differently sized one) is no longer valid.
Fix this by just undefing out any use of the alloca in a dbg.declare in this
case.
Additionally, I had to fix a number of test cases. Of particular note:
- I regenerated dbg-changes-codegen-branch-folding.ll from the given source as
it was affected by the bug fixed in r256077
- two-cus-from-same-file.ll was changed to avoid having a variable-typed debug
variable as that would depend on the target, even though this test is
supposed to be generic
- I had to manually declared size/align for reference type. See also the
discussion for D14275/r253186.
- fpstack-debuginstr-kill.ll required changing `double` to `long double`
- most others were just a question of adding OP_deref
```
llvm-svn: 257550
to only print the first private header.
Which for Mach-O files only prints the Mach header and not the subsequent load
commands. Which is used by scripts to match what the darwin otool(1) with the
-h flag does without the -l flag.
For non-Mach-O files it has the same functionality as -private-headers (with
the trailing ’s’).
rdar://24158331
llvm-svn: 257548
In this refactoring, member functions are introduced to access
CovMap header/func record members and hide layout details. This
will enable further code restructuring to support reading multiple
versions of coverage mapping data with shared/templatized code.
(When coveremap format version changes, backward compatibtility
should be preserved).
llvm-svn: 257547
Summary:
BFC instructions are available in ARMv6T2 and above.
Reviewers: t.p.northover
Subscribers: aemerson
Differential Revision: http://reviews.llvm.org/D16076
llvm-svn: 257546
VMOVs are not strictly speaking cheap, but they are as expensive as a vector
copy (VORR), so we should prefer rematerialization over splitting when it
applies.
rdar://problem/23754176
llvm-svn: 257545
Previously the RegisterOperands have only been used internally in
RegisterPressure.cpp. However this datastructure can be useful for other
tasks as well and allows refactoring of PDiff initialisation out of
RPTracker::recede().
This patch:
- Exposes RegisterOperands as public API
- Splits RPTracker::recede() into a part that skips DebugValues and
maintains the region borders, and the core that changes register
pressure when given a set of RegisterOperands.
- This allows to move the PDiff initialisation out recede() into a
method of the PressureDiffs class.
- The upcoming subregister scheduling code will also use
RegisterOperands to avoid pushing more unrelated functionality into
recede()/advance().
Differential Revision: http://reviews.llvm.org/D15473
llvm-svn: 257535
Summary: The dbg.declare -> dbg.value conversion looks through any zext/sext
to find a value to describe the variable (in the expectation that those
zext/sext instruction will go away later). However, those values do not
cover the entire variable and thus need a DW_OP_bit_piece.
Reviewers: aprantl
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D16061
llvm-svn: 257534
Summary: Add SaturatingMultiplyAdd convenience function template since A + (X * Y) comes up frequently when doing weighted arithmetic.
Reviewers: davidxl, silvas
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D15385
llvm-svn: 257532
CodeView, unlike DWARF, can associate code with a range of columns.
However, LLVM can only represent a single column position internally.
We used to claim that the end column and start column were the same
which yielded less than satisfactory results: we would stop printing at
the _beginning_ of the source expression! Instead, mark the column-end
as 'zero' to indicate that we don't have one (as per the documentation
for IDiaLineNumber::get_lineNumberEnd).
llvm-svn: 257528
Only non-weighted predicates were handled in PPCInstrInfo::insertSelect. Handle
the weighted predicates as well.
This latent bug was triggered by r255398, because it added use of the
branch-weighted predicates.
While here, switch over an enum instead of an int to get the compiler to enforce
totality in the future.
llvm-svn: 257518
A request has been made to the official registry, but an official value is
not yet available. This patch uses a temporary value in order to support
development. When an official value is recieved, the value of EM_WEBASSEMBLY
will be updated.
llvm-svn: 257517
Refactor .param, .result, .local, and .endfunc, as directives, using the
proper MCTargetStreamer mechanism, rather than fake instructions.
llvm-svn: 257511
This patch changes the way labels are referenced. Instead of referencing the
basic-block label name (eg. .LBB0_0), instructions now just have an immediate
which indicates the depth in the control-flow stack to find a label to jump to.
This makes them much closer to what we expect to have in the binary encoding,
and avoids the problem of basic-block label names not being explicit in the
binary encoding.
Also, it terminates blocks and loops with end_block and end_loop instructions,
rather than basic-block label names, for similar reasons.
This will also fix problems where two constructs appear to have the same label,
because we no longer explicitly use labels, so consumers that need labels will
presumably create their own labels, and presumably they won't reuse labels
when they do.
This patch does make the code a little more awkward to read; as a partial
mitigation, this patch also introduces comments showing where the labels are,
and comments on each branch showing where it's branching to.
llvm-svn: 257505
The findExternalCalls routine ignores calls to functions already
defined in the dest module. This was not handling the case where
the definition in the current module is actually an alias to a
function call.
llvm-svn: 257493
This is a very limited implementation of DFG-based copy propagation.
It only handles actual COPY instructions (does not handle other equivalents
such as add-immediate with a 0 operand).
The major limitation is that it does not update the DFG: that will be the
change required to make it more robust (hopefully coming up soon).
llvm-svn: 257490
Prepatory patch before changing LibCallSimplifier to use the FMF.
Also, tighten the CHECK lines and give the tests more meaningful names.
Similar changes to:
http://reviews.llvm.org/rL257414
llvm-svn: 257481
Summary: The result register is the second operand as per the other mt* instructions.
Reviewers: vkalintiris
Subscribers: llvm-commits, dsanders
Differential Revision: http://reviews.llvm.org/D15993
llvm-svn: 257478
Target independent, SSA-based data flow framework for representing
data flow between physical registers.
This commit implements the creation of the actual data flow graph.
llvm-svn: 257477
Summary:
This fixes three bugs, in all of which state is not or incorrecly reset between
objects (i.e. when reusing the same pass manager to create multiple object
files):
1) AttributeSection needs to be reset to nullptr, because otherwise the backend
will try to emit into the old object file's attribute section causing a
segmentation fault.
2) MappingSymbolCounter needs to be reset, otherwise the second object file
will start where the first one left off.
3) The MCStreamer base class resets the Streamer's e_flags settings. Since
EF_ARM_EABI_VER5 is set on streamer creation, we need to set it again
after the MCStreamer was rest.
Also rename Reset (uppser case) to EHReset to avoid confusion with
reset (lower case).
Reviewers: rengolin
Differential Revision: http://reviews.llvm.org/D15950
llvm-svn: 257473
(64 to 128-bit) matches against the pattern fragment 'vzmovl_v2i64'
(a zero-extended 64-bit load).
However, a change in r248784 teaches the instruction combiner that only
the lower 64 bits of the input to a 128-bit vcvtph2ps are used. This means
the instruction combiner will ordinarily optimize away the upper 64-bit
insertelement instruction in the zero-extension and so we no longer select
the memory-register form. To fix this a new pattern has been added.
Differential Revision: http://reviews.llvm.org/D16067
llvm-svn: 257470
This means that the DEBUG_TYPE cannot take a comma anymore. All existing passes
conform to this rule.
Differential Revision: http://reviews.llvm.org/D15645
llvm-svn: 257466
With this, one can build a lib from the objects of other libs:
set(SOURCES
$<TARGET_OBJECTS:obj.clingInterpreter>
$<TARGET_OBJECTS:obj.clingMetaProcessor>
$<TARGET_OBJECTS:obj.clingUtils>
)
Reviewed by Chris Bieneman - thanks!
llvm-svn: 257459
handlers.
It is expected that RPC handlers will usually be member functions. Accepting them
directly in handle and expect allows for the remove of a lot of lambdas an
explicit error variables.
This patch also uses this new feature to substantially tidy up the
OrcRemoteTargetServer class.
llvm-svn: 257452
Since function definitions are not loaded into the address space, PT_LOAD is
inappropriate. PT_WEBASSEMBLY_FUNCTIONS is used to identify where the function
definitions are so that they can be processed at program startup time.
llvm-svn: 257436
The layering of where the various loop unroll parameters are
initialized and overridden here was very confusing, making it pretty
difficult to tell just how the various sources interacted. Instead, we
put all of the initialization logic together in a single function so
that it's obvious what overrides what.
llvm-svn: 257426
Function::copyAttributesFrom will copy the personality function, prefix
data and prolog data from the source function to the new function, and
is invoked when the IRMover copies the function prototype. This puts a
reference to a constant in the source module on a function in the dest
module, which causes an error when deleting the source module after
importing, since the personality function in the source module still has
uses (this would presumably also be an issue for the prologue and prefix
data). Remove the copies added to the dest copy when creating the new
prototype, as they are mapped properly when/if we link the function body.
llvm-svn: 257420
Currently WebAssembly has two kinds of relocations; data addresses and
function addresses. This adds ELF relocations for them, as well as an
MC symbol kind to indicate which type of relocation is needed.
llvm-svn: 257416
Apparently the preferred version is the incredibly complicated
VerifyVersionInfoW function.
Rename the function to avoid potential future name clashes.
llvm-svn: 257415
Currently we're unrolling loops more in minsize than in optsize, which
means -Oz will have a larger code size than -Os. That doesn't make any
sense.
This resolves the FIXME about this in LoopUnrollPass and extends the
optsize test to make sure we use the smaller threshold for minsize as
well.
llvm-svn: 257402
This is a continuation of adding FMF to call instructions:
http://reviews.llvm.org/rL255555
The intent of the patch is to preserve the current behavior of the transform except
that we use the sqrt instruction's 'fast' attribute as a trigger rather than the
function-level attribute.
But this raises a bug noted by the new FIXME comment.
In order to do this transform:
sqrt((x * x) * y) ---> fabs(x) * sqrt(y)
...we need all of the sqrt, the first fmul, and the second fmul to be 'fast'.
If any of those ops is strict, we should bail out.
Differential Revision: http://reviews.llvm.org/D15937
llvm-svn: 257400
Always expect tglobaladdr and texternalsym to be wrapped in
WebAssemblywrapper nodes. Also, split out a regPlusGA from regPlusImm so
that it can special-case global addresses, as they can be folded in more
cases.
Unfortunately this doesn't enable any new optimizations yet due to
SelectionDAG limitations. I'll be submitting changes to the SelectionDAG
infrastructure, along with tests, in a separate patch.
llvm-svn: 257394
Address review feedback from r255909.
Move body of resolveCycles(bool AllowTemps) to
resolveRecursivelyImpl(bool AllowTemps). Revert resolveCycles back
to asserting on temps, and add new resolveNonTemporaries interface
to invoke the new implementation with AllowTemps=true. Document
the differences between these interfaces, specifically the effect
on RAUW support and uniquing. Call appropriate interface from
ValueMapper.
llvm-svn: 257389
When asan is enabled, we poison slabs as we allocate them, and only unpoison the pieces
we need from the slab.
However, in Reset, we were failing to reset the state of the slab back to being poisoned.
Patch by b17 c0de.
llvm-svn: 257388
This reverts commit r254363.
load64BitDebugHelp() has the side effect of loading dbghelp and setting
globals. It should be called in no-asserts builds as well as debug
builds.
llvm_unreachable is also not appropriate here, since we actually want to
return if dbghelp couldn't be loaded in a non-asserts build.
llvm-svn: 257384
This removes ifdefs and fixes the build for users of the Win8.0 SDK,
which I happen to be. Upgrading is not hard, but executing the same code
everywhere seems better.
llvm-svn: 257379
After these revisions, for arm targets, the -mcpu=xscale option caused
an error: "the clang compiler does not support '-mcpu=xscale'". Adding
"v5e" as a SUB_ARCH in ARMTargetParser.def helps.
Submitted by: Andrew Turner
Differential Revision: http://reviews.llvm.org/D16043
llvm-svn: 257376
This patch fixes the memory sanitizer origin store instrumentation for
array types. This can be triggered by cases where frontend lowers
function return to array type instead of aggregation.
For instance, the C code:
--
struct mypair {
int64_t x;
int y;
};
mypair my_make_pair(int64_t x, int y) {
mypair p;
p.x = x;
p.y = y;
return p;
}
int foo (int p)
{
mypair z = my_make_pair(p, 0);
return z.y + z.x;
}
--
It will be lowered with target set to aarch64-linux and -O0 to:
--
[...]
define i32 @_Z3fooi(i32 %p) #0 {
[...]
%call = call [2 x i64] @_Z12my_make_pairxi(i64 %conv, i32 0)
%1 = bitcast %struct.mypair* %z to [2 x i64]*
store [2 x i64] %call, [2 x i64]* %1, align 8
[...]
--
The origin store will emit a 'icmp' to test each store value again the
TLS origin array. However since 'icmp' does not support ArrayType the
memory instrumentation phase will bail out with an error.
This patch change it by using the same strategy used for struct type on
array.
It fixes the 'test/msan/insertvalue_origin.cc' for aarch64 (the -O0 case).
llvm-svn: 257375
I'm still seeing GCC ICE locally, but figured I'd throw this at the wall
& see if it sticks for the bots at least. Will continue investigating
the ICE in any case.
llvm-svn: 257367
The hardware instruction's output on 0 is -1 rather than 32.
Eliminate a test and select to -1. This removes an extra instruction
from the compatability function with HSAIL's firstbit instruction.
llvm-svn: 257352
The new ORC remote-JITing support provides a superset of the old code's
functionality, so we can replace the old stuff. As a bonus, a couple of
previously XFAILed tests have started passing.
llvm-svn: 257343
Summary:
It actually takes an offset into the current PC-region.
This fixes the 'expr' command in lldb.
Reviewers: vkalintiris, jaydeep, bhushan
Subscribers: dsanders, llvm-commits
Differential Revision: http://reviews.llvm.org/D16054
llvm-svn: 257339
In the OptimizeLEA pass keep instructions' positions in the basic block saved and use them for calculation of the distance between two instructions instead of std::distance. This reduces complexity of the pass from O(n^3) to O(n^2) and thus the compile time.
Differential Revision: http://reviews.llvm.org/D15692
llvm-svn: 257328
This is a recommit of r257253 which was reverted in r257270.
Previous testcase can make failure on some targets due to using opt with O3 option.
Original Summary:
Merge MBBICommon and MBBI's MMOs.
Differential Revision: http://reviews.llvm.org/D15990
llvm-svn: 257317
This patch adds utilities to ORC for managing a remote JIT target. It consists
of:
1. A very primitive RPC system for making calls over a byte-stream. See
RPCChannel.h, RPCUtils.h.
2. An RPC API defined in the above system for managing memory, looking up
symbols, creating stubs, etc. on a remote target. See OrcRemoteTargetRPCAPI.h.
3. An interface for creating high-level JIT components (memory managers,
callback managers, stub managers, etc.) that operate over the RPC API. See
OrcRemoteTargetClient.h.
4. A helper class for building servers that can handle the RPC calls. See
OrcRemoteTargetServer.h.
The system is designed to work neatly with the existing ORC components and
functionality. In particular, the ORC callback API (and consequently the
CompileOnDemandLayer) is supported, enabling lazy compilation of remote code.
Assuming this doesn't trigger any builder failures, a follow-up patch will be
committed which tests these utilities by using them to replace LLI's existing
remote-JITing demo code.
llvm-svn: 257305
This is a more generic version of the MCJITMemoryManager::notifyObjectLoaded
method: It provides only a RuntimeDyld reference (rather than an
ExecutionEngine), and so can be used with ORC JIT stacks.
llvm-svn: 257296
RuntimeDyld::MemoryManager.
The RuntimeDyld::MemoryManager::reserveAllocationSpace method is called when
object files are loaded, and gives clients a chance to pre-allocate memory for
all segments. Previously only the size of each segment (code, ro-data, rw-data)
was supplied but not the alignment. This hasn't caused any problems so far, as
most clients allocate via the MemoryBlock interface which returns page-aligned
blocks. Adding alignment arguments enables finer grained allocation while still
satisfying alignment restrictions.
llvm-svn: 257294
In r255760, I optimized the SectionMemoryManager to make better use
of virtual memory on platforms where the allocation granularity was
bigger than the protection granularity. As part of this, fixing up
the free list became more complicated and was moved into
`applyMemoryGroupPermissions`. Unfortunately, I forgot to actually
remove the call that drops the free list for RO memory (I did
remove the corresponding one for RX memory), defeating the whole
optimization.
llvm-svn: 257293
Summary:
Use proper dataflow ordering to speed convergence.
This will converge the testcase on bug 26055 in 2 iterations.
(data structures speedups to come to make even that faster)
Reviewers: kcc, samsonov, echristo, dblaikie, tvvikram
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D16039
llvm-svn: 257292
llvm\unittests\ExecutionEngine\Orc\ObjectLinkingLayerTest.cpp(115) : error C2327: 'llvm::OrcExecutionTest::TM' : is not a type name, static, or enumerator
llvm\unittests\ExecutionEngine\Orc\ObjectLinkingLayerTest.cpp(115) : error C2065: 'TM' : undeclared identifier
FYI, "this->TM" was valid even before moving class SectionMemoryManagerWrapper.
llvm-svn: 257290
MSVC seems to have problems looking up Value inside of the template. Not
really sure whether that's a bug there or Clang and GCC being too
permissive.
llvm-svn: 257288
type.
This makes it easy and safe to use a set of flags as one elmenet of
a tagged union with pointers. There is quite a bit of code that has
historically done this by casting arbitrary integers to "pointers" and
assuming that this was safe and reliable. It is neither, and has started
to rear its head by triggering safety asserts in various abstractions
like PointerLikeTypeTraits when the integers chosen are invariably poor
choices for *some* platform and *some* situation. Not to mention the
(hopefully unlikely) prospect of one of these integers actually getting
allocated!
With this, it will be straightforward to build type safe abstractions
like this without being error prone. The abstraction itself is also
remarkably simple thanks to the implicit conversion.
This use case and pattern was also independently created by the folks
working on Swift, and they're going to incrementally add any missing
functionality they find.
Differential Revision: http://reviews.llvm.org/D15844
llvm-svn: 257284
This is a much more general and powerful form of PointerUnion. It
provides a reasonably complete sum type (from type theory) for
pointer-like types. It has several significant advantages over the
existing PointerUnion infrastructure:
1) It allows more than two pointer types to participate without awkward
nesting structures.
2) It directly exposes the tag so that it is convenient to write
switches over the possible members.
3) It can re-use the same type for multiple tag values, something that
has been worked around by either abusing PointerIntPair or defining
nonce types and doing unsafe pointer casting.
4) It supports customization of the PointerLikeTypeTraits used for
specific member types. This means it could (in theory) be used even
with types that are over-aligned on allocation to expose larger
numbers of bits to the tag.
All in all, I think it is at least complimentary to the existing
infrastructure, and a strict improvement for some use cases.
Differential Revision: http://reviews.llvm.org/D15843
llvm-svn: 257282
JumpThreading's runOnFunction is supposed to return true if it made any
changes. JumpThreading has a call to removeUnreachableBlocks which may
result in changes to the IR but runOnFunction didn't appropriate account
for this possibility, leading to badness.
While we are here, make sure to call LazyValueInfo::eraseBlock in
removeUnreachableBlocks; JumpThreading preserves LVI.
This fixes PR26096.
llvm-svn: 257279
Summary:
This is a fix of D13718. D13718 was committed but then reverted because of the following bug:
https://llvm.org/bugs/show_bug.cgi?id=25299
This patch fixes the issue shown in the bug.
Reviewers: majnemer, reames
Subscribers: jevinskie, llvm-commits
Differential Revision: http://reviews.llvm.org/D14308
llvm-svn: 257277
Summary:
The code was simply ensuring that the catchpad's pred is its catchswitch,
which was letting cases slip through where the flow edge was the unwind
edge of the catchswitch rather than one of its catch clauses.
Reviewers: andrew.w.kaylor, rnk, majnemer
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D16011
llvm-svn: 257275
Summary:
Funclet-based EH personalities/tables likely can't handle these, and they
can't be generated at source, so make them officially illegal in IR as
well.
Reviewers: andrew.w.kaylor, rnk, majnemer
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D15963
llvm-svn: 257274
Summary:
A funclet EH pad may be exited by an unwind edge, which may be a
cleanupret exiting its cleanuppad, an invoke exiting a funclet, or an
unwind out of a nested funclet transitively exiting its parent. Funclet
EH personalities require all such exceptional exits from a given funclet to
have the same unwind destination, and EH preparation / state numbering /
table generation implicitly depends on this. Formalize it as a rule of
the IR in the LangRef and verifier.
Reviewers: rnk, majnemer, andrew.w.kaylor
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D15962
llvm-svn: 257273
Summary:
Funclet EH personalities require a tree-like nesting among funclets
(enforced by the ParentPad linkage in the IR), and also require that
unwind edges conform to certain rules with respect to the tree:
- An unwind edge may exit 0 or more ancestor pads
- An unwind edge must enter exactly one EH pad, which must be distinct
from any exited pads
- A cleanupret's edge must exit its cleanuppad
Describe these rules in the LangRef, and enforce them in the verifier.
Reviewers: rnk, majnemer, andrew.w.kaylor
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D15961
llvm-svn: 257272
This reverts r257221.
This caused several build bot failures
* It looks like some of the tests don't work correctly under Windows
* It looks like the lit per test timeout tests fail
So I'm reverting for now. Once the above failures are fixed running
lit's tests can be enabled again.
llvm-svn: 257268
AVX1 v8i32/v4i64 shuffles are bitcasted to v8f32/v4f64, this patch peeks through any bitcast to check for a load node to allow broadcasts to occur.
This is a re-commit of r257055 after r257264 fixed 32-bit broadcast loads of i64 scalars.
llvm-svn: 257266
managers.
Prior to this patch, recursive finalization (where finalization of one
RuntimeDyld instance triggers finalization of another instance on which the
first depends) could trigger memory access failures: When the inner (dependent)
RuntimeDyld instance and its memory manager are finalized, memory allocated
(but not yet relocated) by the outer instance is locked, and relocation in the
outer instance fails with a memory access error.
This patch adds a latch to the RuntimeDyld::MemoryManager base class that is
checked by a new method: RuntimeDyld::finalizeWithMemoryManagerLocking, ensuring
that shared memory managers are only finalized by the outermost RuntimeDyld
instance.
This allows ORC clients to supply the same memory manager to multiple calls to
addModuleSet. In particular it enables the use of user-supplied memory managers
with the CompileOnDemandLayer which must reuse the supplied memory manager for
each function that is lazily compiled.
llvm-svn: 257263
Summary:
This is analogous to r256079, which removed an overly strong assertion, and
r256812, which simplified the code by replacing three conditionals by one.
Reviewers: reames
Subscribers: sanjoy, llvm-commits
Differential Revision: http://reviews.llvm.org/D16019
llvm-svn: 257250
This patch teaches rewrite-statepoints-for-gc to relocate vector-of-pointers directly rather than trying to split them. This builds on the recent lowering/IR changes to allow vector typed gc.relocates.
The motivation for this is that we recently found a bug in the vector splitting code where depending on visit order, a vector might not be relocated at some safepoint. Specifically, the bug is that the splitting code wasn't updating the side tables (live vector) of other safepoints. As a result, a vector which was live at two safepoints might not be updated at one of them. However, if you happened to visit safepoints in post order over the dominator tree, everything worked correctly. Weirdly, it turns out that post order is actually an incredibly common order to visit instructions in in practice. Frustratingly, I have not managed to write a test case which actually hits this. I can only reproduce it in large IR files produced by actual applications.
Rather than continue to make this code more complicated, we can remove all of the complexity by just representing the relocation of the entire vector natively in the IR.
At the moment, the new functionality is hidden behind a flag. To use this code, you need to pass "-rs4gc-split-vector-values=0". Once I have a chance to stress test with this option and get feedback from other users, my plan is to flip the default and remove the original splitting code. I would just remove it now, but given the rareness of the bug, I figured it was better to leave it in place until the new approach has been stress tested.
Differential Revision: http://reviews.llvm.org/D15982
llvm-svn: 257244
directy with ``make check-lit`` and are run as part of
``make check-all``.
In principle we should run lit's testsuite before testing LLVM using lit
so that any problems with lit get discovered before testing LLVM so we
can bail out early. However this implementation (``check-all`` runs all
tests together) seemed simpler and will still report failing lit tests.
Note that the tests and the configured ``lit.site.cfg`` have to be
copied into the build directory to avoid polluting the source tree.
llvm-svn: 257221
Look for PHI/Select in the same BB of the form
bb:
%p = phi [false, %bb1], [true, %bb2], [false, %bb3], [true, %bb4], ...
%s = select p, trueval, falseval
And expand the select into a branch structure. This later enables
jump-threading over bb in this pass.
Using the similar approach of SimplifyCFG::FoldCondBranchOnPHI(), unfold
select if the associated PHI has at least one constant. If the unfolded
select is not jump-threaded, it will be folded again in the later
optimizations.
llvm-svn: 257198
It's strange that LoopInfo mostly owns the Loop objects, but that it
defers deleting them to the loop pass manager. Instead, change the
oddly named "updateUnloop" to "markAsRemoved" and have it queue the
Loop object for deletion. We can't delete the Loop immediately when we
remove it, since we need its pointer identity still, so we'll mark the
object as "invalid" so that clients can see what's going on.
llvm-svn: 257191
Summary:
r255334 matches bit-reverse pattern in InstCombine and generates calls to Instrinsic::bitreverse.
RBIT instruction is only available for ARMv6t2 and above. This patch has the intrinsic expanded during legalization for ARMv4 and ARMv5.
Patch by Z. Zheng <zhaoshiz@codeaurora.org>
Reviewers: apazos, jmolloy, weimingz
Subscribers: aemerson, rengolin, llvm-commits
Differential Revision: http://reviews.llvm.org/D15932
llvm-svn: 257188
Summary: In ARMConstantIslandPass, which runs after Shrink Wrap pass, long jumps will be fixed up as BL (tBfar) which depends on spilling LR in epilogue. However, shrink-wrap may remove the LR, which causes issues when the function returns.
Reviewers: qcolombet, rengolin
Subscribers: aemerson, rengolin
Differential Revision: http://reviews.llvm.org/D15984
llvm-svn: 257187
Summary:
During legalization if i16, do not ASSERTZEXT the result of FP_TO_FP16.
Directly return an FP_TO_FP16 node with return type as the
promote-to-type of i16.
This patch also removes extraneous length check. This legalization
should be valid even if integer and float types are of different
lengths.
This patch breaks a hard-float test for fp16 args. The test is changed
to allow a vmov to zero-out the top bits, and also ensure that the
return value is in an FP register.
Reviewers: ab, jmolloy
Subscribers: srhines, llvm-commits
Differential Revision: http://reviews.llvm.org/D15438
llvm-svn: 257184
StackColoring rewrites the frame indicies of operations involving
allocas if it can find that the life time of two objects do not overlap.
MSVC EH needs to be kept aware of this if happens in the event that a
catch object has moved around. However, we represent the non-existance
of a catch object with a sentinel frame index (INT_MAX). This sentinel
also happens to be the EmptyKey of the SlotRemap DenseMap. Testing for
whether or not we need to translate the frame index fails in this case
because we call the count method on the DenseMap with the EmptyKey,
leading to assertions. Instead, check if it is our sentinel value
before trying to look into the DenseMap.
This fixes PR26073.
llvm-svn: 257182
Due to the new in-place ThinLTO symbol handling support added in
r257174, we now invoke renameModuleForThinLTO on the current
module from within the FunctionImport pass.
Additionally, renameModuleForThinLTO no longer needs to return the
Module as it is performing the renaming in place on the one provided.
This commit will be immediately preceeded by a companion clang patch to
remove its invocation of renameModuleForThinLTO.
llvm-svn: 257181
Summary:
Move ThinLTO global value processing functions out of ModuleLinker and
into a new ThinLTOGlobalProcessor class, which performs any necessary
linkage and naming changes on the given module in place.
As a result, renameModuleForThinLTO no longer needs to create a new
Module when performing any necessary local to global promotion on a
module that we are possibly exporting from during a ThinLTO backend
compilation.
During function importing the ThinLTO processing is still invoked from
the ModuleLinker (via the new class), as it needs to perform renaming and
linkage changes on the source module, e.g. in order to get the correct
renaming during local to global promotion.
Reviewers: joker.eph
Subscribers: davidxl, llvm-commits, joker.eph
Differential Revision: http://reviews.llvm.org/D15696
llvm-svn: 257174
The function importer was still materializing metadata when modules were
loaded for function importing. We only want to materialize it when we
are going to invoke the metadata linking postpass. Materializing it
before function importing is not only unnecessary, but also causes
metadata referenced by imported functions to be mapped in early, and
then not connected to the rest of the module level metadata when it is
ultimately linked in.
Augmented the test case to specifically check for the metadata being
properly connected, which it wasn't before this fix.
llvm-svn: 257171
This patch corresponds to review:
http://reviews.llvm.org/D15930
Moves to and from CR fields depend on shifts/masks that depend on the
target/source CR field. Thus, post-ra anti-dep breaking must not later
change that CR register assignment.
llvm-svn: 257168
In setInsertionPoint if the value is not a PHI, Instruction or
Argument it should be a Constant, not a ConstantExpr.
Original commit message:
[InstCombine] Look through PHIs, GEPs, IntToPtrs and PtrToInts to expose more constants when comparing GEPs
Summary:
When comparing two GEP instructions which have the same base pointer
and one of them has a constant index, it is possible to only compare
indices, transforming it to a compare with a constant. This removes
one use for the GEP instruction with the constant index, can reduce
register pressure and can sometimes lead to removing the comparisson
entirely.
InstCombine was already doing this when comparing two GEPs if the base
pointers were the same. However, in the case where we have complex
pointer arithmetic (GEPs applied to GEPs, PHIs of GEPs, conversions to
or from integers, etc) the value of the original base pointer will be
hidden to the optimizer and this transformation will be disabled.
This change detects when the two sides of the comparison can be
expressed as GEPs with the same base pointer, even if they don't
appear as such in the IR. The transformation will convert all the
pointer arithmetic to arithmetic done on indices and all the relevant
uses of GEPs to GEPs with a common base pointer. The GEP comparison
will be converted to a comparison done on indices.
Reviewers: majnemer, jmolloy
Subscribers: hfinkel, jevinskie, jmolloy, aadg, llvm-commits
Differential Revision: http://reviews.llvm.org/D15146
llvm-svn: 257164
a top-down manner into a true top-down or RPO pass over the call graph.
There are specific patterns of function attributes, notably the
norecurse attribute, which are most effectively propagated top-down
because all they us caller information.
Walk in RPO over the call graph SCCs takes the form of a module pass run
immediately after the CGSCC pass managers postorder walk of the SCCs,
trying again to deduce norerucrse for each singular SCC in the call
graph.
This removes a very legacy pass manager specific trick of using a lazy
revisit list traversed during finalization of the CGSCC pass. There is
no analogous finalization step in the new pass manager, and a lazy
revisit list is just trying to produce an RPO iteration of the call
graph. We can do that more directly if more expensively. It seems
unlikely that this will be the expensive part of any compilation though
as we never examine the function bodies here. Even in an LTO run over
a very large module, this should be a reasonable fast set of operations
over a reasonably small working set -- the function call graph itself.
In the future, if this really is a compile time performance issue, we
can look at building support for both post order and RPO traversals
directly into a pass manager that builds and maintains the PO list of
SCCs.
Differential Revision: http://reviews.llvm.org/D15785
llvm-svn: 257163
Windows EH keeping track of which frame index corresponds to a catchpad
in order to inform the runtime where the catch parameter should be
initialized. LLVM's optimizations are able to prove that the memory
used by the catch parameter can be reused with another memory
optimization, changing it's frame index.
We need to keep WinEHFuncInfo up to date with respect to this or we will
miscompile/assert.
This fixes PR26069.
llvm-svn: 257158
Done in InstrProfWriter to eliminate the need for client
code to do the sorting. The operation is done once and reused
many times so it is more efficient. Update unit test to remove
sorting. Also update expected output of affected tests.
llvm-svn: 257145
For a new record with weight != 1, only edge profiling
counters are scaled, VP data is not properly scaled.
This patch refactors the code and fixes the problem.
Also added sort by count interface (for follow up patch).
llvm-svn: 257143
This remove the need for locking when deleting a function.
Differential Revision: http://reviews.llvm.org/D15988
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 257139
This is a fix for bug http://llvm.org/bugs/show_bug.cgi?id=25839.
For a PIC TLS variable access in a function, prologue (mflr followed by std and
stdu) gets scheduled after a tls_get_addr call. tls_get_addr messed up LR but
no one saves/restores it.
Also added a test for save/restore clobbered registers during calling __tls_get_addr.
Patch by Tim Shen
llvm-svn: 257137
The early return seems to be missed. This causes a radical and wrong loop
optimization on powerpc. It isn't reproducible on x86_64, because
"UseInterleaved" is false.
Patch by Tim Shen.
llvm-svn: 257134
Limit this transform to a basic block and guard against PHIs.
Hopefully, this fixes the remaining failures in PR25999:
https://llvm.org/bugs/show_bug.cgi?id=25999
llvm-svn: 257133
The new leader is known anyway so we can return it for some micro
optimization in code where it is easy to pass along the result to the
next join().
llvm-svn: 257130
.zero is confusing when used with two arguments. Documentation:
This directive emits SIZE 0-valued bytes. SIZE must be an absolute
expression. This directive is actually an alias for the '.skip'
directive so in can take an optional second argument of the value to
store in the bytes instead of zero. Using '.zero' in this way would be
confusing however.
Ref: https://sourceware.org/bugzilla/show_bug.cgi?id=18353
Hexagon and Sparc do the same, and it's all the same to WebAssembly so
let's pick the less confusing of the two.
llvm-svn: 257111
Summary:
Teach the Verifier to make sure that the storage size given to llvm.dbg.declare
or the value size given to llvm.dbg.value agree with what is declared in
DebugInfo. This is implicitly assumed in a number of passes (e.g. in SROA).
Additionally this catches a number of common mistakes, such as passing a
pointer when a value was intended or vice versa.
One complication comes from stack coloring which modifies the original IR when
it merges allocas in order to make sure that if AA falls back to the IR it gets
the correct result. However, given this new invariant, indiscriminately
replacing one alloca by a different (differently sized one) is no longer valid.
Fix this by just undefing out any use of the alloca in a dbg.declare in this
case.
Additionally, I had to fix a number of test cases. Of particular note:
- I regenerated dbg-changes-codegen-branch-folding.ll from the given source as
it was affected by the bug fixed in r256077
- two-cus-from-same-file.ll was changed to avoid having a variable-typed debug
variable as that would depend on the target, even though this test is
supposed to be generic
- I had to manually declared size/align for reference type. See also the
discussion for D14275/r253186.
- fpstack-debuginstr-kill.ll required changing `double` to `long double`
- most others were just a question of adding OP_deref
Reviewers: aprantl
Differential Revision: http://reviews.llvm.org/D14276
llvm-svn: 257105
Summary:
In rL242338, debugger tuning was introduced, and the tuning for FreeBSD
was set to lldb by default. However, for the foreseeable future we
still need to default to gdb tuning, since lldb is not ready for all of
FreeBSD's architectures, and some system tools (like objcopy, etc) have
not yet been adapted to cope with the lldb tuned format, which has
.apple sections.
Therefore, let FreeBSD use gdb by default for now.
Reviewers: emaste, probinson
Subscribers: llvm-commits, emaste
Differential Revision: http://reviews.llvm.org/D15966
llvm-svn: 257103
We marked values which are 'undef' as constant instead of undefined
which violates SCCP's invariants. If we can figure out that a
computation results in 'undef', leave it in the undefined state.
This fixes PR16052.
llvm-svn: 257102
Fix PR24852 (crash with -debug -instcombine)
Patch by Than McIntosh <thanm@google.com>
Summary:
Add guards to the asm writer to prevent crashing
when dumping an instruction that has no basic
block.
Differential Revision: http://reviews.llvm.org/D15798
From: Than McIntosh <thanm@google.com>
llvm-svn: 257094
Coverage mapping data may reference names of functions
that are skipped by FE (e.g, unused inline functions). Since
those functions are skipped, normal instr-prof function lowering
pass won't put those names in the right section, so special
handling is needed to walk through coverage mapping structure
and recollect the references.
With this patch, only names that are skipped are processed. This
simplifies the lowering code and it no longer needs to make
assumptions coverage mapping data layout. It should also be
more efficient.
llvm-svn: 257091
The fix for PR23999 made us mark loads of null as producing the constant
undef which upsets the lattice. Instead, keep the load as "undefined".
This fixes PR26044.
llvm-svn: 257087
Previously we only supported putting the FI into memory operand offset
fields if there was nothing there already. Now combine them.
Differential Revision: http://reviews.llvm.org/D15941
llvm-svn: 257084
The MC assembler doesn't like using the empty string as a private label
prefix because then it treats all labels as private. This commit reverts
back to the default prefix, which is .L, which is common in ELF targets
and consistent with the LLVM name mangler.
llvm-svn: 257083
Summary:
Multi-dword constant loads generated unnecessary moves from SGPRs into VGPRs,
increasing the code size and VGPR pressure. These moves are now folded away.
Note that this lack of operand folding was not a problem for VMEM loads,
because COPY nodes from VReg_Nnn to VGPR32 are eliminated by the register
coalescer.
Some tests are updated, note that the fsub.ll test explicitly checks that
the move is elided.
With the IR generated by current Mesa, the changes are obviously relatively
minor:
7063 shaders in 3531 tests
Totals:
SGPRS: 351872 -> 352560 (0.20 %)
VGPRS: 199984 -> 200732 (0.37 %)
Code Size: 9876968 -> 9881112 (0.04 %) bytes
LDS: 91 -> 91 (0.00 %) blocks
Scratch: 1779712 -> 1767424 (-0.69 %) bytes per wave
Wait states: 295164 -> 295337 (0.06 %)
Totals from affected shaders:
SGPRS: 65784 -> 66472 (1.05 %)
VGPRS: 38064 -> 38812 (1.97 %)
Code Size: 1993828 -> 1997972 (0.21 %) bytes
LDS: 42 -> 42 (0.00 %) blocks
Scratch: 795648 -> 783360 (-1.54 %) bytes per wave
Wait states: 54026 -> 54199 (0.32 %)
Reviewers: tstellarAMD, arsenm, mareko
Subscribers: arsenm, llvm-commits
Differential Revision: http://reviews.llvm.org/D15875
llvm-svn: 257074
Summary:
Somehow, I first interpreted the docs as saying space for xnack_mask is only
reserved when XNACK is enabled via SH_MEM_CONFIG. I felt uneasy about this and
went back to actually test what is happening, and it turns out that xnack_mask
is always reserved at least on Tonga and Carrizo, in the sense that flat_scr
is always fixed below the SGPRs that are used to implement xnack_mask, whether
or not they are actually used.
I confirmed this by writing a shader using inline assembly to tease out the
aliasing between flat_scratch and regular SGPRs. For example, on Tonga, where
we fix the number of SGPRs to 80, s[74:75] aliases flat_scratch (so
xnack_mask is s[76:77] and vcc is s[78:79]).
This patch changes both the calculation of the total number of SGPRs and the
various register reservations to account for this.
It ought to be possible to use the gap left by xnack_mask when the feature
isn't used, but this patch doesn't try to do that. (Note that the same applies
to vcc.)
Note that previously, even before my earlier change in r256794, the SGPRs that
alias to xnack_mask could end up being used as well when flat_scr was unused
and the total number of SGPRs happened to fall on the right alignment
(e.g. highest regular SGPR being used s29 and VCC used would lead to number
of SGPRs being 32, where s28 and s29 alias with xnack_mask). So if there
were some conflict due to such aliasing, we should have noticed that already.
Reviewers: arsenm, tstellarAMD
Subscribers: arsenm, llvm-commits
Differential Revision: http://reviews.llvm.org/D15898
llvm-svn: 257073
Summary:
When comparing two GEP instructions which have the same base pointer
and one of them has a constant index, it is possible to only compare
indices, transforming it to a compare with a constant. This removes
one use for the GEP instruction with the constant index, can reduce
register pressure and can sometimes lead to removing the comparisson
entirely.
InstCombine was already doing this when comparing two GEPs if the
base pointers were the same. However, in the case where we have
complex pointer arithmetic (GEPs applied to GEPs, PHIs of GEPs,
conversions to or from integers, etc) the value of the original
base pointer will be hidden to the optimizer and this transformation
will be disabled.
This change detects when the two sides of the comparison can be
expressed as GEPs with the same base pointer, even if they don't
appear as such in the IR. The transformation will convert all the
pointer arithmetic to arithmetic done on indices and all the
relevant uses of GEPs to GEPs with a common base pointer. The
GEP comparison will be converted to a comparison done on indices.
Reviewers: majnemer, jmolloy
Subscribers: hfinkel, jevinskie, jmolloy, aadg, llvm-commits
Differential Revision: http://reviews.llvm.org/D15146
llvm-svn: 257064
See PR25822 for a more full summary, but we were conflating the concepts of "capture" and "escape". We were proving nocapture and using that proof to infer noescape, which is not true. Escaped-ness is a function-local property - as soon as a value is used in a call argument it escapes. Capturedness is a related but distinct property. It implies a *temporally limited* escape. Consider:
static int a;
int b;
int g(int * nocapture arg);
int f() {
a = 2; // Even though a escapes to g, it is not captured so can be treated as non-escaping here.
g(&a); // But here it must be treated as escaping.
g(&b); // Now that g(&a) has returned we know it was not captured so we can treat it as non-escaping again.
}
The original commit did not sufficiently understand this nuance and so caused PR25822 and PR26046.
r248576 included both a performance improvement (which has been backed out) and a related conformance fix (which has been kept along with its testcase).
llvm-svn: 257058
AVX1 v8i32/v4i64 shuffles are bitcasted to v8f32/v4f64, this patch peeks through bitcasts to check for a load node to allow broadcasts to occur.
Follow up to D15310
llvm-svn: 257055
Change Triple::get32BitArchVariant to return arm/armeb as the 32bit
variant of aarch64/aarch64_be and do the same change for the oppoiste
direction in Triple::get64BitArchVariant.
Differential revision: http://reviews.llvm.org/D15529
llvm-svn: 257048
Follow up to D15378, added INSERTPS to the list of decodable target shuffles and enabled XFormVExtractWithShuffleIntoLoad to handle target shuffles with SentinelZero and tested this with INSERTPS.
llvm-svn: 257046
Darwin TLS accesses most closely resemble ELF's general-dynamic situation,
since they have to be able to handle all possible situations. The descriptors
and so on are obviously slightly different though.
llvm-svn: 257039
Serialize will perform a hardware serialization operation, and is
acting as a memory barrier. Therefore it must have the hasSideEffects
flag set so it will be treated as a global memory object.
Reviewed by Ulrich Weigand
llvm-svn: 257036