Summary:
I currently have to specify --build=mips-linux-gnu or --build=mipsel-linux-gnu
to configure in order to successfully recurse a 32-bit build of the compiler on
my mips64-linux-gnu and mips64el-linux-gnu targets. This is a bug and will be
fixed but in the meantime it will be useful to have a way to work around this.
Reviewers: tstellarAMD
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D6522
llvm-svn: 223369
r223113 added support for ARM modified immediate assembly syntax. That patch
has broken support for immediate expressions, as in:
add r0, #(4 * 4)
It wasn't caught because we don't have any tests for this feature. This patch
fixes this regression and adds test cases.
llvm-svn: 223366
The current DAG combine turns a sequence of extracts from <4 x i32> followed by zexts into a store followed by scalar loads.
According to measurements by Martin Krastev (see PR 21269) for x86-64, a sequence of an extract, movs and shifts gives better performance. However, for 32-bit x86, the previous sequence still seems better.
Differential Revision: http://reviews.llvm.org/D6501
llvm-svn: 223360
Replaced some logic that checked if a build_vector node is doing a splat of a
non-undef value with a call to method BuildVectorSDNode::getSplatValue().
No functional change intended.
llvm-svn: 223354
--disable-timestamps was added to the configure command way back in r142647 but
the command that echos this command to the log was not updated at the time.
llvm-svn: 223351
According to a previous FIXME comment we now not only look at MBB
successors, but also handle code sinking past them:
x = computation
if () {} else {}
use x
The instruction could be sunk over the whole diamond for the
if/then/else (or loop, etc), allowing it to be sunk into other blocks
after that.
Modified test added in r204522, due to one spill less present.
Minor fixes in comments.
Patch provided by Jonas Paulsson. Reviewed by Hal Finkel.
llvm-svn: 223350
Added instcombine optimizations for BSWAP with AND/OR/XOR ops:
OP( BSWAP(x), BSWAP(y) ) -> BSWAP( OP(x, y) )
OP( BSWAP(x), CONSTANT ) -> BSWAP( OP(x, BSWAP(CONSTANT) ) )
Since its just a one liner, I've also added BSWAP to the DAGCombiner equivalent as well:
fold (OP (bswap x), (bswap y)) -> (bswap (OP x, y))
Refactored bswap-fold tests to use FileCheck instead of just checking that the bswaps had gone.
Differential Revision: http://reviews.llvm.org/D6407
llvm-svn: 223349
I'm recommiting the codegen part of the patch.
The vectorizer part will be send to review again.
Masked Vector Load and Store Intrinsics.
Introduced new target-independent intrinsics in order to support masked vector loads and stores. The loop vectorizer optimizes loops containing conditional memory accesses by generating these intrinsics for existing targets AVX2 and AVX-512. The vectorizer asks the target about availability of masked vector loads and stores.
Added SDNodes for masked operations and lowering patterns for X86 code generator.
Examples:
<16 x i32> @llvm.masked.load.v16i32(i8* %addr, <16 x i32> %passthru, i32 4 /* align */, <16 x i1> %mask)
declare void @llvm.masked.store.v8f64(i8* %addr, <8 x double> %value, i32 4, <8 x i1> %mask)
Scalarizer for other targets (not AVX2/AVX-512) will be done in a separate patch.
http://reviews.llvm.org/D6191
llvm-svn: 223348
Based on review comments from Richard Smith, restrict this optimization from
applying to globals that might resolve lazily to other dynamically-loaded
modules, and also from dynamic allocas (which might be transformed into malloc
calls). In short, take extra care that the compared-to pointer is really
simultaneously live with the memory allocation.
llvm-svn: 223347
Summary: Add rpath load command support in Mach-O object and update llvm-objdump to use it.
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D6512
llvm-svn: 223343
Commit on
- This patch fixes the bug described in
http://lists.cs.uiuc.edu/pipermail/llvmdev/2013-May/062343.html
The fix allocates an extra slot just below the GPRs and stores the base pointer
there. This is done only for functions containing llvm.eh.sjlj.setjmp that also
need a base pointer. Because code containing llvm.eh.sjlj.setjmp saves all of
the callee-save GPRs in the prologue, the offset to the extra slot can be
computed before prologue generation runs.
Impact at run-time on affected functions is::
- One extra store in the prologue, The store saves the base pointer.
- One extra load after a llvm.eh.sjlj.setjmp. The load restores the base pointer.
Because the extra slot is just above a gap between frame-pointer-relative and
base-pointer-relative chunks of memory, there is no impact on other offset
calculations other than ensuring there is room for the extra slot.
http://reviews.llvm.org/D6388
Patch by Arch Robison <arch.robison@intel.com>
llvm-svn: 223329
We had mistakenly believed that GCC's 'cc' referred to the entire
condition-code register (cr0 through cr7) -- and implemented this in r205630 to
fix PR19326, but 'cc' is actually an alias only to 'cr0'. This is causing LLVM
to clobber too much with legacy code with inline asm using the 'cc' clobber.
Fixes PR21451.
llvm-svn: 223328
Use the MCAsmInfo instead of the DataLayout, and allow
specifying a custom prefix for labels specifically. HSAIL
requires that labels begin with @, but global symbols with &.
llvm-svn: 223323
This is simply a grab bag of unrelated checks:
- A statepoint call can't be marked readonly or readnone
- We don't currently support inline asm or varadic target functions. Both could be supported, but don't currently work.
- I forgot to check that the number of call arguments actually matched the wrapped callee in my previous change. Included here.
llvm-svn: 223322
On PowerPC, inline asm memory operands might be expanded as 0($r), where $r is
a register containing the address. As a result, this register cannot be r0, and
we need to enforce this register subclass constraint to prevent miscompiling
the code (we'd get this constraint for free with the usual instruction
definitions, but that scheme has no knowledge of how we end up printing inline
asm memory operands, and so here we need to do it 'by hand'). We can accomplish
this within the current address-mode selection framework by introducing an
explicit COPY_TO_REGCLASS node.
Fixes PR21443.
llvm-svn: 223318
Prior to this commit, physical registers defined implicitly were considered free
right after their definition, i.e.. like dead definitions. Therefore, their uses
had to immediately follow their definitions, otherwise the related register may
be reused to allocate a virtual register.
This commit fixes this assumption by keeping implicit definitions alive until
they are actually used. The downside is that if the implicit definition was dead
(and not marked at such), we block an otherwise available register. This is
however conservatively correct and makes the fast register allocator much more
robust in particular regarding the scheduling of the instructions.
Fixes PR21700.
llvm-svn: 223317
The non-opaque part can be structurally uniqued. To keep this to just
a hash lookup, we don't try to unique cyclic types.
Also change the type mapping algorithm to be optimistic about a type
not being recursive and only create a new type when proven to be wrong.
This is not as strong as trying to speculate that we can keep the source
type, but is simpler (no speculation to revert) and more powerfull
than what we had before (we don't copy non-recursive types at least).
I initially wrote this to try to replace the name based type merging.
It is not strong enough to replace it, but is is a useful addition.
With this patch the number of named struct types is a clang lto bootstrap goes
from 49674 to 15986.
llvm-svn: 223278
Add checks that the types in a gc.statepoint sequence match the wrapper callee and that relocating a pointer doesn't change it's type.
llvm-svn: 223275
This allows cases like float x; fmin(1.0, x); to be optimized to fminf(1.0f, x);
rdar://19049359
Differential Revision: http://reviews.llvm.org/D6496
llvm-svn: 223270
This complicates a few algorithms due to not having random access, but
not by a huge degree I don't think (open to debate/design
discussion/etc).
llvm-svn: 223261
The recently added documentation for statepoints claimed that we checked the parameters of the various intrinsics for validity. This patch adds the code to actually do so. I also removed a couple of redundant checks for conditions which are checked elsewhere in the Verifier and simplified the logic using the helper functions from Statepoint.h.
llvm-svn: 223259
This change makes MemorySanitizer instrumentation a bit more strict
about instructions that have no origin id assigned to them.
This would have caught the bug that was fixed in r222918.
This is re-commit of r222997, reverted in r223211, with 3 more
missing origins added.
llvm-svn: 223236
Try to convert two compares of a signed range check into a single unsigned compare.
Examples:
(icmp sge x, 0) & (icmp slt x, n) --> icmp ult x, n
(icmp slt x, 0) | (icmp sgt x, n) --> icmp ugt x, n
llvm-svn: 223224
Almost all immediates in PowerPC assembly (both 32-bit and 64-bit) are signed
numbers, and it is important that we print them as such. To make sure that
happens, we change PPCTargetLowering::LowerAsmOperandForConstraint so that it
does all intermediate checks on a signed-extended int64_t value, and then
creates the resulting target constant using MVT::i64. This will ensure that all
negative values are printed as negative values (mirroring what is done in other
backends to achieve the same sign-extension effect).
This came up in the context of inline assembly like this:
"add%I2 %0,%0,%2", ..., "Ir"(-1ll)
where we used to print:
addi 3,3,4294967295
and gcc would print:
addi 3,3,-1
and gas accepts both forms, but our builtin assembler (correctly) does not. Now
we print -1 like gcc does.
While here, I replaced a bunch of custom integer checks with isInt<16> and
friends from MathExtras.h.
Thanks to Paul Hargrove for the bug report.
llvm-svn: 223220
LLVM understands a -enable-sign-dependent-rounding-fp-math codegen option. When
the user has specified this option, the Tag_ABI_FP_rounding attribute should be
emitted with value 1. This option currently does not appear to disable
transformations and optimizations that assume default floating point rounding
behavior, AFAICT, but the intention should be recorded in the build attributes,
regardless of what the compiler actually does with the intention.
Change-Id: If838578df3dc652b6f2796b8d152545674bcb30e
llvm-svn: 223218
When lazy reading a module, the types used in a function will not be visible to
a TypeFinder until the body is read.
This patch fixes that by asking the module for its identified struct types.
If a materializer is present, the module asks it. If not, it uses a TypeFinder.
This fixes pr21374.
I will be the first to say that this is ugly, but it was the best I could find.
Some of the options I looked at:
* Asking the LLVMContext. This could be made to work for gold, but not currently
for ld64. ld64 will load multiple modules into a single context before merging
them. This causes us to see types from future merges. Unfortunately,
MappedTypes is not just a cache when it comes to opaque types. Once the
mapping has been made, we have to remember it for as long as the key may
be used. This would mean moving MappedTypes to the Linker class and having
to drop the Linker::LinkModules static methods, which are visible from C.
* Adding an option to ignore function bodies in the TypeFinder. This would
fix the PR by picking the worst result. It would work, but unfortunately
we are currently quite dependent on the upfront type merging. I will
try to reduce our dependency, but it is not clear that we will be able
to get rid of it for now.
The only clean solution I could think of is making the Module own the types.
This would have other advantages, but it is a much bigger change. I will
propose it, but it is nice to have this fixed while that is discussed.
With the gold plugin, this patch takes the number of types in the LTO clang
binary from 52817 to 49669.
llvm-svn: 223215
Remove an unnecessary `MDNode::replaceAllUsesWith()`. In the preceding
line, `TheLoop->setLoopID()` visits all backedges and sets the new loop
ID. This sufficiently updates the loop metadata.
Metadata RAUW is going away as part of PR21532.
llvm-svn: 223210
Select i1 logical ops directly to 64-bit SALU instructions.
Vector i1 values are always really in SGPRs, with each
bit for each item in the wave. This saves about 4 instructions
when and/or/xoring any condition, and also helps write conditions
that need to be passed in vcc.
This should work correctly now that the SGPR live range
fixing pass works. More work is needed to eliminate the VReg_1
pseudo regclass and possibly the entire SILowerI1Copies pass.
llvm-svn: 223206
The loop is over the operands of an instruction, and checks the
register with the sub reg index of the dest register. This probably
meant to be checking the sub reg index of the same operand.
llvm-svn: 223205
m0 is treated as a virtual register class with a single register
rather than the physical register it really is. This was updating
the live range of the used virtual copy of m0 from the first ds_read
instruction, and leaving the unused copy unchanged. This resulted in a
"Live segment doesn't end at a valid instruction" verifier error because
the erased instructions. Update the live range of the second copy (which
should be dead).
No test since I'm not sure how to trigger this with SIFoldOperands
enabled.
llvm-svn: 223203
We were assuming that each back-edge in a region represented a unique
loop, which is not always the case. We need to use LoopInfo to
correctly determine which back-edges are loops.
llvm-svn: 223199
We just needed to remove the assertion in
AMDGPURegisterInfo::getFrameRegister(), which is called when
initializing the parser for inline assembly.
llvm-svn: 223197
Patch by Ben Gamari!
This redefines the `prefix` attribute introduced previously and
introduces a `prologue` attribute. There are a two primary usecases
that these attributes aim to serve,
1. Function prologue sigils
2. Function hot-patching: Enable the user to insert `nop` operations
at the beginning of the function which can later be safely replaced
with a call to some instrumentation facility
3. Runtime metadata: Allow a compiler to insert data for use by the
runtime during execution. GHC is one example of a compiler that
needs this functionality for its tables-next-to-code functionality.
Previously `prefix` served cases (1) and (2) quite well by allowing the user
to introduce arbitrary data at the entrypoint but before the function
body. Case (3), however, was poorly handled by this approach as it
required that prefix data was valid executable code.
Here we redefine the notion of prefix data to instead be data which
occurs immediately before the function entrypoint (i.e. the symbol
address). Since prefix data now occurs before the function entrypoint,
there is no need for the data to be valid code.
The previous notion of prefix data now goes under the name "prologue
data" to emphasize its duality with the function epilogue.
The intention here is to handle cases (1) and (2) with prologue data and
case (3) with prefix data.
References
----------
This idea arose out of discussions[1] with Reid Kleckner in response to a
proposal to introduce the notion of symbol offsets to enable handling of
case (3).
[1] http://lists.cs.uiuc.edu/pipermail/llvmdev/2014-May/073235.html
Test Plan: testsuite
Differential Revision: http://reviews.llvm.org/D6454
llvm-svn: 223189
The X86AsmParser intel handling was refactored in r216481, making it
try each different memory operand size to see which one matches.
Operand sizes larger than 80 ("[xyz]mmword ptr") were forgotten, which
led to an "invalid operand" error for code such as:
movdqa [rax], xmm0
llvm-svn: 223187
We need to use the custom expansion of readcyclecounter on all 32-bit targets
(even those with 64-bit registers). This should fix the ppc64 buildbot.
llvm-svn: 223182
A global variable without an explicit alignment specified should be assumed to
be ABI-aligned according to its type, like on other platforms. This allows us
to use better memory operations when accessing it.
rdar://18533701
llvm-svn: 223180
This frequently leads to cases like:
ldr xD, [xN, :lo12:var]
add xA, xN, :lo12:var
ldr xD, [xA, #8]
where the ADD would have been needed anyway, and the two distinct addressing
modes can prevent the formation of an ldp. Because of how we handle ADRP
(aggressively forming an ADRP/ADD pseudo-inst at ISel time), this pattern also
results in duplicated ADRP instructions (one on its own to cover the ldr, and
one combined with the add).
llvm-svn: 223172
Such loops shouldn't be vectorized due to the loops form.
After applying loop-rotate (+simplifycfg) the tests again start to check
what they are intended to check.
llvm-svn: 223170
4i32 shuffles for single insertions into zero vectors lowers to X86vzmovl which was using (v)blendps - causing domain switch stalls. This patch fixes this by using (v)pblendw instead.
The updated tests on test/CodeGen/X86/sse41.ll still contain a domain stall due to the use of insertps - I'm looking at fixing this in a future patch.
Differential Revision: http://reviews.llvm.org/D6458
llvm-svn: 223165
--xunit-xml-output saves test results to disk in JUnit's xml format. This will allow Jenkins to report the details of a lit run.
Based on a patch by David Chisnall.
llvm-svn: 223163
We've long supported readcyclecounter on PPC64, but it is easier there (the
read of the 64-bit time-base register can be accomplished via a single
instruction). This now provides an implementation for PPC32 as well. On PPC32,
the time-base register is still 64 bits, but can only be read 32 bits at a time
via two separate SPRs. The ISA manual explains how to do this properly (it
involves re-reading the upper bits and looping if the counter has wrapped while
being read).
This requires PPC to implement a custom integer splitting legalization for the
READCYCLECOUNTER node, turning it into a target-specific SDAG node, which then
gets turned into a pseudo-instruction, which is then expanded to the necessary
sequence (which has three SPR reads, the comparison and the branch).
Thanks to Paul Hargrove for pointing out to me that this was still unimplemented.
llvm-svn: 223161
Reduce the number of nops emitted for stackmap shadows on AArch64 by counting
non-stackmap instructions up to the next branch target towards the requested
shadow.
<rdar://problem/14959522>
llvm-svn: 223156
Summary:
Like N32/N64, they must be passed in the upper bits of the register.
The new code could be merged with the existing if-statements but I've
refrained from doing this since it will make porting the O32 implementation
to tablegen harder later.
Reviewers: vmedic
Reviewed By: vmedic
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D6463
llvm-svn: 223148
Previously .cpu directive in ARM assembler didnt switch to the new CPU and
therefore acted as a nop. This implemented real action for .cpu and eg.
allows to assembler FreeBSD kernel with -integrated-as.
llvm-svn: 223147
This is the fourth and final patch in the statepoint series. It contains the documentation for the statepoint intrinsics and their usage.
There's definitely still room to improve the documentation here, but I wanted to get this landed so it was available for others. There will likely be a series of small cleanup changes over the next few weeks as we work to clarify and revise the documentation. If you have comments or questions, please feel free to discuss them either in this commit thread, the original review thread, or on llvmdev. Comments are more than welcome.
Reviewed by: atrick, ributzka
Differential Revision: http://reviews.llvm.org/D5683
llvm-svn: 223143
Presumably it was added to the CMake system when MAXPATHLEN was still
used by code built for Windows. Currently only lib/Support/Path.inc uses
MAXPATHLEN, and it should be available on all Unices.
llvm-svn: 223139
This is the third patch in a small series. It contains the CodeGen support for lowering the gc.statepoint intrinsic sequences (223078) to the STATEPOINT pseudo machine instruction (223085). The change also includes the set of helper routines and classes for working with gc.statepoints, gc.relocates, and gc.results since the lowering code uses them.
With this change, gc.statepoints should be functionally complete. The documentation will follow in the fourth change, and there will likely be some cleanup changes, but interested parties can start experimenting now.
I'm not particularly happy with the amount of code or complexity involved with the lowering step, but at least it's fairly well isolated. The statepoint lowering code is split into it's own files and anyone not working on the statepoint support itself should be able to ignore it.
During the lowering process, we currently spill aggressively to stack. This is not entirely ideal (and we have plans to do better), but it's functional, relatively straight forward, and matches closely the implementations of the patchpoint intrinsics. Most of the complexity comes from trying to keep relocated copies of values in the same stack slots across statepoints. Doing so avoids the insertion of pointless load and store instructions to reshuffle the stack. The current implementation isn't as effective as I'd like, but it is functional and 'good enough' for many common use cases.
In the long term, I'd like to figure out how to integrate the statepoint lowering with the register allocator. In principal, we shouldn't need to eagerly spill at all. The register allocator should do any spilling required and the statepoint should simply record that fact. Depending on how challenging that turns out to be, we may invest in a smarter global stack slot assignment mechanism as a stop gap measure.
Reviewed by: atrick, ributzka
llvm-svn: 223137
Follow up from r222926. Also handle multiple destinations from merged
cases on multiple and subsequent phi instructions.
rdar://problem/19106978
llvm-svn: 223135
Go through implicit defs of CSMI and MI, and clear the kill flags on
their uses in all the instructions between CSMI and MI.
We might have made some of the kill flags redundant, consider:
subs ... %NZCV<imp-def> <- CSMI
csinc ... %NZCV<imp-use,kill> <- this kill flag isn't valid anymore
subs ... %NZCV<imp-def> <- MI, to be eliminated
csinc ... %NZCV<imp-use,kill>
Since we eliminated MI, and reused a register imp-def'd by CSMI
(here %NZCV), that register, if it was killed before MI, should have
that kill flag removed, because it's lifetime was extended.
Also, add an exhaustive testcase for the motivating example.
Reviewed by: Juergen Ributzka <juergen@apple.com>
llvm-svn: 223133
The blocking code originated in ARM, which is more aggressive about casting
types to a canonical representative before doing anything else, so I missed out
most vector HFAs and broke the ABI. This should fix it.
llvm-svn: 223126
This operating system type represents the AMD HSA runtime,
and will be required by the R600 backend in order to generate
correct code for this runtime.
llvm-svn: 223124
Load instructions are inserted into loop preheaders when sinking stores
and later removed if not used by the SSA updater. Avoid sinking if the
loop has no preheader and avoid crashes. This fixes one more side effect
of not handling indirectbr instructions properly on LoopSimplify.
llvm-svn: 223119
Removing an unused function which is causing one of the build bots to fail.
This was introduced in the commit r223113. A proper cleanup of the so_imm
tblgen defintion (made redundant by the mod_imm definition) needs to happen
soon.
llvm-svn: 223115
Certain ARM instructions accept 32-bit immediate operands encoded as a 8-bit
integer value (0-255) and a 4-bit rotation (0-30, even). Current ARM assembly
syntax support in LLVM allows the decoded (32-bit) immediate to be specified
as a single immediate operand for such instructions:
mov r0, #4278190080
The ARMARM defines an extended assembly syntax allowing the encoding to be made
more explicit, as in:
mov r0, #255, #8 ; (same 32-bit value as above)
The behaviour of the two instructions can be different w.r.t flags, which is
documented under "Modified immediate constants" in ARMARM. This patch enables
support for this extended syntax at the MC layer.
llvm-svn: 223113
The default ARM floating-point mode does not support IEEE 754 mode exactly. Of
relevance to this patch is that input denormals are flushed to zero. The way in
which they're flushed to zero depends on the architecture,
* For VFPv2, it is implementation defined as to whether the sign of zero is
preserved.
* For VFPv3 and above, the sign of zero is always preserved when a denormal
is flushed to zero.
When FP support has been disabled, the strategy taken by this patch is to
assume the software support will mirror the behaviour of the hardware support
for the target *if it existed*. That is, for architectures which can only have
VFPv2, it is assumed the software will flush to positive zero. For later
architectures it is assumed the software will flush to zero preserving sign.
Change-Id: Icc5928633ba222a4ba3ca8c0df44a440445865fd
llvm-svn: 223110
In both the Unix and Windows variants, std::getenv was called and the
result passed directly to a function accepting a StringRef. This isn't
OK because it might return a null pointer and that causes the StringRef
constructor to assert (and generally produces crash-prone code if
asserts are disabled). Fix this by independently testing the result as
non-null prior to splitting things.
This in turn uncovered another bug in the Unix variant where it would
infinitely recurse if PATH="", or after this fix if PATH isn't set.
There is no need to recurse at all. Slightly re-arrange the code to make
it clear that we can just fixup the Paths argument based on the
environment if we find anything.
I don't know of a particularly useful way to test these routines in
LLVM. I'll commit a test to Clang that ensures that its driver correctly
handles various settings of PATH. However, I have no idea how to
correctly write a Windows test for the PATHEXT change. Any Windows
developers who could provide such a test, please have at. =D
Many thanks to Nick Lewycky and others for helping debug this. =/ It was
quite nasty for us to track down.
llvm-svn: 223099
System memory allocation functions, which are identified at the IR level by the
noalias attribute on the return value, must return a pointer into a memory region
disjoint from any other memory accessible to the caller. We can use this
property to simplify pointer comparisons between allocated memory and local
stack addresses and the addresses of global variables. Neither the stack nor
global variables can overlap with the region used by the memory allocator.
Fixes PR21556.
llvm-svn: 223093
This is the second patch in a small series. This patch contains the MachineInstruction and x86-64 backend pieces required to lower Statepoints. It does not include the code to actually generate the STATEPOINT machine instruction and as a result, the entire patch is currently dead code. I will be submitting the SelectionDAG parts within the next 24-48 hours. Since those pieces are by far the most complicated, I wanted to minimize the size of that patch. That patch will include the tests which exercise the functionality in this patch. The entire series can be seen as one combined whole in http://reviews.llvm.org/D5683.
The STATEPOINT psuedo node is generated after all gc values are explicitly spilled to stack slots. The purpose of this node is to wrap an actual call instruction while recording the spill locations of the meta arguments used for garbage collection and other purposes. The STATEPOINT is modeled as modifing all of those locations to prevent backend optimizations from forwarding the value from before the STATEPOINT to after the STATEPOINT. (Doing so would break relocation semantics for collectors which wish to relocate roots.)
The implementation of STATEPOINT is closely modeled on PATCHPOINT. Eventually, much of the code in this patch will be removed. The long term plan is to merge the functionality provided by statepoints and patchpoints. Merging their implementations in the backend is likely to be a good starting point.
Reviewed by: atrick, ributzka
llvm-svn: 223085
The statepoint intrinsics are intended to enable precise root tracking through the compiler as to support garbage collectors of all types. The addition of the statepoint intrinsics to LLVM should have no impact on the compilation of any program which does not contain them. There are no side tables created, no extra metadata, and no inhibited optimizations.
A statepoint works by transforming a call site (or safepoint poll site) into an explicit relocation operation. It is the frontend's responsibility (or eventually the safepoint insertion pass we've developed, but that's not part of this patch series) to ensure that any live pointer to a GC object is correctly added to the statepoint and explicitly relocated. The relocated value is just a normal SSA value (as seen by the optimizer), so merges of relocated and unrelocated values are just normal phis. The explicit relocation operation, the fact the statepoint is assumed to clobber all memory, and the optimizers standard semantics ensure that the relocations flow through IR optimizations correctly.
This is the first patch in a small series. This patch contains only the IR parts; the documentation and backend support will be following separately. The entire series can be seen as one combined whole in http://reviews.llvm.org/D5683.
Reviewed by: atrick, ributzka
llvm-svn: 223078
Summary:
".weak" symbols cannot be consumed by ptxas (PR21685). This patch makes the
weak directive in MCAsmPrinter customizable, and disables emitting ".weak"
symbols for NVPTX.
Test Plan: weak-linkage.ll
Reviewers: jholewinski
Reviewed By: jholewinski
Subscribers: majnemer, jholewinski, llvm-commits
Differential Revision: http://reviews.llvm.org/D6455
llvm-svn: 223077
r208210 introduced an optimization that improves the vector select
codegen by doing the setcc on vectors directly.
This is a problem they the setcc operands are i1s, because the
optimization would create vectors of i1, which aren't legal.
Part of PR21549.
Differential Revision: http://reviews.llvm.org/D6308
llvm-svn: 223075
r213378 improved f16 bitcasts, so that they go directly through subregs,
instead of through the stack. That code now causes an assertion failure
for bitcasts from other 16-bits types (most importantly v2i8).
Correct that by doing the custom lowering for i16 bitcasts only when the
input is an f16.
Part of PR21549.
Differential Revision: http://reviews.llvm.org/D6307
llvm-svn: 223074
The MachineVerifier used to check that there was always exactly one
unconditional branch to a non-landingpad (normal) successor.
If that normal successor to an invoke BB is unreachable, it seems
reasonable to only have one successor, the landing pad.
On targets other than AArch64 (and on AArch64 with a different testcase),
the branch folder turns the branch to the landing pad into a fallthrough.
The MachineVerifier, which relies on AnalyzeBranch, is unable to check
the condition, and doesn't complain. However, it does in this specific
testcase, where the branch to the landing pad remained.
Make the MachineVerifier accept it.
llvm-svn: 223059
An unreachable default destination can be exploited by other optimizations, and
SDag lowering is now prepared to handle them efficiently.
For example, branches to the unreachable destination will be optimized away,
such as in the case of range checks for switch lookup tables.
On 64-bit Linux, this reduces the size of a clang bootstrap by 80 kB (and
Chromium by 30 kB).
llvm-svn: 223050
This can significantly reduce the size of the switch, allowing for more
efficient lowering.
I also worked with the idea of exploiting unreachable defaults by
omitting the range check for jump tables, but always ended up with a
non-neglible binary size increase. It might be worth looking into some more.
llvm-svn: 223049
The explicit set of destination types is not fully redundant when lazy loading
since the TypeFinder will not find types used only in function bodies.
This keeps the logic to drop the name of mapped types since it still helps
with avoiding further renaming.
llvm-svn: 223043
- Fix missing SALU format bits
- Remove unused isSALUInstr
- Add isVALU
- Switch isDS to use a bit like the others
- Move SIInstrInfo::is* functions to header
- Reorder so they are approximately sorted by type (SALU, VALU, memory)
llvm-svn: 223038
This change makes MemorySanitizer instrumentation a bit more strict
about instructions that have no origin id assigned to them.
This would have caught the bug that was fixed in r222918.
No functional change.
llvm-svn: 222997
Summary:
PowerPC DWARF unwind info defined CFA as SP + offset even in a function
where the stack had been dynamically realigned. This clearly doesn't
work because the offset from SP to CFA is not a constant. Fix it by
defining CFA as BP instead.
This was causing the AddressSanitizer null_deref test to fail 50% of
the time, depending on whether SP happened to be 32-byte aligned on
entry to a particular function or not.
Reviewers: willschm, uweigand, hfinkel
Reviewed By: hfinkel
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D6410
llvm-svn: 222996
Add checkDecodedInstruction for post-decode checking of instructions, to catch
the corner cases like HVC that don't fit into the general pattern. Needed to
check for an invalid condition field in instruction encoding despite HVC not
taking a predicate.
Patch by Matthew Wahab.
Change-Id: I48e28de981d7a9e43569594da3c45fb478b4f795
llvm-svn: 222992
This commit fixes a bug in stack protector pass where edge weights were not set
when new basic blocks were added to lists of successor basic blocks.
Differential Revision: http://reviews.llvm.org/D5766
llvm-svn: 222987
Instead of keeping an explicit set, just drop the names of types we choose
to map to some other type.
This has the advantage that the name of the unused will not cause the context
to rename types on module read.
llvm-svn: 222986
Add assembler support for the fixed-point cache-inhibited load/store
instructions. These are hypervisor-level only, so don't get too excited ;)
Fixes PR21650.
llvm-svn: 222976
Upon further review I think the MultiClass is being copied into the map instead of being moved due to the copy constructor on the nested Record type. This ultimately got exposed when the vector in DefPrototype vector was changed to hold unique_ptrs in another commit. This caused gcc 4.7 to fail due to the use of the copy constructor on unique_ptr with the error pointing back to one of the insert calls from this commit. Not sure why clang was able to build.
This reverts commit 710cdf729f84b428bf41aa8d32dbdb35fff79fde.
llvm-svn: 222971
The previous patch had effect, but missed this one. It seems MSVC
gets ADL-confused by the calls where the first argument is a function call?
llvm-svn: 222968
It was failing with this kind of error:
C:\b\build\slave\CrWinClang\build\src\third_party\llvm\lib\TableGen\TGParser.cpp(1243) : error C2668: 'llvm::make_unique' : ambiguous call to overloaded function
C:\b\build\slave\CrWinClang\build\src\third_party\llvm\include\llvm/ADT/STLExtras.h(408): could be 'std::unique_ptr<llvm::Record,std::default_delete<_Ty>> llvm::make_unique<llvm::Record,std::string,llvm::SMLoc&,llvm::RecordKeeper&,bool>(std::string &&,llvm::SMLoc &,llvm::RecordKeeper &,bool &&)'
with
[
_Ty=llvm::Record
]
C:\b\depot_tools\win_toolchain\vs2013_files\win8sdk\bin\..\..\VC\include\memory(1637): or 'std::unique_ptr<llvm::Record,std::default_delete<_Ty>> std::make_unique<llvm::Record,std::string,llvm::SMLoc&,llvm::RecordKeeper&,bool>(std::string &&,llvm::SMLoc &,llvm::RecordKeeper &,bool &&)' [found using argument-dependent lookup]
with
[
_Ty=llvm::Record
]
while trying to match the argument list '(std::string, llvm::SMLoc, llvm::RecordKeeper, bool)'
llvm-svn: 222967
Order matters for this container, it seems (using a forward_list and
replacing the original push_backs with emplace_fronts caused test
failures). I didn't look too deeply into why.
(& in retrospect, I might go back & change some of the forward_lists I
introduced to deques anyway - since most don't require removal, deque is
a more memory-friendly data structure (moderate locality while not
invalidating pointers))
llvm-svn: 222950
Seems libstdc++ on some buildbots is lacking std::map::emplace, which is
weird... reverting while I look into it.
This reverts commit r222937.
llvm-svn: 222939
Pointers and references to map elements are never invalidated (except on
removal, which isn't used here) so there's no need for the indirection
unless there's polymorphism at work.
A little const correctness had to be fixed, since the indirection
allowed some benign const violations.
llvm-svn: 222937
This reverts commit r222632 (and follow-up r222636), which caused a host
of LNT failures on an internal bot. I'll respond to the commit on the
list with a reproduction of one of the failures.
Conflicts:
lib/Target/X86/X86TargetTransformInfo.cpp
llvm-svn: 222936
Since the elements were not polymorphic, the unique_ptr was only used to
avoid pointer invalidation on container resizes - might as well skip the
indirection and use a container with suitable invalidation semantics.
llvm-svn: 222931
We may be in a situation where the icmps might not be near each other in
a tree of or instructions. Try to dig out related compare instructions
and see if they combine.
N.B. This won't fire on deep trees of compares because rewritting the
tree might end up creating a net increase of IR. We may have to resort
to something more sophisticated if this is a real problem.
llvm-svn: 222928
Loop simplify skips exit-block insertion when exits contain indirectbr
instructions. This leads to an assertion in LICM when trying to sink
stores out of non-dedicated loop exits containing indirectbr
instructions. This patch fix this issue by re-checking for dedicated
exits in LICM prior to store sink attempts.
Differential Revision: http://reviews.llvm.org/D6414
rdar://problem/18943047
llvm-svn: 222927
Switch cases statements with sequential values that branch to the same
destination BB may often be handled together in a single new source BB.
In this scenario we need to remove remaining incoming values from PHI
instructions in the destination BB, as to match the number of source
branches.
Differential Revision: http://reviews.llvm.org/D6415
rdar://problem/19040894
llvm-svn: 222926
Allow unaligned 16-byte memop codegen for btver2. No functional changes for any other subtargets.
Replace the existing supposed small memcpy test with an actual test of a small memcpy.
The previous test wasn't using FileCheck either.
This patch should allow us to close PR21541 ( http://llvm.org/bugs/show_bug.cgi?id=21541 ).
Differential Revision: http://reviews.llvm.org/D6360
llvm-svn: 222925
The original patch would fail when:
* A dst opaque type (%A) is matched with a src type (%A).
* A src opaque (%E) type is then speculatively matched with %A and the
speculation fails afterward.
* When rolling back the speculation we would cancel the source %A to dest
%A mapping.
The fix is to keep an explicit list of which resolutions are speculative.
Original message:
Fix overly aggressive type merging.
If we find out that two types are *not* isomorphic, we learn nothing about
opaque sub types in both the source and destination.
llvm-svn: 222923
MSan does not assign origin for instrumentation temps (i.e. the ones that do
not come from the application code), but "select" instrumentation erroneously
tried to use one of those.
https://code.google.com/p/memory-sanitizer/issues/detail?id=78
llvm-svn: 222918
Add more tests to make sure the encoding/decoding of build attributes works
correctly for all permissible values of build attributes. For cases where there
are an infinite number of such values, a representative subset has been settled
for.
Change-Id: I2643c9624c211b2d56405306e16eec2d487bc5d6
llvm-svn: 222917
The AAPCS treats small structs and homogeneous floating (or vector) aggregates
specially, and guarantees they either get passed as a contiguous block of
registers, or prevent any future use of those registers and get passed on the
stack.
This concept can fit quite neatly into LLVM's own type system, mapping an HFA
to [N x float] and so on, and small structs to [N x i64]. Doing so allows
front-ends to emit AAPCS compliant code without having to duplicate the
register counting logic.
llvm-svn: 222903
I also added a test.
Original message:
Allow FDE references outside the +/-2GB range supported by PC relative
offsets for code models other than small/medium. For JIT application,
memory layout is less controlled and can result in truncations
otherwise.
Patch from Akos Kiss.
Differential Revision: http://reviews.llvm.org/D6079
llvm-svn: 222897
This reverts commit r222760.
It changed our behaviour on PIC so we don't match gas anymore. It also
included lots of unnecessary changes to tests.
If those changes are desirable, there should be an independent discussion
as they are out of scope for that patch.
I will recommit the other bits.
llvm-svn: 222896
This reverts commit r222727, which causes LTO bootstrap failures.
Last passing @ r222698:
http://lab.llvm.org:8080/green/job/clang-Rlto_master_build/532/
First failing @ r222843:
http://lab.llvm.org:8080/green/job/clang-Rlto_master_build/533/
Internal bootstraps pointed at a much narrower range: r222725 is
passing, and r222731 is failing.
LTO crashes while handling libclang.dylib:
http://lab.llvm.org:8080/green/job/clang-Rlto_master_build/533/consoleFull#-158682280549ba4694-19c4-4d7e-bec5-911270d8a58c
GEP is not of right type for indices!
%InfoObj.i.i = getelementptr inbounds %"class.llvm::OnDiskIterableChainedHashTable"* %.lcssa, i64 0, i32 0, i32 4, !dbg !123627
%"class.clang::serialization::reader::ASTIdentifierLookupTrait" = type { %"class.clang::ASTReader.31859"*, %"class.clang::serialization::ModuleFile.31870"*, %"class.clang::IdentifierInfo"* }LLVM ERROR: Broken function found, compilation aborted!
clang: error: linker command failed with exit code 1 (use -v to see invocation)
Looks like the new algorithm doesn't merge types aggressively enough.
llvm-svn: 222895
Fixed missing dominance check.
Original commit message:
This optimization tries to reuse the generated compare instruction, if there is a comparison against the default value after the switch.
Example:
if (idx < tablesize)
r = table[idx]; // table does not contain default_value
else
r = default_value;
if (r != default_value)
...
Is optimized to:
cond = idx < tablesize;
if (cond)
r = table[idx];
else
r = default_value;
if (cond)
...
Jump threading will then eliminate the second if(cond).
llvm-svn: 222891
The string data for string-valued build attributes were being unconditionally
uppercased. There is no mention in the ARM ABI addenda about case conventions,
so it's technically implementation defined as to whether the data are
capitialised in some way or not. However, there are good reasons not to
captialise the data.
* It's less work.
* Some vendors may legitimately have case-sensitive checks for these
attributes which would fail on LLVM generated object files.
* There could be locale issues with uppercasing.
The original reasons for uppercasing appear to have stemmed from an
old codesourcery toolchain behaviour, see
http://comments.gmane.org/gmane.comp.compilers.llvm.cvs/87133
This patch makes the object file emitted no longer captialise string
data, it encodes as seen in the assembly source.
Change-Id: Ibe20dd6e60d2773d57ff72a78470839033aa5538
llvm-svn: 222882
This optimization tries to reuse the generated compare instruction, if there is a comparison against the default value after the switch.
Example:
if (idx < tablesize)
r = table[idx]; // table does not contain default_value
else
r = default_value;
if (r != default_value)
...
Is optimized to:
cond = idx < tablesize;
if (cond)
r = table[idx];
else
r = default_value;
if (cond)
...
\endcode
Jump threading will then eliminate the second if(cond).
llvm-svn: 222872
This restores our ability to optimize:
(X & C) ? X & ~C : X into X & ~C
(X & C) ? X : X & ~C into X
(X & C) ? X | C : X into X
(X & C) ? X : X | C into X | C
llvm-svn: 222868
All symbols have to be stored in the global symbol to enable
cross-rtdyld-instance linking, so the local symbol table content is
redundant.
llvm-svn: 222867
This reverts commit r210006, it miscompiled libapr which is used in who
knows how many projects.
A test has been added to ensure that we don't regress again.
I'll work on a rewrite of what the optimization was trying to do later.
llvm-svn: 222856