Also add an MIBundleBuilder constructor that takes an existing bundle.
Together these functions make it possible to add instructions to
existing bundles.
llvm-svn: 170063
structures to and from YAML using traits. The first client will
be the test suite of lld. The documentation will show up at:
http://llvm.org/docs/YamlIO.html
llvm-svn: 170019
PowerPC target. This is the last of the four models, so we now have
full TLS support.
This is mostly a straightforward extension of the general dynamic model.
I had to use an additional Chain operand to tie ADDIS_DTPREL_HA to the
register copy following ADDI_TLSLD_L; otherwise everything above the
ADDIS_DTPREL_HA appeared dead and was removed.
As before, there are new test cases to test the assembly generation, and
the relocations output during integrated assembly. The expected code
gen sequence can be read in test/CodeGen/PowerPC/tls-ld.ll.
There are a couple of things I think can be done more efficiently in the
overall TLS code, so there will likely be a clean-up patch forthcoming;
but for now I want to be sure the functionality is in place.
Bill
llvm-svn: 170003
been used in the first place. It simply was passed to the function and to the
recursive invocations. Simply drop the parameter and update the callers for the
new signature.
Patch by Saleem Abdulrasool!
llvm-svn: 169988
When ASan replaces <alloca instruction> with
<offset into a common large alloca>, it should also patch
llvm.dbg.declare calls and replace debug info descriptors to mark
that we've replaced alloca with a value that stores an address
of the user variable, not the user variable itself.
See PR11818 for more context.
llvm-svn: 169984
Add R_ARM_NONE and R_ARM_PREL31 relocation types
to MCExpr. Both of them will be used while
generating .ARM.extab and .ARM.exidx sections.
llvm-svn: 169965
mention the inline memcpy / memset expansion code is a mess?
This patch split the ZeroOrLdSrc argument into two: IsMemset and ZeroMemset.
The first indicates whether it is expanding a memset or a memcpy / memmove.
The later is whether the memset is a memset of zero. It's totally possible
(likely even) that targets may want to do different things for memcpy and
memset of zero.
llvm-svn: 169959
Also added more comments to explain why it is generally ok to return true.
- Rename getOptimalMemOpType argument IsZeroVal to ZeroOrLdSrc. It's meant to
be true for loaded source (memcpy) or zero constants (memset). The poor name
choice is probably some kind of legacy issue.
llvm-svn: 169954
fsub X, +0 ==> X
fsub X, -0 ==> X, when we know X is not -0
fsub +/-0.0, (fsub -0.0, X) ==> X
fsub nsz +/-0.0, (fsub +/-0.0, X) ==> X
fsub nnan ninf X, X ==> 0.0
fadd nsz X, 0 ==> X
fadd [nnan ninf] X, (fsub [nnan ninf] 0, X) ==> 0
where nnan and ninf have to occur at least once somewhere in this expression
fmul X, 1.0 ==> X
llvm-svn: 169940
m_ConstantFP - match and bind a float constant
m_SpecificConstantFP - match a specific floating point value or vector of floats of that value
m_FPOne - match a floating point 1.0 or vector of 1.0s
m_NegZero - match -0.0
m_AnyZero - match 0 or -0.0
llvm-svn: 169939
ScalarTargetTransformInfo::getIntImmCost() instead. "Legal" is a poorly defined
term for something like integer immediate materialization. It is always possible
to materialize an integer immediate. Whether to use it for memcpy expansion is
more a "cost" conceern.
llvm-svn: 169929
Given a thread-local symbol x with global-dynamic access, the generated
code to obtain x's address is:
Instruction Relocation Symbol
addis ra,r2,x@got@tlsgd@ha R_PPC64_GOT_TLSGD16_HA x
addi r3,ra,x@got@tlsgd@l R_PPC64_GOT_TLSGD16_L x
bl __tls_get_addr(x@tlsgd) R_PPC64_TLSGD x
R_PPC64_REL24 __tls_get_addr
nop
<use address in r3>
The implementation borrows from the medium code model work for introducing
special forms of ADDIS and ADDI into the DAG representation. This is made
slightly more complicated by having to introduce a call to the external
function __tls_get_addr. Using the full call machinery is overkill and,
more importantly, makes it difficult to add a special relocation. So I've
introduced another opcode GET_TLS_ADDR to represent the function call, and
surrounded it with register copies to set up the parameter and return value.
Most of the code is pretty straightforward. I ran into one peculiarity
when I introduced a new PPC opcode BL8_NOP_ELF_TLSGD, which is just like
BL8_NOP_ELF except that it takes another parameter to represent the symbol
("x" above) that requires a relocation on the call. Something in the
TblGen machinery causes BL8_NOP_ELF and BL8_NOP_ELF_TLSGD to be treated
identically during the emit phase, so this second operand was never
visited to generate relocations. This is the reason for the slightly
messy workaround in PPCMCCodeEmitter.cpp:getDirectBrEncoding().
Two new tests are included to demonstrate correct external assembly and
correct generation of relocations using the integrated assembler.
Comments welcome!
Thanks,
Bill
llvm-svn: 169910
instead of the instruction. I've left a forwarding wrapper for the
instruction so users with the instruction don't need to create
a GEPOperator themselves.
This lets us remove the copy of this code in instsimplify.
I've looked at most of the other copies of similar code, and this is the
only one I've found that is actually exactly the same. The one in
InlineCost is very close, but it requires re-mapping non-constant
indices through the cost analysis value simplification map. I could add
direct support for this to the generic routine, but it seems overly
specific.
llvm-svn: 169853
the GEP instruction class.
This is part of the continued refactoring and cleaning of the
infrastructure used by SROA. This particular operation is also done in
a few other places which I'll try to refactor to share this
implementation.
llvm-svn: 169852
Accordingly, add helper funtions getSimpleValueType (in parallel to
getValueType) in SDValue, SDNode, and TargetLowering.
This is the first, in a series of patches.
llvm-svn: 169837
This shouldn't affect codegen for -O0 compiles as tail call markers are not
emitted in unoptimized compiles. Testing with the external/internal nightly
test suite reveals no change in compile time performance. Testing with -O1,
-O2 and -O3 with fast-isel enabled did not cause any compile-time or
execution-time failures. All tests were performed on my x86 machine.
I'll monitor our arm testers to ensure no regressions occur there.
In an upcoming clang patch I will be marking the objc_autoreleaseReturnValue
and objc_retainAutoreleaseReturnValue as tail calls unconditionally. While
it's theoretically true that this is just an optimization, it's an
optimization that we very much want to happen even at -O0, or else ARC
applications become substantially harder to debug.
Part of rdar://12553082
llvm-svn: 169796
1. Teach it to use overlapping unaligned load / store to copy / set the trailing
bytes. e.g. On 86, use two pairs of movups / movaps for 17 - 31 byte copies.
2. Use f64 for memcpy / memset on targets where i64 is not legal but f64 is. e.g.
x86 and ARM.
3. When memcpy from a constant string, do *not* replace the load with a constant
if it's not possible to materialize an integer immediate with a single
instruction (required a new target hook: TLI.isIntImmLegal()).
4. Use unaligned load / stores more aggressively if target hooks indicates they
are "fast".
5. Update ARM target hooks to use unaligned load / stores. e.g. vld1.8 / vst1.8.
Also increase the threshold to something reasonable (8 for memset, 4 pairs
for memcpy).
This significantly improves Dhrystone, up to 50% on ARM iOS devices.
rdar://12760078
llvm-svn: 169791
InitSections is called before the MCContext is initialized it could cause
duplicate temporary symbols to be emitted later (after context initialization
resets the temporary label counter).
llvm-svn: 169785
The linker will call `lto_codegen_add_must_preserve_symbol' on all globals that
should be kept around. The linker will pretend that a dylib is being created.
<rdar://problem/12528059>
llvm-svn: 169770
The `-mno-red-zone' flag wasn't being propagated to the functions that code
coverage generates. This allowed some of them to use the red zone when that
wasn't allowed.
<rdar://problem/12843084>
llvm-svn: 169754
This visitor provides infrastructure for recursively traversing the
use-graph of a pointer-producing instruction like an alloca or a malloc.
It maintains a worklist of uses to visit, so it can handle very deep
recursions. It automatically looks through instructions which simply
translate one pointer to another (bitcasts and GEPs). It tracks the
offset relative to the original pointer as long as that offset remains
constant and exposes it during the visit as an APInt offset. Finally, it
performs conservative escape analysis.
However, currently it has some limitations that should be addressed
going forward:
1) It doesn't handle vectors of pointers.
2) It doesn't provide a cheaper visitor when the constant offset
tracking isn't needed.
3) It doesn't support non-instruction pointer values.
The current functionality is exactly what is required to implement the
SROA pointer-use visitors in terms of this one, rather than in terms of
their own ad-hoc base visitor, which was always very poorly specified.
SROA has been converted to use this, and the code there deleted which
this utility now provides.
Technically speaking, using this new visitor allows SROA to handle a few
more cases than it previously did. It is now more aggressive in ignoring
chains of instructions which look like they would defeat SROA, but in
fact do not because they never result in a read or write of memory.
While this is "neat", it shouldn't be interesting for real programs as
any such chains should have been removed by others passes long before we
get to SROA. As a consequence, I've not added any tests for these
features -- it shouldn't be part of SROA's contract to perform such
heroics.
The goal is to extend the functionality of this visitor going forward,
and re-use it from passes like ASan that can benefit from doing
a detailed walk of the uses of a pointer.
Thanks to Ben Kramer for the code review rounds and lots of help
reviewing and debugging this patch.
llvm-svn: 169728
- added function to VectorTargetTransformInfo to query cost of intrinsics
- vectorize trivially vectorizable intrinsic calls such as sin, cos, log, etc.
Reviewed by: Nadav
llvm-svn: 169711
There are still bugs in this pass, as well as other issues that are
being worked on, but the bugs are crashers that occur pretty easily in
the wild. Test cases have been sent to the original commit's review
thread.
This reverts the commits:
r169671: Fix a logic error.
r169604: Move the popcnt tests to an X86 subdirectory.
r168931: Initial commit adding the pass.
llvm-svn: 169683
This function sets the `_exportDynamic' ivar. When that's set, we export all
symbols (e.g. we don't run the internalize pass). This is equivalent to the
`--export-dynamic' linker flag in GNU land:
--export-dynamic
When creating a dynamically linked executable, add all symbols to the dynamic
symbol table. The dynamic symbol table is the set of symbols which are visible
from dynamic objects at run time. If you do not use this option, the dynamic
symbol table will normally contain only those symbols which are referenced by
some dynamic object mentioned in the link. If you use dlopen to load a dynamic
object which needs to refer back to the symbols defined by the program, rather
than some other dynamic object, then you will probably need to use this option
when linking the program itself.
The Darwin linker will support this via the `-export_dynamic' flag. We should
modify clang to support this via the `-rdynamic' flag.
llvm-svn: 169656
SmallString. This makes it possible to use the length-erased SmallVectorImpl
in the interface without imposing buffer size. Thus, the size of MCInstFragment
is back down since a preallocated 8-byte contents buffer is enough.
It would be generally a good idea to rid all the fragments of SmallString as
contents, because a vector just makes more sense.
llvm-svn: 169644
Before this patch, when you objdump an LLVM-compiled file, objdump tried to
decode data-in-code sections as if they were code. This patch adds the missing
Mapping Symbols, as defined by "ELF for the ARM Architecture" (ARM IHI 0044D).
Patch based on work by Greg Fitzgerald.
llvm-svn: 169609
This is still a work in progress. The purpose is to make bundling and
unbundling operations explicit, and to catch errors where bundles are
broken or created inadvertently.
The old IsInsideBundle flag is replaced by two MI flags: BundledPred
which has the same meaning as IsInsideBundle, and BundledSucc which is
set on instructions that are bundled with a successor. Having two flags
provdes redundancy to detect when a bundle is inadvertently torn by a
splice() or insert(), and it makes it possible to write bundle iterators
that don't need to peek at adjacent instructions.
The new flags can't be manipulated directly (once setIsInsideBundle is
gone). Instead there are MI functions to make and break bundle bonds.
The setIsInsideBundle function will be removed in a future commit. It
should be replaced by bundleWithPred().
llvm-svn: 169583
original change description:
change MCContext to work on the doInitialization/doFinalization model
reviewed by Evan Cheng <evan.cheng@apple.com>
llvm-svn: 169553
understand target implementation of any_extend / extload, just generate
zero_extend in place of any_extend for liveouts when the target knows the
zero_extend will be implicit (e.g. ARM ldrb / ldrh) or folded (e.g. x86 movz).
rdar://12771555
llvm-svn: 169536
This is an alternative to the ImmutableMapRef interface where a factory
should still be canonicalizing by default, but in certain cases an
improvement can be made by delaying the canonicalization.
llvm-svn: 169532
Some languages, e.g. Ada and Pascal, allow you to specify that the array bounds
are different from the default (1 in these cases). If we have a lower bound
that's non-default, then we emit the lower bound. We also calculate the correct
upper bound in those cases.
llvm-svn: 169484
This is more consistent with other vectors in this code. In addition, I ran some
tests compiling a large program and >96% of fragments have 4 or less fixups, so
SmallVector<4> is a good optimization.
llvm-svn: 169433
This is much simpler to reason about, more efficient, and
fixes some corner cases involving implicit super-register defs.
Fixed rdar://12797931.
llvm-svn: 169425
Change member types of RuntimeFunction and UnwindInfo from uint64_t to
uint32_t:
These members represent addresses. According to MSDN, they are image
relative, that is, they are 32-bit offsets from the starting address
of the image that contains the function table entry.
See MSDN for more information:
RUNTIME_FUNCTION: http://msdn.microsoft.com/en-us/library/ft9x1kdx.aspx
UNWIND_INFO: http://msdn.microsoft.com/en-us/library/ddssxxy8.aspx
Make Win64.h platform-neutral:
The standard types unit8_t, uint16_t and uint32_t are replaced with
their counterparts from Endian.h. Accessor functions are introduced to
replace bit fields.
Patch by João Matos and Kai Nacke.
llvm-svn: 169414
A MachineInstr can only ever be constructed by CreateMachineInstr() and
CloneMachineInstr(), and those factories don't use the removed
constructors.
llvm-svn: 169395
This is for the lldb team so most of but not all of the values are
to be printed as hex with this option. Some small values like the
scale in an X86 address were requested to printed in decimal
without the leading 0x.
There may be some tweaks need to places that may still be in
decimal that they want in hex. Specially for arm. I made my best
guess. Any tweaks from here should be simple.
I also did the best I know now with help from the C++ gurus
creating the cleanest formatImm() utility function and containing
the changes. But if someone has a better idea to make something
cleaner I'm all ears and game for changing the implementation.
rdar://8109283
llvm-svn: 169393
- fixed ordering of calls to doFinalization to be the reverse of the pass run order due to potential dependencies
- fixed machine module info to operate in the doInitialization/doFinalization model, also fixes some FIXMEs
reviewed by Evan Cheng <evan.cheng@apple.com>
llvm-svn: 169391
At build-time register pressure was always computed in terms of
register units. But the compile-time API was expressed in terms of
register classes because it was intended for virtual registers (and
physical register units weren't yet used anywhere in codegen).
Now that the codegen uses physreg units consistently, prepare for
tracking register pressure also in terms of live units, not live
registers.
llvm-svn: 169360
The count attribute is more accurate with regards to the size of an array. It
also obviates the upper bound attribute in the subrange. We can also better
handle an unbound array by setting the count to -1 instead of the lower bound to
1 and upper bound to 0.
llvm-svn: 169312
textually as NativeClient. Also added a link to the native client project for
readers unfamiliar with it.
A Clang patch will follow shortly.
llvm-svn: 169291
on 64-bit PowerPC ELF.
The patch includes code to handle external assembly and MC output with the
integrated assembler. It intentionally does not support the "old" JIT.
For the initial-exec TLS model, the ABI requires the following to calculate
the address of external thread-local variable x:
Code sequence Relocation Symbol
ld 9,x@got@tprel(2) R_PPC64_GOT_TPREL16_DS x
add 9,9,x@tls R_PPC64_TLS x
The register 9 is arbitrary here. The linker will replace x@got@tprel
with the offset relative to the thread pointer to the generated GOT
entry for symbol x. It will replace x@tls with the thread-pointer
register (13).
The two test cases verify correct assembly output and relocation output
as just described.
PowerPC-specific selection node variants are added for the two
instructions above: LD_GOT_TPREL and ADD_TLS. These are inserted
when an initial-exec global variable is encountered by
PPCTargetLowering::LowerGlobalTLSAddress(), and later lowered to
machine instructions LDgotTPREL and ADD8TLS. LDgotTPREL is a pseudo
that uses the same LDrs support added for medium code model's LDtocL,
with a different relocation type.
The rest of the processing is straightforward.
llvm-svn: 169281
The count field is necessary because there isn't a difference between the 'lo'
and 'hi' attributes for a one-element array and a zero-element array. When the
count is '0', we know that this is a zero-element array. When it's >=1, then
it's a normal constant sized array. When it's -1, then the array is unbounded.
llvm-svn: 169218
the alignment is clamped to TargetFrameLowering.getStackAlignment if the target
does not support stack realignment or the option "realign-stack" is off.
This will cause miscompile if the address is treated as aligned and add is
replaced with or in DAGCombine.
Added a bool StackRealignable to TargetFrameLowering to check whether stack
realignment is implemented for the target. Also added a bool RealignOption
to MachineFrameInfo to check whether the option "realign-stack" is on.
rdar://12713765
llvm-svn: 169197
Now that there can be multiple hint registers from targets, it doesn't
make sense to have a function that returns 'the' preferred register.
llvm-svn: 169190
Targets can provide multiple hints now, so getRegAllocPref() doesn't
make sense any longer because it only returns one preferred register.
Replace it with getSimpleHint() in the remaining heuristics. This
function only
llvm-svn: 169188
Virtual registers with a known preferred register are prioritized by
RAGreedy. This function makes the condition explicit without depending
on getRegAllocPref().
llvm-svn: 169179
The TargetRegisterInfo::getRegAllocationHints() function is going to
replace the existing mechanisms for providing target-dependent hints to
the register allocator: ResolveRegAllocHint() and
getRawAllocationOrder().
The new hook is more flexible because it allows the target to provide
multiple preferred candidate registers for each virtual register, and it
is easier to use because targets are not required to return a reference
to a constant array like getRawAllocationOrder().
An optional VirtRegMap argument can be used to provide target-dependent
hints that depend on the provisional assignments of other virtual
registers.
llvm-svn: 169154
AKA: Recompile *ALL* the source code!
This one went much better. No manual edits here. I spot-checked for
silliness and grep-checked for really broken edits and everything seemed
good. It all still compiles. Yell if you see something that looks goofy.
llvm-svn: 169133
Sooooo many of these had incorrect or strange main module includes.
I have manually inspected all of these, and fixed the main module
include to be the nearest plausible thing I could find. If you own or
care about any of these source files, I encourage you to take some time
and check that these edits were sensible. I can't have broken anything
(I strictly added headers, and reordered them, never removed), but they
may not be the headers you'd really like to identify as containing the
API being implemented.
Many forward declarations and missing includes were added to a header
files to allow them to parse cleanly when included first. The main
module rule does in fact have its merits. =]
llvm-svn: 169131
The original patch removed a bunch of code that the SjLjEHPrepare pass placed
into the entry block if all of the landing pads were removed during the
CodeGenPrepare class. The more natural way of doing things is to run the CGP
*before* we run the SjLjEHPrepare pass.
Make it so!
llvm-svn: 169044
Rationale:
1) This was the name in the comment block. ;]
2) It matches Clang's __has_feature naming convention.
3) It matches other compiler-feature-test conventions.
Sorry for the noise. =]
I've also switch the comment block to use a \brief tag and not duplicate
the name.
llvm-svn: 168996
references from whether it supports an R-value reference *this. No
version of GCC today supports the latter, which breaks GCC C++11
compiles of LLVM and Clang now.
Also add doxygen comments clarifying what's going on here, and update
the usage in Optional. I'll update the usages in Clang next.
llvm-svn: 168993
For example, don't allow empty strings to be passed to getInt.
Move asserts inside parseSpecifier. (One day we may want to pass parse
error messages to the user - from LLParser - instead of using asserts,
but keep the code simple until then. There have been an attempt to do
this. See r142288, which got reverted, and r142605.)
llvm-svn: 168991
depends on the IR infrastructure, there is no sense in it being off in
Support land.
This is in preparation to start working to expand InstVisitor into more
special-purpose visitors that are still generic and can be re-used
across different passes. The expansion will go into the Analylis tree
though as nothing in VMCore needs it.
llvm-svn: 168972
This expands to '&', and is intended to be used when an /optional/ rvalue
override is available.
Before:
void foo() const { ... }
After:
void foo() const LLVM_LVALUE_FUNCTION { ... }
void foo() && { ... }
This is used to allow moving the contents of an Optional.
llvm-svn: 168963
This revision attempts to recognize following population-count pattern:
while(a) { c++; ... ; a &= a - 1; ... },
where <c> and <a>could be used multiple times in the loop body.
TODO: On X8664 and ARM, __buildin_ctpop() are not expanded to a efficent
instruction sequence, which need to be improved in the following commits.
Reviewed by Nadav, really appreciate!
llvm-svn: 168931
MachOObjectFile owns a MachOObj, but never frees it. Both MachOObjectFile
and MachOObj want to own the MemoryBuffer, though, so we have to be careful
and give them each one of their own.
Thanks to Greg Clayton, Eric Christopher and Michael Spencer for helping
figure out what's going wrong here.
rdar://12561773
llvm-svn: 168923
No functional change, just moved header files.
Targets can inject custom passes between register allocation and
rewriting. This makes it possible to tweak the register allocation
before rewriting, using the full global interference checking available
from LiveRegMatrix.
llvm-svn: 168806
appropriate unit tests. This change in itself is not expected to
affect any functionality at this point, but it will serve as a
stepping stone to improve FileCheck's variable matching capabilities.
Luckily, our regex implementation already supports backreferences,
although a bit of hacking is required to enable it. It supports both
Basic Regular Expressions (BREs) and Extended Regular Expressions
(EREs), without supporting backrefs for EREs, following POSIX strictly
in this respect. And EREs is what we actually use (rightly). This is
contrary to many implementations (including the default on Linux) of
POSIX regexes, that do allow backrefs in EREs.
Adding backref support to our EREs is a very simple change in the
regcomp parsing code. I fail to think of significant cases where it
would clash with existing things, and can bring more versatility to
the regexes we write. There's always the danger of a backref in a
specially crafted regex causing exponential matching times, but since
we mainly use them for testing purposes I don't think it's a big
problem. [it can also be placed behind a flag specific to FileCheck,
if needed].
For more details, see:
* http://lists.cs.uiuc.edu/pipermail/llvmdev/2012-November/055840.html
* http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20121126/156878.html
llvm-svn: 168802
This is a simple, cheap infrastructure for analyzing the shape of a
DAG. It recognizes uniform DAGs that take the shape of bottom-up
subtrees, such as the included matrix multiplication example. This is
useful for heuristics that balance register pressure with ILP. Two
canonical expressions of the heuristic are implemented in scheduling
modes: -misched-ilpmin and -misched-ilpmax.
llvm-svn: 168773