The `-mno-red-zone' flag wasn't being propagated to the functions that code
coverage generates. This allowed some of them to use the red zone when that
wasn't allowed.
<rdar://problem/12843084>
llvm-svn: 169754
the assembler. This is useful in order to know how the numbers add up,
since in particular the Align fragments account for a non-trivial
portion of the emitted fragments (especially on -O0 which sets
relax-all).
llvm-svn: 169747
misched used GetUnderlyingObject in order to break false load/store
dependencies, and the -enable-aa-sched-mi feature similarly relied on
GetUnderlyingObject in order to ensure it is safe to use the aliasing analysis.
Unfortunately, GetUnderlyingObject does not recurse through phi nodes, and so
(especially due to LSR) all of these mechanisms failed for
induction-variable-dependent loads and stores inside loops.
This change replaces uses of GetUnderlyingObject with GetUnderlyingObjects
(which will recurse through phi and select instructions) in misched.
Andy reviewed, tested and simplified this patch; Thanks!
llvm-svn: 169744
Summary:
Not all chips targeted by x86_64 have this feature, but a dramatically
increasing number do. Specifying a chip-specific tuning parameter will
continue to turn the feature on or off as appropriate for that
particular chip, but the generic flag should try to achieve the best
performance on the most widely available hardware. Today, the number of
chips with fast UA access dwarfs those without in the x86-64 space.
Note that this also brings LLVM's code generation for this '-march' flag
more in line with that of modern GCCs.
CC: llvm-commits
Differential Revision: http://llvm-reviews.chandlerc.com/D195
llvm-svn: 169740
Intel chips.
The model number rules were determined by inspecting Intel's
documentation for their newer chip model numbers. My understanding is
that all of the newer Intel chips have fast unaligned memory access, but
if anyone is concerned about a particular chip, just shout.
No tests updated; it's not clear we have dedicated tests for the chips'
various features, but if anyone would like tests (or can point me at
some existing ones), I'm happy to oblige.
llvm-svn: 169730
This visitor provides infrastructure for recursively traversing the
use-graph of a pointer-producing instruction like an alloca or a malloc.
It maintains a worklist of uses to visit, so it can handle very deep
recursions. It automatically looks through instructions which simply
translate one pointer to another (bitcasts and GEPs). It tracks the
offset relative to the original pointer as long as that offset remains
constant and exposes it during the visit as an APInt offset. Finally, it
performs conservative escape analysis.
However, currently it has some limitations that should be addressed
going forward:
1) It doesn't handle vectors of pointers.
2) It doesn't provide a cheaper visitor when the constant offset
tracking isn't needed.
3) It doesn't support non-instruction pointer values.
The current functionality is exactly what is required to implement the
SROA pointer-use visitors in terms of this one, rather than in terms of
their own ad-hoc base visitor, which was always very poorly specified.
SROA has been converted to use this, and the code there deleted which
this utility now provides.
Technically speaking, using this new visitor allows SROA to handle a few
more cases than it previously did. It is now more aggressive in ignoring
chains of instructions which look like they would defeat SROA, but in
fact do not because they never result in a read or write of memory.
While this is "neat", it shouldn't be interesting for real programs as
any such chains should have been removed by others passes long before we
get to SROA. As a consequence, I've not added any tests for these
features -- it shouldn't be part of SROA's contract to perform such
heroics.
The goal is to extend the functionality of this visitor going forward,
and re-use it from passes like ASan that can benefit from doing
a detailed walk of the uses of a pointer.
Thanks to Ben Kramer for the code review rounds and lots of help
reviewing and debugging this patch.
llvm-svn: 169728
When SROA was evaluating a mixture of i1 and i8 loads and stores, in
just a particular case, it would tickle a latent bug where we compared
bits to bytes rather than bits to bits. As a consequence of the latent
bug, we would allow integers through which were not byte-size multiples,
a situation the later rewriting code was never intended to handle.
In release builds this could trigger all manner of oddities, but the
reported issue in PR14548 was forming invalid bitcast instructions.
The only downside of this fix is that it makes it more clear that SROA
in its current form is not capable of handling mixed i1 and i8 loads and
stores. Sometimes with the previous code this would work by luck, but
usually it would crash, so I'm not terribly worried. I'll watch the LNT
numbers just to be sure.
llvm-svn: 169719
- added function to VectorTargetTransformInfo to query cost of intrinsics
- vectorize trivially vectorizable intrinsic calls such as sin, cos, log, etc.
Reviewed by: Nadav
llvm-svn: 169711
This will more closely match the behavior of the new PtrUseVisitor that
I am adding. Hopefully this will not change the actual behavior in any
way, but by making the processing order more similar help in debugging.
llvm-svn: 169697
The limit seems to break newer pythons (see PR13598) so just drop it for now.
Eventually lit should learn to set limits for its children instead of a global
limit in the makefile.
If some PPC bots fail after this change: That's a good thing, they actually run
clang tests now.
llvm-svn: 169695
There are still bugs in this pass, as well as other issues that are
being worked on, but the bugs are crashers that occur pretty easily in
the wild. Test cases have been sent to the original commit's review
thread.
This reverts the commits:
r169671: Fix a logic error.
r169604: Move the popcnt tests to an X86 subdirectory.
r168931: Initial commit adding the pass.
llvm-svn: 169683
This function sets the `_exportDynamic' ivar. When that's set, we export all
symbols (e.g. we don't run the internalize pass). This is equivalent to the
`--export-dynamic' linker flag in GNU land:
--export-dynamic
When creating a dynamically linked executable, add all symbols to the dynamic
symbol table. The dynamic symbol table is the set of symbols which are visible
from dynamic objects at run time. If you do not use this option, the dynamic
symbol table will normally contain only those symbols which are referenced by
some dynamic object mentioned in the link. If you use dlopen to load a dynamic
object which needs to refer back to the symbols defined by the program, rather
than some other dynamic object, then you will probably need to use this option
when linking the program itself.
The Darwin linker will support this via the `-export_dynamic' flag. We should
modify clang to support this via the `-rdynamic' flag.
llvm-svn: 169656
SmallString. This makes it possible to use the length-erased SmallVectorImpl
in the interface without imposing buffer size. Thus, the size of MCInstFragment
is back down since a preallocated 8-byte contents buffer is enough.
It would be generally a good idea to rid all the fragments of SmallString as
contents, because a vector just makes more sense.
llvm-svn: 169644
Before this patch, when you objdump an LLVM-compiled file, objdump tried to
decode data-in-code sections as if they were code. This patch adds the missing
Mapping Symbols, as defined by "ELF for the ARM Architecture" (ARM IHI 0044D).
Patch based on work by Greg Fitzgerald.
llvm-svn: 169609
Buildbots for some hosts may choose to build only their own backend in order to
maximise testing-turnaround time. Move the test into a prefixed directory so
lit's standard "backend specific" suppression can be done.
llvm-svn: 169604
NOTE: If you have any patches in the works that modify LangRef, you will
need to rewrite the changes to LangRef.html to their equivalents in
LangRef.rst. If you need assistance feel free to contact me.
Since LangRef is mission-critical for the project and "normative", I
have taken extra care to ensure that no content was lost or altered in
the conversion. The content was converted with a tool called `pandoc`,
so there is no chance for a human error like accidentally forgetting a
sentence or whatever. After the initial conversion by `pandoc`, only
changes to the markup were done.
This is just the most literal conversion of the HTML document as
possible. It might be worth exploring some way to chop up this massive
document into separate pages, e.g. something like
`docs/LangRef/Instructions.rst`, `docs/LangRef/Intrinsics.rst`, etc.
with `docs/LangRef.rst` being an "intro/navigation page" of sorts. On
the other hand, that loses the ability to {Ctrl,Cmd}-F for a given term
right from your browser.
IMO, I think our stylesheet needs some work because I find it hard to
tell what level of nesting some of the headings are at (e.g. "is this a
new section or is it a subsection?"). The issue is present on other
pages, but the sheer size and deep section structure of LangRef really
brings this issue out. If there are any web designers out there in the
community it would be awesome if you tried to come up with something
nicer.
llvm-svn: 169596
MSan uses a TLS slot to pass shadow for function arguments and return values.
This makes all instrumented functions not readonly, and at the same time
requires that all callees of an instrumented function that may be
MSan-instrumented do not have readonly attribute (otherwise some of the
instrumentation may be optimized out).
llvm-svn: 169591
This is still a work in progress. The purpose is to make bundling and
unbundling operations explicit, and to catch errors where bundles are
broken or created inadvertently.
The old IsInsideBundle flag is replaced by two MI flags: BundledPred
which has the same meaning as IsInsideBundle, and BundledSucc which is
set on instructions that are bundled with a successor. Having two flags
provdes redundancy to detect when a bundle is inadvertently torn by a
splice() or insert(), and it makes it possible to write bundle iterators
that don't need to peek at adjacent instructions.
The new flags can't be manipulated directly (once setIsInsideBundle is
gone). Instead there are MI functions to make and break bundle bonds.
The setIsInsideBundle function will be removed in a future commit. It
should be replaced by bundleWithPred().
llvm-svn: 169583
by virtue of inbounds GEPs that preclude a null pointer.
This is a very common pattern in the code generated by std::vector and
other standard library routines which use allocators that test for null
pervasively. This is one step closer to teaching Clang+LLVM to be able
to produce an empty function for:
void f() {
std::vector<int> v;
v.push_back(1);
v.push_back(2);
v.push_back(3);
v.push_back(4);
}
Which is related to getting them to completely fold SmallVector
push_back sequences into constants when inlining and other optimizations
make that a possibility.
llvm-svn: 169573
- Use SOURCES instead of Source. See Makefile.rules and MakefileGuide.html.
- Don't assume the current directory. $(wildcard *.cc) doesn't match anything on corresponding build directory.
llvm-svn: 169568
1) don't delete gtest-all.cc (which is used to gather all gtest source
files in a single file)
2) make including LLVMSupport headers optional (on by default).
Sanitizer tools may want to use their own versions of googletest
compiled with specific flags, instead of the common googletest
library used for all other LLVM/Clang unittests.
llvm-svn: 169559
original change description:
change MCContext to work on the doInitialization/doFinalization model
reviewed by Evan Cheng <evan.cheng@apple.com>
llvm-svn: 169553
understand target implementation of any_extend / extload, just generate
zero_extend in place of any_extend for liveouts when the target knows the
zero_extend will be implicit (e.g. ARM ldrb / ldrh) or folded (e.g. x86 movz).
rdar://12771555
llvm-svn: 169536
This is an alternative to the ImmutableMapRef interface where a factory
should still be canonicalizing by default, but in certain cases an
improvement can be made by delaying the canonicalization.
llvm-svn: 169532
Instead of unconditionally storing origin with every application store,
only do this when the shadow of the stored value is != 0.
This change also delays instrumentation of stores until after the walk over
function's instructions, because adding new basic blocks confuses InstVisitor.
We only keep 1 origin value per 4 bytes of application memory. This change
fixes the bug when a store of a single clean byte wiped the origin for the
whole 4-byte area.
Since stores of uninitialized values are relatively uncommon, this change
improves performance of track-origins mode by 5% median and by up to 47% on
specs.
llvm-svn: 169490
Some languages, e.g. Ada and Pascal, allow you to specify that the array bounds
are different from the default (1 in these cases). If we have a lower bound
that's non-default, then we emit the lower bound. We also calculate the correct
upper bound in those cases.
llvm-svn: 169484
RUN: a
RUN: b || true
as "a && (b || true)" in Tcl mode, and as "(a && b) || true" in sh mode.
Everyone seems to (quite reasonably) write tests assuming the Tcl behavior,
so use that in sh mode too.
llvm-svn: 169441
This is more consistent with other vectors in this code. In addition, I ran some
tests compiling a large program and >96% of fragments have 4 or less fixups, so
SmallVector<4> is a good optimization.
llvm-svn: 169433
This is much simpler to reason about, more efficient, and
fixes some corner cases involving implicit super-register defs.
Fixed rdar://12797931.
llvm-svn: 169425
The encoding of NOP in ARMAsmBackend.cpp is missing a trailing zero, which
causes the emission of a coprocessor instruction rather than "mov r0, r0"
as indicated in the comment. The test also checks for the wrong encoding.
http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20121203/157919.html
llvm-svn: 169420
The new command line option -unwind-info dumps the Win64 EH unwind
data to the console. This is a nice feature if you need to debug
generated EH data (e.g. from LLVM). Includes a test case.
Initial patch by João Matos, extensions and rework by Kai Nacke.
llvm-svn: 169415
Change member types of RuntimeFunction and UnwindInfo from uint64_t to
uint32_t:
These members represent addresses. According to MSDN, they are image
relative, that is, they are 32-bit offsets from the starting address
of the image that contains the function table entry.
See MSDN for more information:
RUNTIME_FUNCTION: http://msdn.microsoft.com/en-us/library/ft9x1kdx.aspx
UNWIND_INFO: http://msdn.microsoft.com/en-us/library/ddssxxy8.aspx
Make Win64.h platform-neutral:
The standard types unit8_t, uint16_t and uint32_t are replaced with
their counterparts from Endian.h. Accessor functions are introduced to
replace bit fields.
Patch by João Matos and Kai Nacke.
llvm-svn: 169414
For OS X builds, we generate one version of config.h but then build for
multiple architectures. This means that the LLVM_HOSTTRIPLE setting may have
the wrong architecture. Adjust it dynamically to match the current
architecture. <rdar://problem/12715470>
llvm-svn: 169405
A MachineInstr can only ever be constructed by CreateMachineInstr() and
CloneMachineInstr(), and those factories don't use the removed
constructors.
llvm-svn: 169395
This is for the lldb team so most of but not all of the values are
to be printed as hex with this option. Some small values like the
scale in an X86 address were requested to printed in decimal
without the leading 0x.
There may be some tweaks need to places that may still be in
decimal that they want in hex. Specially for arm. I made my best
guess. Any tweaks from here should be simple.
I also did the best I know now with help from the C++ gurus
creating the cleanest formatImm() utility function and containing
the changes. But if someone has a better idea to make something
cleaner I'm all ears and game for changing the implementation.
rdar://8109283
llvm-svn: 169393
- fixed ordering of calls to doFinalization to be the reverse of the pass run order due to potential dependencies
- fixed machine module info to operate in the doInitialization/doFinalization model, also fixes some FIXMEs
reviewed by Evan Cheng <evan.cheng@apple.com>
llvm-svn: 169391
At build-time register pressure was always computed in terms of
register units. But the compile-time API was expressed in terms of
register classes because it was intended for virtual registers (and
physical register units weren't yet used anywhere in codegen).
Now that the codegen uses physreg units consistently, prepare for
tracking register pressure also in terms of live units, not live
registers.
llvm-svn: 169360
Sorry for the massive commit, but I just wanted to knock this one down
and it is really straightforward.
There are still a couple trivial (i.e. not related to the content)
things left to fix:
- Use of raw HTML links where :doc:`...` and :ref:`...` could be used
instead. If you are a newbie and want to help fix this it would make
for some good bite-sized patches; more experienced developers should
be focusing on adding new content (to this tutorial or elsewhere, but
please _do not_ waste your time on formatting when there is such dire
need for documentation (see docs/SphinxQuickstartTemplate.rst to get
started writing)).
- Highlighting of the kaleidoscope code blocks (currently left as bare
`::`). I will be working on writing a custom Pygments highlighter for
this, mostly as training for maintaining the `llvm` code-block's lexer
in-tree. I want to do this because I am extremely unhappy with how it
just "gives up" on the slightest deviation from the expected syntax
and leaves the whole code-block un-highlighted.
More generally I am looking at writing some Sphinx extensions and
keeping them in-tree as well, to support common use cases that
currently have no good solution (like "monospace text inside a link").
llvm-svn: 169343
This change attempts to simplify (X^Y) -> X or Y in the user's context if we know that
only bits from X or Y are demanded.
A minimized case is provided bellow. This change will simplify "t>>16" into "var1 >>16".
=============================================================
unsigned foo (unsigned val1, unsigned val2) {
unsigned t = val1 ^ 1234;
return (t >> 16) | t; // NOTE: t is used more than once.
}
=============================================================
Note that if the "t" were used only once, the expression would be finally optimized as well.
However, with with this change, the optimization will take place earlier.
Reviewed by Nadav, Thanks a lot!
llvm-svn: 169317
The count attribute is more accurate with regards to the size of an array. It
also obviates the upper bound attribute in the subrange. We can also better
handle an unbound array by setting the count to -1 instead of the lower bound to
1 and upper bound to 0.
llvm-svn: 169312
This reapplies the fix for PR13303 now with more justification. Based on my
execution of the GDB 7.5 test suite this results in:
expected passes: 16101 -> 20890 (+30%)
unexpected failures: 4826 -> 637 (-77%)
There are 23 checks that used to pass and now fail. They are all in
gdb.reverse. Investigating a few looks like they were accidentally passing
due to extra breakpoints being set by this bug. They're generally due to the
difference in end location between gcc and clang, the test suite is trying to
set breakpoints on the closing '}' that clang doesn't associate with any
instructions.
llvm-svn: 169304
textually as NativeClient. Also added a link to the native client project for
readers unfamiliar with it.
A Clang patch will follow shortly.
llvm-svn: 169291
on 64-bit PowerPC ELF.
The patch includes code to handle external assembly and MC output with the
integrated assembler. It intentionally does not support the "old" JIT.
For the initial-exec TLS model, the ABI requires the following to calculate
the address of external thread-local variable x:
Code sequence Relocation Symbol
ld 9,x@got@tprel(2) R_PPC64_GOT_TPREL16_DS x
add 9,9,x@tls R_PPC64_TLS x
The register 9 is arbitrary here. The linker will replace x@got@tprel
with the offset relative to the thread pointer to the generated GOT
entry for symbol x. It will replace x@tls with the thread-pointer
register (13).
The two test cases verify correct assembly output and relocation output
as just described.
PowerPC-specific selection node variants are added for the two
instructions above: LD_GOT_TPREL and ADD_TLS. These are inserted
when an initial-exec global variable is encountered by
PPCTargetLowering::LowerGlobalTLSAddress(), and later lowered to
machine instructions LDgotTPREL and ADD8TLS. LDgotTPREL is a pseudo
that uses the same LDrs support added for medium code model's LDtocL,
with a different relocation type.
The rest of the processing is straightforward.
llvm-svn: 169281
Again, tools are trickier to pick the main module header for than
library source files. I've started to follow the pattern of using
LLVMContext.h when it is included as a stub for program source files.
llvm-svn: 169252
missed in the first pass because the script didn't yet handle include
guards.
Note that the script is now able to handle all of these headers without
manual edits. =]
llvm-svn: 169224
1) Teach it to handle files with #include on the first line -- these do
actually exist in LLVM.
2) Support llvm-c and clang-c include projects.
3) Nuke some stail imports.
4) Switch to using os.path to split the file extension off.
5) Remove debugging leftovers.
6) Add docstring (a really puny one) for the sort function.
I'm continuing te avoid stripping the whitespace on the RHS to preserve
whatever newline characters happen to be in the original file.
llvm-svn: 169222
The count field is necessary because there isn't a difference between the 'lo'
and 'hi' attributes for a one-element array and a zero-element array. When the
count is '0', we know that this is a zero-element array. When it's >=1, then
it's a normal constant sized array. When it's -1, then the array is unbounded.
llvm-svn: 169218
Added the code that actually performs the if-conversion during vectorization.
We can now vectorize this code:
for (int i=0; i<n; ++i) {
unsigned k = 0;
if (a[i] > b[i]) <------ IF inside the loop.
k = k * 5 + 3;
a[i] = k; <---- K is a phi node that becomes vector-select.
}
llvm-svn: 169217
the alignment is clamped to TargetFrameLowering.getStackAlignment if the target
does not support stack realignment or the option "realign-stack" is off.
This will cause miscompile if the address is treated as aligned and add is
replaced with or in DAGCombine.
Added a bool StackRealignable to TargetFrameLowering to check whether stack
realignment is implemented for the target. Also added a bool RealignOption
to MachineFrameInfo to check whether the option "realign-stack" is on.
rdar://12713765
llvm-svn: 169197