According to the AAPCS, when a CPRC is allocated to the stack, all other
VFP registers should be marked as unavailable.
I have also modified the rules for allocating non-CPRCs to the stack, to make
it more explicit that all GPRs must be made unavailable. I cannot think of a
case where the old version would produce incorrect answers, so there is no test
for this.
llvm-svn: 200970
Generalize the AArch64 .td nodes for AssertZext and AssertSext. Use
them to match the relevant pextr store instructions.
The test widen_load-2.ll requires a slight change because with the
stores gone, the remaining instructions are scheduled in a different
order.
Add test cases for SSE4 and AVX variants.
Resolves rdar://13414672.
Patch by Adam Nemet <anemet@apple.com>.
llvm-svn: 200957
In a previous commit (r199818) we added a const_cast to an existing
subtarget info instead of creating a new one so that we could reuse
it when creating the TargetAsmParser for parsing inline assembly.
This cast was necessary because we needed to reuse the existing STI
to avoid generating incorrect code when the inline asm contained
mode-switching directives (e.g. .code 16).
The root cause of the failure was that there was an implicit sharing
of the STI between the parser and the MCCodeEmitter. To fix a
different but related issue, we now explicitly pass the STI to the
MCCodeEmitter (see commits r200345-r200351).
The const_cast is no longer necessary and we can now create a fresh
STI for the inline asm parser to use.
Differential Revision: http://llvm-reviews.chandlerc.com/D2709
llvm-svn: 200929
build but spectacularly changed behavior of the C++98 build. =]
This shows my one problem with not having unittests -- basic API
expectations aren't well exercised by the integration tests because they
*happen* to not come up, even though they might later. I'll probably add
a basic unittest to complement the integration testing later, but
I wanted to revive the bots.
llvm-svn: 200905
The primary motivation for this pass is to separate the call graph
analysis used by the new pass manager's CGSCC pass management from the
existing call graph analysis pass. That analysis pass is (somewhat
unfortunately) over-constrained by the existing CallGraphSCCPassManager
requirements. Those requirements make it *really* hard to cleanly layer
the needed functionality for the new pass manager on top of the existing
analysis.
However, there are also a bunch of things that the pass manager would
specifically benefit from doing differently from the existing call graph
analysis, and this new implementation tries to address several of them:
- Be lazy about scanning function definitions. The existing pass eagerly
scans the entire module to build the initial graph. This new pass is
significantly more lazy, and I plan to push this even further to
maximize locality during CGSCC walks.
- Don't use a single synthetic node to partition functions with an
indirect call from functions whose address is taken. This node creates
a huge choke-point which would preclude good parallelization across
the fanout of the SCC graph when we got to the point of looking at
such changes to LLVM.
- Use a memory dense and lightweight representation of the call graph
rather than value handles and tracking call instructions. This will
require explicit update calls instead of some updates working
transparently, but should end up being significantly more efficient.
The explicit update calls ended up being needed in many cases for the
existing call graph so we don't really lose anything.
- Doesn't explicitly model SCCs and thus doesn't provide an "identity"
for an SCC which is stable across updates. This is essential for the
new pass manager to work correctly.
- Only form the graph necessary for traversing all of the functions in
an SCC friendly order. This is a much simpler graph structure and
should be more memory dense. It does limit the ways in which it is
appropriate to use this analysis. I wish I had a better name than
"call graph". I've commented extensively this aspect.
This is still very much a WIP, in fact it is really just the initial
bits. But it is about the fourth version of the initial bits that I've
implemented with each of the others running into really frustrating
problms. This looks like it will actually work and I'd like to split the
actual complexity across commits for the sake of my reviewers. =] The
rest of the implementation along with lots of wiring will follow
somewhat more rapidly now that there is a good path forward.
Naturally, this doesn't impact any of the existing optimizer. This code
is specific to the new pass manager.
A bunch of thanks are deserved for the various folks that have helped
with the design of this, especially Nick Lewycky who actually sat with
me to go through the fundamentals of the final version here.
llvm-svn: 200903
Ideally only those transform passes that run at -O0 remain enabled,
in reality we get as close as we reasonably can.
Passes are responsible for disabling themselves, it's not the job of
the pass manager to do it for them.
llvm-svn: 200892
I think this was just over-eagerness on my part. The analysis results
need to often be non-const because they need to (in some cases at least)
be updated by the transformation pass in order to remain correct. It
also makes lazy analyses (a common case) needlessly annoying to write in
order to make their entire state mutable.
llvm-svn: 200881
Summary:
The check performed in the comparator is invalid, as some STL
implementations enforce strict weak ordering by calling the comparator with the
same value. This check was also in a wrong place: the assertion would only fire
when -help was used. The new check is performed each time the category is
registered (we are not going to have thousands of them, so it's fine to do it in
O(N^2)).
Reviewers: jordan_rose
Reviewed By: jordan_rose
CC: cfe-commits, alexmc
Differential Revision: http://llvm-reviews.chandlerc.com/D2699
llvm-svn: 200853
ISSUE:
On Ubuntu 12.04 LTS, arc4random is provided by libbsd.so, which is a
transitive dependency of libedit. If a system had libedit on it that
was implemented in terms of libbsd.so, then the arc4random test,
previously implemented as a linker test, would succeed with -ledit.
However, on Ubuntu this would also require a #include <bsd/stdlib.h>.
This caused a build breakage on configure-based Ubuntu 12.04 with
libedit installed.
FIX:
This fix changes configure to test for arc4random by searching for it
in the standard header files. On Ubuntu 12.04, this test now properly
fails to find arc4random as it is not defined in the default header
locations. It also tweaks the #define names to match the output of the
header check command, which is slightly different than the linker
function check #defines.
I tested the following scenarios:
(1) Ubuntu 12.04 without the libedit package [did not find arc4random,
as expected]
(2) Ubuntu 12.04 with libedit package [properly did not find
arc4random, as expected]
(3) Ubuntu 12.04 with most recent libedit, custom built, and not
dependent on libbsd.so [properly did not find arc4random, as
expected].
(4) FreeBSD 10.0B1 [properly found arc4random, as expected]
llvm-svn: 200819
This patch fixes the ldr-pseudo implementation to work when used in
inline assembly. The fix is to move arm assembler constant pools
from the ARMAsmParser class to the ARMTargetStreamer class.
Previously we kept the assembler generated constant pools in the
ARMAsmParser object. This does not work for inline assembly because
a new parser object is created for each blob of inline assembly.
This patch moves the constant pools to the ARMTargetStreamer class
so that the constant pool will remain alive for the entire code
generation process.
An ARMTargetStreamer class is now required for the arm backend.
There was no existing implementation for MachO, only Asm and ELF.
Instead of creating an empty MachO subclass, we decided to make the
ARMTargetStreamer a non-abstract class and provide default
(llvm_unreachable) implementations for the non constant-pool related
methods.
Differential Revision: http://llvm-reviews.chandlerc.com/D2638
llvm-svn: 200777
There was an extremely confusing proliferation of LLVM intrinsics to implement
the vacge & vacgt instructions. This combines them all into two polymorphic
intrinsics, shared across both backends.
llvm-svn: 200768
Until now, when a path in a gcno file included a directory, we would
emit our .gcov file in that directory, whereas gcov always emits the
file in the current directory. In doing so, this implements gcov's
strange name-mangling -p flag, which is needed to avoid clobbering
files when two with the same name exist in different directories.
The path mangling is a bit ugly and only handles unix-like paths, but
it's simple, and it doesn't make any guesses as to how it should
behave outside of what gcov documents. If we decide this should be
cross platform later, we can consider the compatibility implications
then.
llvm-svn: 200754
Some of the SHA instructions take a scalar i32 as one argument (largely because
they work on 160-bit hash fragments). This wasn't reflected in the IR
previously, with ARM and AArch64 choosing different types (<4 x i32> and <1 x
i32> respectively) which was ugly.
This makes all the affected intrinsics take a uniform "i32", allowing them to
become non-polymorphic at the same time.
llvm-svn: 200706
iteration. This alows the majority of operations to be performed without
encoding a specific small size. It follows the model of
SmallVectorImpl<T>.
llvm-svn: 200688
'SmallPtrSetImplBase'. This more closely matches the organization of
SmallVector and should allow introducing a SmallPtrSetImpl which serves
the same purpose as SmallVectorImpl: isolating the element type from the
particular small size chosen. This in turn allows a lot of
simplification of APIs by not coding them against a specific small size
which is rarely needed.
llvm-svn: 200687
This will be needed for .octa support, but we don't want to just use the
existing AsmLexer::Integer for it and then have to litter all its users
with explicit checks for the size, and make them use the new get APIntVal()
method.
So let the lexer produce an AsmLexer::Integer as before for numbers which
are small enough — which appears to cover what was previously a nasty
special case handling of numbers which don't fit in int64_t but *do* fit
in uint64_t.
Where the number is too large even for that, produce an AsmLexer::BigNum
instead. We do nothing with these except complain about them for now,
but that will be changed shortly...
Based on a patch from PaX Team <pageexec@freemail.hu>
llvm-svn: 200613
Calls with inalloca are lowered by skipping all stores for arguments
passed in memory and the initial stack adjustment to allocate argument
memory.
Now the frontend is responsible for the memory layout, and the backend
doesn't have to do any work. As a result these changes are pretty
minimal.
Reviewers: echristo
Differential Revision: http://llvm-reviews.chandlerc.com/D2637
llvm-svn: 200596
This library will be used by clang-query. I can imagine LLDB becoming another
client of this library, so I think LLVM is a sensible place for it to live.
It wraps libedit, and adds tab completion support.
The code is loosely based on the line editor bits in LLDB, with a few
improvements:
- Polymorphism for retrieving the list of tab completions, based on
the concept pattern from the new pass manager.
- Tab completion doesn't corrupt terminal output if the input covers
multiple lines. Unfortunately this can only be done in a truly horrible
way, as far as I can tell. But since the alternative is to implement our
own line editor (which I don't think LLVM should be in the business of
doing, at least for now) I think it may be acceptable.
- Includes a fallback for the case where the user doesn't have libedit
installed.
Note that this uses C stdio, mainly because libedit also uses C stdio.
Differential Revision: http://llvm-reviews.chandlerc.com/D2200
llvm-svn: 200595
This will be used by the line editor library to derive a default path to
the history file.
Differential Revision: http://llvm-reviews.chandlerc.com/D2199
llvm-svn: 200594
To remove this one simply move the end of file logic from the asm printer to
the target mc streamer.
This removes the last call to hasRawTextSupport from lib/Target.
llvm-svn: 200590
MSVC always places the 'this' parameter for a method first. The
implicit 'sret' pointer for methods always comes second. We already
implement this for __thiscall by putting sret parameters on the stack,
but __cdecl methods require putting both parameters on the stack in
opposite order.
Using a special calling convention allows frontends to keep the sret
parameter first, which avoids breaking lots of assumptions in LLVM and
Clang.
Fixes PR15768 with the corresponding change in Clang.
Reviewers: ributzka, majnemer
Differential Revision: http://llvm-reviews.chandlerc.com/D2663
llvm-svn: 200561
COFF has only one symbol table.
MachO has a LC_DYSYMTAB, but that is not a symbol table, just extra info about
the one symbol table (LC_SYMTAB).
IR (coming soon) also has only one table.
llvm-svn: 200488
The .object_arch directive indicates an alternative architecture to be specified
in the object file. The directive does *not* effect the enabled feature bits
for the object file generation. This is particularly useful when the code
performs runtime detection and would like to indicate a lower architecture as
the requirements than the actual instructions used.
llvm-svn: 200451
.movsp is an ARM unwinding directive that indicates to the unwinder that a
register contains an offset from the current stack pointer. If the offset is
unspecified, it defaults to zero.
llvm-svn: 200449
This enhances the ARMAsmParser to handle .tlsdescseq directives. This is a
slightly special relocation. We must be able to generate them, but not consume
them in assembly. The relocation is meant to assist the linker in generating a
TLS descriptor sequence. The ELF target streamer is enhanced to append
additional fixups into the current segment and that is used to emit the new
R_ARM_TLS_DESCSEQ relocations.
llvm-svn: 200448
Add support for tlsdesc relocations which are part of the ABI, marked as
experimental. These relocations permit the linker to perform TLS reference
optimizations.
llvm-svn: 200447
This adds support for TLS CALL relocations. TLS CALL relocations are used to
indicate to the linker to generate appropriate entries to resolve TLS references
via an appropriate function invocation (e.g. __tls_get_addr(PLT)).
In order to accomodate the linker relaxation of the TLS access model for the
references (GD/LD -> IE, IE -> LE), the relocation addend must be incomplete.
This requires that the partial inplace value is also incomplete (i.e. 0). We
simply avoid the offset value calculation at the time of the fixup adjustment in
the ARM assembler backend.
llvm-svn: 200446
None of the object file formats reported error on iterator increment. In
retrospect, that is not too surprising: no object format stores symbols or
sections in a linked list or other structure that requires chasing pointers.
As a consequence, all error checking can be done on begin() and end().
This reduces the text segment of bin/llvm-readobj in my machine from 521233 to
518526 bytes.
llvm-svn: 200442
This commit only handles IfConvertTriangle. To update edge weights
of a successor, one interface is added to MachineBasicBlock:
/// Set successor weight of a given iterator.
setSuccWeight(succ_iterator I, uint32_t weight)
An existing testing case test/CodeGen/Thumb2/v8_IT_5.ll is updated,
since we now correctly update the edge weights, the cold block
is placed at the end of the function and we jump to the cold block.
llvm-svn: 200428
This can still be overridden by explicitly setting a value requirement on the
alias option, but by default it should be the same.
PR18649
llvm-svn: 200407
This is a bit more convenient for some callers, but more importantly, it is
easier to implement correctly. Doing this removes the patching of already
printed data that was used for fastcall, fixing a crash with private fastcall
symbols.
llvm-svn: 200367
I assume that the name is file_type because it is the name of a c++11 type that
we will use once we convert, but at least our current implementation can look
like llvm code.
Thanks to David Blakie for the push.
llvm-svn: 200354
Needed to fix PR18303 to correctly re-encode the instruction if it
is relaxed.
We keep a copy of the MCSubtargetInfo to make sure that we are not
effected by future changes to the subtarget info coming from the
assembler (e.g. when parsing .code 16 directived).
llvm-svn: 200347
When simplifycfg moves an instruction, it must drop metadata it doesn't know
is still valid with the preconditions changes. In particular, it must drop
the range and tbaa metadata.
The patch implements this with an utility function to drop all metadata not
in a white list.
llvm-svn: 200322
LCSSA from it caused a crasher with the LoopUnroll pass.
This crasher is really nasty. We destroy LCSSA form in a suprising way.
When unrolling a loop into an outer loop, we not only need to restore
LCSSA form for the outer loop, but for all children of the outer loop.
This is somewhat obvious in retrospect, but hey!
While this seems pretty heavy-handed, it's not that bad. Fundamentally,
we only do this when we unroll a loop, which is already a heavyweight
operation. We're unrolling all of these hypothetical inner loops as
well, so their size and complexity is already on the critical path. This
is just adding another pass over them to re-canonicalize.
I have a test case from PR18616 that is great for reproducing this, but
pretty useless to check in as it relies on many 10s of nested empty
loops that get unrolled and deleted in just the right order. =/ What's
worse is that investigating this has exposed another source of failure
that is likely to be even harder to test. I'll try to come up with test
cases for these fixes, but I want to get the fixes into the tree first
as they're causing crashes in the wild.
llvm-svn: 200273
Before this patch we used getIntImmCost from TargetTransformInfo to determine if
a load of a constant should be converted to just a constant, but the threshold
for this was set to an arbitrary value. This value works well for the two
targets (X86 and ARM) that implement this target-hook, but it isn't
target-independent at all.
Now targets have the possibility to decide directly if this optimization should
be performed. The default value is set to false to preserve the current
behavior. The target hook has been moved to TargetLowering, which removed the
last use and need of TargetTransformInfo in SelectionDAG.
llvm-svn: 200271
code to see if we're emitting a function into a non-default
text section. This is still a less-than-ideal solution, but more
contained than r199871 to determine whether or not we're emitting
code into an array of comdat sections.
llvm-svn: 200269
This commit allows LLVM MC to process .cfi_startproc directives when
they are followed by an additional `simple' identifier. This signals to
elide the emission of target specific CFI instructions that would
normally occur initially.
This fixes PR16587.
Differential Revision: http://llvm-reviews.chandlerc.com/D2624
llvm-svn: 200227
powers of two. This is essentially always the correct thing given the
impact on alignment, scaling factors that can be used in addressing
modes, etc. Also, fix the management of the unroll vs. small loop cost
to more accurately model things with this world.
Enhance a test case to actually exercise more of the unroll machinery if
using synthetic constants rather than a specific target model. Before
this change, with the added flags this test will unroll 3 times instead
of either 2 or 4 (the two sensible answers).
While I don't expect this to make a huge difference, if there are lots
of loops sitting right on the edge of hitting the 'small unroll' factor,
they might change behavior. However, I've benchmarked moving the small
loop cost up and down in many various ways and by a huge factor (2x)
without seeing more than 0.2% code size growth. Small adjustments such
as the series that led up here have led to about 1% improvement on some
benchmarks, but it is very close to the noise floor so I mostly checked
that nothing regressed. Let me know if you see bad behavior on other
targets but I don't expect this to be a sufficiently dramatic change to
trigger anything.
llvm-svn: 200213
Unfortunately, this in turn led to some lower quality SCEVs due to some different paths through expression simplification, so add getUDivExactExpr and use it. This fixes all instances of the problems that I found, but we can make that function smarter as necessary.
Merge test "xor-and.ll" into "and-xor.ll" since I needed to update it anyways. Test 'nsw-offset.ll' analyzes a little deeper, %n now gets a scev in terms of %no instead of a SCEVUnknown.
llvm-svn: 200203
There are a couple of interesting things here that we want to check over
(particularly the expecting asserts in StringRef) and get right for general use
in ADT so hold back on this one. For clang we have a workable templated
solution to use in the meanwhile.
This reverts commit r200187.
llvm-svn: 200194
StringRef is a low-level data wrapper that shouldn't know about language
strings like 'true' and 'false' whereas StringExtras is just the place for
higher-level utilities.
llvm-svn: 200188
(1) Add llvm_expect(), an asserting macro that can be evaluated as a constexpr
expression as well as a runtime assert or compiler hint in release builds. This
technique can be used to construct functions that are both unevaluated and
compiled depending on usage.
(2) Update StringRef using llvm_expect() to preserve runtime assertions while
extending the same checks to static asserts in C++11 builds that support the
feature.
(3) Introduce ConstStringRef, a strong subclass of StringRef that references
compile-time constant strings. It's convertible to, but not from, ordinary
StringRef and thus can be used to add compile-time safety to various interfaces
in LLVM and clang that only accept fixed inputs such as diagnostic format
strings that tend to get misused.
llvm-svn: 200187
These were:
* noreorder handling on the target object streamer and asm parser.
* setting the initial flag bits based on the enabled features.
* setting the elf header flag for micromips
It is *really* depressing I am the one doing this instead of someone at
mips actually taking the time to understand the infrastructure.
llvm-svn: 200138
This has a few advantages:
* Only targets that use a MCTargetStreamer have to worry about it.
* There is never a MCTargetStreamer without a MCStreamer, so we can use a
reference.
* A MCTargetStreamer can talk to the MCStreamer in its constructor.
llvm-svn: 200129
That bit is not documented in the PE/COFF spec published by Microsoft, so we
don't know the official name of it. I named this bit
IMAGE_DLL_CHARACTERISTICS_HIGH_ENTROPY_VIRTUAL_ADDRESS because the bit is
reported as "high entropy virtual address" by dumpbin.exe,
llvm-svn: 200121
PE32+ supports 64 bit address space, but the file format remains 32 bit.
So its file format is pretty similar to PE32 (32 bit executable). The
differences compared to PE32 are (1) the lack of "BaseOfData" field and
(2) some of its data members are 64 bit.
In this patch, I added a new member function to get a PE32+ Header object to
COFFObjectFile class and made llvm-readobj to use it.
llvm-svn: 200117
the loops in a function, and teach LICM to work in the presance of
LCSSA.
Previously, LCSSA was a loop pass. That made passes requiring it also be
loop passes and unable to depend on function analysis passes easily. It
also caused outer loops to have a different "canonical" form from inner
loops during analysis. Instead, we go into LCSSA form and preserve it
through the loop pass manager run.
Note that this has the same problem as LoopSimplify that prevents
enabling its verification -- loop passes which run at the end of the loop
pass manager and don't preserve these are valid, but the subsequent loop
pass runs of outer loops that do preserve this pass trigger too much
verification and fail because the inner loop no longer verifies.
The other problem this exposed is that LICM was completely unable to
handle LCSSA form. It didn't preserve it and it actually would give up
on moving instructions in many cases when they were used by an LCSSA phi
node. I've taught LICM to support detecting LCSSA-form PHI nodes and to
hoist and sink around them. This may actually let LICM fire
significantly more because we put everything into LCSSA form to rotate
the loop before running LICM. =/ Now LICM should handle that fine and
preserve it correctly. The down side is that LICM has to require LCSSA
in order to preserve it. This is just a fact of life for LCSSA. It's
entirely possible we should completely remove LCSSA from the optimizer.
The test updates are essentially accomodating LCSSA phi nodes in the
output of LICM, and the fact that we now completely sink every
instruction in ashr-crash below the loop bodies prior to unrolling.
With this change, LCSSA is computed only three times in the pass
pipeline. One of them could be removed (and potentially a SCEV run and
a separate LoopPassManager entirely!) if we had a LoopPass variant of
InstCombine that ran InstCombine on the loop body but refused to combine
away LCSSA PHI nodes. Currently, this also prevents loop unrolling from
being in the same loop pass manager is rotate, LICM, and unswitch.
There is one thing that I *really* don't like -- preserving LCSSA in
LICM is quite expensive. We end up having to re-run LCSSA twice for some
loops after LICM runs because LICM can undo LCSSA both in the current
loop and the parent loop. I don't really see good solutions to this
other than to completely move away from LCSSA and using tools like
SSAUpdater instead.
llvm-svn: 200067
This commit caused -Woverloaded-virtual warnings. The two new
TargetTransformInfo::getIntImmCost functions were only added to the superclass,
and to the X86 subclass. The other targets were not updated, and the
warning highlighted this by pointing out that e.g. ARMTTI::getIntImmCost was
hiding the two new getIntImmCost variants.
We could pacify the warning by adding "using TargetTransformInfo::getIntImmCost"
to the various subclasses, or turning it off, but I suspect that it's wrong to
leave the functions unimplemnted in those targets. The default implementations
return TCC_Free, which I don't think is right e.g. for ARM.
llvm-svn: 200058
This change does not affect anything because everybody seems to be using
Object/COFF.h instead. But the definition is not for PE32 but for PE32+,
so fix it anyway.
llvm-svn: 200038