As part of the cleanup done to enable the disassembler, the PPC instructions
now have a valid Size description field. This can now be used to replace some
custom logic in a few places to compute instruction sizes.
Patch by David Wiberg!
llvm-svn: 200623
unrolling heuristic per default
Benchmarking on x86_64 (thanks Chandler!) and ARM has shown those options speed
up some benchmarks while not causing any interesting regressions.
llvm-svn: 200621
This didn't work for any integer vectors, and didn't
work with some sizes of float vectors. This should now
work with all sizes of float and i32 vectors.
llvm-svn: 200619
This is a minimal implementation which accepts only constants rather than
full expressions, but that should be perfectly sufficient for all known
users for now.
Patch from PaX Team <pageexec@freemail.hu>
llvm-svn: 200614
This will be needed for .octa support, but we don't want to just use the
existing AsmLexer::Integer for it and then have to litter all its users
with explicit checks for the size, and make them use the new get APIntVal()
method.
So let the lexer produce an AsmLexer::Integer as before for numbers which
are small enough — which appears to cover what was previously a nasty
special case handling of numbers which don't fit in int64_t but *do* fit
in uint64_t.
Where the number is too large even for that, produce an AsmLexer::BigNum
instead. We do nothing with these except complain about them for now,
but that will be changed shortly...
Based on a patch from PaX Team <pageexec@freemail.hu>
llvm-svn: 200613
LCSSA when we promote to SSA registers inside of LICM.
Currently, this is actually necessary. The promotion logic in LICM uses
SSAUpdater which doesn't understand how to place LCSSA PHI nodes.
Teaching it to do so would be a very significant undertaking. It may be
worthwhile and I've left a FIXME about this in the code as well as
starting a thread on llvmdev to try to figure out the right long-term
solution.
For now, the PR needs to be fixed. Short of using the promition
SSAUpdater to place both the LCSSA PHI nodes and the promoted PHI nodes,
I don't see a cleaner or cheaper way of achieving this. Fortunately,
LCSSA is relatively lazy and sparse -- it should only update
instructions which need it. We can also skip the recursive variant when
we don't promote to SSA values.
llvm-svn: 200612
cost so that they don't impact the vector bonus. Fundamentally, counting
unsimplified instructions is just *wrong*; it will continue to introduce
instability as things which do not generate code bizarrely impact
inlining. For example, sufficiently nested inlined functions could turn
off the vector bonus with lifetime markers just like the debug
intrinsics do. =/
This is a short-term tactical fix. Long term, I think we need to remove
the vector bonus entirely. That's a separate patch and discussion
though.
The patch to fix this provided by Dario Domizioli. I've added some
comments about the planned direction and used a heavily pruned form of
debug info intrinsics for the test case. While this debug info doesn't
work or "do" anything useful, it lets us easily test all manner of
interference easily, and I suspect this will not be the last time we
want to craft a pattern where debug info interferes with the inliner in
a problematic way.
llvm-svn: 200609
Per the GAS documentation, .fill should permit pattern widths that
aren't a power of two. While I was in the neighborhood, I added some
sanity checking. This change was motivated by a use of this construct
in the Linux Kernel.
llvm-svn: 200606
This reverts commit r200576. It broke 32-bit self-host builds by
vectorizing two calls to @llvm.bswap.i64, which we then fail to expand.
llvm-svn: 200602
This changes the PrologueEpilogInserter and LocalStackSlotAllocation passes to
follow the extended stack layout rules for sspstrong and sspreq.
The sspstrong layout rules are:
1. Large arrays and structures containing large arrays (>= ssp-buffer-size)
are closest to the stack protector.
2. Small arrays and structures containing small arrays (< ssp-buffer-size) are
2nd closest to the protector.
3. Variables that have had their address taken are 3rd closest to the
protector.
Differential Revision: http://llvm-reviews.chandlerc.com/D2546
llvm-svn: 200601
Calls with inalloca are lowered by skipping all stores for arguments
passed in memory and the initial stack adjustment to allocate argument
memory.
Now the frontend is responsible for the memory layout, and the backend
doesn't have to do any work. As a result these changes are pretty
minimal.
Reviewers: echristo
Differential Revision: http://llvm-reviews.chandlerc.com/D2637
llvm-svn: 200596
This library will be used by clang-query. I can imagine LLDB becoming another
client of this library, so I think LLVM is a sensible place for it to live.
It wraps libedit, and adds tab completion support.
The code is loosely based on the line editor bits in LLDB, with a few
improvements:
- Polymorphism for retrieving the list of tab completions, based on
the concept pattern from the new pass manager.
- Tab completion doesn't corrupt terminal output if the input covers
multiple lines. Unfortunately this can only be done in a truly horrible
way, as far as I can tell. But since the alternative is to implement our
own line editor (which I don't think LLVM should be in the business of
doing, at least for now) I think it may be acceptable.
- Includes a fallback for the case where the user doesn't have libedit
installed.
Note that this uses C stdio, mainly because libedit also uses C stdio.
Differential Revision: http://llvm-reviews.chandlerc.com/D2200
llvm-svn: 200595
This will be used by the line editor library to derive a default path to
the history file.
Differential Revision: http://llvm-reviews.chandlerc.com/D2199
llvm-svn: 200594
Allocas marked inalloca are never static, but we were trying to put them
into the static alloca map if they were in the entry block. Also add an
assertion in x86 fastisel.
llvm-svn: 200593
To remove this one simply move the end of file logic from the asm printer to
the target mc streamer.
This removes the last call to hasRawTextSupport from lib/Target.
llvm-svn: 200590
There is nothing wrong with printing the disassembly section when printing
text. An hypothetical assembler would then produce a .o just like our
direct object emission produces.
llvm-svn: 200583
It looks like these pseudos were only used for pattern matching. Def pats are
the appropriate way to do that. As a bonus, these intrinsics will now have
memory operands folded properly, and better FMA3 variants selected where
appropriate (see r199933).
<rdar://problem/15611947>
llvm-svn: 200577
transform accordingly. Based on similar code from Loop vectorization.
Subsequent commits will include vectorization of function calls to
vector intrinsics and form function calls to vector library calls.
Patch by Raul Silvera! (Much delayed due to my not running dcommit)
llvm-svn: 200576
algorithm. Sink the 'A' + Attribute hash into each form so we don't
have to check valid forms before deciding whether or not we're going
to hash which will let the default be to return without doing anything.
llvm-svn: 200571
This ensures DWARF consumers don't confuse these references for
definitions. I'd argue it might be nice to improve debuggers so we don't
need this, but it's just one field in an abbreviation anyway - so it
doesn't seem worth the fight.
llvm-svn: 200569
MSVC always places the 'this' parameter for a method first. The
implicit 'sret' pointer for methods always comes second. We already
implement this for __thiscall by putting sret parameters on the stack,
but __cdecl methods require putting both parameters on the stack in
opposite order.
Using a special calling convention allows frontends to keep the sret
parameter first, which avoids breaking lots of assumptions in LLVM and
Clang.
Fixes PR15768 with the corresponding change in Clang.
Reviewers: ributzka, majnemer
Differential Revision: http://llvm-reviews.chandlerc.com/D2663
llvm-svn: 200561
loop vectorizer to not do so when runtime pointer checks are needed and
share code with the new (not yet enabled) load/store saturation runtime
unrolling. Also ensure that we only consider the runtime checks when the
loop hasn't already been vectorized. If it has, the runtime check cost
has already been paid.
I've fleshed out a test case to cover the scalar unrolling as well as
the vector unrolling and comment clearly why we are or aren't following
the pattern.
llvm-svn: 200530
The entry block of a function starts with all the static allocas. The change
in r195513 splits the block before those allocas, which has the effect of
turning them into dynamic allocas. That breaks all sorts of things. Change to
split after the initial allocas, and also add a comment explaining why the
block is split.
llvm-svn: 200515
when the input is a concat_vectors and the insert replaces one of the
concat halves:
Lower half: fold (insert_subvector (concat_vectors X, Y), Z) ->
(concat_vectors Z, Y)
Upper half: fold (insert_subvector (concat_vectors X, Y), Z) ->
(concat_vectors X, Z)
This can be seen with the following IR:
define <8 x float> @lower_half(<4 x float> %v1, <4 x float> %v2, <4 x
float> %v3) {
%1 = shufflevector <4 x float> %v1, <4 x float> %v2, <8 x i32> <i32
0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
%2 = tail call <8 x float> @llvm.x86.avx.vinsertf128.ps.256(<8 x
float> %1, <4 x float> %v3, i8 0)
The vinsertf128 intrinsic is converted into an insert_subvector node
in SelectionDAGBuilder.cpp.
Using AVX, without the patch this generates two vinsertf128 instructions:
vinsertf128 $1, %xmm1, %ymm0, %ymm0
vinsertf128 $0, %xmm2, %ymm0, %ymm0
With the patch this is optimized into:
vinsertf128 $1, %xmm1, %ymm2, %ymm0
Patch by Robert Lougher.
llvm-svn: 200506
When converting from "or + br" to two branches, or converting from
"and + br" to two branches, we correctly update the edge weights of
the two branches.
The previous attempt at r200431 was reverted at r200434 because of
two testing case failures. I modified my patch a little, but forgot
to re-run "make check-all".
Testing case CodeGen/ARM/lsr-unfolded-offset.ll is updated because of
the patch's impact on branch probability which causes changes in
spill placement.
llvm-svn: 200502
These should end up (in ELF) as R_X86_64_32S relocs, not R_X86_64_32.
Kill the horrid and incomplete special case and FIXME in
EncodeInstruction() and set things up so it can infer the signedness
from the ImmType just like it can the size and whether it's PC-relative.
llvm-svn: 200495
COFF has only one symbol table.
MachO has a LC_DYSYMTAB, but that is not a symbol table, just extra info about
the one symbol table (LC_SYMTAB).
IR (coming soon) also has only one table.
llvm-svn: 200488
Modern compilers (Clang 3.4, GCC 4.8) warn on variadic macros being
introduced in C99, which produces a huge number of useless diagnostics
since this macro is unused in the whole project.
llvm-svn: 200479
utohexstr provides a temporary string, making it unsafe to use with the Twine
interface which will not copy the string. Switch to using std::string.
llvm-svn: 200457
exp2 is not available on Windows. Fortunately, we are calculating powers of 2
with expontents within the range of [4,12]. Simply use an equivalent bitshift
operation to repair compilation with MSVC which does not provide this standard
function.
llvm-svn: 200454
The SWAP instruction only exists in a 32-bit variant, but the 64-bit
atomic swap can be implemented in terms of CASX, like the other atomic
rmw primitives.
llvm-svn: 200453
The .object_arch directive indicates an alternative architecture to be specified
in the object file. The directive does *not* effect the enabled feature bits
for the object file generation. This is particularly useful when the code
performs runtime detection and would like to indicate a lower architecture as
the requirements than the actual instructions used.
llvm-svn: 200451
Enhance the ARM specific parsing support in llvm-readobj to support attributes.
This allows for simpler tests to validate encoding of the build attributes as
specified in the ARM ELF specification.
llvm-svn: 200450
.movsp is an ARM unwinding directive that indicates to the unwinder that a
register contains an offset from the current stack pointer. If the offset is
unspecified, it defaults to zero.
llvm-svn: 200449
This enhances the ARMAsmParser to handle .tlsdescseq directives. This is a
slightly special relocation. We must be able to generate them, but not consume
them in assembly. The relocation is meant to assist the linker in generating a
TLS descriptor sequence. The ELF target streamer is enhanced to append
additional fixups into the current segment and that is used to emit the new
R_ARM_TLS_DESCSEQ relocations.
llvm-svn: 200448
Add support for tlsdesc relocations which are part of the ABI, marked as
experimental. These relocations permit the linker to perform TLS reference
optimizations.
llvm-svn: 200447
This adds support for TLS CALL relocations. TLS CALL relocations are used to
indicate to the linker to generate appropriate entries to resolve TLS references
via an appropriate function invocation (e.g. __tls_get_addr(PLT)).
In order to accomodate the linker relaxation of the TLS access model for the
references (GD/LD -> IE, IE -> LE), the relocation addend must be incomplete.
This requires that the partial inplace value is also incomplete (i.e. 0). We
simply avoid the offset value calculation at the time of the fixup adjustment in
the ARM assembler backend.
llvm-svn: 200446
None of the object file formats reported error on iterator increment. In
retrospect, that is not too surprising: no object format stores symbols or
sections in a linked list or other structure that requires chasing pointers.
As a consequence, all error checking can be done on begin() and end().
This reduces the text segment of bin/llvm-readobj in my machine from 521233 to
518526 bytes.
llvm-svn: 200442
When converting from "or + br" to two branches, or converting from
"and + br" to two branches, we correctly update the edge weights of
the two branches.
llvm-svn: 200431
This commit only handles IfConvertTriangle. To update edge weights
of a successor, one interface is added to MachineBasicBlock:
/// Set successor weight of a given iterator.
setSuccWeight(succ_iterator I, uint32_t weight)
An existing testing case test/CodeGen/Thumb2/v8_IT_5.ll is updated,
since we now correctly update the edge weights, the cold block
is placed at the end of the function and we jump to the cold block.
llvm-svn: 200428
module since there's no range guarantee that we could make given
output order. This also fixes up the testcases that have multiple
CUs to have the correct range offset.
llvm-svn: 200422
This is a bit imperfect, as these options don't show up in the help as
is and single dash variants are accepted, which differs from gcov.
Unfortunately, this seems to be as good as it gets with the cl::opt
machinery, so it'll do as an incremental step.
llvm-svn: 200419
This Properly capitalizes and clarifies the help output from
llvm-cov. It also puts the llvm-only / non-gcov-compatible options in
their own category.
llvm-svn: 200418
Currently, llvm-cov isn't command-line compatible with gcov, which
accepts a source file name as its first parameter and infers the gcno
and gcda file names from that. This change keeps our -gcda and -gcno
options available for convenience in overriding this behaviour, but
adds the required parameter and inference behaviour as a compatible
default.
llvm-svn: 200417
The linux kernel makes uses of a GAS `feature' which substitutes nothing
for macro arguments which aren't specified.
Proper support for these kind of macro arguments necessitated a cleanup of
differences between `GAS' and `Darwin' dialect macro processing.
Differential Revision: http://llvm-reviews.chandlerc.com/D2634
llvm-svn: 200409
This can still be overridden by explicitly setting a value requirement on the
alias option, but by default it should be the same.
PR18649
llvm-svn: 200407
preserve loop simplify of enclosing loops.
The problem here starts with LoopRotation which ends up cloning code out
of the latch into the new preheader it is buidling. This can create
a new edge from the preheader into the exit block of the loop which
breaks LoopSimplify form. The code tries to fix this by splitting the
critical edge between the latch and the exit block to get a new exit
block that only the latch dominates. This sadly isn't sufficient.
The exit block may be an exit block for multiple nested loops. When we
clone an edge from the latch of the inner loop to the new preheader
being built in the outer loop, we create an exiting edge from the outer
loop to this exit block. Despite breaking the LoopSimplify form for the
inner loop, this is fine for the outer loop. However, when we split the
edge from the inner loop to the exit block, we create a new block which
is in neither the inner nor outer loop as the new exit block. This is
a predecessor to the old exit block, and so the split itself takes the
outer loop out of LoopSimplify form. We need to split every edge
entering the exit block from inside a loop nested more deeply than the
exit block in order to preserve all of the loop simplify constraints.
Once we try to do that, a problem with splitting critical edges
surfaces. Previously, we tried a very brute force to update LoopSimplify
form by re-computing it for all exit blocks. We don't need to do this,
and doing this much will sometimes but not always overlap with the
LoopRotate bug fix. Instead, the code needs to specifically handle the
cases which can start to violate LoopSimplify -- they aren't that
common. We need to see if the destination of the split edge was a loop
exit block in simplified form for the loop of the source of the edge.
For this to be true, all the predecessors need to be in the exact same
loop as the source of the edge being split. If the dest block was
originally in this form, we have to split all of the deges back into
this loop to recover it. The old mechanism of doing this was
conservatively correct because at least *one* of the exiting blocks it
rewrote was the DestBB and so the DestBB's predecessors were fixed. But
this is a much more targeted way of doing it. Making it targeted is
important, because ballooning the set of edges touched prevents
LoopRotate from being able to split edges *it* needs to split to
preserve loop simplify in a coherent way -- the critical edge splitting
would sometimes find the other edges in need of splitting but not
others.
Many, *many* thanks for help from Nick reducing these test cases
mightily. And helping lots with the analysis here as this one was quite
tricky to track down.
llvm-svn: 200393
After all hard work to implement the EHABI and with the test-suite
passing, it's time to turn it on by default and allow users to
disable it as a work-around while we fix the eventual bugs that show
up.
This commit also remove the -arm-enable-ehabi-descriptors, since we
want the tables to be printed every time the EHABI is turned on
for non-Darwin ARM targets.
Although MCJIT EHABI is not working yet (needs linking with the right
libraries), this commit also fixes some relocations on MCJIT regarding
the EH tables/lib calls, and update some tests to avoid using EH tables
when none are needed.
The EH tests in the test-suite that were previously disabled on ARM
now pass with these changes, so a follow-up commit on the test-suite
will re-enable them.
llvm-svn: 200388
This commit seeks to do two things:
- Run the surfeit of tests under the Darwin dialect. This ends up
affecting tests which assumed that spaces could deliminate arguments.
- The GAS dialect tests should limit their surface area to things that
could plausibly work under GAS. For example, Darwin style arguments
have no business being in such a test.
llvm-svn: 200383
Otherwise, assembler (gas) fails to assemble them with error message "operation
combines symbols in different segments". This is because MC computes
pc_rel entries with subtract expression between labels from different sections.
llvm-svn: 200373
because of the inside-out run of LoopSimplify in the LoopPassManager and
the fact that LoopSimplify couldn't be "preserved" across two
independent LoopPassManagers.
Anyways, in that case, IndVars wasn't correctly preserving an LCSSA PHI
node because it thought it was rewriting (via SCEV) the incoming value
to a loop invariant value. While it may well be invariant for the
current loop, it may be rewritten in terms of an enclosing loop's
values. This in and of itself is fine, as the LCSSA PHI node in the
enclosing loop for the inner loop value we're rewriting will have its
own LCSSA PHI node if used outside of the enclosing loop. With me so
far?
Well, the current loop and the enclosing loop may share an exiting
block and exit block, and when they do they also share LCSSA PHI nodes.
In this case, its not valid to RAUW through the LCSSA PHI node.
Expected crazy test included.
llvm-svn: 200372
When estimating register pressure, don't count the induction variable mulitple
times. It is unlikely to be unrolled. This is currently disabled and hidden
behind a flag ("enable-ind-var-reg-heur").
llvm-svn: 200371
This is a bit more convenient for some callers, but more importantly, it is
easier to implement correctly. Doing this removes the patching of already
printed data that was used for fastcall, fixing a crash with private fastcall
symbols.
llvm-svn: 200367
When the scalar compare is between floating point and operands are
vector, we custom lower SELECT_CC to use NEON SIMD compare for
generating less instructions.
llvm-svn: 200365
The default value of this attribute is PTHREAD_PROCESS_PRIVATE, so
there's no point in calling pthread_mutexattr_setpshared() to set
that.
See: http://pubs.opengroup.org/onlinepubs/9699919799/functions/pthread_mutexattr_getpshared.html
This removes some ifdefs that tend to need to be extended for other
platforms (e.g. for NaCl).
Note that this call was in the first implementation of Mutex, added in
r22403, so it doesn't appear to have been added in response to a
performance problem.
Differential Revision: http://llvm-reviews.chandlerc.com/D2633
llvm-svn: 200360
I assume that the name is file_type because it is the name of a c++11 type that
we will use once we convert, but at least our current implementation can look
like llvm code.
Thanks to David Blakie for the push.
llvm-svn: 200354