The validation algorithm used an incremental approach, building each
iteration's data structures temporarily, validating them, then
adding them to a global set.
This does not scale well to having multiple sets of Root nodes, as the
set of instructions used in each iteration is the union over all
the root nodes. Therefore, refactor the logic to create a single, simple
container to which later logic then refers. This makes it simpler
control-flow wise to make the creation of the container more complex with
the addition of multiple root sets.
llvm-svn: 227499
In http://reviews.llvm.org/D6911, we allowed GVN to propagate FP equalities
to allow some simple value range optimizations. But that introduced a bug
when comparing to -0.0 or 0.0: these compare equal even though they are not
bitwise identical.
This patch disallows propagating zero constants in equality comparisons.
Fixes: http://llvm.org/bugs/show_bug.cgi?id=22376
Differential Revision: http://reviews.llvm.org/D7257
llvm-svn: 227491
Add tests for the various combines. This should
always be at least cycle neutral on all subtargets for f64,
and faster on some. For f32 we should prefer selecting
v_mad_f32 over v_fma_f32.
llvm-svn: 227484
The use of the DbgLoc in FastISel is probably something we should fix.
It's prone to leaking the wrong location into instructions - we should
have a clear chain of custody from the debug location of an IR
Instruction to that of a MachineInstr to avoid such leakage.
llvm-svn: 227481
Any code creating an MCSectionELF knows ELF and already provides the flags.
SectionKind is an abstraction used by common code that uses a plain
MCSection.
Use the flags to compute the SectionKind. This removes a lot of
guessing and boilerplate from the MCSectionELF construction.
llvm-svn: 227476
For large stack offsets the compiler generates multiple immediate mode
sub/add instructions in the prologue/epilogue. This patch makes the
compiler place the final amount to be added/subtracted into a register,
which is then added/substracted with a single operation.
Differential Revision: http://reviews.llvm.org/D7226
llvm-svn: 227458
Patch by Nemanja Ivanovic.
As was uncovered by the failing test case (when run on non-PPC
platforms), the feature set when compiling with -march=ppc64le was not
being picked up. This change ensures that if the -mcpu option is not
specified, the correct feature set is picked up regardless of whether
we are on PPC or not.
llvm-svn: 227455
reroll() was slightly monolithic and a pain to modify. Refactor
a bunch of its state from local variables to member variables
of a helper class, and do some trivial simplification while we're
there.
llvm-svn: 227439
ELF has support for sections that can be split into fixed size or
null terminated entities.
Since these sections can be split by the linker, it is not necessary
to split them in codegen.
This reduces the combined .o size in a llvm+clang build from
202,394,570 to 173,819,098 bytes.
The time for linking clang with gold (on a VM, on a laptop) goes
from 2.250089985 to 1.383001792 seconds.
The flip side is the size of rodata in clang goes from 10,926,785
to 10,929,345 bytes.
The increase seems to be because of http://sourceware.org/bugzilla/show_bug.cgi?id=17902.
llvm-svn: 227431
This has the nice secondary effect of allowing LLVM to continue to build
for targets without __thread or thread_local support to continue to work
so long as they build without support for backtraces.
llvm-svn: 227423
entirely when threads are not enabled. This should allow anyone who
needs to bootstrap or cope with a host loader without TLS support to
limp along without threading support.
There is still some bug in the PPC TLS stuff that is not worked around.
I'm getting access to a machine to reproduce and debug this further.
There is some chance that I'll have to add a terrible workaround for
PPC.
There is also some problem with iOS, but I have no ability to really
evaluate what the issue is there. I'm leaving it to folks maintaining
that platform to suggest a path forward -- personally I don't see any
useful path forward that supports threading in LLVM but does so without
support for *very basic* TLS. Note that we don't need more than some
pointers, and we don't need constructors, destructors, or any of the
other fanciness which remains widely unimplemented.
llvm-svn: 227411
If the personality is not a recognized MSVC personality function, this
pass delegates to the dwarf EH preparation pass. This chaining supports
people on *-windows-itanium or *-windows-gnu targets.
Currently this recognizes some personalities used by MSVC and turns
resume instructions into traps to avoid link errors. Even if cleanups
are not used in the source program, LLVM requires the frontend to emit a
code path that resumes unwinding after an exception. Clang does this,
and we get unreachable resume instructions. PR20300 covers cleaning up
these unreachable calls to resume.
Reviewers: majnemer
Differential Revision: http://reviews.llvm.org/D7216
llvm-svn: 227405
Patch by: Igor Laevsky <igor@azulsystems.com>
"Currently SplitBlockPredecessors generates incorrect code in case if basic block we are going to split has a landingpad. Also seems like it is fairly common case among it's users to conditionally call either SplitBlockPredecessors or SplitLandingPadPredecessors. Because of this I think it is reasonable to add this condition directly into SplitBlockPredecessors."
Differential Revision: http://reviews.llvm.org/D7157
llvm-svn: 227390
Summary: Add test targets and the lit-style runner.
Test Plan: Run the tests on bot.
Reviewers: samsonov
Reviewed By: samsonov
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D7217
llvm-svn: 227389
Sadly, this precludes optimizing it down to initial-exec or local-exec
when statically linking, and in general makes the code slower on PPC 64,
but there's nothing else for it until we can arrange to produce the
correct bits for the linker.
Lots of thanks to Ulirch for tracking this down and Bill for working on
the long-term fix to LLVM so that we can relegate this to old host
clang versions.
I'll be watching the PPC build bots to make sure this effectively
revives them.
llvm-svn: 227352
This is a refactoring to restructure the single user of performCustomLowering as a specific lowering pass and remove the custom lowering hook entirely.
Before this change, the LowerIntrinsics pass (note to self: rename!) was essentially acting as a pass manager, but without being structured in terms of passes. Instead, it proxied calls to a set of GCStrategies internally. This adds a lot of conceptual complexity (i.e. GCStrategies are stateful!) for very little benefit. Since there's been interest in keeping the ShadowStackGC working, I extracting it's custom lowering pass into a dedicated pass and just added that to the pass order. It will only run for functions which opt-in to that gc.
I wasn't able to find an easy way to preserve the runtime registration of custom lowering functionality. Given that no user of this exists that I'm aware of, I made the choice to just remove that. If someone really cares, we can look at restoring it via dynamic pass registration in the future.
Note that despite the large diff, none of the lowering code actual changes. I added the framing needed to make it a pass and rename the class, but that's it.
Differential Revision: http://reviews.llvm.org/D7218
llvm-svn: 227351
Summary:
The primary goal of this patch is to remove the need for MarkOptionsChanged(). That goal is accomplished by having addOption and removeOption properly sort the options.
This patch puts the new add and remove functionality on a CommandLineParser class that is a placeholder. Some of the functionality in this class will need to be merged into the OptionRegistry, and other bits can hopefully be in a better abstraction.
This patch also removes the RegisteredOptionList global, and the need for cl::Option objects to be linked list nodes.
The changes in CommandLineTest.cpp are required because these changes shift when we validate that options are not duplicated. Before this change duplicate options were only found during certain cl API calls (like cl::ParseCommandLine). With this change duplicate options are found during option construction.
Reviewers: dexonsmith, chandlerc, pete
Reviewed By: pete
Subscribers: pete, majnemer, llvm-commits
Differential Revision: http://reviews.llvm.org/D7132
llvm-svn: 227345
Summary:
MetadataAsValue uses a canonical format that strips the MDNode if it
contains only a single constant value. This triggers an assertion when
trying to cast the value to a MDNode.
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D7165
llvm-svn: 227319
Reduce integer multiplication by a constant of the form k*2^c, where k is in {3,5,9} into a lea + shl. Previously it was only done for imulq on 64-bit platforms, but it makes sense for imull and 32-bit as well.
Differential Revision: http://reviews.llvm.org/D7196
llvm-svn: 227308
This includes two things:
1) Fix TCRETURNdi and TCRETURN64di patterns to check the right thing (LP64 as opposed to target bitness).
2) Allow LEA64_32 in MatchingStackOffset.
llvm-svn: 227307
By Asaf Badouh and Elena Demikhovsky
Added special nodes for rounding: FMADD_RND, FMSUB_RND..
It will prevent merge between nodes with rounding and other standard nodes.
llvm-svn: 227303
tracing code.
Managed static was just insane overhead for this. We took memory fences
and external function calls in every path that pushed a pretty stack
frame. This includes a multitude of layers setting up and tearing down
passes, the parser in Clang, everywhere. For the regression test suite
or low-overhead JITs, this was contributing to really significant
overhead.
Even the LLVM ThreadLocal is really overkill here because it uses
pthread_{set,get}_specific logic, and has careful code to both allocate
and delete the thread local data. We don't actually want any of that,
and this code in particular has problems coping with deallocation. What
we want is a single TLS pointer that is valid to use during global
construction and during global destruction, any time we want. That is
exactly what every host compiler and OS we use has implemented for
a long time, and what was standardized in C++11. Even though not all of
our host compilers support the thread_local keyword, we can directly use
the platform-specific keywords to get the minimal functionality needed.
Provided this limited trial survives the build bots, I will move this to
Compiler.h so it is more widely available as a light weight if limited
alternative to the ThreadLocal class. Many thanks to David Majnemer for
helping me think through the implications across platforms and craft the
MSVC-compatible syntax.
The end result is *substantially* faster. When running llc in a tight
loop over a small IR file targeting the aarch64 backend, this improves
its performance by over 10% for me. It also seems likely to fix the
remaining regressions seen by JIT users with threading enabled.
This may actually have more impact on real-world compile times due to
the use of the pretty stack tracing utility throughout the rest of Clang
or LLVM, but I've not collected any detailed measurements.
llvm-svn: 227300
querying of the pass registry.
The pass manager relies on the static registry of PassInfo objects to
perform all manner of its functionality. I don't understand why it does
much of this. My very vague understanding is that this registry is
touched both during static initialization *and* while each pass is being
constructed. As a consequence it is hard to make accessing it not
require a acquiring some lock. This lock ends up in the hot path of
setting up, tearing down, and invaliditing analyses in the legacy pass
manager.
On most systems you can observe this as a non-trivial % of the time
spent in 'ninja check-llvm'. However, I haven't really seen it be more
than 1% in extreme cases of compiling more real-world software,
including LTO.
Unfortunately, some of the GPU JITs are seeing this taking essentially
all of their time because they have very small IR running through
a small pass pipeline very many times (at least, this is the vague
understanding I have of it).
This patch tries to minimize the cost of looking up PassInfo objects by
leveraging the fact that the objects themselves are immutable and they
are allocated separately on the heap and so don't have their address
change. It also requires a change I made the last time I tried to debug
this problem which removed the ability to de-register a pass from the
registry. This patch creates a single access path to these objects
inside the PMTopLevelManager which memoizes the result of querying the
registry. This is somewhat gross as I don't really know if
PMTopLevelManager is the *right* place to put it, and I dislike using
a mutable member to memoize things, but it seems to work.
For long-lived pass managers this should completely eliminate
the cost of acquiring locks to look into the pass registry once the
memoized cache is warm. For 'ninja check' I measured about 1.5%
reduction in CPU time and in total time on a machine with 32 hardware
threads. For normal compilation, I don't know how much this will help,
sadly. We will still pay the cost while we populate the memoized cache.
I don't think it will hurt though, and for LTO or compiles with many
small functions it should still be a win. However, for tight loops
around a pass manager with many passes and small modules, this will help
tremendously. On the AArch64 backend I saw nearly 50% reductions in time
to complete 2000 cycles of spinning up and tearing down the pipeline.
Measurements from Owen of an actual long-lived pass manager show more
along the lines of 10% improvements.
Differential Revision: http://reviews.llvm.org/D7213
llvm-svn: 227299
This patch folds fcmp in some cases of interest in Julia. The patch adds a function CannotBeOrderedLessThanZero that returns true if a value is provably not less than zero. I.e. the function returns true if the value is provably -0, +0, positive, or a NaN. The patch extends InstructionSimplify.cpp to fold instances of fcmp where:
- the predicate is olt or uge
- the first operand is provably not less than zero
- the second operand is zero
The motivation for handling these cases optimizing away domain checks for sqrt in Julia for common idioms such as sqrt(x*x+y*y)..
http://reviews.llvm.org/D6972
llvm-svn: 227298
abomination.
For starters, this API is incredibly slow. In order to lookup the name
of a pass it must take a memory fence to acquire a pointer to the
managed static pass registry, and then potentially acquire locks while
it consults this registry for information about what passes exist by
that name. This stops the world of LLVMs in your process no matter
how little they cared about the result.
To make this more joyful, you'll note that we are preserving many passes
which *do not exist* any more, or are not even analyses which one might
wish to have be preserved. This means we do all the work only to say
"nope" with no error to the user.
String-based APIs are a *bad idea*. String-based APIs that cannot
produce any meaningful error are an even worse idea. =/
I have a patch that simply removes this API completely, but I'm hesitant
to commit it as I don't really want to perniciously break out-of-tree
users of the old pass manager. I'd rather they just have to migrate to
the new one at some point. If others disagree and would like me to kill
it with fire, just say the word. =]
llvm-svn: 227294
This has wider implications than I expected when I reviewed the patch: It can
cause JIT crashes where clients have used the default value for AbortOnFailure
during symbol lookup. I'm currently investigating alternative approaches and I
hope to have this back in tree soon.
llvm-svn: 227287
Summary:
Also add enum types for __C_specific_handler and _CxxFrameHandler3 for
which we know a few things.
Reviewers: majnemer
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D7214
llvm-svn: 227284
This commit creates infinite loop in DAG combine for in the LLVM test-suite
for aarch64 with mcpu=cylcone (just having neon may be enough to expose this).
llvm-svn: 227272
Use __clear_cache builtin instead of cacheflush() in
Unix Memory::InvalidateInstructionCache().
Differential Revision: http://reviews.llvm.org/D7198
llvm-svn: 227269
COMDATs must be identically named to the symbol. When support for COMDATs was
introduced, the symbol rewriter was not updated, resulting in rewriting failing
for symbols which were placed into COMDATs. This corrects the behaviour and
adds test cases for this.
llvm-svn: 227261
Summary:
A simple genetic in-process coverage-guided fuzz testing library.
I've used this fuzzer to test clang-format
(it found 12+ bugs, thanks djasper@ for the fixes!)
and it may also help us test other parts of LLVM.
So why not keep it in the LLVM repository?
I plan to add the cmake build rules later (in a separate patch, if that's ok)
and also add a clang-format-fuzzer target.
See README.txt for details.
Test Plan: Tests will follow separately.
Reviewers: djasper, chandlerc, rnk
Reviewed By: rnk
Subscribers: majnemer, ygribov, dblaikie, llvm-commits
Differential Revision: http://reviews.llvm.org/D7184
llvm-svn: 227252