-debug-pass is not specified, as the string is only used when dumping pass
information. There is a big cost of determining the name in
ReginBase<Tr>:getNameStr() if the region's entry or exit block doesn't have a
name. This is the case for the Release build, as names are not preserved by the
front-end.
RegionPass is mainly used by Polly, resulting in long compile time for one file
of a customer application with the Release build (1m24s) vs Release+Asserts
build (10s) when Polly is used. With this change, the compile time with the
Release build went down to 8s.
Patch by Sanjin Sijaric <ssijaric@codeaurora.org>!
Phabricator: http://reviews.llvm.org/D8076
llvm-svn: 231485
Multiplication is not dependent on signedness, so just treating
all input ranges as unsigned is not incorrect. However it will cause
overly pessimistic ranges (such as full-set) when used with signed
negative values.
Teach multiply to try to interpret its inputs as both signed and
unsigned, and then to take the most specific (smallest population)
as its result.
llvm-svn: 231483
Add MachO 32-bit (i.e. arm and x86) support for replacing global GOT equivalent
symbol accesses. Unlike 64-bit targets, there's no GOTPCREL relocation, and
access through a non_lazy_symbol_pointers section is used instead.
-- before
_extgotequiv:
.long _extfoo
_delta:
.long _extgotequiv-_delta
-- after
_delta:
.long L_extfoo$non_lazy_ptr-_delta
.section __IMPORT,__pointers,non_lazy_symbol_pointers
L_extfoo$non_lazy_ptr:
.indirect_symbol _extfoo
.long 0
llvm-svn: 231475
Follow up r230264 and add ARM64 support for replacing global GOT
equivalent symbol accesses by references to the GOT entry for the final
symbol instead, example:
-- before
.globl _foo
_foo:
.long 42
.globl _gotequivalent
_gotequivalent:
.quad _foo
.globl _delta
_delta:
.long _gotequivalent-_delta
-- after
.globl _foo
_foo:
.long 42
.globl _delta
Ltmp3:
.long _foo@GOT-Ltmp3
llvm-svn: 231474
Summary:
None of the .set directives can be used before the .module directives. The .set mips0/pop/push were not triggering this constraint.
Also added testing for all the other implemented directives which are supposed to trigger this constraint.
Reviewers: dsanders
Reviewed By: dsanders
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D7140
llvm-svn: 231465
Specifically this:
* Prevents an "unused" warning in non-assert builds.
* In that error case return with out removing a child loop instead of
looping forever.
llvm-svn: 231459
This pass interchanges loops to provide a more cache-friendly memory access.
For e.g. given a loop like -
for(int i=0;i<N;i++)
for(int j=0;j<N;j++)
A[j][i] = A[j][i]+B[j][i];
is interchanged to -
for(int j=0;j<N;j++)
for(int i=0;i<N;i++)
A[j][i] = A[j][i]+B[j][i];
This pass is currently disabled by default.
To give a brief introduction it consists of 3 stages-
LoopInterchangeLegality : Checks the legality of loop interchange based on Dependency matrix.
LoopInterchangeProfitability: A very basic heuristic has been added to check for profitibility. This will evolve over time.
LoopInterchangeTransform : Which does the actual transform.
LNT Performance tests shows improvement in Polybench/linear-algebra/kernels/mvt and Polybench/linear-algebra/kernels/gemver becnmarks.
TODO:
1) Add support for reductions and lcssa phi.
2) Improve profitability model.
3) Improve loop selection algorithm to select best loop for interchange. Currently the innermost loop is selected for interchange.
4) Improve compile time regression found in llvm lnt due to this pass.
5) Fix issues in Dependency Analysis module.
A special thanks to Hal for reviewing this code.
Review: http://reviews.llvm.org/D7499
llvm-svn: 231458
We supported forming IMGREL relocations from ConstantExprs involving
__ImageBase if the minuend was a GlobalVariable. Extend this
functionality to all GlobalObjects.
llvm-svn: 231456
We extend an underlying file before mmap'ing it, but it's not needed
on Windows. Extending file is slow on Windows, so we should avoid doing that.
The difference gets larger as the size of an output file gets larger.
It shove off 2 seconds out of 25 seconds when linking chrome.dll with LLD,
for example.
llvm-svn: 231452
We would set the body of a struct type (therefore making it non-opaque)
but were forgetting to move it to the non-opaque set.
Fixes pr22807.
llvm-svn: 231442
These refactored computations check whether or not we are at a stage
of the sequence where we can perform a match. This patch moves the
computation out of the main dataflow and into
{BottomUp,TopDown}PtrState.
llvm-svn: 231439
This initialization occurs when we see a new retain or release. Before
we performed the actual initialization inline in the dataflow. That is
just messy.
llvm-svn: 231438
This will enable the main ObjCARCOpts dataflow to work with higher
level concepts such as "can this ptr state be modified by this ref
count" and not need to understand the nitty gritty details of how that
is determined. This makes the dataflow cleaner.
llvm-svn: 231437
Fixes llgo following Duncan's changes to debug info in r231082. llgo needs
to replace composite types, which are no longer represented using MDTuple.
llvm-svn: 231416
At this point, we should have decent coverage of the involved code. I've got a few more test cases to cleanup and submit, but what's here is already reasonable.
I've got a collection of liveness tests which will be posted for review along with a decent liveness algorithm in the next few days. Once those are in, the code in this file should be well tested and I can start renaming things without risk of serious breakage.
llvm-svn: 231414
This patch reduces code size for all AVX targets and increases speed for some chips.
SSE 4.1 introduced the useless (see code comments) 2-register form of BLENDV and
only in the packed float/double flavors.
AVX subsequently made the instruction useful by adding a 4-register operand form.
So we just need to paper over the lack of scalar forms of this instruction, complicate
the code to choose float or double forms, and use blendv on scalars since all FP is in
xmm registers anyway.
This gives us an approximately 50% speed up for a blendv microbenchmark sequence
on SandyBridge and Haswell:
blendv : 29.73 cycles/iter
logic : 43.15 cycles/iter
No new test cases with this patch because:
1. fast-isel-select-sse.ll tests the positive side for regular X86 lowering and fast-isel
2. sse-minmax.ll and fp-select-cmp-and.ll confirm that we're not firing for scalar selects without AVX
3. fp-select-cmp-and.ll and logical-load-fold.ll confirm that we're not firing for scalar selects with constants.
http://llvm.org/bugs/show_bug.cgi?id=22483
Differential Revision: http://reviews.llvm.org/D8063
llvm-svn: 231408
This is what all the targets check for and is consistent with the
initialized value of MissingFeatures, which is sometimes assinged
to ErrorInfo.
llvm-svn: 231397
This commit enables forming vector extloads for ARM.
It only does so for legal types, and when we can't fold the extension
in a wide/long form of the user instruction.
Enabling it for larger types isn't as good an idea on ARM as it is on
X86, because:
- we pretend that extloads are legal, but end up generating vld+vmov
- we have instructions like vld {dN, dM}, which can't be generated
when we "manually expand" extloads to vld+vmov.
For legal types, the combine doesn't fire that often: in the
integration tests only in a big endian testcase, where it removes a
pointless AND.
Related to rdar://19723053
Differential Revision: http://reviews.llvm.org/D7423
llvm-svn: 231396
This will be followed by a change on the clang side to update
the only user of this function with the new version.
Differential Revision: http://reviews.llvm.org/D8074
Reviewed By: Reid Kleckner
llvm-svn: 231392
The first element of STACKFRAME64 is a struct and Clang wants us to put
braces around it's initialization. Instead, drop the zero. The result
should be the same.
llvm-svn: 231387
llvm::sys::PrintBacktrace(FILE*) is supposed to print a backtrace
of the current thread given the current PC. This function was
unimplemented on Windows, and instead the only time we could
print a backtrace was as the result of an exception through
LLVMUnhandledExceptionFilter.
This patch implements backtracing of self by using
RtlCaptureContext to get a CONTEXT for the current thread, and
moving the printing and StackWalk64 code to a common method that
printing own stack trace and printing stack trace of an exception
can use.
Differential Revision: http://reviews.llvm.org/D8068
Reviewed by: Reid Kleckner
llvm-svn: 231382
Currently shuffles may only be combined if they are of the same type, despite the fact that bitcasts are often introduced in between shuffle nodes (e.g. x86 shuffle type widening).
This patch allows a single input shuffle to peek through bitcasts and if the input is another shuffle will merge them, shuffling using the smallest sized type, and re-applying the bitcasts at the inputs and output instead.
Dropped old ShuffleToZext test - this patch removes the use of the zext and vector-zext.ll covers these anyhow.
Differential Revision: http://reviews.llvm.org/D7939
llvm-svn: 231380
Added lowering for ISD::CONCAT_VECTORS and ISD::INSERT_SUBVECTOR for i1 vectors,
it is needed to pass all masked_memop.ll tests for SKX.
llvm-svn: 231371
Also it extracts getCopyFromRegs helper function in SelectionDAGBuilder as we need to be able to customize type of the register exported from basic block during lowering of the gc.result.
llvm-svn: 231366
Build time (user time) for building llvm+clang+lldb in release mode:
- default allocator: 9086 seconds
- with PBQP: 9126 seconds
- with PBQP + local bit matrix cache: 9097 seconds
llvm-svn: 231360
isNormalFp and isFiniteNonZeroFp should not assume vector operands can not be constant expressions.
Patch by Pawel Jurek <pawel.jurek@intel.com>
Differential Revision: http://reviews.llvm.org/D8053
llvm-svn: 231359
This should help with the AVX512 masked gather changes Elena is working on. This patch is derived from some of the changes Elena made to tablegen, but modified by me to support arbitrary number of results.
llvm-svn: 231357
These came from my own experience and may not apply equally to all use cases. Any alternate perspective anyone has should be used to refine these.
As always, grammar and spelling adjustments are more than welcome. Please just directly commit a fix if you see something problematic.
llvm-svn: 231352
already been added and the inconsistency made choosing names and
changing code more annoying. Plus, wow are they better for this code!
llvm-svn: 231347
result reasonable.
This code predated clang-format and so there was a reasonable amount of
crufty formatting that had accumulated. This should ensure that neither
myself nor others end up with formatting-only changes sneaking into
other fixes.
llvm-svn: 231341
just arbitrarily interleaving unrelated control flows once they get
moved "out-of-line" (both outside of natural CFG ordering and with
diamonds that cannot be fully laid out by chaining fallthrough edges).
This easy solution doesn't work in practice, and it isn't just a small
bug. It looks like a very different strategy will be required. I'm
working on that now, and it'll again go behind some flag so that
everyone can experiment and make sure it is working well for them.
llvm-svn: 231332
Improve test robustness in preparation of coming commits:
- Avoid undefs which may get propagated too much.
- Remove several pointless add 0, instructions
llvm-svn: 231307
Summary:
rL225282 introduced an ad-hoc way to promote some additions to nuw or
nsw. Since then SCEV has become smarter in directly proving no-wrap;
and using the canonical "ext(A op B) == ext(A) op ext(B)" method of
proving no-wrap is just as powerful now. Rip out the existing
complexity in favor of getting SCEV to do all the heaving lifting
internally.
This change does not add any unit tests because it is supposed to be a
non-functional change. Tests added in rL225282 and rL226075 are valid
tests for this change.
Reviewers: atrick, majnemer
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D7981
llvm-svn: 231306
Summary:
Teach SCEV to prove no overflow for an add recurrence by proving
something about the range of another add recurrence a loop-invariant
distance away from it.
Reviewers: atrick, hfinkel
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D7980
llvm-svn: 231305
This commit adds code to emit DIE trees that have been pruned from the
parts that haven't been marked as kept in the previous pass.
It works by 'cloning' the input DIE tree (as read by libDebugInfoDwarf)
into a tree of DIE objects. Cloning the DIEs means essentially cloning
their attributes. The code in this commit does only handle scalar and
block attributes (scalar because they are trivial, blocks because they
can't be easily replaced by a scalr placeholder), all the other ones
are replaced by placeholder zero values and will be handled in
further commits.
The added tests mostly check that the DIE tree has the correct layout and
also verify that a few chosen scalar and block attributes correctly make
their way into the output.
llvm-svn: 231300
To be used/tested by llvm-dsymutil. (llvm-dsymutil does a 'static' link,
no need for relocations for most things, so it'll just emit raw integers
for most attributes)
llvm-svn: 231298
Here's a rough/first draft - it at least hits the actual textual IR
examples and some of the phrasing. It's probably worth a full pass over,
but I'm not sure how much these docs should reflect the strange
intermediate state we're in anyway.
Totally open to lots of review/feedback/suggestions.
llvm-svn: 231294
When calling lock() after all passes are registered, the PassRegistry doesn't need a mutex anymore to look up passes.
This speeds up multithreaded llvm execution by ~5% (tested with 4 threads).
In an asserts build of llvm this has an even bigger impact.
Note that it's not required to use the lock function.
llvm-svn: 231276
Reverted in r231254 due to a self-hosting crash of Clang (see Clang
PR22793). Workaround the crash by using {} instead of = default to
define a dtor.
llvm-svn: 231274
The issue was that we were always printing the remarks. Fix that and add a test
showing that it prints nothing if -pass-remarks is not given.
Original message:
Correctly handle -pass-remarks in the gold plugin.
llvm-svn: 231273
Summary:
DataLayout keeps the string used for its creation.
As a side effect it is no longer needed in the Module.
This is "almost" NFC, the string is no longer
canonicalized, you can't rely on two "equals" DataLayout
having the same string returned by getStringRepresentation().
Get rid of DataLayoutPass: the DataLayout is in the Module
The DataLayout is "per-module", let's enforce this by not
duplicating it more than necessary.
One more step toward non-optionality of the DataLayout in the
module.
Make DataLayout Non-Optional in the Module
Module->getDataLayout() will never returns nullptr anymore.
Reviewers: echristo
Subscribers: resistor, llvm-commits, jholewinski
Differential Revision: http://reviews.llvm.org/D7992
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 231270
This reverts r231200 and r231204. The second one added an explicit move
ctor for MSVC.
This change broke the clang-cl self-host due to weirdness in MSVC's
implementation of std::map::insert. Somehow we lost our rvalue ref-ness
when going through variadic placement new:
template <class _Objty, class... _Types>
void construct(_Objty *_Ptr,
_Types &&... _Args) { // construct _Objty(_Types...) at _Ptr
::new ((void *)_Ptr) _Objty(_STD forward<_Types>(_Args)...);
}
For some reason, Clang decided to call the deleted std::pair copy
constructor at this point. Needs further investigation, once I can
build.
llvm-svn: 231269
Summary:
In PNaCl, most atomic instructions have their own @llvm.nacl.atomic.* function, each one, with a few exceptions, represents a consistent behaviour across all NaCl-supported targets. Unfortunately, the atomic RMW operations nand, [u]min, and [u]max aren't directly represented by any such @llvm.nacl.atomic.* function. This patch refines shouldExpandAtomicRMWInIR in TargetLowering so that a future `Le32TargetLowering` class can selectively inform the caller how the target desires the atomic RMW instruction to be expanded (ie via load-linked/store-conditional for ARM/AArch64, via cmpxchg for X86/others?, or not at all for Mips) if at all.
This does not represent a behavioural change and as such no tests were added.
Patch by: Richard Diamond.
Reviewers: jfb
Reviewed By: jfb
Subscribers: jfb, aemerson, t.p.northover, llvm-commits
Differential Revision: http://reviews.llvm.org/D7713
llvm-svn: 231250
This "itinerary class map" in PPCSchedule.td is incomplete and
redundant with the actual code. As it provides no value, we've
decided to remove it.
No functional change.
llvm-svn: 231246
The target-independent selection algorithm in FastISel already knows how
to select a SINT_TO_FP if the target is SSE but not AVX.
On targets that have SSE but not AVX, the tablegen'd 'fastEmit' functions
for ISD::SINT_TO_FP know how to select instruction X86::CVTSI2SSrr
(for an i32 to f32 conversion) and X86::CVTSI2SDrr (for an i32 to f64
conversion).
This patch simplifies the logic in method X86SelectSIToFP knowing that
the code would not be reachable if the subtarget doesn't have AVX.
No functional change intended.
llvm-svn: 231243
Do not instrument direct accesses to stack variables that can be
proven to be inbounds, e.g. accesses to fields of structs on stack.
But it eliminates 33% of instrumentation on webrtc/modules_unittests
(number of memory accesses goes down from 290152 to 193998) and
reduces binary size by 15% (from 74M to 64M) and improved compilation time by 6-12%.
The optimization is guarded by asan-opt-stack flag that is off by default.
http://reviews.llvm.org/D7583
llvm-svn: 231241
Summary:
Use more reasonable names for these pseudo-instructions.
As there's only one definition tied to any one of these classes, I named them with abbreviated versions of their respective class' name.
Reviewers: dsanders
Reviewed By: dsanders
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D7831
llvm-svn: 231240
Summary:
Move the "Filler" parameter to the end of the parameter list as it is,
conceptually, the only output parameter of that function.
Reviewers: dsanders
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D7726
llvm-svn: 231239
a flag for now.
First off, thanks to Daniel Jasper for really pointing out the issue
here. It's been here forever (at least, I think it was there when
I first wrote this code) without getting really noticed or fixed.
The key problem is what happens when two reasonably common patterns
happen at the same time: we outline multiple cold regions of code, and
those regions in turn have diamonds or other CFGs for which we can't
just topologically lay them out. Consider some C code that looks like:
if (a1()) { if (b1()) c1(); else d1(); f1(); }
if (a2()) { if (b2()) c2(); else d2(); f2(); }
done();
Now consider the case where a1() and a2() are unlikely to be true. In
that case, we might lay out the first part of the function like:
a1, a2, done;
And then we will be out of successors in which to build the chain. We go
to find the best block to continue the chain with, which is perfectly
reasonable here, and find "b1" let's say. Laying out successors gets us
to:
a1, a2, done; b1, c1;
At this point, we will refuse to lay out the successor to c1 (f1)
because there are still un-placed predecessors of f1 and we want to try
to preserve the CFG structure. So we go get the next best block, d1.
... wait for it ...
Except that the next best block *isn't* d1. It is b2! d1 is waaay down
inside these conditionals. It is much less important than b2. Except
that this is exactly what we didn't want. If we keep going we get the
entire set of the rest of the CFG *interleaved*!!!
a1, a2, done; b1, c1; b2, c2; d1, f1; d2, f2;
So we clearly need a better strategy here. =] My current favorite
strategy is to actually try to place the block whose predecessor is
closest. This very simply ensures that we unwind these kinds of CFGs the
way that is natural and fitting, and should minimize the number of cache
lines instructions are spread across.
It also happens to be *dead simple*. It's like the datastructure was
specifically set up for this use case or something. We only push blocks
onto the work list when the last predecessor for them is placed into the
chain. So the back of the worklist *is* the nearest next block.
Unfortunately, a change like this is going to cause *soooo* many
benchmarks to swing wildly. So for now I'm adding this under a flag so
that we and others can validate that this is fixing the problems
described, that it seems possible to enable, and hopefully that it fixes
more of our problems long term.
llvm-svn: 231238
This commit fixes a bug introduced in r230956 where we were creating
CMovFP_{T,F} nodes with multiple return value types (one for each operand).
With this change the return value type of the new node is the same as the
value type of the True/False operands of the original node.
llvm-svn: 231237
In a CFG with the edges A->B->C and A->C, B is an optional branch.
LLVM's default behavior is to lay the blocks out naturally, i.e. A, B,
C, in order to improve code locality and fallthroughs. However, if a
function contains many of those optional branches only a few of which
are taken, this leads to a lot of unnecessary icache misses. Moving B
out of line can work around this.
Review: http://reviews.llvm.org/D7719
llvm-svn: 231230
As is described at http://llvm.org/bugs/show_bug.cgi?id=22408, the GNU linkers
ld.bfd and ld.gold currently only support a subset of the whole range of AArch64
ELF TLS relocations. Furthermore, they assume that some of the code sequences to
access thread-local variables are produced in a very specific sequence.
When the sequence is not as the linker expects, it can silently mis-relaxe/mis-optimize
the instructions.
Even if that wouldn't be the case, it's good to produce the exact sequence,
as that ensures that linkers can perform optimizing relaxations.
This patch:
* implements support for 16MiB TLS area size instead of 4GiB TLS area size. Ideally clang
would grow an -mtls-size option to allow support for both, but that's not part of this patch.
* by default doesn't produce local dynamic access patterns, as even modern ld.bfd and ld.gold
linkers do not support the associated relocations. An option (-aarch64-elf-ldtls-generation)
is added to enable generation of local dynamic code sequence, but is off by default.
* makes sure that the exact expected code sequence for local dynamic and general dynamic
accesses is produced, by making use of a new pseudo instruction. The patch also removes
two (AArch64ISD::TLSDESC_BLR, AArch64ISD::TLSDESC_CALL) pre-existing AArch64-specific pseudo
SDNode instructions that are superseded by the new one (TLSDESC_CALLSEQ).
llvm-svn: 231227
Asserting that the source and destination iterators are from the same
region is unnecessary - there's no reason to disallow reassignment from
any regions, so long as they aren't compared.
llvm-svn: 231224
When trying to convert a BUILD_VECTOR into a shuffle, we try to split a single source vector that is twice as wide as the destination vector.
We can not do this when we also need the zero vector to create a blend.
This fixes PR22774.
Differential Revision: http://reviews.llvm.org/D8040
llvm-svn: 231219
Since OptionValue (& its base classes) have user-declared dtors, use of
the implicit copy ctor/assignment operator is deprecated in C++11.
Provide them explicitly (defaulted) to avoid depending on this
deprecated feature.
llvm-svn: 231218
These objects are never polymorphically owned, so there's no need for
virtual dtors - just make the dtor protected in the base classes, and
make the derived classes final.
llvm-svn: 231217
This will now display enum definitions both at the global
scope as well as nested inside of classes. Additionally,
it will no longer display enums at the global scope if the
enum is nested. Instead, it will omit the definition of
the enum globally and instead emit it in the corresponding
class definition.
llvm-svn: 231215
(They are called emitDwarfDIE and emitDwarfAbbrevs in their new home)
llvm-dsymutil wants to reuse that code, but it doesn't have a DwarfUnit or
a DwarfDebug object to call those. It has access to an AsmPrinter though.
Having emitDIE in the AsmPrinter also removes the DwarfFile dependency
on DwarfDebug, and thus the patch drops that field.
Differential Revision: http://reviews.llvm.org/D8024
llvm-svn: 231210
GCC 4.7's libstdc++ doesn't have std::map::emplace, but it does have
std::unordered_map::emplace, and the use case here doesn't appear to
need ordering. The container has been changed in a separate/precursor
patch, and now this patch should hopefully build cleanly even with
GCC 4.7.
& then I realized the order of the container did matter, so extra
handling of ordering was added in r231189.
Original commit message:
This makes LiveRange non-copyable, and LiveInterval is already
non-movable (due to the explicit dtor), so now it's non-copyable and
non-movable.
Fix the one case where we were relying on the (deprecated in C++11)
implicit copy ctor of LiveInterval (which happened to work because the
ctor created an object with a null segmentSet, so double-deleting the
null pointer was fine).
llvm-svn: 231192
The order of this container was needed at one point - so, at that point
create a temporary array of pointers, sort those, then iterate them.
This keeps lookup efficient (& the lesser issue, of allowing the use of
emplace... ), object identity preserved, and ordered iteration in the
one place that requires it.
While this has no functional change, I realize it does mean allocating
an extra data structure and performing a sort - so if this looks suspect
to anyone regarding perf characteristics, I'm all ears.
llvm-svn: 231189
There is a known bug where the register coalescer fails to merge
subranges when multiple ranges end up in the "overflow" bit 32 of the
lanemasks. A proper fix for this is complicated so for now this is a
workaround which lets the register coalescer drop the subregister
liveness information (we just loose some precision by that) and
continue.
llvm-svn: 231186
Apparently something does care about ordering of LiveIntervals... so
revert all that stuff (r231175, r231176, r231177) & take some time to
re-evaluate.
llvm-svn: 231184
RewriteStatepointsForGC pass emits an alloca for each GC pointer which will be relocated. It then inserts stores after def and all relocations, and inserts loads before each use as well. In the end, mem2reg is used to update IR with relocations in SSA form.
However, there is a problem with inserting stores for values defined by invoke instructions. The code didn't expect a def was a terminator instruction, and inserting instructions after these terminators resulted in malformed IR.
This patch fixes this problem by handling invoke instructions as a special case. If the def is an invoke instruction, the store will be inserted at the beginning of the normal destination block. Since return value from invoke instruction does not dominate the unwind destination block, no action is needed there.
Patch by: Chen Li
Differential Revision: http://reviews.llvm.org/D7923
llvm-svn: 231183
The intrinsic is no longer generated by the front-end. Remove the intrinsic and
auto-upgrade it to a vector shuffle.
Reviewed by Nadav
This is related to rdar://problem/18742778.
llvm-svn: 231182
GCC 4.7's libstdc++ doesn't have std::map::emplace, but it does have
std::unordered_map::emplace, and the use case here doesn't appear to
need ordering. The container has been changed in a separate/precursor
patch, and now this patch should hopefully build cleanly even with
GCC 4.7.
Original commit message:
This makes LiveRange non-copyable, and LiveInterval is already
non-movable (due to the explicit dtor), so now it's non-copyable and
non-movable.
Fix the one case where we were relying on the (deprecated in C++11)
implicit copy ctor of LiveInterval (which happened to work because the
ctor created an object with a null segmentSet, so double-deleting the
null pointer was fine).
llvm-svn: 231176
This use case doesn't appear to benefit from ordering, and
std::unordered_map has the advantage that it supports emplace (the
LiveInterval values really shouldn't be copyable or movable & they won't
be in a near-future patch).
llvm-svn: 231175
Summary:
This makes it more obvious that the enum definition and the
"StandardName" array is in sync. Mechanically refactored w/ a
python script.
Test Plan: still compiles
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D7845
llvm-svn: 231172
This makes LiveRange non-copyable, and LiveInterval is already
non-movable (due to the explicit dtor), so now it's non-copyable and
non-movable.
Fix the one case where we were relying on the (deprecated in C++11)
implicit copy ctor of LiveInterval (which happened to work because the
ctor created an object with a null segmentSet, so double-deleting the
null pointer was fine).
llvm-svn: 231168
Introduce -mllvm -sanitizer-coverage-8bit-counters=1
which adds imprecise thread-unfriendly 8-bit coverage counters.
The run-time library maps these 8-bit counters to 8-bit bitsets in the same way
AFL (http://lcamtuf.coredump.cx/afl/technical_details.txt) does:
counter values are divided into 8 ranges and based on the counter
value one of the bits in the bitset is set.
The AFL ranges are used here: 1, 2, 3, 4-7, 8-15, 16-31, 32-127, 128+.
These counters provide a search heuristic for single-threaded
coverage-guided fuzzers, we do not expect them to be useful for other purposes.
Depending on the value of -fsanitize-coverage=[123] flag,
these counters will be added to the function entry blocks (=1),
every basic block (=2), or every edge (=3).
Use these counters as an optional search heuristic in the Fuzzer library.
Add a test where this heuristic is critical.
llvm-svn: 231166
Ultimately, we'll need to leave something behind to indicate which
alloca will hold the exception, but we can figure that out when it comes
time to emit the __CxxFrameHandler3 catch handler table.
llvm-svn: 231164
Selection conditions may be vectors or scalars. Make sure InstCombine
doesn't indiscriminately assume that a select which is value dependent
on another select have identical select condition types.
This fixes PR22773.
llvm-svn: 231156
From:
int M, total;
void foo() {
int i;
for (i = 0; i < M; i++) {
total = total + i / 2;
}
}
This is the kernel loop:
.LBB0_2: # %for.body
=>This Inner Loop Header: Depth=1
movl %edx, %esi
movl %ecx, %edx
shrl $31, %edx
addl %ecx, %edx
sarl %edx
addl %esi, %edx
incl %ecx
cmpl %eax, %ecx
jl .LBB0_2
--------------------------
The first mov insn "movl %edx, %esi" could be removed if we change "addl %esi, %edx" to "addl %edx, %esi".
The IR before TwoAddressInstructionPass is:
BB#2: derived from LLVM BB %for.body
Predecessors according to CFG: BB#1 BB#2
%vreg3<def> = COPY %vreg12<kill>; GR32:%vreg3,%vreg12
%vreg2<def> = COPY %vreg11<kill>; GR32:%vreg2,%vreg11
%vreg7<def,tied1> = SHR32ri %vreg3<tied0>, 31, %EFLAGS<imp-def,dead>; GR32:%vreg7,%vreg3
%vreg8<def,tied1> = ADD32rr %vreg3<tied0>, %vreg7<kill>, %EFLAGS<imp-def,dead>; GR32:%vreg8,%vreg3,%vreg7
%vreg9<def,tied1> = SAR32r1 %vreg8<kill,tied0>, %EFLAGS<imp-def,dead>; GR32:%vreg9,%vreg8
%vreg4<def,tied1> = ADD32rr %vreg9<kill,tied0>, %vreg2<kill>, %EFLAGS<imp-def,dead>; GR32:%vreg4,%vreg9,%vreg2
%vreg5<def,tied1> = INC64_32r %vreg3<kill,tied0>, %EFLAGS<imp-def,dead>; GR32:%vreg5,%vreg3
CMP32rr %vreg5, %vreg0, %EFLAGS<imp-def>; GR32:%vreg5,%vreg0
%vreg11<def> = COPY %vreg4; GR32:%vreg11,%vreg4
%vreg12<def> = COPY %vreg5<kill>; GR32:%vreg12,%vreg5
JL_4 <BB#2>, %EFLAGS<imp-use,kill>
Now TwoAddressInstructionPass will choose vreg9 to be tied with vreg4. However, it doesn't see that there is copy from vreg4 to vreg11 and another copy from vreg11 to vreg2 inside the loop body. To remove those copies, it is necessary to choose vreg2 to be tied with vreg4 instead of vreg9. This code pattern commonly appears when there is reduction operation in a loop.
So check for a reversed copy chain and if we encounter one then we can commute the add instruction so we can avoid a copy.
Patch by Wei Mi.
http://reviews.llvm.org/D7806
llvm-svn: 231148
Summary:
This does not conceptually belongs here. Instead provide a shortcut
getModule() that provides access to the DataLayout.
Reviewers: chandlerc, echristo
Reviewed By: echristo
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D8027
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 231147
I removed the copy ctor, thinking that'd be the end of it - these
iterators should be perfectly assignable even from disjoint ranges (as
any iterator would be) - exkcept that the member was const.
Unconstify it.
llvm-svn: 231146
The assertion was just checking a class invariant that's pretty easy to
verify by inspection (no mutating operations, and the two non-copy ctors
already ensure the state is maintained) so remove the explicit copy ctor
in favor of the default, thus allowing the use of the default copy
assignment operator without hitting the C++11 deprecation here.
llvm-svn: 231143
There's no reason to disallow assigning an iterator from one range to an
iterator that previously iterated over a disjoint range. This then
follows the Rule of Zero, allowing implicit copy construction to be used
without hitting the case that's deprecated in C++11.
llvm-svn: 231142
There doesn't seem to be any need to assert that iterator assignment is
between iterators over the same node - if you want to reuse an iterator
variable to iterate another node, that's perfectly acceptable. Just
don't mix comparisons between iterators into disjoint sequences, as
usual.
llvm-svn: 231138
Accidentally committed a few more of these cleanup changes than
intended. Still breaking these out & tidying them up.
This reverts commit r231135.
llvm-svn: 231136
There doesn't seem to be any need to assert that iterator assignment is
between iterators over the same node - if you want to reuse an iterator
variable to iterate another node, that's perfectly acceptable. Just
don't mix comparisons between iterators into disjoint sequences, as
usual.
llvm-svn: 231135
There doesn't seem to be any need to assert that iterator assignment is
between iterators over the same node - if you want to reuse an iterator
variable to iterate another node, that's perfectly acceptable. Just
don't mix comparisons between iterators into disjoint sequences, as
usual.
llvm-svn: 231134
This type could be made copyable (= default a protected copy ctor in the
base class, and preferably make the derived class final to avoid risks
of providing a slicing copy operation to further derived classes) but it
seemed easier to avoid that complexity for a dump function that I assume
(by symmetry with ResourcePriorityQueue's dump, which was actively
buggy) not often used.
llvm-svn: 231133
Previously we had only Linux using DTPOFF for these; all X86 ELF
targets should. Fixes a side issue mentioned in PR21077.
Differential Revision: http://reviews.llvm.org/D8011
llvm-svn: 231130
This patch was landed in r231035 and reverted because it was buggy.
This is fixed version of the same change.
Summary:
This patch is an attempt at making `DenseMapIterator`s "fail-fast".
Fail-fast iterators that have been invalidated due to insertion into
the host `DenseMap` deterministically trip an assert (in debug mode)
on access, instead of non-deterministically hitting memory corruption
issues.
Reviewers: dexonsmith, dberlin, ruiu, chandlerc
Reviewed By: chandlerc
Subscribers: yaron.keren, chandlerc, llvm-commits
Differential Revision: http://reviews.llvm.org/D7931
llvm-svn: 231125
frame register before checking if there is a DWARF register number for it.
Thanks to H.J. Lu for diagnosing this and providing the testcase!
llvm-svn: 231121
Without this, use of this copy ctor is deprecated in C++11 due to the
presence of a user-declared dtor.
Marking the class final is just a little extra security that there are
no further derived classes that may then end up using the intermediate
base class's copy assignment operator and cause slicing to occur.
I didn't bother marking the other (non-test) base class final, since it
has reference members so it won't have any implicit assignment operators
anyway. Open to ideas on that, though.
We probably want a warning about use of a slicing assignment operator,
then I wouldn't worry so much about marking the class as final.
llvm-svn: 231114
The cause of the issue is the interaction of two factors:
1) When generating a DW_TAG_imported_declaration DIE which imports another
imported declaration, the code in AsmPrinter/DwarfCompileUnit.cpp
asserts that the second imported declaration must already have a DIE.
2) There is a non-determinism in the order in which imported declarations
within the same scope are processed.
Because of the non-determinism (2), it is possible that an imported
declaration is processed before another one it depends on, breaking the
assumption in (1).
The source of the non-determinism is that the imported declaration
DIDescriptors are sorted by scope in DwarfDebug::beginModule(); however that
sort is not a stable_sort, therefore the order of the declarations within
the same scope is not preserved. The attached patch changes the std::sort to
a std::stable_sort and it fixes the problem.
Test omitted due to it being non-deterministic and depending on the
implementation of std::sort.
llvm-svn: 231100
I tried making these private & friended to the BitVector, but that
didn't work - there's one use of BitVector::reference in Clang that
actually copies it into a local variable & uses it from there, rather
than just using the result of op[] in a temporary expression.
Whether or not this is desired is debatable (we could just fix that one
use in Clang) & it's not clear which way the C++ standard falls on this
for std::bitset's reference type (it has the same bug at least in
libstdc++, but Clang's -Wdeprecated doesn't flag it, because it's in a
standard header)
While it was only BitVector::reference's copy ctor that was referenced
by user code, I made SmallBitVector::reference's copy ctor public too,
for consistency.
llvm-svn: 231099
Ultimately, __CxxFrameHandler3 needs us to put a stack offset in a
table, and it will take responsibility for copying the exception object
into that slot. Modelling the exception object as an SSA value returned
by begincatch isn't going to work in general, so make it use an output
parameter.
Reviewers: andrew.w.kaylor
Differential Revision: http://reviews.llvm.org/D7920
llvm-svn: 231086
Move the specialized metadata nodes for the new debug info hierarchy
into place, finishing off PR22464. I've done bootstraps (and all that)
and I'm confident this commit is NFC as far as DWARF output is
concerned. Let me know if I'm wrong :).
The code changes are fairly mechanical:
- Bumped the "Debug Info Version".
- `DIBuilder` now creates the appropriate subclass of `MDNode`.
- Subclasses of DIDescriptor now expect to hold their "MD"
counterparts (e.g., `DIBasicType` expects `MDBasicType`).
- Deleted a ton of dead code in `AsmWriter.cpp` and `DebugInfo.cpp`
for printing comments.
- Big update to LangRef to describe the nodes in the new hierarchy.
Feel free to make it better.
Testcase changes are enormous. There's an accompanying clang commit on
its way.
If you have out-of-tree debug info testcases, I just broke your build.
- `upgrade-specialized-nodes.sh` is attached to PR22564. I used it to
update all the IR testcases.
- Unfortunately I failed to find way to script the updates to CHECK
lines, so I updated all of these by hand. This was fairly painful,
since the old CHECKs are difficult to reason about. That's one of
the benefits of the new hierarchy.
This work isn't quite finished, BTW. The `DIDescriptor` subclasses are
almost empty wrappers, but not quite: they still have loose casting
checks (see the `RETURN_FROM_RAW()` macro). Once they're completely
gutted, I'll rename the "MD" classes to "DI" and kill the wrappers. I
also expect to make a few schema changes now that it's easier to reason
about everything.
llvm-svn: 231082
Add the final bits of API that `DIBuilder` needs before the new nodes
can be moved into place.
- Add `MDType::clone()` and `MDType::setFlags()` to support
`DIBuilder::createTypeWithFlags()`.
- Add `MDBasicType::get()` overload that just requires a tag and a
name, as a convenience for `DIBuilder::createUnspecifiedType()`.
- Add `MDLocalVariable::withInline()` and
`MDLocalVariable::withoutInline()` to support
`llvm::createInlinedVariable()` and
`llvm::cleanseInlinedVariable()`.
(Somehow these got lost inside the "move into place" patch I'm about to
commit -- better to commit separately!)
llvm-svn: 231079
This prevents the behavior observed in llvm.org/PR22369. I am not sure
whether I am reading the code correctly, but the early exit based on
isLiveOutPastPHIs() seems to make the wrong assumption that
RegisterCoalescer won't be able to coalesce those copies later.
This change hides the new behavior behind -no-phi-elim-live-out-early-exit
as it currently breaks four tests:
* Assertion in:
CodeGen/Hexagon/hwloop-cleanup.ll
* Worse code in:
CodeGen/X86/coalescer-commute4.ll
CodeGen/X86/phys_subreg_coalesce-2.ll
CodeGen/X86/zlib-longest-match.ll
The root cause here seems to be that the heuristic that determines
the visitation order in RegisterCoalescer gets less lucky.
llvm-svn: 231064
This lets us avoid a few copies that are otherwise hard to get rid of.
The way this is done is, the custom-inserter looks at the following
instruction for another CMOV, and replaces both at the same time.
A previous version used a new CMOV2 opcode, but the custom inserter
is expected to be able to return a different basic block anyway, which
means it's OK - though far from ideal - to alter that block's contents.
Explicitly document that, in case it ever makes a difference.
Alternatives welcome!
Follow-up to r231045.
rdar://19767934
Closes http://reviews.llvm.org/D8019
llvm-svn: 231046
Fold and/or of setcc's to double CMOV:
(CMOV F, T, ((cc1 | cc2) != 0)) -> (CMOV (CMOV F, T, cc1), T, cc2)
(CMOV F, T, ((cc1 & cc2) != 0)) -> (CMOV (CMOV T, F, !cc1), F, !cc2)
When we can't use the CMOV instruction, it might increase branch
mispredicts. When we can, or when there is no mispredict, this
improves throughput and reduces register pressure.
These can't be catched by generic combines, because the pattern can
appear when legalizing some instructions (such as fcmp une).
rdar://19767934
http://reviews.llvm.org/D7634
llvm-svn: 231045
By loading from indexed offsets into a byte array and applying a mask, a
program can test bits from the bit set with a relatively short instruction
sequence. For example, suppose we have 15 bit sets to lay out:
A (16 bits), B (15 bits), C (14 bits), D (13 bits), E (12 bits),
F (11 bits), G (10 bits), H (9 bits), I (7 bits), J (6 bits), K (5 bits),
L (4 bits), M (3 bits), N (2 bits), O (1 bit)
These bits can be laid out in a 16-byte array like this:
Byte Offset
0123456789ABCDEF
Bit
7 HHHHHHHHHIIIIIII
6 GGGGGGGGGGJJJJJJ
5 FFFFFFFFFFFKKKKK
4 EEEEEEEEEEEELLLL
3 DDDDDDDDDDDDDMMM
2 CCCCCCCCCCCCCCNN
1 BBBBBBBBBBBBBBBO
0 AAAAAAAAAAAAAAAA
For example, to test bit X of A, we evaluate ((bits[X] & 1) != 0), or to
test bit X of I, we evaluate ((bits[9 + X] & 0x80) != 0). This can be done
in 1-2 machine instructions on x86, or 4-6 instructions on ARM.
This uses the LPT multiprocessor scheduling algorithm to lay out the bits
efficiently.
Saves ~450KB of instructions in a recent build of Chromium.
Differential Revision: http://reviews.llvm.org/D7954
llvm-svn: 231043
There's really no reason to have them have entries in the symbol table
anymore. Old versions of ld64 had some bugs in this area but those have
been fixed long ago.
llvm-svn: 231041
In the future, we should run the output of clang through instnamer to
make it easier to manually edit test cases.
No functionality change.
llvm-svn: 231037
Summary:
This patch is an attempt at making `DenseMapIterator`s "fail-fast".
Fail-fast iterators that have been invalidated due to insertion into
the host `DenseMap` deterministically trip an assert (in debug mode)
on access, instead of non-deterministically hitting memory corruption
issues.
Reviewers: dexonsmith, dberlin, ruiu, chandlerc
Reviewed By: chandlerc
Subscribers: yaron.keren, chandlerc, llvm-commits
Differential Revision: http://reviews.llvm.org/D7931
llvm-svn: 231035
TargetRegisterInfo. DebugLocEntry now holds a buffer with the raw bytes
of the pre-calculated DWARF expression.
Ought to be NFC, but it does slightly alter the output format of the
textual assembly.
This reapplies 230930 without the assertion in DebugLocEntry::finalize()
because not all Machine registers can be lowered into DWARF register
numbers and floating point constants cannot be expressed.
llvm-svn: 231023