the new pass manager.
This adds operator<< overloads for the various bits of the
LazyCallGraph, dump methods for use from the debugger, and debug logging
using them to the CGSCC pass manager.
Having this was essential for debugging the call graph update patch, and
I've extracted what I could from that patch here to minimize the delta.
llvm-svn: 273961
allow a good error message to be produced.
I added the one test case that the object file tools could produce an error
message. The other two errors can’t be triggered if the input file is passed
through sys::fs::identify_magic(). But the malformedError("bad magic number")
does get triggered by the logic in llvm-dsymutil when dealing with a normal
Mach-O file. The other "File too small ..." error would take a logic error
currently to produce and is not tested for.
llvm-svn: 273946
Summary:
Our YAML library's handling of tags isn't perfect, but it is good enough to get rid of the need for the --format argument to yaml2obj. This patch does exactly that.
Instead of requiring --format, it infers the format based on the tags found in the object file. The supported tags are:
!ELF
!COFF
!mach-o
!fat-mach-o
I have a corresponding patch that is quite large that fixes up all the in-tree test cases.
Reviewers: rafael, Bigcheese, compnerd, silvas
Subscribers: compnerd, llvm-commits
Differential Revision: http://reviews.llvm.org/D21711
llvm-svn: 273915
This is a resubmittion of 263158 change after fixing the existing problem with intrinsics mangling (see LTO and intrinsics mangling llvm-dev thread for details).
This patch fixes the problem which occurs when loop-vectorize tries to use @llvm.masked.load/store intrinsic for a non-default addrspace pointer. It fails with "Calling a function with a bad signature!" assertion in CallInst constructor because it tries to pass a non-default addrspace pointer to the pointer argument which has default addrspace.
The fix is to add pointer type as another overloaded type to @llvm.masked.load/store intrinsics.
Reviewed By: reames
Differential Revision: http://reviews.llvm.org/D17270
llvm-svn: 273892
Summary:
Created a pattern to match 64-bit mode (and (xor x, -1), y)
to a shorter sequence of instructions.
Before the change, the canonical form is translated to:
xihf %r3, 4294967295
xilf %r3, 4294967295
ngr %r2, %r3
After the change, the canonical form is translated to:
ngr %r3, %r2
xgr %r2, %r3
Reviewers: zhanjunl, uweigand
Subscribers: llvm-commits
Author: assem
Committing on behalf of Assem.
Differential Revision: http://reviews.llvm.org/D21693
llvm-svn: 273887
It did not handle correctly cases without GEP.
The following loop wasn't vectorized:
for (int i=0; i<len; i++)
*to++ = *from++;
I use getPtrStride() to find Stride for memory access and return 0 is the Stride is not 1 or -1.
Re-commit rL273257 - revision: http://reviews.llvm.org/D20789
llvm-svn: 273864
AVX1 can only broadcast vectors as floats/doubles, so for 256-bit vectors we insert bitcasts if we are shuffling v8i32/v4i64 types. Unfortunately the presence of these bitcasts prevents the current broadcast lowering code from peeking through cases where we have concatenated / extracted vectors to create the 256-bit vectors.
This patch allows us to peek through bitcasts as long as the number of elements doesn't change (i.e. element bitwidth is the same) so the broadcast index is not affected.
Note this bitcast peek is different from the stage later on which doesn't care about the type and is just trying to find a load node.
Differential Revision: http://reviews.llvm.org/D21660
llvm-svn: 273848
Summary:
This is a straightforward extension of what LoopUnswitch does to
branches to guards. That is, we unswitch
```
for (;;) {
...
guard(loop_invariant_cond);
...
}
```
into
```
if (loop_invariant_cond) {
for (;;) {
...
// There is no need to emit guard(true)
...
}
} else {
for (;;) {
...
guard(false);
// SimplifyCFG will clean this up by adding an
// unreachable after the guard(false)
...
}
}
```
Reviewers: majnemer
Subscribers: mcrosier, llvm-commits, mzolotukhin
Differential Revision: http://reviews.llvm.org/D21725
llvm-svn: 273801
Don't cast GV expression to MCSymbolRefExpr. r272705 changed GV to binary
expressions by including offset even if the offset it 0
(we haven't hit this sooner since tested workloads don't include static offsets)
We don't really care about the type of expression, so set it directly.
Fixes: r272705
Consider section relative relocations. Since all const as data is in one boffer section relative is equivalent to abs32.
Fixes: r273166
Differential Revision: http://reviews.llvm.org/D21633
llvm-svn: 273785
SimplifyCFG had logic to insert calls to llvm.trap for two very
particular IR patterns: stores and invokes of undef/null.
While InstCombine canonicalizes certain undefined behavior IR patterns
to stores of undef, phase ordering means that this cannot be relied upon
in general.
There are much better tools than llvm.trap: UBSan and ASan.
N.B. I could be argued into reverting this change if a clear argument as
to why it is important that we synthesize llvm.trap for stores, I'd be
hard pressed to see why it'd be useful for invokes...
llvm-svn: 273778
Debugger prologue is emitted if -mattr=+amdgpu-debugger-emit-prologue.
Debugger prologue writes work group IDs and work item IDs to scratch memory at fixed location in the following format:
- offset 0: work group ID x
- offset 4: work group ID y
- offset 8: work group ID z
- offset 16: work item ID x
- offset 20: work item ID y
- offset 24: work item ID z
Set
- amd_kernel_code_t::debug_wavefront_private_segment_offset_sgpr to scratch wave offset reg
- amd_kernel_code_t::debug_private_segment_buffer_sgpr to scratch rsrc reg
- amd_kernel_code_t::is_debug_supported to true if all debugger features are enabled
Differential Revision: http://reviews.llvm.org/D20335
llvm-svn: 273769
This adds some tests for the smarter llvm-ar selection mode as well as some
additional tests as per Rafael's post commit review comments.
llvm-svn: 273768
Summary:
Offset folding only works if you are emitting relocations, and we don't
emit relocations for local address space globals.
Reviewers: arsenm, nhaustov
Subscribers: arsenm, llvm-commits, kzhuravl
Differential Revision: http://reviews.llvm.org/D21647
llvm-svn: 273765
There are two separate issues:
- LLVM doesn't consider infinite loops to be side effects: we happily
hoist/sink above/below loops whose bounds are unknown.
- The absence of the noreturn attribute is insufficient for us to know
if a function will definitely return. Relying on noreturn in the
middle-end for any property is an accident waiting to happen.
llvm-svn: 273762
This intrinsic safely loads a function pointer from a virtual table pointer
using type metadata. This intrinsic is used to implement control flow integrity
in conjunction with virtual call optimization. The virtual call optimization
pass will optimize away llvm.type.checked.load intrinsics associated with
devirtualized calls, thereby removing the type check in cases where it is
not needed to enforce the control flow integrity constraint.
This patch also introduces the capability to copy type metadata between
global variables, and teaches the virtual call optimization pass to do so.
Differential Revision: http://reviews.llvm.org/D21121
llvm-svn: 273756
In bidirectional scheduling this gives more stable results than just
comparing the "reason" fields of the top/bottom node because the reason
field may be higher depending on what other nodes are in the queue.
Differential Revision: http://reviews.llvm.org/D19401
llvm-svn: 273755
r273711 was reverted by r273743. The inliner needs to know about any
call sites in the inlined function. These were obscured if we replaced
a call to undef with an undef but kept the call around.
This fixes PR28298.
llvm-svn: 273753
COPY was lacking a scheduling class, define it to avoid regressions in
the upcoming change to the bidirectional MachineScheduler. Approved by
tstellar on IRC.
Differential Revision: http://reviews.llvm.org/D21540
llvm-svn: 273751
Summary: Set ProfileSummary in SampleProfilerLoader.
Reviewers: davidxl, eraman
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D21702
llvm-svn: 273745
This fixes an embarrassing bug when emitting .debug_loc entries for 64-bit+ constants,
which were previously silently truncated to 32 bits.
<rdar://problem/26843232>
llvm-svn: 273736
The bitset metadata currently used in LLVM has a few problems:
1. It has the wrong name. The name "bitset" refers to an implementation
detail of one use of the metadata (i.e. its original use case, CFI).
This makes it harder to understand, as the name makes no sense in the
context of virtual call optimization.
2. It is represented using a global named metadata node, rather than
being directly associated with a global. This makes it harder to
manipulate the metadata when rebuilding global variables, summarise it
as part of ThinLTO and drop unused metadata when associated globals are
dropped. For this reason, CFI does not currently work correctly when
both CFI and vcall opt are enabled, as vcall opt needs to rebuild vtable
globals, and fails to associate metadata with the rebuilt globals. As I
understand it, the same problem could also affect ASan, which rebuilds
globals with a red zone.
This patch solves both of those problems in the following way:
1. Rename the metadata to "type metadata". This new name reflects how
the metadata is currently being used (i.e. to represent type information
for CFI and vtable opt). The new name is reflected in the name for the
associated intrinsic (llvm.type.test) and pass (LowerTypeTests).
2. Attach metadata directly to the globals that it pertains to, rather
than using the "llvm.bitsets" global metadata node as we are doing now.
This is done using the newly introduced capability to attach
metadata to global variables (r271348 and r271358).
See also: http://lists.llvm.org/pipermail/llvm-dev/2016-June/100462.html
Differential Revision: http://reviews.llvm.org/D21053
llvm-svn: 273729
This patch adds round-trip support for MachO Universal binaries to obj2yaml and yaml2obj. Universal binaries have a header and list of architecture structures, followed by a the individual object files at specified offsets.
llvm-svn: 273719
By putting all the possible commutations together, we simplify the code.
Note that this is NFCI, but I'm adding tests that actually exercise each
commutation pattern because we don't have this anywhere else.
llvm-svn: 273702
Tail merge was making the assumption that a layout successor or
predecessor was always a cfg successor/predecessor. Remove that
assumption. Changes to tests are necessary because the errant cfg edges
were preventing optimizations.
llvm-svn: 273700
Clang emits them in reverse order to conform to the ABI, which requires
left-to-right destruction. As a result, the order doesn't fall out
naturally, and we have to sort things out in the backend.
Fixes PR28213
llvm-svn: 273696
There are two remaining issues here:
1. No vbptr information
2. Need to mention indirect virtual bases
Getting indirect virtual bases is just a matter of adding an "indirect"
flag, emitting them in the frontend, and ignoring them when appropriate
for DWARF.
All virtual bases use the same artificial vbptr field, so I think the
vbptr offset will be best represented by an implicit __vbptr$ClassName
member similar to our existing __vptr$ member.
llvm-svn: 273688
The interleaved access analysis currently assumes that the inserted run-time
pointer aliasing checks ensure the absence of dependences that would prevent
its instruction reordering. However, this is not the case.
Issues can arise from how code generation is performed for interleaved groups.
For a load group, all loads in the group are essentially moved to the location
of the first load in program order, and for a store group, all stores in the
group are moved to the location of the last store. For groups having members
involved in a dependence relation with any other instruction in the loop, this
reordering can violate the dependence.
This patch teaches the interleaved access analysis how to avoid breaking such
dependences, and should fix PR27626.
An assumption of the original analysis was that the accesses had been collected
in "program order". The analysis was then simplified by visiting the accesses
bottom-up. However, this ordering was never guaranteed for anything other than
single basic block loops. Thus, this patch also enforces the desired ordering.
Reference: https://llvm.org/bugs/show_bug.cgi?id=27626
Differential Revision: http://reviews.llvm.org/D19984
llvm-svn: 273687
This adds rudimentary support for COFF ARM to the dynamic loader for the
exeuction engine. This can be used by lldb to JIT code into a COFF ARM
environment. This lays the foundation for the loader, though a few of the
relocation types are yet unhandled.
llvm-svn: 273682
Un XFAIL a few tests plus a few more I had lying around
in my tree, which seem to all work now but I don't see tests
that quite test the same things.
llvm-svn: 273655
The only real reason to use it is for testing, so replace
it with a command line option instead of a potentially function
dependent feature.
llvm-svn: 273653
Some buildbots are complaining about a .s file under test/ that was
inadvertently created by a test earlier today, and is still hanging
around. I'll undo this change in ~1hr (or whenever the bots are done
getting rid of said file).
llvm-svn: 273641
reduce the number of comparisons.
Specifically, InstCombine can turn:
(i == 5334 || i == 5335)
into:
((i | 1) == 5335)
SimplifyCFG was already able to detect the pattern:
(i == 5334 || i == 5335)
to:
((i & -2) == 5334)
This patch supersedes D21315 and resolves PR27555
(https://llvm.org/bugs/show_bug.cgi?id=27555).
Thanks to David and Chandler for the suggestions!
Author: Thomas Jablin (tjablin)
Reviewers: majnemer chandlerc halfdan cycheng
http://reviews.llvm.org/D21397
llvm-svn: 273639
The function name Module::empty() is slightly misleading in that it
only tests for the presence of functions in the module. However we
still want to emit the module summary if the module contains only
global variables or aliases. The presence of such entities can be
determined simply by checking the summary directly, as we are doing
below.
Differential Revision: http://reviews.llvm.org/D21669
llvm-svn: 273638
This patch also has a refactor that kills StratifiedAttr, and leaves us
with StratifiedAttrs, because having both was mildly redundant.
This patch makes us correctly handle stratified attributes when doing
interprocedural analysis. It also adds another attribute, AttrCaller,
which acts like AttrUnknown. We can filter out AttrCaller values when
during interprocedural analysis, since the caller should have
information about what arguments it's passing to its callee.
Patch by Jia Chen.
Differential Revision: http://reviews.llvm.org/D21645
llvm-svn: 273636
Summary:
We will start generating this in a future patch.
Reviewers: arsenm, kzhuravl, rafael, ruiu, tony-tye
Subscribers: arsenm, llvm-commits, kzhuravl
Differential Revision: http://reviews.llvm.org/D21482
llvm-svn: 273628
32-bit Mach headers don't have reserved fields. When generating the
mapping for 32-bit headers leaving off the reserved field will result in
parse errors if the field is present in the yaml.
Added a CHECK-NOT line to ensure that mach_header.yaml isn't adding a
reserved field, and a test to ensure that the parser error gets hit with
32-bit headers.
llvm-svn: 273623
Memory references were not being propagated for this folded load. This
prevented optimizations like LICM from hoisting the load.
Added test to verify that this allows LICM to proceed.
llvm-svn: 273617
When considering whether to split an instruction with a memory operand
into an explicit load and a register-based instruction, we currently
check that the resulting instruction has exactly 1 def. This prevents 2
important LICM optimizations: compares with memory operands, and double
indirect calls. All the tests and the test-suite pass without the check.
My guess as to original intent is to limit the additional register pressure
created by the new instruction, but given that we only split out a single
register, it is already limited.
The licm-dominance test now checks actual memory loads for hoisting instead of
undef, and it tests compares.
hoist-invariant-load.ll now checks for 2 hoists, the intended hoist, and a bonus
from calling a got-relative function in a loop.
llvm-svn: 273616
Summary:
This instcombine rule folds away trunc operations that have value available from a prior load or store.
This kind of code can be generated as a result of GVN widening the load or from source code as well.
Reviewers: reames, majnemer, sanjoy
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D21246
llvm-svn: 273608
Previously, we just unified any arguments that seemed to be related to
each other. With this patch, we now respect dereference levels, etc.
which should make us substantially more accurate. Proper handling of
StratifiedAttrs will be done in a later patch.
Patch by Jia Chen.
Differential Revision: http://reviews.llvm.org/D21536
llvm-svn: 273596
X86FrameLowering::adjustForHiPEPrologue() contains a hard-coded offset
into an Erlang Runtime System-internal data structure (the PCB). As the
layout of this data structure is prone to change, this poses problems
for maintaining compatibility.
To address this problem, the compiler can produce this information as
module-level named metadata. For example (where P_NSP_LIMIT is the
offending offset):
!hipe.literals = !{ !2, !3, !4 }
!2 = !{ !"P_NSP_LIMIT", i32 152 }
!3 = !{ !"X86_LEAF_WORDS", i32 24 }
!4 = !{ !"AMD64_LEAF_WORDS", i32 24 }
Patch by Magnus Lang
Differential Revision: http://reviews.llvm.org/D20363
llvm-svn: 273593
Recommiting after correcting over-eager Debug Value transfer fixing PR28270.
[DAG] Previously debug values would transfer debuginfo for the selected
start node for a replacement which allows for debug to be dropped.
Push debug value transfer to occur with node/value replacement in
SelectionDAG, remove now extraneous transfers of debug values.
This refixes PR9817 which was being incompletely checked in the
testsuite.
Reviewers: jyknight
Subscribers: dblaikie, llvm-commits
Differential Revision: http://reviews.llvm.org/D21037
llvm-svn: 273585
Summary:
SSAT saturates an integer, making sure that its value lies within
an interval [-k, k]. Since the constant is given to SSAT as the
number of bytes set to one, k + 1 must be a power of 2, otherwise
the optimization is not possible. Also, the select_cc must use <
and > respectively so that they define an interval.
Reviewers: mcrosier, jmolloy, rengolin
Subscribers: aemerson, rengolin, llvm-commits
Differential Revision: http://reviews.llvm.org/D21372
llvm-svn: 273581
When simplifying a load we need to make sure that the type of the
simplified value matches the type of the instruction we're processing.
In theory, we can handle casts here as we deal with constant data, but
since it's not implemented at the moment, we at least need to bail out.
This fixes PR28262.
llvm-svn: 273562
DeadStoreElimination can currently remove a small store rendered unnecessary by
a later larger one, but could not remove a larger store rendered unnecessary by
a series of later smaller ones. This adds that capability.
It works by keeping a map, which is used as an effective interval map, for each
store later overwritten only partially, and filling in that interval map as
more such stores are discovered. No additional walking or aliasing queries are
used. In the map forms an interval covering the the entire earlier store, then
it is dead and can be removed. The map is used as an interval map by storing a
mapping between the ending offset and the beginning offset of each interval.
I discovered this problem when investigating a performance issue with code like
this on PowerPC:
#include <complex>
using namespace std;
complex<float> bar(complex<float> C);
complex<float> foo(complex<float> C) {
return bar(C)*C;
}
which produces this:
define void @_Z4testSt7complexIfE(%"struct.std::complex"* noalias nocapture sret %agg.result, i64 %c.coerce) {
entry:
%ref.tmp = alloca i64, align 8
%tmpcast = bitcast i64* %ref.tmp to %"struct.std::complex"*
%c.sroa.0.0.extract.shift = lshr i64 %c.coerce, 32
%c.sroa.0.0.extract.trunc = trunc i64 %c.sroa.0.0.extract.shift to i32
%0 = bitcast i32 %c.sroa.0.0.extract.trunc to float
%c.sroa.2.0.extract.trunc = trunc i64 %c.coerce to i32
%1 = bitcast i32 %c.sroa.2.0.extract.trunc to float
call void @_Z3barSt7complexIfE(%"struct.std::complex"* nonnull sret %tmpcast, i64 %c.coerce)
%2 = bitcast %"struct.std::complex"* %agg.result to i64*
%3 = load i64, i64* %ref.tmp, align 8
store i64 %3, i64* %2, align 4 ; <--- ***** THIS SHOULD NOT BE HERE ****
%_M_value.realp.i.i = getelementptr inbounds %"struct.std::complex", %"struct.std::complex"* %agg.result, i64 0, i32 0, i32 0
%4 = lshr i64 %3, 32
%5 = trunc i64 %4 to i32
%6 = bitcast i32 %5 to float
%_M_value.imagp.i.i = getelementptr inbounds %"struct.std::complex", %"struct.std::complex"* %agg.result, i64 0, i32 0, i32 1
%7 = trunc i64 %3 to i32
%8 = bitcast i32 %7 to float
%mul_ad.i.i = fmul fast float %6, %1
%mul_bc.i.i = fmul fast float %8, %0
%mul_i.i.i = fadd fast float %mul_ad.i.i, %mul_bc.i.i
%mul_ac.i.i = fmul fast float %6, %0
%mul_bd.i.i = fmul fast float %8, %1
%mul_r.i.i = fsub fast float %mul_ac.i.i, %mul_bd.i.i
store float %mul_r.i.i, float* %_M_value.realp.i.i, align 4
store float %mul_i.i.i, float* %_M_value.imagp.i.i, align 4
ret void
}
the problem here is not just that the i64 store is unnecessary, but also that
it blocks further backend optimizations of the other uses of that i64 value in
the backend.
In the future, we might want to add a special case for handling smaller
accesses (e.g. using a bit vector) if the map mechanism turns out to be
noticeably inefficient. A sorted vector is also a possible replacement for the
map for small numbers of tracked intervals.
Differential Revision: http://reviews.llvm.org/D18586
llvm-svn: 273559
Summary:
The backend has no reason to behave like a driver and should generally do
as it's told (and error out if it can't) instead of trying to figure out
what the API user meant. The default ABI is still derived from the arch
component as a concession to backwards compatibility.
API-users that previously passed an explicit CPU and a triple that was
inconsistent with the CPU (e.g. mips-linux-gnu and mips64r2) may get a
different ABI to what they got before. However, it's expected that there
are no such users on the basis that CodeGen has been asserting that the
triple is consistent with the selected ABI for several releases. API-users
that were consistent or passed '' or 'generic' as the CPU will see no
difference.
Reviewers: sdardis, rafael
Subscribers: rafael, dsanders, sdardis, llvm-commits
Differential Revision: http://reviews.llvm.org/D21466
llvm-svn: 273557
Summary:
When parseAnyRegister() encounters a symbol alias, it parses integers and adds
a corresponding expression to the operand list. This is clearly wrong since the
only operands that parseAnyRegister() should be accepting are registers.
It's not clear why this code was added and there are no test cases that cover
it. I think it might be leftover from when searchSymbolAlias() was more widely
used.
Reviewers: sdardis
Subscribers: dsanders, sdardis, llvm-commits
Differential Revision: http://reviews.llvm.org/D21377
llvm-svn: 273555
The exit-on-error flag was necessary in order to avoid an assertion when
handling DYNAMIC_STACKALLOC nodes in SelectionDAGLegalize.
We can avoid the assertion by creating some dummy nodes. This enables us to
remove the exit-on-error flag on the first 2 run lines (SI), but on the third
run line (R600) we would run into another assertion when trying to reserve
indirect registers. This patch also replaces that assertion with an early exit
from the function.
Fixes PR27761.
Differential Revision: http://reviews.llvm.org/D20852
llvm-svn: 273550
dext and dins, along with their 'm' and 'u' variants are defined in mips64r2,
not mips64.
Reviewers: dsanders, vkalintiris
Differential Review: http://reviews.llvm.org/D21608
llvm-svn: 273549
When trying to convert a loading instruction into a FAULTING_LOAD, we
sometimes face code like this:
if %R10 is not null:
%R9<def> = MOV32ri Immediate
%R9<def, tied> = AND32rm %R9, 0x20(%R10)
else:
goto TRAP
In these cases we would like to use the AND32rm instruction as the
faulting operation by hoisting the "depedency" def-ing %R9 also above
the control flow, transforming the program into:
%R9<def> = MOV32ri Immediate
%R9<def, tied> = FAULTING_LOAD_OP(AND32rm %R9, 0x20(%R10), FailPath: TRAP)
This change teaches ImplicitNullChecks to do the above, when safe.
llvm-svn: 273501
Tweak the big-types.ll test case to catch this bug. We just need an
enumerator name that doesn't have a length that is a multiple of 4.
llvm-svn: 273477
This is a convenience iterator that allows clients to enumerate the
GlobalObjects within a Module.
Also start using it in a few places where it is obviously the right thing
to use.
Differential Revision: http://reviews.llvm.org/D21580
llvm-svn: 273470
The main sin this was committing was using terminator
instructions in the middle of the block, and then
not updating the block successors / predecessors.
Split the blocks up to avoid this and introduce new
pseudo instructions for branches taken with exec masking.
Also use a pseudo instead of emitting s_endpgm and erasing
it in the special case of a non-void return.
llvm-svn: 273467
Transform: (store ch addr (add x (add (shl y c) e)))
to: (store ch addr (add x (shl (add y d) c))),
where e = (shl d c) for some integer d.
The purpose of this is to enable generation of loads/stores with
shifted addressing mode, i.e. mem(x+y<<#c). For that, the shift
value c must be 0, 1 or 2.
llvm-svn: 273466
This is similar to the computeKnownBits improvement in rL268479.
There's probably more we can do for vector logic instructions, but
this should let us see non-splat constant masking ops that can
become vector selects instead of and/andn/or sequences.
Differential Revision: http://reviews.llvm.org/D21610
llvm-svn: 273459
Recommiting after fixing over-aggressive assertion
[DAG] Previously debug values would transfer debuginfo for the selected
start node for a replacement which allows for debug to be dropped.
Push debug value transfer to occur with node/value replacement in
SelectionDAG, remove now extraneous transfers of debug values.
This refixes PR9817 which was being incompletely checked in the
testsuite.
Reviewers: jyknight
Subscribers: dblaikie, llvm-commits
Differential Revision: http://reviews.llvm.org/D21037
llvm-svn: 273456
CodeView needs to know if a virtual method was introduced in the current
class, and base classes may not have complete type information, so we
need to thread this bit through from the frontend.
llvm-svn: 273453
It's only useful to asan-itize profiling globals while debugging llvm's
profiling instrumentation passes. Enabling asan along with instrprof or
gcov instrumentation shouldn't incur extra overhead.
This patch is in the same spirit as r264805 and r273202, which disabled
tsan instrumentation of instrprof/gcov globals.
Differential Revision: http://reviews.llvm.org/D21541
llvm-svn: 273444
This is the motivating example:
struct B { int b; };
struct A { B *b; };
int f(A *p) { return p->b->b; }
Clang emits complete types for both A and B because they are required to
be complete, but our CodeView emission would only emit forward
declarations of A and B. This was a consequence of the fact that the A*
type must reference the forward declaration of A, which doesn't
reference B at all.
We can't eagerly emit complete definitions of A and B when we request
the forward declaration's type index because of recursive types like
linked lists. If we did that, our stack usage could get out of hand, and
it would be possible to lower a type while attempting to lower a type,
and we would need to double check if our type is already present in the
TypeIndexMap after all recursive getTypeIndex calls.
Instead, defer complete type emission until after all type lowering has
completed. This ensures that all referenced complete types are emitted,
and that type lowering is not re-entrant.
llvm-svn: 273443
Summary:
Recognize RISBG opportunities where the end result is narrower than the
original input - where a truncate separates the shift/and operations.
The motivating case is some code in postgres which looks like:
srlg %r2, %r0, 11
nilh %r2, 255
Reviewers: uweigand
Author: RolandF
Differential Revision: http://reviews.llvm.org/D21452
llvm-svn: 273433
We no longer have corresponding code in autoupgrade and the vast majority of the tests were fixed long time ago. Fix the remaining few. One of the verifier test cases is marked as XFAIL because it was passing only because the signature was incorrect.
llvm-svn: 273428
This patch changes single method of llvm-readobj.
It teaches SHT_GNU_verdef dumper to print version dependencies,
also it removes few fields from output that can be dumped with other keys
and slightly refactors code.
Testcase was also modified to match the changes.
Change is required for testcases of upcoming lld patches.
Differential revision: http://reviews.llvm.org/D21552
llvm-svn: 273417
We now include namespace scope info in LF_FUNC_ID records and we emit
LF_MFUNC_ID records for member functions as we should.
Class names are now fully qualified, which is what MSVC does.
Add a little bit of scaffolding to handle ThisAdjustment when it arrives
in DISubprogram.
llvm-svn: 273358
Do not instrument pointers with address space attributes since we cannot track
them anyway. Instrumenting them results in false positives in ASan and a
compiler crash in TSan. (The compiler should not crash in any case, but that's
a different problem.)
llvm-svn: 273339
This change is motivated by an upcoming change to the metadata representation
used for CFI. The indirect function call checker needs type information for
external function declarations in order to correctly generate jump table
entries for such declarations. We currently associate such type information
with declarations using a global metadata node, but I plan [1] to move all
such metadata to global object attachments.
In bitcode, metadata attachments for function declarations appear in the
global metadata block. This seems reasonable to me because I expect metadata
attachments on declarations to be uncommon. In the long term I'd also expect
this to be the case for CFI, because we'd want to use some specialized bitcode
format for this metadata that could be read as part of the ThinLTO thin-link
phase, which would mean that it would not appear in the global metadata block.
To solve the lazy loaded metadata issue I was seeing with D20147, I use the
same bitcode representation for metadata attachments for global variables as I
do for function declarations. Since there's a use case for metadata attachments
in the global metadata block, we might as well use that representation for
global variables as well, at least until we have a mechanism for lazy loading
global variables.
In the assembly format, the metadata attachments appear after the "declare"
keyword in order to avoid a parsing ambiguity.
[1] http://lists.llvm.org/pipermail/llvm-dev/2016-June/100462.html
Differential Revision: http://reviews.llvm.org/D21052
llvm-svn: 273336
with the -macho and -universal-headers flags.
Just a follow on to r273207, I missed updating the printing of the fat magic
number when the universal file is a 64-bit universal file.
rdar://26899493
llvm-svn: 273324
Avoid unnecessary spills of such vars to local space on SASS level and
pointer space conversion.
Instead, make a local copy with appropriate addrspacecasts and let
LLVM optimize them away when possible.
This allows loading value of the argument using [symbol+offset]
instead of converting argument to general space pointer and using it
for indexing (which also implicitly converts param space pointer to
local space one on SASS level and triggers copying of argument into
local space in the process).
This reduces call overhead, uses less registers and reduces overall
SASS size by 2-4%.
Differential Review: http://reviews.llvm.org/D21421
llvm-svn: 273313
The basic structure is that once a list record goes over 64K, the last
subrecord of the list is an LF_INDEX record that refers to the next
record. Because the type record graph must be toplogically sorted, this
means we have to emit them in reverse order. We build the type record in
order of declaration, so this means that if we don't want extra copies,
we need to detect when we were about to split a record, and leave space
for a continuation subrecord that will point to the eventual split
top-level record.
Also adds dumping support for these records.
Next we should make sure that large method overload lists work properly.
llvm-svn: 273294
r273271 changed the RUN line of the regression test to use
-march=cyclone instead of -mtriple=aarch64-none-none.
This caused a change in the output syntax for the ext
instruction, causing the test to fail. Change this test
back to using -mtriple=aarch64-none-none.
llvm-svn: 273286
Summary:
Fix the computation of the offsets present in the scopetable when using the
SEH (__except_handler4).
This patch added an intrinsic to track the position of the allocation on the
stack of the EHGuard. This position is needed when producing the ScopeTable.
```
struct _EH4_SCOPETABLE {
DWORD GSCookieOffset;
DWORD GSCookieXOROffset;
DWORD EHCookieOffset;
DWORD EHCookieXOROffset;
_EH4_SCOPETABLE_RECORD ScopeRecord[1];
};
struct _EH4_SCOPETABLE_RECORD {
DWORD EnclosingLevel;
long (*FilterFunc)();
union {
void (*HandlerAddress)();
void (*FinallyFunc)();
};
};
```
The code to generate the EHCookie is added in `X86WinEHState.cpp`.
Which is adding these instructions when using SEH4.
```
Lfunc_begin0:
# BB#0: # %entry
pushl %ebp
movl %esp, %ebp
pushl %ebx
pushl %edi
pushl %esi
subl $28, %esp
movl %ebp, %eax <<-- Loading FramePtr
movl %esp, -36(%ebp)
movl $-2, -16(%ebp)
movl $L__ehtable$use_except_handler4_ssp, %ecx
xorl ___security_cookie, %ecx
movl %ecx, -20(%ebp)
xorl ___security_cookie, %eax <<-- XOR FramePtr and Cookie
movl %eax, -40(%ebp) <<-- Storing EHGuard
leal -28(%ebp), %eax
movl $__except_handler4, -24(%ebp)
movl %fs:0, %ecx
movl %ecx, -28(%ebp)
movl %eax, %fs:0
movl $0, -16(%ebp)
calll _may_throw_or_crash
LBB1_1: # %cont
movl -28(%ebp), %eax
movl %eax, %fs:0
addl $28, %esp
popl %esi
popl %edi
popl %ebx
popl %ebp
retl
```
And the corresponding offset is computed:
```
Luse_except_handler4_ssp$parent_frame_offset = -36
.p2align 2
L__ehtable$use_except_handler4_ssp:
.long -2 # GSCookieOffset
.long 0 # GSCookieXOROffset
.long -40 # EHCookieOffset <<----
.long 0 # EHCookieXOROffset
.long -2 # ToState
.long _catchall_filt # FilterFunction
.long LBB1_2 # ExceptionHandler
```
Clang is not yet producing function using SEH4, but it's a work in progress.
This patch is a step toward having a valid implementation of SEH4.
Unfortunately, it is not yet fully working. The EH registration block is not
allocated at the right offset on the stack.
Reviewers: rnk, majnemer
Subscribers: llvm-commits, chrisha
Differential Revision: http://reviews.llvm.org/D21231
llvm-svn: 273281
Summary:
We have switched to using features for all heuristics, but
the tests for these are still using -mcpu, which means we
are not directly testing the features.
This converts at least some of the existing regression tests
to use the new features.
This still leaves the following features untested:
merge-narrow-ld
predictable-select-expensive
alternate-sextload-cvt-f32-pattern
disable-latency-sched-heuristic
Reviewers: mcrosier, t.p.northover, rengolin
Subscribers: MatzeB, aemerson, llvm-commits, rengolin
Differential Revision: http://reviews.llvm.org/D21288
llvm-svn: 273271
Summary:
canCombineSinCosLibcall() would previously combine sin+cos into sincos for
GNUX32/GNUEABI/GNUEABIHF regardless of whether UnsafeFPMath were set or not.
However, GNU would only combine them for UnsafeFPMath because sincos does not
set errno like sin and cos do. It seems likely that this is an oversight.
Reviewers: t.p.northover
Subscribers: t.p.northover, aemerson, llvm-commits, rengolin
Differential Revision: http://reviews.llvm.org/D21431
llvm-svn: 273259
It did not handle correctly cases without GEP.
The following loop wasn't vectorized:
for (int i=0; i<len; i++)
*to++ = *from++;
I use getPtrStride() to find Stride for memory access and return 0 is the Stride is not 1 or -1.
Differential revision: http://reviews.llvm.org/D20789
llvm-svn: 273257
This reverts commit r273019.
From email I sent to list:
> I don't think this makes sense. Either the linker you're using supports
> this feature, or it doesn't. Having it enabled for llc if your linker
> doesn't support it is not fun.
>
> Further note that this also affects basically all other code using llvm
> libraries -- other than Clang, which explicitly sets it back to false by
> default, unless you set the ENABLE_X86_RELAX_RELOCATIONS cmake flag to
> true.
>
> If you want to enable the relax mode across all llvm tools in some
> circumstances, I think it should be via moving the cmake flag from clang
> down into llvm.
>
> I'm going to revert this commit, since I both think it intrinsically
> doesn't make sense to do this, and because it's breaking some of our
> tools.
llvm-svn: 273245
This patch makes us perform interprocedural analysis on functions that
don't have internal linkage. It also removes a test that should've been
deleted in an earlier commit (since other tests now cover everything
that the newly-removed test covers).
Patch by Jia Chen.
Differential Revision: http://reviews.llvm.org/D21513
llvm-svn: 273229
This patch adds function summaries, so that we don't need to recompute
various properties about function parameters/return values at each
callsite of a function. It also adds many interprocedural tests for
CFLAA.
Patch by Jia Chen.
Differential Revision: http://reviews.llvm.org/D21475#inline-182390
llvm-svn: 273219
The BSWAP of vector types is quite efficiently implemented using vector shuffles on SSE/AVX targets, we should reflect the typical cost of this to encourage vectorization.
Differential Revision: http://reviews.llvm.org/D21521
llvm-svn: 273217
Fix for PR27726 - sitofp i64 to fp128 was loading the merged load i64 to a x87 register preventing legalization for conversion to fp128.
Added 32-bit tests for fp128 cast/conversions.
llvm-svn: 273210
Darwin added support in its Xcode 8.0 tools (released in the beta) for universal
files where offsets and sizes for the objects are 64-bits to allow support for
objects contained in universal files to be larger then 4gb. The change is very
straight forward. There is a new magic number that differs by one bit, much
like the 64-bit Mach-O files. Then there is a new structure that follow the
fat_header that has the same layout but with the offset and size fields using
64-bit values instead of 32-bit values.
rdar://26899493
llvm-svn: 273207
There is a known intended race here. This is a follow-up to r264805,
which disabled tsan instrumentation for updates to instrprof counters.
For more background on this please see the discussion in D18164.
llvm-svn: 273202
By moving this transform to InstSimplify from InstCombine, we sidestep the problem/question
raised by PR27869:
https://llvm.org/bugs/show_bug.cgi?id=27869
...where InstCombine turns an icmp+zext into a shift causing us to miss the fold.
Credit to David Majnemer for a draft patch of the changes to InstructionSimplify.cpp.
Differential Revision: http://reviews.llvm.org/D21512
llvm-svn: 273200
Summary: Inliner needs ACT when calling InlineFunction. Instead of nullptr, we need to pass it in from SampleProfileLoader
Reviewers: davidxl
Subscribers: eraman, vsk, danielcdh, llvm-commits
Differential Revision: http://reviews.llvm.org/D21205
llvm-svn: 273199
The implicit operand is added by the initial instruction construction,
so this was adding an additional vcc use. The original one
was missing the undef flag the original condition had,
so the verifier would complain.
llvm-svn: 273182
This will help sneak undefs past GVN into the DAG for
some tests.
Also add missing intrinsic for rsq_legacy, even though the node
was already selected to the instruction. Also start passing
the debug location to intrinsic errors.
llvm-svn: 273181
TargetLowering and DAGToDAG are used to combine ADDC, ADDE and UMLAL
dags into UMAAL. Selection is split into the two phases because it
is easier to match the two patterns at those different times.
Differential Revision: http://http://reviews.llvm.org/D21461
llvm-svn: 273165
This reverts commit r273160, reapplying r273132.
RecursivelyDeleteTriviallyDeadInstructions cannot be called on a
parentless Instruction.
llvm-svn: 273162
This reverts commit r273132.
Breaks multiple test under /llvm/test:Transforms (e.g.
llvm/test:Transforms/LoopIdiom/basic.ll.test) under asan.
llvm-svn: 273160
After a store has been eliminated, when making sure that the
instruction iterator points to a valid instruction, dbg intrinsics are
now ignored as a new instruction.
Patch by Henric Karlsson.
Reviewed by Daniel Berlin.
Differential Revision: http://reviews.llvm.org/D21076
llvm-svn: 273141
Removing dead instructions requires remembering which operands have
already been removed. RecursivelyDeleteTriviallyDeadInstructions has
this logic, don't partially reimplement it in LoopIdiomRecognize.
This fixes PR28196.
llvm-svn: 273132
We currently only allow exact matches of shuffle mask patterns during target shuffle combining.
This patch relaxes this to permit SM_SentinelUndef in the combined shuffle to always be accepted as well as allowing exact matching of the SM_SentinelZero value.
I've adjusted some tests that were requiring exact shuffle masks to now include undef values.
Differential Revision: http://reviews.llvm.org/D21495
llvm-svn: 273119
Passes to fix three hardware errata that appear on some LEON processor variants.
The instructions FSMULD, FMULS and FDIVS do not work as expected on some LEON processors. This change allows those instructions to be substituted for alternatives instruction sequences that are known to work.
These passes only run when selected individually, or as part of a processor defintion. They are not included in general SPARC processor compilations for non-LEON processors or for those LEON processors that do not have these hardware errata.
llvm-svn: 273108
Change the underlying offset and comparisons to use int64_t instead of
uint64_t.
Patch by River Riddle!
Differential Revision: http://reviews.llvm.org/D21499
llvm-svn: 273105
Summary:
JR is an alias of JALR with $rd=0 in the R6 ISA. Also, this fixes recursive
builds in MIPS32R6.
Reviewers: dsanders, sdardis
Subscribers: jfb, dschuff, dsanders, sdardis, llvm-commits
Differential Revision: http://reviews.llvm.org/D21370
llvm-svn: 273085
CodeGen has hooks that allow targets to emit specialized code instead
of calls to memcmp, memchr, strcpy, stpcpy, strcmp, strlen, strnlen.
When ASan/MSan/TSan/ESan is in use, this sidesteps its interceptors, resulting
in uninstrumented memory accesses. To avoid that, make these sanitizers
mark the calls as nobuiltin.
Differential Revision: http://reviews.llvm.org/D19781
llvm-svn: 273083
Don't use AllocateStack because kernel arguments have nothing
to do with the stack. The ensureMaxAlignment call was still
changing the stack alignment.
llvm-svn: 273080
The way we elide max expressions when computing trip counts is incorrect
-- it breaks cases like this:
```
static int wrapping_add(int a, int b) {
return (int)((unsigned)a + (unsigned)b);
}
void test() {
volatile int end_buf = 2147483548; // INT_MIN - 100
int end = end_buf;
unsigned counter = 0;
for (int start = wrapping_add(end, 200); start < end; start++)
counter++;
print(counter);
}
```
Note: the `NoWrap` variable that was being tested has little to do with
the values flowing into the max expression; it is a property of the
induction variable.
test/Transforms/LoopUnroll/nsw-tripcount.ll was added to solely test
functionality I'm reverting in this change, so I've deleted the test
fully.
llvm-svn: 273079
This is a functional change for LLE and LDist. The other clients (LV,
LVerLICM) already had this explicitly enabled.
The temporary boolean parameter to LAA is removed that allowed turning
off speculation of symbolic strides. This makes LAA's caching interface
LAA::getInfo only take the loop as the parameter. This makes the
interface more friendly to the new Pass Manager.
The flag -enable-mem-access-versioning is moved from LV to a LAA which
now allows turning off speculation globally.
llvm-svn: 273064
This should select to s_trap, but that requires
additonal work to setup and enable the trap handler.
For now emit s_endpgm so bugpoint stops getting stuck
on the unsupported call to abort.
Emit a warning that this will only terminate the wave and
not really trap.
llvm-svn: 273062
Darwin added support in its Xcode 8.0 tools (released in the beta) for static
library table of contents with 64-bit offsets to the archive members. The
change is very straight forward. The table of contents member is named
___.SYMDEF_64 or "___.SYMDEF_64 SORTED" and same layout is used but with
fields using 64 bit values instead of 32 bit values.
rdar://26869808
llvm-svn: 273058
An incomplete member pointer type will always have a size of zero, so we
don't need an extra flag. Credit to David Majnemer for the idea.
llvm-svn: 273057
Summary:
This seems like the least intrusive way to pass this information
through.
Fixes PR28151
Reviewers: majnemer, aprantl, dblaikie
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D21444
llvm-svn: 273053
This is indeed a much cleaner approach (thanks to Daniel Berlin
for pointing out), and also David/Sean for review.
Differential Revision: http://reviews.llvm.org/D21454
llvm-svn: 273032
Many CPUs only have the ability to do a 4-byte cmpxchg (or ll/sc), not 1
or 2-byte. For those, you need to mask and shift the 1 or 2 byte values
appropriately to use the 4-byte instruction.
This change adds support for cmpxchg-based instruction sets (only SPARC,
in LLVM). The support can be extended for LL/SC-based PPC and MIPS in
the future, supplanting the ISel expansions those architectures
currently use.
Tests added for the IR transform and SPARCv9.
Differential Revision: http://reviews.llvm.org/D21029
llvm-svn: 273025
llvm-mc is a developer tool, as such it make sense for it to use new
features by default.
This doesn't change the user facing clang, which still defaults to non
relaxable relocations.
llvm-svn: 273014
The motivating example for this transform is similar to D20774 where bitcasts interfere
with a single cmp/select sequence, but in this case we have 2 uses of each bitcast to
produce min and max ops:
define void @minmax_bc_store(<4 x float> %a, <4 x float> %b, <4 x float>* %ptr1, <4 x float>* %ptr2) {
%cmp = fcmp olt <4 x float> %a, %b
%bc1 = bitcast <4 x float> %a to <4 x i32>
%bc2 = bitcast <4 x float> %b to <4 x i32>
%sel1 = select <4 x i1> %cmp, <4 x i32> %bc1, <4 x i32> %bc2
%sel2 = select <4 x i1> %cmp, <4 x i32> %bc2, <4 x i32> %bc1
%bc3 = bitcast <4 x float>* %ptr1 to <4 x i32>*
store <4 x i32> %sel1, <4 x i32>* %bc3
%bc4 = bitcast <4 x float>* %ptr2 to <4 x i32>*
store <4 x i32> %sel2, <4 x i32>* %bc4
ret void
}
With this patch, we move the selects up to use the input args which allows getting rid of
all of the bitcasts:
define void @minmax_bc_store(<4 x float> %a, <4 x float> %b, <4 x float>* %ptr1, <4 x float>* %ptr2) {
%cmp = fcmp olt <4 x float> %a, %b
%sel1.v = select <4 x i1> %cmp, <4 x float> %a, <4 x float> %b
%sel2.v = select <4 x i1> %cmp, <4 x float> %b, <4 x float> %a
store <4 x float> %sel1.v, <4 x float>* %ptr1, align 16
store <4 x float> %sel2.v, <4 x float>* %ptr2, align 16
ret void
}
The asm for x86 SSE then improves from:
movaps %xmm0, %xmm2
cmpltps %xmm1, %xmm2
movaps %xmm2, %xmm3
andnps %xmm1, %xmm3
movaps %xmm2, %xmm4
andnps %xmm0, %xmm4
andps %xmm2, %xmm0
orps %xmm3, %xmm0
andps %xmm1, %xmm2
orps %xmm4, %xmm2
movaps %xmm0, (%rdi)
movaps %xmm2, (%rsi)
To:
movaps %xmm0, %xmm2
minps %xmm1, %xmm2
maxps %xmm0, %xmm1
movaps %xmm2, (%rdi)
movaps %xmm1, (%rsi)
The TODO comments show that we're limiting this transform only to vectors and only to bitcasts
because we need to improve other transforms or risk creating worse codegen.
Differential Revision: http://reviews.llvm.org/D21190
llvm-svn: 273011
Names in function id records don't include nested name specifiers or
template arguments, but names in the symbol stream include both.
For the symbol stream, instead of having Clang put the fully qualified
name in the subprogram display name, recreate it from the subprogram
scope chain. For the type stream, take the unqualified name and chop of
any template arguments.
This makes it so that CodeView DI metadata is more similar to DWARF DI
metadata.
llvm-svn: 273009
Recommiting after fixing non-atomic insert to front of SmallVector in
MCAsmLexer.h
Add explicit Comment Token in Assembly Lexing for future support for
outputting explicit comments from inline assembly. As part of this,
CPPHash Directives are now explicitly distinguished from Hash line
comments in Lexer.
Line comments are recorded as EndOfStatement tokens, not Comment tokens
to simplify compatibility with current TargetParsers. This slightly
complicates comment output.
This remove all lexing tasks out of the parser, does minor cleanup
to remove extraneous newlines Asm Output, and some improvements white
space handling.
Reviewers: rtrieu, dwmw2, rnk
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D20009
llvm-svn: 273007
Reapplying patch as it was reverted when it was first
committed because of an assertion failure when the
mrrc2 intrinsic was called in ARM mode. The failure
was happening because the instruction was being built
in ARMISelDAGToDAG.cpp and the tablegen description for
mrrc2 instruction doesn't allow you to use a predicate.
The ARM architecture manuals do say that mrrc2 in ARM
mode can be predicated with AL in assembly but this has
no effect on the encoding of the instruction as the top
4 bits will always be 1111 not 1110 which is the encoding
for the condition AL.
Differential Revision: http://reviews.llvm.org/D21408
llvm-svn: 272982
This is a fix for PR27844.
When replacing uses of unsafe allocas, emit the new location
immediately after each use. Without this, the pointer stays live from
the function entry to the last use, while it's usually cheaper to
recalculate.
llvm-svn: 272969
When moving unsafe allocas to the unsafe stack, dbg.declare intrinsics are
updated to refer to the new location.
This change does the same to dbg.value intrinsics.
llvm-svn: 272968
MSVC handles enums differently from structs and classes: a forward
declaration is not emitted unconditionally. MSVC does not emit an S_UDT
record for the enum.
Differential Revision: http://reviews.llvm.org/D21442
llvm-svn: 272960
Redundant invariant loads can be CSE'ed with very little extra effort
over what early-cse already tracks, so it looks reasonable to make
early-cse handle this case.
llvm-svn: 272954
Add explicit Comment Token in Assembly Lexing for future support for
outputting explicit comments from inline assembly. As part of this,
CPPHash Directives are now explicitly distinguished from Hash line
comments in Lexer.
Line comments are recorded as EndOfStatement tokens, not Comment tokens
to simplify compatibility with current TargetParsers. This slightly
complicates comment output.
This remove all lexing tasks out of the parser, does minor cleanup
to remove extraneous newlines Asm Output, and some improvements white
space handling.
Reviewers: rtrieu, dwmw2, rnk
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D20009
llvm-svn: 272953
This test checks that the string 'bar' (no quotes) doesn't exist in the
output after running opt. But opt embeds the absolute path to the
filename, and on my machine, the filename contains the string 'jlebar',
causing the test to fail.
This patch changes the test to look for the string '"bar"' instead.
llvm-svn: 272941
Daniel Berlin expressed some real concerns about the port and proposed
and alternative approach. I'll revert this for now while working on a
new patch, which I hope to put up for review shortly. Sorry for the churn.
llvm-svn: 272925
When calculating a square root using Newton-Raphson with two constants,
a naive implementation is to use five multiplications (four muls to calculate
reciprocal square root and another one to calculate the square root itself).
However, after some reassociation and CSE the same result can be obtained
with only four multiplications. Unfortunately, there's no reliable way to do
such a reassociation in the back-end. So, the patch modifies NR code itself
so that it directly builds optimal code for SQRT and doesn't rely on any
further reassociation.
Patch by Nikolai Bozhenov!
Differential Revision: http://reviews.llvm.org/D21127
llvm-svn: 272920
This is currently only performed in the Vectorizer. I will change this
as symbolic stride collection is moved to LAA.
This test will track when the actual functional change occurs.
llvm-svn: 272918
This fixes IMAGE_REL_I386_DIR32, IMAGE_REL_I386_DIR32NB,
IMAGE_REL_I386_SECREL, and IMAGE_REL_I386_REL32 relocations.
Based on patch by Jon Turney <jon.turney@dronecode.org.uk>
llvm-svn: 272911
The R_ARM_PLT32 relocation is deprecated and is not produced by MC.
This means that the code being deleted is dead from the .o point of
view and was making the .s more confusing.
llvm-svn: 272909
The -mattr options in these four tests have no effect on the output of
llvm-objdump. In the case of the two Mips tests, removing the -mattr option
left duplicate RUN lines so the duplicates have been removed.
llvm-svn: 272906
We should update results of the BranchProbabilityInfo after removing block in JumpThreading. Otherwise
we will get dangling pointer inside BranchProbabilityInfo cache.
Differential Revision: http://reviews.llvm.org/D20957
llvm-svn: 272891
Added checks to make sure the Scalarizer::transferMetadata() don't
remove valid debug locations from instructions. This is important as
the verifier pass require that e.g. inlinable callsites have a valid
debug location.
https://llvm.org/bugs/show_bug.cgi?id=27938
Patch by Karl-Johan Karlsson
Reviewers: dblaikie
Differential Revision: http://reviews.llvm.org/D20807
llvm-svn: 272884
Summary:
[ls][bh] and [ls][bh]u cannot use sp-relative addresses and must therefore
lower frameindex nodes such that there is a copy to a CPU16Regs register. This
is now done consistently using a separate addressing mode that does not
permit frameindex nodes.
As part of this I've had to remove an optimization that reduced the number of
instructions needed to work around the lack of sp-relative addresses on [ls][bh]
and [ls][bh]u. This optimization used one of the eight CPU16Regs registers as
a copy of the stack pointer and it's implementation was the root cause of many
of the register vs register class mismatches.
lw/sw can use sp-relative addresses but we ought to ensure that we use the
correct version of lw/sw internally for things like IAS. This is not currently
the case and this change does not fix this. However, this change does clean it
up sufficiently well to fix the machine verifier failures.
Also removed irrelevant functions from stchar.ll.
Reviewers: sdardis
Subscribers: dsanders, sdardis, llvm-commits
Differential Revision: http://reviews.llvm.org/D21062
llvm-svn: 272882
Summary:
The Mips implementation only covers the feature bits described by the ELF
e_flags so far. Mips stores additional feature bits such as MSA in the
.MIPS.abiflags section.
Also fixed a small bug this revealed where microMIPS wouldn't add the
EF_MIPS_MICROMIPS flag when using -filetype=obj.
Reviewers: echristo, rafael
Subscribers: rafael, mehdi_amini, dsanders, sdardis, llvm-commits
Differential Revision: http://reviews.llvm.org/D21125
llvm-svn: 272880
(i == 5334 || i == 5335)
to:
((i & -2) == 5334)
This transformation has some incorrect side conditions. Specifically, the
transformation is only applied when the right-hand side constant (5334 in
the example) is a power of two not equal and not equal to the negated mask.
These side conditions were added in r258904 to fix PR26323. The correct side
condition is that: ((Constant & Mask) == Constant)[(5334 & -2) == 5334].
It's a little bit hard to see why these transformations are correct and what
the side conditions ought to be. Here is a CVC3 program to verify them for
64-bit values:
ONE : BITVECTOR(64) = BVZEROEXTEND(0bin1, 63);
x : BITVECTOR(64);
y : BITVECTOR(64);
z : BITVECTOR(64);
mask : BITVECTOR(64) = BVSHL(ONE, z);
QUERY( (y & ~mask = y) =>
((x & ~mask = y) <=> (x = y OR x = (y | mask)))
);
Please note that each pattern must be a dual implication (<--> or iff). One
directional implication can create spurious matches. If the implication is
only one-way, an unsatisfiable condition on the left side can imply a
satisfiable condition on the right side. Dual implication ensures that
satisfiable conditions are transformed to other satisfiable conditions and
unsatisfiable conditions are transformed to other unsatisfiable conditions.
Here is a concrete example of a unsatisfiable condition on the left
implying a satisfiable condition on the right:
mask = (1 << z)
(x & ~mask) == y --> (x == y || x == (y | mask))
Substituting y = 3, z = 0 yields:
(x & -2) == 3 --> (x == 3 || x == 2)
The version of this code before r258904 had no side-conditions and
incorrectly justified itself in comments through one-directional
implication.
Thanks to Chandler for the suggestion!
Author: Thomas Jablin (tjablin)
Reviewers: chandlerc majnemer hfinkel cycheng
http://reviews.llvm.org/D21417
llvm-svn: 272873
The backend has been around for years, it's pretty ridiculous that we can't
even use the preferred form for printing "MOV" aliases. Unfortunately, TableGen
can't handle the complex predicates when printing so it's a bunch of nasty C++.
Oh well.
llvm-svn: 272865
It was printing out nothing in this case.
llvm-objdump tries to disassemble sections a symbol at a time. In the case of a
fully stripped Mach-O executable the only symbol remaining in the (__TEXT,__text)
section is the special linker defined symbol __mh_execute_header . This
symbol is special in that while it is N_SECT symbol in the (__TEXT,__text)
its address is before the start of the (__TEXT,__text). It’s address is the
start of the __TEXT segment which is where the mach header is statically
linked. So the code in DisassembleMachO() needs to deal with this case specially.
rdar://26778273
llvm-svn: 272837
Of course the assembly was right but because the opcode was MOVZWi it was
encoded as "movz w16, #65535, lsl #32" which is an unallocated encoding and
would go horribly wrong on a CPU.
No idea how this bug survived this long. It seems nobody is using that aspect
of patchpoints.
llvm-svn: 272831
- We lacked a short unique identifier for a statistics, so I renamed the
current "Name" field that just contained the DEBUG_TYPE name of the
current file to DebugType and added a new "Name" field that contains
the C++ identifier of the statistic variable.
- Add the -stats-json option which outputs statistics in json format.
Differential Revision: http://reviews.llvm.org/D20995
llvm-svn: 272826
Emit a S_UDT record for typedefs. We still need to do something for
class types.
Differential Revision: http://reviews.llvm.org/D21149
llvm-svn: 272813
This allows us to emit native IR in Clang (next commit).
Also, update the intrinsic tests to show that codegen already knows how to handle
the IR that Clang will soon produce.
llvm-svn: 272806
We would fail to validate the type of the tan function which would cause
downstream users of isValidProtoForLibFunc to assert.
This fixes PR28143.
llvm-svn: 272802
[DAG] Previously debug values would transfer debuginfo for the selected
start node for a replacement which allows for debug to be dropped.
Push debug value transfer to occur with node/value replacement in
SelectionDAG, remove now extraneous transfers of debug values.
This refixes PR9817 which was being incompletely checked in the
testsuite.
Reviewers: jyknight
Subscribers: dblaikie, llvm-commits
Differential Revision: http://reviews.llvm.org/D21037
llvm-svn: 272792
This uses the "runImpl" approach to share code with the old PM.
Porting to the new PM meant abandoning the anonymous namespace enclosing
most of SLPVectorizer.cpp which is a bit of a bummer (but not a big deal
compared to having to pull the pass class into a header which the new PM
requires since it calls the constructor directly).
llvm-svn: 272766
Summary:
This fixes two related bugs. First, the generic optimization passes
unfortunately generate negative constant offsets but the hardware treats
SOffset as an unsigned value.
Second, there is a hardware bug on SI and CI, where address clamping in MUBUF
instructions does not work correctly when SOffset is larger than the buffer
size. This patch works around this bug by never using SOffset.
An alternative workaround would be to do the clamping manually when SOffset
is too large, but generating the required code sequence during instruction
selection would be rather involved, and in any case the resulting code would
probably be worse.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96360
Reviewers: arsenm, tstellarAMD
Subscribers: arsenm, llvm-commits, kzhuravl
Differential Revision: http://reviews.llvm.org/D21326
llvm-svn: 272761
Summary:
... when the offset is not statically known.
Prioritize addresses relative to the stack pointer in the stackmap, but
fallback gracefully to other modes of addressing if the offset to the
stack pointer is not a known constant.
Patch by Oscar Blumberg!
Reviewers: sanjoy
Subscribers: llvm-commits, majnemer, rnk, sanjoy, thanm
Differential Revision: http://reviews.llvm.org/D21259
llvm-svn: 272756
Summary:
We we have an MCConstantExpr, we can encode it directly into the instruction
instead of emitting fixups.
Reviewers: artem.tamazov, vpykhtin, SamWot, nhaustov, arsenm
Subscribers: arsenm, llvm-commits, kzhuravl
Differential Revision: http://reviews.llvm.org/D21236
Change-Id: I88b3edf288d48e65c5d705fc4850d281f8e36948
llvm-svn: 272750
Summary:
We can now reference symbols directly in operands, like this:
s_mov_b32 s0, global
Reviewers: artem.tamazov, vpykhtin, SamWot, nhaustov
Subscribers: arsenm, llvm-commits, kzhuravl
Differential Revision: http://reviews.llvm.org/D21038
llvm-svn: 272748
r272715 broke libcxx because it did not correctly handle cases where the
last iteration of one IV is the second-to-last iteration of another.
Original commit message:
Vectorizing loops with "escaping" IVs has been disabled since r190790, due to
PR17179. This re-enables it, with support for external use of both
"post-increment" (last iteration) and "pre-increment" (second-to-last iteration)
IVs.
llvm-svn: 272742
We do not support splitting cleanuppad or catchswitches. This is
problematic for passes which assume that a loop is in loop simplify
form (the loop would have a dedicated exit block instead of sharing it).
While it isn't great that we don't support this for cleanups, we still
cannot make loop-simplify form an assertable precondition because
indirectbr will also disable these sorts of CFG cleanups.
This fixes PR28132.
llvm-svn: 272739
Nearly all the changes to this pass have been done while maintaining and
updating other parts of LLVM. LLVM has had another pass, SROA, which
has superseded ScalarReplAggregates for quite some time.
Differential Revision: http://reviews.llvm.org/D21316
llvm-svn: 272737
Summary: With runtime profile, we have more confidence in branch probability, thus during basic block layout, we set a lower hot prob threshold so that blocks can be layouted optimally.
Reviewers: djasper, davidxl
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D20991
llvm-svn: 272729
Vectorizing loops with "escaping" IVs has been disabled since r190790, due to
PR17179. This re-enables it, with support for external use of both
"post-increment" (last iteration) and "pre-increment" (second-to-last iteration)
IVs.
Differential Revision: http://reviews.llvm.org/D21048
llvm-svn: 272715
If a local_unnamed_addr attribute is attached to a global, the address
is known to be insignificant within the module. It is distinct from the
existing unnamed_addr attribute in that it only describes a local property
of the module rather than a global property of the symbol.
This attribute is intended to be used by the code generator and LTO to allow
the linker to decide whether the global needs to be in the symbol table. It is
possible to exclude a global from the symbol table if three things are true:
- This attribute is present on every instance of the global (which means that
the normal rule that the global must have a unique address can be broken without
being observable by the program by performing comparisons against the global's
address)
- The global has linkonce_odr linkage (which means that each linkage unit must have
its own copy of the global if it requires one, and the copy in each linkage unit
must be the same)
- It is a constant or a function (which means that the program cannot observe that
the unique-address rule has been broken by writing to the global)
Although this attribute could in principle be computed from the module
contents, LTO clients (i.e. linkers) will normally need to be able to compute
this property as part of symbol resolution, and it would be inefficient to
materialize every module just to compute it.
See:
http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20160509/356401.htmlhttp://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20160516/356738.html
for earlier discussion.
Part of the fix for PR27553.
Differential Revision: http://reviews.llvm.org/D20348
llvm-svn: 272709
This change teaches llvm::isGuaranteedToTransferExecutionToSuccessor
that calls to @llvm.assume always terminate. Most other relevant
intrinsics should be covered by the "CS.onlyReadsMemory() ||
CS.onlyAccessesArgMemory()" bit but we were missing @llvm.assumes
because we state that it clobbers memory.
Added an LICM test case, but this change is not specific to LICM.
llvm-svn: 272703
For <N x i32> type mul, pmuludq will be used for targets without SSE41, which
often introduces many extra pack and unpack instructions in vectorized loop
body because pmuludq generates <N/2 x i64> type value. However when the operands
of <N x i32> mul are extended from smaller size values like i8 and i16, the type
of mul may be shrunk to use pmullw + pmulhw/pmulhuw instead of pmuludq, which
generates better code. For targets with SSE41, pmulld is supported so no
shrinking is needed.
Differential Revision: http://reviews.llvm.org/D20931
llvm-svn: 272694
This reverts commit 879139e1c6577b09df52de56a6bab856a19ed185.
This was committed accidentally when I blindly typed git svn
dcommit instead of the command to generate a patch.
llvm-svn: 272693
This patch also includes some refactoring.
Prior to this patch, we tagged all CFLAA attributes as unknown. This is
suboptimal, since it meant that any Value used as an argument would be
considered to alias any other Value that existed.
Now that we have the machinery to tag sets below the set for an
arbitrary value with attributes, it's okay to be less conservative with
arguments. (Specifically, we still tag the set under an argument with
unknown).
Patch by Jia Chen.
Differential Revision: http://reviews.llvm.org/D21262
llvm-svn: 272690
Change EmitGlobalVariable to check final assembler section is in BSS
before using .lcomm/.comm directive. This prevents globals from being
put into .bss erroneously when -data-sections is used.
This fixes PR26570.
Reviewers: echristo, rafael
Subscribers: llvm-commits, mehdi_amini
Differential Revision: http://reviews.llvm.org/D21146
llvm-svn: 272674
The feature allows for conditional assembly etc.
TODO: make those symbols read-only.
Test added.
Differential Revision: http://reviews.llvm.org/D21238
llvm-svn: 272673
Summary:
This new alias takes a comma separated list of prefixes which allows
'--check-prefix=A --check-prefix=B --check-prefix=C' to be written as
'--check-prefixes=A,B,C'.
Reviewers: probinson
Subscribers: probinson, llvm-commits, dsanders
Differential Revision: http://reviews.llvm.org/D21293
llvm-svn: 272670
Instead of always using addu to adjust the stack pointer when the
size out is of the range of an addiu instruction, use subu so that
a smaller constant can be generated.
This can give savings of ~3 instructions whenever a function has a
a stack frame whose size is out of range of an addiu instruction.
This change may break some naive stack unwinders.
Partially resolves PR/26291.
Thanks to David Chisnall for reporting the issue.
Reviewers: dsanders, vkalintiris
Differential Review: http://reviews.llvm.org/D21321
llvm-svn: 272666
We can only generate immediates up to #510 with a MOV+ADD, not #511, because there's no such instruction as add #256.
Found by Oliver Stannard and csmith!
llvm-svn: 272665
PR27458 highlights that the MIPS backend does not have well formed
MIR for atomic operations (among other errors).
This patch adds expands and corrects the LL/SC descriptions and uses
for MIPS(64).
Reviewers: dsanders, vkalintiris
Differential Review: http://reviews.llvm.org/D19719
llvm-svn: 272655
This patch implements the N32 case where -mno-shared is in effect. The case
where -mshared is in effect will be added later since doing that now requires
additional changes to how we handle %hi(%neg(%gp_rel(foo))) expressions to
emit the three relocations as three relocations (currently only one of the
three would be emitted) which then requires further changes to our MCFixup
handling.
While we could fix both cases together, fixing the -mno-shared case allows us
to fix the ELFCLASS bug (where N32 incorrectly uses ELFCLASS64 instead of
ELFCLASS32) in a way that allows cpsetup.s to check for a correct output instead
of another incorrect output.
Reviewers: sdardis
Subscribers: dsanders, llvm-commits, sdardis
Differential Revision: http://reviews.llvm.org/D21131
llvm-svn: 272652
We only used to add the edge from the cloned loop to PHIs that
corresponded to values defined by the loop. We need to do this for all
PHIs obviously since we need a PHI operand for each incoming edge.
This includes things like PHIs with a constant value or with values
defined before the original loop (see the testcases).
After the patch the PHIs are added to the exit block in two passes.
In the first pass we ensure there is a single-operand (LCSSA) PHI for
each value defined by the loop.
In the second pass we loop through each (single-operand) PHI and add the
value for the edge from the cloned loop. If the value is defined in the
loop we'll use the cloned instruction from the cloned loop.
Fixes PR28037
llvm-svn: 272649
Itineraries for some pre MIPSR6 and EVA instructions. Some pseudo expanded
instructions are marked as having no scheduling info.
Reviewers: dsanders, vkalintiris
Differential Review: http://reviews.llvm.org/D20418
llvm-svn: 272648
Summary:
The machine verifier reports 'Explicit operand marked as def' when it is
manually specified even though it agrees with the operand info.
Reviewers: sdardis
Subscribers: dsanders, sdardis, llvm-commits
Differential Revision: http://reviews.llvm.org/D21065
llvm-svn: 272646
The exit-on-error flag in the ARM test is necessary in order to avoid an
unreachable in the DAGTypeLegalizer, when trying to expand a physical register.
We can also avoid this situation by introducing a bitcast early on, where the
invalid scalar-to-vector conversion is detected.
We also add a test for PowerPC, which goes through a similar code path in the
SelectionDAGBuilder.
Fixes PR27765.
Differential Revision: http://reviews.llvm.org/D21061
llvm-svn: 272644
The need for all these Lookup* functions is just because of calls to
getAnalysis inside methods (i.e. not at the top level) of the
runOnFunction method. They should be straightforward to clean up when
the old PM is gone.
llvm-svn: 272615
This reverts commit r272603 and adds a fix.
Big thanks to Davide for pointing me at r216244 which gives some insight
into how to fix this VS2013 issue. VS2013 can't synthesize a move
constructor. So the fix here is to add one explicitly to the
JumpThreadingPass class.
llvm-svn: 272607
This follows the approach in r263208 (for GVN) pretty closely:
- move the bulk of the body of the function to the new PM class.
- expose a runImpl method on the new-PM class that takes the IRUnitT and
pointers/references to any analyses and use that to implement the
old-PM class.
- use a private namespace in the header for stuff that used to be file
scope
llvm-svn: 272597
Summary:
AAResults::callCapturesBefore would previously ignore operand
bundles. It was possible for a later instruction to miss its memory
dependency on a call site that would only access the pointer through a
bundle.
Patch by Oscar Blumberg!
Reviewers: sanjoy
Differential Revision: http://reviews.llvm.org/D21286
llvm-svn: 272580
Summary:
Mesa and other users must set this to enable coalescing:
- STRIDE = 0
- SWIZZLE_ENABLE = 1
This makes one particular compute shader 8x faster.
Reviewers: tstellarAMD, arsenm
Subscribers: arsenm, kzhuravl
Differential Revision: http://reviews.llvm.org/D21136
llvm-svn: 272556
This enables use of the 'R' and 'T' memory constraints for inline ASM
operands on SystemZ, which allow an index register as well as an
immediate displacement. This patch includes corresponding documentation
and test case updates.
As with the last patch of this kind, I moved the 'm' constraint to the
most general case, which is now 'T' (base + 20-bit signed displacement +
index register).
Author: colpell
Differential Revision: http://reviews.llvm.org/D21239
llvm-svn: 272547
MRRC/MRRC2 instruction writes to two registers. The
intrinsic definition returns a single uint64_t to
represent the write, this is a compact way of
representing a write to two 32 bit registers,
the alternative might have been two return a
struct of 2 uint32_t's but this isn't as nice.
Differential Revision:
llvm-svn: 272544
This patch is intended to solve:
https://llvm.org/bugs/show_bug.cgi?id=28044
By changing the definition of X86ISD::CMPP to use float types, we allow it to be created
and pass legalization for an SSE1-only target where v4i32 is not legal.
The motivational trail for this change includes:
https://llvm.org/bugs/show_bug.cgi?id=28001
and eventually makes this trigger:
http://reviews.llvm.org/D21190
Ie, after this step, we should be free to have Clang generate FP compare IR instead of x86
intrinsics for SSE C packed compare intrinsics. (We can auto-upgrade and remove the LLVM
sse.cmp intrinsics as a follow-up step.) Once we're generating vector IR instead of x86
intrinsics, a big pile of generic optimizations can trigger.
Differential Revision: http://reviews.llvm.org/D21235
llvm-svn: 272511
Below are my super rough notes when porting. They can probably serve as
a basic guide for porting other passes to the new PM. As I port more
passes I'll expand and generalize this and make a proper
docs/HowToPortToNewPassManager.rst document. There is also missing
documentation for general concepts and API's in the new PM which will
require some documentation.
Once there is proper documentation in place we can put up a list of
passes that have to be ported and game-ify/crowdsource the rest of the
porting (at least of the middle end; the backend is still unclear).
I will however be taking personal responsibility for ensuring that the
LLD/ELF LTO pipeline is ported in a timely fashion. The remaining passes
to be ported are (do something like
`git grep "<the string in the bullet point below>"` to find the pass):
General Scalar:
[ ] Simplify the CFG
[ ] Jump Threading
[ ] MemCpy Optimization
[ ] Promote Memory to Register
[ ] MergedLoadStoreMotion
[ ] Lazy Value Information Analysis
General IPO:
[ ] Dead Argument Elimination
[ ] Deduce function attributes in RPO
Loop stuff / vectorization stuff:
[ ] Alignment from assumptions
[ ] Canonicalize natural loops
[ ] Delete dead loops
[ ] Loop Access Analysis
[ ] Loop Invariant Code Motion
[ ] Loop Vectorization
[ ] SLP Vectorizer
[ ] Unroll loops
Devirtualization / CFI:
[ ] Cross-DSO CFI
[ ] Whole program devirtualization
[ ] Lower bitset metadata
CGSCC passes:
[ ] Function Integration/Inlining
[ ] Remove unused exception handling info
[ ] Promote 'by reference' arguments to scalars
Please let me know if you are interested in working on any of the passes
in the above list (e.g. reply to the post-commit thread for this patch).
I'll probably be tackling "General Scalar" and "General IPO" first FWIW.
Steps as I port "Deduce function attributes in RPO"
---------------------------------------------------
(note: if you are doing any work based on these notes, please leave a
note in the post-commit review thread for this commit with any
improvements / suggestions / incompleteness you ran into!)
Note: "Deduce function attributes in RPO" is a module pass.
1. Do preparatory refactoring.
Do preparatory factoring. In this case all I had to do was to pull out a static helper (r272503).
(TODO: give more advice here e.g. if pass holds state or something)
2. Rename the old pass class.
llvm/lib/Transforms/IPO/FunctionAttrs.cpp
Rename class ReversePostOrderFunctionAttrs -> ReversePostOrderFunctionAttrsLegacyPass
in preparation for adding a class ReversePostOrderFunctionAttrs as the pass in the new PM.
(edit: actually wait what? The new class name will be
ReversePostOrderFunctionAttrsPass, so it doesn't conflict. So this step is
sort of useless churn).
llvm/include/llvm/InitializePasses.h
llvm/lib/LTO/LTOCodeGenerator.cpp
llvm/lib/Transforms/IPO/IPO.cpp
llvm/lib/Transforms/IPO/FunctionAttrs.cpp
Rename initializeReversePostOrderFunctionAttrsPass -> initializeReversePostOrderFunctionAttrsLegacyPassPass
(note that the "PassPass" thing falls out of `s/ReversePostOrderFunctionAttrs/ReversePostOrderFunctionAttrsLegacyPass/`)
Note that the INITIALIZE_PASS macro is what creates this identifier name, so renaming the class requires this renaming too.
Note that createReversePostOrderFunctionAttrsPass does not need to be
renamed since its name is not generated from the class name.
3. Add the new PM pass class.
In the new PM all passes need to have their
declaration in a header somewhere, so you will often need to add a header.
In this case
llvm/include/llvm/Transforms/IPO/FunctionAttrs.h is already there because
PostOrderFunctionAttrsPass was already ported.
The file-level comment from the .cpp file can be used as the file-level
comment for the new header. You may want to tweak the wording slightly
from "this file implements" to "this file provides" or similar.
Add declaration for the new PM pass in this header:
class ReversePostOrderFunctionAttrsPass
: public PassInfoMixin<ReversePostOrderFunctionAttrsPass> {
public:
PreservedAnalyses run(Module &M, AnalysisManager<Module> &AM);
};
Its name should end with `Pass` for consistency (note that this doesn't
collide with the names of most old PM passes). E.g. call it
`<name of the old PM pass>Pass`.
Also, move the doxygen comment from the old PM pass to the declaration of
this class in the header.
Also, include the declaration for the new PM class
`llvm/Transforms/IPO/FunctionAttrs.h` at the top of the file (in this case,
it was already done when the other pass in this file was ported).
Now define the `run` method for the new class.
The main things here are:
a) Use AM.getResult<...>(M) to get results instead of `getAnalysis<...>()`
b) If the old PM pass would have returned "false" (i.e. `Changed ==
false`), then you should return PreservedAnalyses::all();
c) In the old PM getAnalysisUsage method, observe the calls
`AU.addPreserved<...>();`.
In the case `Changed == true`, for each preserved analysis you should do
call `PA.preserve<...>()` on a PreservedAnalyses object and return it.
E.g.:
PreservedAnalyses PA;
PA.preserve<CallGraphAnalysis>();
return PA;
Note that calls to skipModule/skipFunction are not supported in the new PM
currently, so optnone and optimization bisect support do not work. You can
just drop those calls for now.
4. Add the pass to the new PM pass registry to make it available in opt.
In llvm/lib/Passes/PassBuilder.cpp add a #include for your header.
`#include "llvm/Transforms/IPO/FunctionAttrs.h"`
In this case there is already an include (from when
PostOrderFunctionAttrsPass was ported).
Add your pass to llvm/lib/Passes/PassRegistry.def
In this case, I added
`MODULE_PASS("rpo-functionattrs", ReversePostOrderFunctionAttrsPass())`
The string is from the `INITIALIZE_PASS*` macros used in the old pass
manager.
Then choose a test that uses the pass and use the new PM `-passes=...` to
run it.
E.g. in this case there is a test that does:
; RUN: opt < %s -basicaa -functionattrs -rpo-functionattrs -S | FileCheck %s
I have added the line:
; RUN: opt < %s -aa-pipeline=basic-aa -passes='require<targetlibinfo>,cgscc(function-attrs),rpo-functionattrs' -S | FileCheck %s
The `-aa-pipeline=basic-aa` and
`require<targetlibinfo>,cgscc(function-attrs)` are what is needed to run
functionattrs in the new PM (note that in the new PM "functionattrs"
becomes "function-attrs" for some reason). This is just pulled from
`readattrs.ll` which contains the change from when functionattrs was ported
to the new PM.
Adding rpo-functionattrs causes the pass that was just ported to run.
llvm-svn: 272505
Summary: This also deprecated the get attribute function familly.
Reviewers: Wallbraker, whitequark, joker.eph, echristo, rafael, jyknight
Subscribers: axw, joker.eph, llvm-commits
Differential Revision: http://reviews.llvm.org/D19181
llvm-svn: 272504
It isn't legal to hoist a load past a call which might not return;
even if it doesn't throw, it could, for example, call exit().
Fixes http://llvm.org/PR27953.
llvm-svn: 272495
Summary:
Make isGuaranteedToExecute use the
isGuaranteedToTransferExecutionToSuccessor helper, and make that helper
a bit more accurate.
There's a potential performance impact here from assuming that arbitrary
calls might not return. This probably has little impact on loads and
stores to a pointer because most things alias analysis can reason about
are dereferenceable anyway. The other impacts, like less aggressive
hoisting of sdiv by a variable and less aggressive hoisting around
volatile memory operations, are unlikely to matter for real code.
This also impacts SCEV, which uses the same helper. It's a minor
improvement there because we can tell that, for example, memcpy always
returns normally. Strictly speaking, it's also introducing
a bug, but it's not any worse than everywhere else we assume readonly
functions terminate.
Fixes http://llvm.org/PR27857.
Reviewers: hfinkel, reames, chandlerc, sanjoy
Subscribers: broune, llvm-commits
Differential Revision: http://reviews.llvm.org/D21167
llvm-svn: 272489
The script now replace '.LCPI888_8' style asm symbols with the {{\.LCPI.*}} re pattern - this helps stop hardcoded symbols in 32-bit x86 tests changing with every edit of the file
Refreshed some tests to demonstrate the new check
llvm-svn: 272488
PSHUFB can speed up BITREVERSE of byte vectors by performing LUT on the low/high nibbles separately and ORing the results. Wider integer vector types are already BSWAP'd beforehand so also make use of this approach.
llvm-svn: 272477
Summary:
Iterates all (except the first and the last) operands within each GEP
instruction for instrumentation.
Adds test struct_field_gep.ll.
Reviewers: aizatsky
Subscribers: vitalybuka, zhaoqin, kcc, eugenis, bruening, llvm-commits
Differential Revision: http://reviews.llvm.org/D21242
llvm-svn: 272442
Support and generate Compare and Traps like CRT, CIT, etc.
Support Trap as legal DAG opcodes and generate "j .+2" for them by default.
Add support for Conditional Traps and use the If Converter to convert them into
the corresponding compare and trap opcodes.
Differential Revision: http://reviews.llvm.org/D21155
llvm-svn: 272419
Summary:
We need to set the fixup type to FK_Data_4 for the
SCRATCH_RSRC_DWORD[01] symbols, since these require absolute
relocations, and fixup_si_rodata is for relative relocations.
Reviewers: arsenm, kzhuravl
Subscribers: arsenm, kzhuravl, llvm-commits
Differential Revision: http://reviews.llvm.org/D21153
llvm-svn: 272417
Adds a MachineFunctionPass that scans the body to find calls, and
update the register mask with the one saved by the
RegUsageInfoCollector analysis in PhysicalRegisterUsageInfo.
Patch by Vivek Pandya <vivekvpandya@gmail.com>
Differential Revision: http://reviews.llvm.org/D21180
llvm-svn: 272414
The costs are somewhat hand-wavy, but should be much closer to the truth
than what we get from BasicTTI.
Differential Revision: http://reviews.llvm.org/D21156
llvm-svn: 272406
Add an option to enable the analysis of MachineFunction register
usage to extract the list of clobbered registers.
When enabled, the CodeGen order is changed to be bottom up on the Call
Graph.
The analysis is split in two parts, RegUsageInfoCollector is the
MachineFunction Pass that runs post-RA and collect the list of
clobbered registers to produce a register mask.
An immutable pass, RegisterUsageInfo, stores the RegMask produced by
RegUsageInfoCollector, and keep them available. A future tranformation
pass will use this information to update every call-sites after
instruction selection.
Patch by Vivek Pandya <vivekvpandya@gmail.com>
Differential Revision: http://reviews.llvm.org/D20769
llvm-svn: 272403
Somehow, the codegen logic for these sequences has gone completely untested
until now (note the 2 compare instructions generated per test).
There's also an *Intel* AVX optimization opportunity exposed in these cases
and the existing tests. Intel's (but not AMD's) AVX spec shows that extra FP
predicates were added, so a single comparison should always be sufficient,
and operand commutation should never be necessary.
llvm-svn: 272397
This reapplies commit r272385 with a fix. The build was failing when compiled
with gcc, but not with clang. With the fix, we now get the data layout from the
current TTI implementation, which will hopefully solve the issue.
llvm-svn: 272395
This patch refines the default cost for interleaved load groups having gaps. If
a load group has gaps, the legalized instructions corresponding to the unused
elements will be dead. Thus, we don't need to account for them in the cost
model. Instead, we only need to account for the fraction of legalized loads
that will actually be used.
Differential Revision: http://reviews.llvm.org/D20873
llvm-svn: 272385
Summary:
sext() modifier is supported in SDWA instructions only for integer operands. Spec is unclear should integer operands support abs and neg modifiers with sext - for now they are not supported.
Renamed InputModsWithNoDefault to FloatInputMods. Added SextInputMods for operands that support sext() modifier.
Added AMDGPUOperand::Modifier struct to handle register and immediate modifiers.
Code cleaning in AMDGPUOperand class: organize method in groups (render-, predicate-methods...).
Reviewers: vpykhtin, artem.tamazov, tstellarAMD
Subscribers: arsenm, kzhuravl
Differential Revision: http://reviews.llvm.org/D20968
llvm-svn: 272384
Memory operand is new for AVX512 (SSE/AVX2 didn't support it).
Also dropped the 'mask' from the tests (VPSLLDQ/VPSRLDQ don't support masked operations).
Regenerated VPALIGNR test now that the shuffle comments work
llvm-svn: 272383
The test case is not great espicially because it is still cumbersome to
run the regalloc pass with run-pass. (We miss a bunch of initiliazier to
be properly implemented.)
Related to llvm.org/PR27983
llvm-svn: 272360
Previously we could run only one machine pass with the run-pass option.
With that patch, we can now specify several passes with several run-pass
options (or just one option with a list of comma separated passes) and
llc will build the related pipeline.
This is great to test the interaction of two passes that are not
necessarily next to each other in the pipeline, or play with pass
ordering.
Now, we should be at parity with opt for the flexibility of running
passes.
Note: I also moved the run pass option from CommandFlags.h to llc.cpp
because, really, this is needed only there!
llvm-svn: 272356
Summary:
Adds ClInstrumentFastpath option to control fastpath instrumentation.
Avoids the load/store instrumentation for the cache fragmentation tool.
Renames cache_frag_basic.ll to working_set_slow.ll for slowpath
instrumentation test.
Adds the __esan_init check in struct_field_count_basic.ll.
Reviewers: aizatsky
Subscribers: llvm-commits, bruening, eugenis, kcc, zhaoqin, vitalybuka
Differential Revision: http://reviews.llvm.org/D21079
llvm-svn: 272355
Summary:
This fixes a bug with ds_*permute instructions where if it was passed a
constant address, then the offset operand would get assigned a register
operand instead of an immediate.
Reviewers: scchan, arsenm
Subscribers: arsenm, llvm-commits
Differential Revision: http://reviews.llvm.org/D19994
llvm-svn: 272349
This was using extract_subreg sub0 to extract the low register
of the result instead of sub0_sub1, producing an invalid copy.
There doesn't seem to be a way to use the compound subreg indices
in tablegen since those are generated, so manually select it.
llvm-svn: 272344
Summary:
We failed to unpoison uninteresting allocas on return as unpoisoning is part of
main instrumentation which skips such allocas.
Added check -asan-instrument-allocas for dynamic allocas. If instrumentation of
dynamic allocas is disabled it will not will not be unpoisoned.
PR27453
Reviewers: kcc, eugenis
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D21207
llvm-svn: 272341
Prior to this patch, we used argument/global stratified attributes in
order to note that a value could have come from either dereferencing a
global/arg, or from the assignment from a global/arg.
Now, AttrUnknown is placed on sets when we see a dereference, instead of
the global/arg attributes. This allows us to be more aggressive in the
future when we see global/arg attributes without AttrUnknown.
Patch by Jia Chen.
Differential Revision: http://reviews.llvm.org/D21110
llvm-svn: 272335
Summary:
We still want to unpoison full stack even in use-after-return as it can be disabled at runtime.
PR27453
Reviewers: eugenis, kcc
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D21202
llvm-svn: 272334
Instead of directly using MaxFunctionCount and function entry count to determine callee hotness, use the isHotFunction/isColdFunction methods provided by ProfileSummaryInfo.
Differential revision: http://reviews.llvm.org/D21045
llvm-svn: 272321
512-bit VPSLLDQ/VPSRLDQ can only be used for avx512bw targets so lowerVectorShuffleAsShift had to be adjusted to include the subtarget
llvm-svn: 272300
Summary:
Currently clang emits these instructions via inline (volatile) asm in
the CUDA headers. Switching to intrinsics will let the optimizer reason
across calls to these intrinsics.
Reviewers: tra
Subscribers: llvm-commits, jholewinski
Differential Revision: http://reviews.llvm.org/D21160
llvm-svn: 272298
Previously, we materialized secondary vector IVs from the primary scalar IV,
by offseting the primary to match the correct start value, and then broadcasting
it - inside the loop body. Instead, we can use a real vector IV, like we do for
the primary.
This enables using vector IVs for secondary integer IVs whose type matches the
type of the primary.
Differential Revision: http://reviews.llvm.org/D20932
llvm-svn: 272283
This reapplies commit r271930, r271915, r271923. They hit a bug in
Thumb which is fixed in r272258 now.
The original message:
The code layout that TailMerging (inside BranchFolding) works on is not the
final layout optimized based on the branch probability. Generally, after
BlockPlacement, many new merging opportunities emerge.
This patch calls Tail Merging after MBP and calls MBP again if Tail Merging
merges anything.
llvm-svn: 272267
This enables use of the 'S' constraint for inline ASM operands on
SystemZ, which allows for a memory reference with a signed 20-bit
immediate displacement. This patch includes corresponding documentation
and test case updates.
I've changed the 'T' constraint to match the new behavior for 'S', as
'T' also uses a long displacement (though index constraints are still
not implemented). I also changed 'm' to match the behavior for 'S' as
this will allow for a wider range of displacements for 'm', though
correct me if that's not the right decision.
Author: colpell
Differential Revision: http://reviews.llvm.org/D21097
llvm-svn: 272266
This is made possible by removing an assert in llc that assumed
MIRParser::parseLLVMModule would exit on error. MIRParser's documentation states
that it returns null if a parsing error occurs, so there's no reason to assert.
We can instead just fall through to where the check for a module is performed
and exit if it is null.
This commit is part of the clean-up after r269655.
Fixes PR27770
Differential Revision: http://reviews.llvm.org/D20371
llvm-svn: 272254
If an immediate is only used in an AND node, it is possible that the immediate can be more optimally materialized when negated. If this is the case, we can negate the immediate and use a BIC instead;
int i(int a) {
return a & 0xfffffeec;
}
Used to produce:
ldr r1, [CONSTPOOL]
ands r0, r1
CONSTPOOL: 0xfffffeec
And now produces:
movs r1, #255
adds r1, #20 ; Less costly immediate generation
bics r0, r1
llvm-svn: 272251
Add support to the AArch64 IAS for the `.arch` directive. This allows the
assembly input to use architectural functionality in part of a file. This is
used in existing code like BoringSSL.
Resolves PR26016!
llvm-svn: 272241
We can safely rely on a NoWrap add recurrence causing UB down the road
only if we know the loop does not have a exit expressed in a way that is
opaque to ScalarEvolution (e.g. by a function call that conditionally
calls exit(0)).
I believe with this change PR28012 is fixed.
Note: I had to change some llvm-lit tests in LoopReroll, since it looks
like they were depending on this incorrect behavior.
llvm-svn: 272237
Without that check it was possible to write test cases where the size
was not specified and we ended up with weird asserts down the road,
because the default value (1) would not make sense.
llvm-svn: 272226
Summary:
This fixes PR26682. Also add LCSSA as a preserved pass to LoopSimplify,
that looks correct to me and allows to write a test for the issue.
Reviewers: chandlerc, bogner, sanjoy
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D21112
llvm-svn: 272224
Summary:
This fixes PR27617.
Bug description: The SLPVectorizer asserts on encountering GEPs with different index types, such as i8 and i64.
The patch includes a simple relaxation of the assert to allow constants being of different types, along with a regression test that will provoke the unrelaxed assert.
Reviewers: nadav, mzolotukhin
Subscribers: JesperAntonsson, llvm-commits, mzolotukhin
Differential Revision: http://reviews.llvm.org/D20685
Patch by Jesper Antonsson!
llvm-svn: 272206
Summary:
Consider the following diamond CFG:
A
/ \
B C
\/
D
Suppose A->B and A->C have probabilities 81% and 19%. In block-placement, A->B is called a hot edge and the final placement should be ABDC. However, the current implementation outputs ABCD. This is because when choosing the next block of B, it checks if Freq(C->D) > Freq(B->D) * 20%, which is true (if Freq(A) = 100, then Freq(B->D) = 81, Freq(C->D) = 19, and 19 > 81*20%=16.2). Actually, we should use 25% instead of 20% as the probability here, so that we have 19 < 81*25%=20.25, and the desired ABDC layout will be generated.
Reviewers: djasper, davidxl
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D20989
llvm-svn: 272203
Summary:
Now DISubroutineType has a 'cc' field which should be a DW_CC_ enum. If
it is present and non-zero, the backend will emit it as a
DW_AT_calling_convention attribute. On the CodeView side, we translate
it to the appropriate enum for the LF_PROCEDURE record.
I added a new LLVM vendor specific enum to the list of DWARF calling
conventions. DWARF does not appear to attempt to standardize these, so I
assume it's OK to do this until we coordinate with GCC on how to emit
vectorcall convention functions.
Reviewers: dexonsmith, majnemer, aaboud, amccarth
Subscribers: mehdi_amini, llvm-commits
Differential Revision: http://reviews.llvm.org/D21114
llvm-svn: 272197
with user specified count has been applied.
Summary:
Previously SetLoopAlreadyUnrolled() set the disable pragma only if
there was some loop metadata.
Now it set the pragma in all cases. This helps to prevent multiple
unroll when -unroll-count=N is given.
Reviewers: mzolotukhin
Differential Revision: http://reviews.llvm.org/D20765
From: Evgeny Stupachenko <evstupac@gmail.com>
llvm-svn: 272195
Again, the Microsoft linker does not like empty substreams.
We still emit an empty string table if CodeView is enabled, but that
doesn't cause problems because it always contains at least one null
byte.
llvm-svn: 272183
Absence of may-unwind calls is not enough to guarantee that a
UB-generating use of an add-rec poison in the loop latch will actually
cause UB. We also need to guard against calls that terminate the thread
or infinite loop themselves.
This partially addresses PR28012.
llvm-svn: 272181
The worklist algorithm introduced in rL271151 didn't check to see if the
direct users of the post-inc add recurrence propagates poison. This
change fixes the problem and makes the code structure more obvious.
Note for release managers: correctness wise, this bug wasn't a
regression introduced by rL271151 -- the behavior of SCEV around
post-inc add recurrences was strictly improved (in terms of correctness)
in rL271151.
llvm-svn: 272179