- Adapt MachineBasicBlock::getName() to have the same behavior as the IR
BasicBlock (Value::getName()).
- Add it to lib/CodeGen/CodeGen.cpp::initializeCodeGen so that it is linked in
the CodeGen library.
- MachineRegionInfoPass's name conflicts with RegionInfoPass's name ("region").
- MachineRegionInfo should depend on MachineDominatorTree,
MachinePostDominatorTree and MachineDominanceFrontier instead of their
respective IR versions.
- Since there were no tests for this, add a X86 MIR test.
Patch by Francis Visoiu Mistrih<fvisoiumistrih@apple.com>
llvm-svn: 295518
A line number doesn't make much sense if you don't say where it's
from. Add a verifier check for this and update some tests that had
bogus debug info.
llvm-svn: 295516
When promoting the Load of a Store-Load pair to a COPY all kill flags
between the store and the load need to be cleared.
rdar://30402435
Differential Revision: https://reviews.llvm.org/D30110
llvm-svn: 295512
Newer ppc supports unaligned memory access, it reduces the cost of unaligned memory access significantly. This patch handles this case in PPCTTIImpl::getMemoryOpCost.
This patch fixes pr31492.
Differential Revision: https://reviews.llvm.org/D28630
This is resubmit of r292680, which was reverted by r293092. The internal application failures were actually caused by a source code bug.
llvm-svn: 295506
A future change will cause this byte offset to be inttoptr'd and then exported
via an absolute symbol. On the importing end we will expect the symbol to be
in range [0,2^32) so that it will fit into a 32-bit relocation. The problem
is that on 64-bit architectures if the offset is negative it will not be in
the correct range once we inttoptr it.
This change causes us to use a 32-bit integer so that it can be inttoptr'd
(which zero extends) into the correct range.
Differential Revision: https://reviews.llvm.org/D30016
llvm-svn: 295487
This fixes PR31381, which caused an assertion and/or invalid debug info.
This affects debug variables that have multiple fragments in the MMI
side (i.e.: in the stack frame) table.
rdar://problem/30571676
llvm-svn: 295486
This set of patches adds support for Cavium ThunderX ARM64 processors:
* ThunderX
* ThunderX T81
* ThunderX T83
* ThunderX T88
Patch by Stefan Teleman
Differential Revision: https://reviews.llvm.org/D28891
llvm-svn: 295475
MSVC link.exe cannot handle associative sections that refer later
sections in the section header. Technically, such COFF object doesn't
violate the Microsoft COFF spec, as the spec doesn't say anything
about that, but still we should avoid doing that to make it compatible
with MS tools.
This patch assigns smaller section numbers to non-associative sections
and larger numbers to associative sections. This should resolve the
compatibility issue.
Differential Revision: https://reviews.llvm.org/D30080
llvm-svn: 295464
We previously only created a vector phi node for an induction variable if its
step had a constant integer type. However, the step actually only needs to be
loop-invariant. We only handle inductions having loop-invariant steps, so this
patch should enable vector phi node creation for all integer induction
variables that will be vectorized.
Differential Revision: https://reviews.llvm.org/D29956
llvm-svn: 295456
Removed the HasT2ExtractPack feature and replaced its references
with HasDSP. This then allows the Thumb2 extend instructions to be
selected for ARMv8M +dsp. These instruction descriptions have also
been refactored and more target tests have been added for their isel.
Differential Revision: https://reviews.llvm.org/D29623
llvm-svn: 295452
During legalization we are often creating shuffles (via a build_vector scalarization stage) that are "any_extend_vector_inreg" style masks, and also other masks that are the equivalent of "truncate_vector_inreg" (if we had such a thing).
This patch is an attempt to match these cases to help undo the effects of just leaving shuffle lowering to handle it - which typically means we lose track of the undefined elements of the shuffles resulting in an unnecessary extension+truncation stage for widened illegal types.
The 2011-10-21-widen-cmp.ll regression will be fixed by making SIGN_EXTEND_VECTOR_IN_REG legal in SSE instead of lowering them to X86ISD::VSEXT (PR31712).
Differential Revision: https://reviews.llvm.org/D29454
llvm-svn: 295451
Add some asserts to make sure we're using the mappings that we think we're
using. This is to keep us from accidentally breaking functionality while moving
to TableGen'erated mappings.
llvm-svn: 295441
Start using the Subtarget to make decisions about what's legal. In particular,
we only mark floating point operations as legal if we have VFP2, which is
something we should've done from the very start.
llvm-svn: 295439
This enables some early outs to avoid repeatedly using IsX86 check to qualify. I hope to continue to improve this to shorten the lengths of some of the string comparisons.
llvm-svn: 295424
Summary:
JumpThreading for guards feature has been reverted at https://reviews.llvm.org/rL295200
due to the following problem: the feature used the following algorithm for detection of
diamond patters:
1. Find a block with 2 predecessors;
2. Check that these blocks have a common single parent;
3. Check that the parent's terminator is a branch instruction.
The problem is that these checks are insufficient. They may pass for a non-diamond
construction in case if those two predecessors are actually the same block. This may
happen if parent's terminator is a br (either conditional or unconditional) to a block
that ends with "switch" instruction with exactly two branches going to one block.
This patch re-enables the JumpThreading for guards and fixes this issue by adding the
check that those found predecessors are actually different blocks. This guarantees that
parent's terminator is a conditional branch with exactly 2 different successors, which
is now ensured by assertions. It also adds two more tests for this situation (with parent's
terminator being a conditional and an unconditional branch).
Patch by Max Kazantsev!
Reviewers: anna, sanjoy, reames
Reviewed By: sanjoy
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D30036
llvm-svn: 295410
Summary:
The file type packs function trace data onto disk from potentially multiple
threads that are aggregated and flushed during the course of an instrumented
program's runtime.
It is named FDR mode or Flight Data recorder as an analogy to plane
blackboxes, which instrument a running system without access to IO.
The writer code is defined in compiler-rt in xray_fdr_logging.h/cc
Reviewers: rSerge, kcc, dberris
Reviewed By: dberris
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D29697
llvm-svn: 295397
This is useful for some edge cases where detecting things gets tricky. Specifically LLDB needs this to support iOS because CMake doesn't support running tests using obj-c code.
llvm-svn: 295392
Summary:
This is an issue both with regular and Thin LTO. When we link together
a DICompileUnit that is marked NoDebug (e.g when compiling with -g0
but applying an AutoFDO profile, which requires location tracking
in the compiler) and a DICompileUnit with debug emission enabled,
we can have failures during dwarf debug generation. Specifically,
when we have inlined from the NoDebug compile unit into the debug
compile unit, we can fail during construction of the abstract and
inlined scope DIEs. This is because the SPMap does not include NoDebug
CUs (they are skipped in the debug_compile_units_iterator).
This patch fixes the failures by skipping locations from NoDebug CUs
when extracting lexical scopes.
Reviewers: dblaikie, aprantl
Subscribers: mehdi_amini, llvm-commits
Differential Revision: https://reviews.llvm.org/D29765
llvm-svn: 295384
Some PDBs or object files can contain references to other PDBs
where the real type information lives. When this happens,
all type indices in the original PDB are meaningless because
their records are not there.
With this patch we add the ability to pull type info from those
secondary PDBs.
Differential Revision: https://reviews.llvm.org/D29973
llvm-svn: 295382
In rL294814, we allow formula with SCEVAddRecExpr type of Reg from loops
other than current loop. This is good for the case when induction variable
of outerloop being used in expr in innerloop. But it is very bad to allow
such Reg from sibling loop because we may need to add lsr.iv in other sibling
loops when scev expanding those SCEVAddRecExpr type exprs. For the testcase
below, one loop can be inserted with a bunch of lsr.iv because of LSR for
other loops.
// The induction variable j from a loop in the middle will have initial
// value generated from previous sibling loop and exit value used by its
// next sibling loop.
void goo(long i, long j);
long cond;
void foo(long N) {
long i = 0;
long j = 0;
i = 0; do { goo(i, j); i++; j++; } while (cond);
i = 0; do { goo(i, j); i++; j++; } while (cond);
i = 0; do { goo(i, j); i++; j++; } while (cond);
i = 0; do { goo(i, j); i++; j++; } while (cond);
i = 0; do { goo(i, j); i++; j++; } while (cond);
i = 0; do { goo(i, j); i++; j++; } while (cond);
}
The fix is to only allow formula with SCEVAddRecExpr type of Reg from current
loop or its parents.
Differential Revision: https://reviews.llvm.org/D30021
llvm-svn: 295378
TimerGroup was showing up on a leak in valigrind, and
used some pretty complex code to implement a singleton.
This patch replaces the implementation with a vastly simpler
one.
Differential Revision: https://reviews.llvm.org/D28367
llvm-svn: 295370
The original commit was reverted in r283329 due to a miscompile in
Chromium. That turned out to be the same issue as PR31257, which was
fixed in r295262.
llvm-svn: 295357
Defining nodes should not alias with one another, while clobbering
nodes can. When pushing defs on stacks, push clobbers first, link
non-clobbering defs, then push the defs.
The data flow in a statement is now: uses -> clobbers -> defs.
llvm-svn: 295356
The existing code always saves the xmm registers for 64-bit targets even if the
target doesn't support SSE (which is common for kernels). Thus, the compiler
inserts movaps instructions which lead to CPU exceptions when an interrupt
handler is invoked.
This commit fixes this bug by returning a register set without xmm registers
from getCalleeSavedRegs and getCallPreservedMask for such targets.
Patch by Philipp Oppermann.
Differential Revision: https://reviews.llvm.org/D29959
llvm-svn: 295347
Resubmit -r295314 with PowerPC and AMDGPU tests updated.
Support {a|s}ext, {a|z|s}ext load nodes as a part of load combine patters.
Reviewed By: filcab
Differential Revision: https://reviews.llvm.org/D29591
llvm-svn: 295336
Regression test neon-diagnostics.s needed changing because it now
produces a more specific diagnostic about the immediate ranges. One
change in the expected error message is not obvious, but there multiple
candidate and it happens to pick the immediate diagnostic.
Differential Revision: https://reviews.llvm.org/D29939
llvm-svn: 295331
Support {a|s}ext, {a|z|s}ext load nodes as a part of load combine patters.
Reviewed By: filcab
Differential Revision: https://reviews.llvm.org/D29591
llvm-svn: 295314
Since they're only used for passing around double precision floating point
values into the general purpose registers, we'll lower them to VMOVDRR and
VMOVRRD.
llvm-svn: 295310
For now we just mark them as legal all the time and let the other passes bail
out if they can't handle it. In the future, we'll want to move more of the
brains into the legalizer.
llvm-svn: 295300
For the hard float calling convention, we just use the D registers.
For the soft-fp calling convention, we use the R registers and move values
to/from the D registers by means of G_SEQUENCE/G_EXTRACT. While doing so, we
make sure to honor the endianness of the target, since the CCAssignFn doesn't do
that for us.
For pure soft float targets, we still bail out because we don't support the
libcalls yet.
llvm-svn: 295295
The new 512-bit unmasked intrinsics will make it easy to handle these with the SSE/AVX intrinsics in InstCombine where we currently have a TODO.
llvm-svn: 295290
This reverts r294348, which removed support for conditional tail calls
due to the PR above. It fixes the PR by marking live registers as
implicitly used and defined by the now predicated tailcall. This is
similar to how IfConversion predicates instructions.
Differential Revision: https://reviews.llvm.org/D29856
llvm-svn: 295262
Uses a Custom implementation because the slot sizes being a multiple of the
pointer size isn't really universal, even for the architectures that do have a
simple "void *" va_list.
llvm-svn: 295255
Fixes PR 31921
Summary:
Predicateinfo requires an ugly workaround to try to avoid literal
struct types due to the intrinsic mangling not being implemented.
This workaround actually does not work in all cases (you can hit the
assert by bootstrapping with -print-predicateinfo), and can't be made
to work without DFS'ing the type (IE copying getMangledStr and using a
version that detects if it would crash).
Rather than do that, i just implemented the mangling. It seems
simple, since they are unified structurally.
Looking at the overloaded-mangling testcase we have, it actually turns
out the gc intrinsics will *also* crash if you try to use a literal
struct. Thus, the testcase added fails before this patch, and works
after, without needing to resort to predicateinfo.
Reviewers: chandlerc, davide
Subscribers: llvm-commits, sanjoy
Differential Revision: https://reviews.llvm.org/D29925
llvm-svn: 295253
Summary:
In rL291613, the section name was interned in LLVMContext. However,
this broke the ability to remove the section from a GlobalObject,
because it tried to intern empty strings, which is not allowed.
Fix that and add an appropriate regression test.
Reviewed By: rnk
Differential Revision: https://reviews.llvm.org/D29795
llvm-svn: 295238
This is a short term solution to the problem that many passes currently fail
to update the assumption cache. In the long term the verifier should not
be controllable with a flag. We should either fix all passes to correctly
update the assumption cache and enable the verifier unconditionally or
somehow arrange for the assumption list to be updated automatically by passes.
Differential Revision: https://reviews.llvm.org/D30003
llvm-svn: 295236
Minor performance speedup - if any call to getShuffleScalarElt fails to get a result, don't both calling for the remaining elements as EltsFromConsecutiveLoads will fail anyhow.
llvm-svn: 295235
They are register promoted by ISel and so it makes no sense to treat them as
memory.
Inserting calls to the thread sanitizer would also generate invalid IR.
You would hit:
"swifterror value can only be loaded and stored from, or as a swifterror
argument!"
llvm-svn: 295230
am_ldrlit diverged from am_brcond in r207105, but kept the OtherVT
operand type. It made sense for branch targets, as those are
represented as MVT::Other in SDAG. But loads operate on pointers.
This shouldn't have an observable effect on any in-tree code, but helps
make the patterns consistent for external users.
llvm-svn: 295229
Summary:
Add a field to LTO::Config, CGFileType, to select the file type to emit (object
or assembly). This is useful for testing and to implement -save-temps.
Reviewers: tejohnson, mehdi_amini, pcc
Reviewed By: mehdi_amini
Subscribers: davide, llvm-commits
Differential Revision: https://reviews.llvm.org/D29475
llvm-svn: 295226
Lay out trellis-shaped CFGs optimally.
A trellis of the shape below:
A B
|\ /|
| \ / |
| X |
| / \ |
|/ \|
C D
would be laid out A; B->C ; D by the current layout algorithm. Now we identify
trellises and lay them out either A->C; B->D or A->D; B->C. This scales with an
increasing number of predecessors. A trellis is a a group of 2 or more
predecessor blocks that all have the same successors.
because of this we can tail duplicate to extend existing trellises.
As an example consider the following CFG:
B D F H
/ \ / \ / \ / \
A---C---E---G---Ret
Where A,C,E,G are all small (Currently 2 instructions).
The CFG preserving layout is then A,B,C,D,E,F,G,H,Ret.
The current code will copy C into B, E into D and G into F and yield the layout
A,C,B(C),E,D(E),F(G),G,H,ret
define void @straight_test(i32 %tag) {
entry:
br label %test1
test1: ; A
%tagbit1 = and i32 %tag, 1
%tagbit1eq0 = icmp eq i32 %tagbit1, 0
br i1 %tagbit1eq0, label %test2, label %optional1
optional1: ; B
call void @a()
br label %test2
test2: ; C
%tagbit2 = and i32 %tag, 2
%tagbit2eq0 = icmp eq i32 %tagbit2, 0
br i1 %tagbit2eq0, label %test3, label %optional2
optional2: ; D
call void @b()
br label %test3
test3: ; E
%tagbit3 = and i32 %tag, 4
%tagbit3eq0 = icmp eq i32 %tagbit3, 0
br i1 %tagbit3eq0, label %test4, label %optional3
optional3: ; F
call void @c()
br label %test4
test4: ; G
%tagbit4 = and i32 %tag, 8
%tagbit4eq0 = icmp eq i32 %tagbit4, 0
br i1 %tagbit4eq0, label %exit, label %optional4
optional4: ; H
call void @d()
br label %exit
exit:
ret void
}
here is the layout after D27742:
straight_test: # @straight_test
; ... Prologue elided
; BB#0: # %entry ; A (merged with test1)
; ... More prologue elided
mr 30, 3
andi. 3, 30, 1
bc 12, 1, .LBB0_2
; BB#1: # %test2 ; C
rlwinm. 3, 30, 0, 30, 30
beq 0, .LBB0_3
b .LBB0_4
.LBB0_2: # %optional1 ; B (copy of C)
bl a
nop
rlwinm. 3, 30, 0, 30, 30
bne 0, .LBB0_4
.LBB0_3: # %test3 ; E
rlwinm. 3, 30, 0, 29, 29
beq 0, .LBB0_5
b .LBB0_6
.LBB0_4: # %optional2 ; D (copy of E)
bl b
nop
rlwinm. 3, 30, 0, 29, 29
bne 0, .LBB0_6
.LBB0_5: # %test4 ; G
rlwinm. 3, 30, 0, 28, 28
beq 0, .LBB0_8
b .LBB0_7
.LBB0_6: # %optional3 ; F (copy of G)
bl c
nop
rlwinm. 3, 30, 0, 28, 28
beq 0, .LBB0_8
.LBB0_7: # %optional4 ; H
bl d
nop
.LBB0_8: # %exit ; Ret
ld 30, 96(1) # 8-byte Folded Reload
addi 1, 1, 112
ld 0, 16(1)
mtlr 0
blr
The tail-duplication has produced some benefit, but it has also produced a
trellis which is not laid out optimally. With this patch, we improve the layouts
of such trellises, and decrease the cost calculation for tail-duplication
accordingly.
This patch produces the layout A,C,E,G,B,D,F,H,Ret. This layout does have
back edges, which is a negative, but it has a bigger compensating
positive, which is that it handles the case where there are long strings
of skipped blocks much better than the original layout. Both layouts
handle runs of executed blocks equally well. Branch prediction also
improves if there is any correlation between subsequent optional blocks.
Here is the resulting concrete layout:
straight_test: # @straight_test
; BB#0: # %entry ; A (merged with test1)
mr 30, 3
andi. 3, 30, 1
bc 12, 1, .LBB0_4
; BB#1: # %test2 ; C
rlwinm. 3, 30, 0, 30, 30
bne 0, .LBB0_5
.LBB0_2: # %test3 ; E
rlwinm. 3, 30, 0, 29, 29
bne 0, .LBB0_6
.LBB0_3: # %test4 ; G
rlwinm. 3, 30, 0, 28, 28
bne 0, .LBB0_7
b .LBB0_8
.LBB0_4: # %optional1 ; B (Copy of C)
bl a
nop
rlwinm. 3, 30, 0, 30, 30
beq 0, .LBB0_2
.LBB0_5: # %optional2 ; D (Copy of E)
bl b
nop
rlwinm. 3, 30, 0, 29, 29
beq 0, .LBB0_3
.LBB0_6: # %optional3 ; F (Copy of G)
bl c
nop
rlwinm. 3, 30, 0, 28, 28
beq 0, .LBB0_8
.LBB0_7: # %optional4 ; H
bl d
nop
.LBB0_8: # %exit
Differential Revision: https://reviews.llvm.org/D28522
llvm-svn: 295223
They are register promoted by ISel and so it makes no sense to treat them as
memory.
Inserting calls to the thread sanitizer would also generate invalid IR.
You would hit:
"swifterror value can only be loaded and stored from, or as a swifterror
argument!"
llvm-svn: 295215
We currently can't legalize those, but we should really not be creating
them in the first place, since legalization would probably look similar to the
way we legalize CONCAT_VECTORS - basically replace the INSERT with a BUILD.
This fixes PR311956.
Differential Revision: https://reviews.llvm.org/D29961
llvm-svn: 295213
Summary: Discriminators are now encoded with rich information. This patch exposes the encoding API to downstream tools.
Reviewers: davidxl, hfinkel
Reviewed By: davidxl
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D29852
llvm-svn: 295210
This patch reverts region's scheduling to the original untouched state
in case if we have have decreased occupancy.
In addition it switches to use TargetRegisterInfo occupancy callback
for pressure limits instead of gradually increasing limits which were
just passed by. We are going to stay with the best schedule so we do
not need to tolerate worsened scheduling anymore.
Differential Revision: https://reviews.llvm.org/D29971
llvm-svn: 295206
Changed format specifiers to use format macro constant for pointer type.
Moved width part of format specifier in the correct place for formatting members a and b.
Added a unit test to confirm the output.
Differential Revision: https://reviews.llvm.org/D28957
llvm-svn: 295173
Summary:
We don't seem to have great rules on what a valid VBROADCAST node looks like. And as a consequence we end up with a lot of patterns to try to catch everything. We have patterns with scalar inputs, 128-bit vector inputs, 256-bit vector inputs, and 512-bit vector inputs.
As you can see from the things improved here we are currently missing patterns for 128-bit loads being extended to 256-bit before the vbroadcast.
I'd like to propose that VBROADCAST should always take a 128-bit vector type as input. As a first step towards that this patch adds an EXTRACT_SUBVECTOR in front of VBROADCAST when the input is 256 or 512-bits. In the future I would like to add scalar_to_vector around all the scalar operations. And maybe we should consider adding a VBROADCAST+load node to avoid separating loads from the broadcasting operation when the load itself isn't foldable.
This requires an additional change in target shuffle combining to look for the extract subvector and look through it to find the original operand. I'm sure this change isn't perfect but was enough to fix a few test failures that were being caused.
Another interesting thing I noticed is that the changes in masked_gather_scatter.ll show cases were we don't remove a useless insert into element 1 before broadcasting element 0.
Reviewers: delena, RKSimon, zvi
Reviewed By: zvi
Subscribers: igorb, llvm-commits
Differential Revision: https://reviews.llvm.org/D28747
llvm-svn: 295155
Summary:
The current code loops over all elements to calculate a used range. Then a second short loop looks at the ranges and determines if they can be used in a extract and creates a properly aligned start index for the extract.
This range finding is unnecessary, we can just calculate a properly aligned start index for an extract for each input during the first loop. If we don't find the same start index for each indice we can't use an extract.
Reviewers: zvi, RKSimon
Reviewed By: zvi
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D29926
llvm-svn: 295152
handler args.
The specialization just inherits from the std::decay'd response handler type.
This allows member functions (via MemberFunctionWrapper) to be used as async
handlers.
llvm-svn: 295151
The comment was somewhat misleading in that it implied that passes were not
responsible for adding new assumptions to the assumption cache. This new
wording now explicitly mentions that they are required to do so.
Differential Revision: https://reviews.llvm.org/D29977
llvm-svn: 295148
The idea is that the apply* functions will also be called when importing
devirt optimizations.
Differential Revision: https://reviews.llvm.org/D29745
llvm-svn: 295144
This patch corrects the maximum workgroups per CU if we have big
workgroups (more than 128). This calculation contributes to the
occupancy calculation in respect to LDS size.
Differential Revision: https://reviews.llvm.org/D29974
llvm-svn: 295134
Summary:
The YAML output produced by llvm-xray is supposed to be wrapped at the
arbitrary default of 70 columns set by `yaml:Output`. Unfortunately,
the wrapping is rather unpredictable, and can easily go past the set
number of columns, depending on the execution environment.
To make the YAML output environment-independent, disable wrapping
instead.
Reviewers: dberris
Reviewed By: dberris
Subscribers: fhahn, llvm-commits
Differential Revision: https://reviews.llvm.org/D29962
llvm-svn: 295116
Multiple blocks in the callee can be mapped to a single cloned block
since we prune the callee as we clone it. The existing code
iterates over the value map and clones the block frequency (and
eventually scales the frequencies of the cloned blocks). Value map's
iteration is not deterministic and so the cloned block might get the
frequency of any of the original blocks. The fix is to set the max of
the original frequencies to the cloned block. The first block in the
sequence must have this max frequency and, in the call context,
subsequent blocks must have its frequency.
Differential Revision: https://reviews.llvm.org/D29696
llvm-svn: 295115
Summary:
This helps to avoid signed integer overflow after running a fast fuzz target for several hours, e.g.:
<...>
Done -1097903291 runs in 54001 second(s)
Reviewers: kcc
Reviewed By: kcc
Differential Revision: https://reviews.llvm.org/D29941
llvm-svn: 295112
Group calls into constant and non-constant arguments up front, and use uint64_t
instead of ConstantInt to represent constant arguments. The goal is to allow
the information from the summary to fit naturally into this data structure in
a future change (specifically, it will be added to CallSiteInfo).
This has two side effects:
- We disallow VCP for constant integer arguments of width >64 bits.
- We remove the restriction that the bitwidth of a vcall's argument and return
types must match those of the vfunc definitions.
I don't expect either of these to matter in practice. The first case is
uncommon, and the second one will lead to UB (so we can do anything we like).
Differential Revision: https://reviews.llvm.org/D29744
llvm-svn: 295110
Summary:
When setting debugloc for instructions created in SplitBlockPredecessors, current implementation copies debugloc from the first-non-phi instruction of the original basic block. However, if the first-non-phi instruction is a call for @llvm.dbg.value, the debugloc of the instruction may point the location outside of the block itself. For the example code of
```
1 typedef struct _node_t {
2 struct _node_t *next;
3 } node_t;
4
5 extern node_t *root;
6
7 int foo() {
8 node_t *node, *tmp;
9 int ret = 0;
10
11 node = tmp = root->next;
12 while (node != root) {
13 while (node) {
14 tmp = node;
15 node = node->next;
16 ret++;
17 }
18 }
19
20 return ret;
21 }
```
, below is the basicblock corresponding to line 12 after Reassociate expressions pass:
```
while.cond: ; preds = %while.cond2, %entry
%node.0 = phi %struct._node_t* [ %1, %entry ], [ null, %while.cond2 ]
%ret.0 = phi i32 [ 0, %entry ], [ %ret.1, %while.cond2 ]
tail call void @llvm.dbg.value(metadata i32 %ret.0, i64 0, metadata !19, metadata !20), !dbg !21
tail call void @llvm.dbg.value(metadata %struct._node_t* %node.0, i64 0, metadata !11, metadata !20), !dbg !31
%cmp = icmp eq %struct._node_t* %node.0, %0, !dbg !33
br i1 %cmp, label %while.end5, label %while.cond2, !dbg !35
```
As you can see, the first-non-phi instruction is a call for @llvm.dbg.value, and the debugloc is
```
!21 = !DILocation(line: 9, column: 7, scope: !6)
```
, which is a definition of 'ret' variable and outside of the scope of the basicblock itself. However, current implementation picks up this debugloc for the instructions created in SplitBlockPredecessors. This patch addresses this problem by picking up debugloc from the first-non-phi-non-dbg instruction.
Reviewers: dblaikie, samsonov, eugenis
Reviewed By: eugenis
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D29867
llvm-svn: 295106
Summary:
Blocks ending in unreachable are typically cold because they end the
program or throw an exception, so merging them with other identical
blocks is usually profitable because it reduces the size of cold code.
MachineBlockPlacement generally does not arrange to fall through to such
blocks, so commoning these blocks will not introduce additional
unconditional branches.
Reviewers: hans, iteratee, haicheng
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D29153
llvm-svn: 295105
This instruction clears the low bits of a pointer without requiring (possibly
dodgy if pointers aren't ints) conversions to and from an integer. Since (as
far as I'm aware) all masks are statically known, the instruction takes an
immediate operand rather than a register to specify the mask.
llvm-svn: 295103
This reverts 295092 (re-applies 295084), with a fix for dangling
references from the array of coverage names passed down from frontends.
I missed this in my initial testing because I only checked test/Profile,
and not test/CoverageMapping as well.
Original commit message:
The profile name variables passed to counter increment intrinsics are dead
after we emit the finalized name data in __llvm_prf_nm. However, we neglect to
erase these name variables. This causes huge size increases in the
__TEXT,__const section as well as slowdowns when linker dead stripping is
disabled. Some affected projects are so massive that they fail to link on
Darwin, because only the small code model is supported.
Fix the issue by throwing away the name constants as soon as we're done with
them.
Differential Revision: https://reviews.llvm.org/D29921
llvm-svn: 295099
Store instructions can have more than one memory operand as a result
of optimizations that fold different stores into one.
When we identify spill instructions to generate DBG_VALUE instructions
to record the spilling of a variable, we disregard stores with
multiple memory operands for now. We may miss some relevant spills but
the handling is a bit more complex, so we'll do it in a different patch.
This fixes PR31935.
llvm-svn: 295093
In r288754, Mehdi added a cmake option to disable enforcement of the ABI
breaking checks in the "abi-breaking.h" header. We used that when building
Swift and it works, but I think it will be better to control this with a
preprocessor macro instead of a cmake option. That will let us opt out of
the enforcement more selectively.
This change allows skipping the cmake setting if the existing preprocessor
macro is already defined. My intention here is to make this change and get
Swift to use it, and then after a few weeks, we can remove the cmake option.
I want to stage it like that to be less disruptive. I'm not aware of anyone
else using that cmake option.
Mehdi had some initial concern about the impact of using a preprocessor
macro when building with modules enabled. I don't think that will be a
problem if we set the macro on the command line with a -D option in those
contexts where we need to disable the enforcement of the checks.
https://reviews.llvm.org/D29919
llvm-svn: 295090
The profile name variables passed to counter increment intrinsics are
dead after we emit the finalized name data in __llvm_prf_nm. However, we
neglect to erase these name variables. This causes huge size increases
in the __TEXT,__const section as well as slowdowns when linker dead
stripping is disabled. Some affected projects are so massive that they
fail to link on Darwin, because only the small code model is supported.
Fix the issue by throwing away the name constants as soon as we're done
with them.
Differential Revision: https://reviews.llvm.org/D29921
llvm-svn: 295084
To help assist in debugging ISEL or to prioritize GlobalISel backend
work, this patch adds two more tables to <Target>GenISelDAGISel.inc -
one which contains the patterns that are used during selection and the
other containing include source location of the patterns
Enabled through CMake varialbe LLVM_ENABLE_DAGISEL_COV
llvm-svn: 295081
This allows for nicer backtrace and debugging when -j1 is passed:
$ opt-viewer.py CMakeFiles/LLVMScalarOpts.dir/LoopVersioningLICM.cpp.opt.yaml html
Traceback (most recent call last):
File "/org/llvm/utils/opt-viewer/opt-viewer.py", line 405, in <module>
generate_report(pmap, all_remarks, file_remarks, args.source_dir, args.output_dir)
File "/org/llvm/utils/opt-viewer/opt-viewer.py", line 362, in generate_report
pmap(_render_file_bound, file_remarks.items())
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/multiprocessing/pool.py", line 251, in map
return self.map_async(func, iterable, chunksize).get()
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/multiprocessing/pool.py", line 567, in get
raise self._value
Exception: blah
$ opt-viewer.py -j 1 CMakeFiles/LLVMScalarOpts.dir/LoopVersioningLICM.cpp.opt.yaml html
Traceback (most recent call last):
File "/org/llvm/utils/opt-viewer/opt-viewer.py", line 405, in <module>
generate_report(pmap, all_remarks, file_remarks, args.source_dir, args.output_dir)
File "/org/llvm/utils/opt-viewer/opt-viewer.py", line 362, in generate_report
pmap(_render_file_bound, file_remarks.items())
File "/org/llvm/utils/opt-viewer/opt-viewer.py", line 317, in _render_file
SourceFileRenderer(source_dir, output_dir, filename).render(remarks)
File "/org/llvm/utils/opt-viewer/opt-viewer.py", line 168, in __init__
raise Exception("blah")
Exception: blah
llvm-svn: 295080
Summary:
As written in the comments above, LastCallToStaticBonus is already applied to
the cost if Caller has only one user, so it is redundant to reapply the bonus
here.
If the only user is not a caller, TotalSecondaryCost will not be adjusted
anyway because callerWillBeRemoved is false. If there's no caller at all, we
don't need to care about TotalSecondaryCost because
inliningPreventsSomeOuterInline is false.
Reviewers: chandlerc, eraman
Reviewed By: eraman
Subscribers: haicheng, davidxl, davide, llvm-commits, mehdi_amini
Differential Revision: https://reviews.llvm.org/D29169
llvm-svn: 295075
And use it in MachineOptimizationRemarkEmitter. A test will follow on top of
Justin's changes to enable MachineORE in AsmPrinter.
The approach is similar to the IR-level pass. It's a bit simpler because BPI
is immutable at the Machine level so we don't need to make that lazy.
Because of this, a new function mapping is introduced (BPIPassTrait::getBPI).
This function extracts BPI from the pass. In case of the lazy pass, this is
when the calculation of the BFI occurs. For Machine-level, this is the
identity function.
Differential Revision: https://reviews.llvm.org/D29836
llvm-svn: 295072
Summary:
This is achieved by generalizing the expression selecting the StringRef
format_provider. Now, anything that can be converted to a StringRef will
use it's formatter.
Reviewers: zturner
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D29898
llvm-svn: 295064
This reapplies commit r294967 with a fix for the execution time regressions
caught by the clang-cmake-aarch64-quick bot. We now extend the truncate
optimization to non-primary induction variables only if the truncate isn't
already free.
Differential Revision: https://reviews.llvm.org/D29847
llvm-svn: 295063
back into a vector
Previously the cost of the existing ExtractElement/ExtractValue
instructions was considered as a dead cost only if it was detected that
they have only one use. But these instructions may be considered
dead also if users of the instructions are also going to be vectorized,
like:
```
%x0 = extractelement <2 x float> %x, i32 0
%x1 = extractelement <2 x float> %x, i32 1
%x0x0 = fmul float %x0, %x0
%x1x1 = fmul float %x1, %x1
%add = fadd float %x0x0, %x1x1
```
This can be transformed to
```
%1 = fmul <2 x float> %x, %x
%2 = extractelement <2 x float> %1, i32 0
%3 = extractelement <2 x float> %1, i32 1
%add = fadd float %2, %3
```
because though `%x0` and `%x1` have 2 users each other, these users are
part of the vectorized tree and we can consider these `extractelement`
instructions as dead.
Differential Revision: https://reviews.llvm.org/D29900
llvm-svn: 295056
Prevent memory objects of different address spaces to be part of
the same load/store groups when analysing interleaved accesses.
This is fixing pr31900.
Reviewers: HaoLiu, mssimpso, mkuper
Reviewed By: mssimpso, mkuper
Subscribers: llvm-commits, efriedma, mzolotukhin
Differential Revision: https://reviews.llvm.org/D29717
llvm-svn: 295038
Summary:
Function isCompatibleIVType is already used as a guard before the call to
SE.getMinusSCEV(OperExpr, PrevExpr);
in LSRInstance::ChainInstruction. getMinusSCEV requires the expressions
to be of the same type, so we now consider two pointers with different
address spaces to be incompatible, since it is possible that the pointers
in fact have different sizes.
Reviewers: qcolombet, eli.friedman
Reviewed By: qcolombet
Subscribers: nhaehnle, Ka-Ka, llvm-commits, mzolotukhin
Differential Revision: https://reviews.llvm.org/D29885
llvm-svn: 295033
Launch policies provided a mechanism for running RPC handlers on a background
thread (unblocking the main RPC receiver thread). Async handlers generalize
this by passing the responder function (the function that sends the RPC return
value) as an argument to the handler. The handler can optionally do its work on
a background thread (the same way launch policies do), but can also (a) can
inspect the call arguments before deciding to run the work on a different
thread, or (b) can use the responder in a subsequent RPC call (e.g. in the
handler of a callAsync), allowing the handler to call back to the originator (or
to a 3rd party) without blocking the listener thread, and without launching a
new thread.
llvm-svn: 295030
Extend our store promotion code to deal with unordered atomic accesses. Ordered atomics continue to be unhandled.
Most of the change is straight-forward, the only complicated bit is in the reasoning around mixing of atomic and non-atomic memory access. Rather than trying to reason about the complex semantics in these cases, I simply disallowed promotion when both atomic and non-atomic accesses are present. This is conservatively correct.
It seems really tempting to just promote all access to atomics, but the original accesses might have been conditional. Since we can't lower an arbitrary atomic type, it might not be safe to promote all access to atomic. Consider a loop like the following:
while(b) {
load i128 ...
if (can lower i128 atomic)
store atomic i128 ...
else
store i128
}
It could be there's no race on the location and thus the code is perfectly well defined even if we can't lower a i128 atomically.
It's not clear we need to be this conservative - arguably the program above is brocken since it can't be lowered unless the branch is folded - but I didn't want to have to fix any fallout which might result.
Differential Revision: https://reviews.llvm.org/D15592
llvm-svn: 295015
Previously we could not use it because std::once_flag's default
constructor was not constexpr. Today, all supported versions of VS
correctly mark it constexpr. I confirmed that MSVC 2015 does not emit
any problematic racy dynamic initialization code, so we should be safe
to use this now.
llvm-svn: 295013
This will later be used by ThinLTOBitcodeWriter to add copies of readnone
functions to the regular LTO module.
Differential Revision: https://reviews.llvm.org/D29695
llvm-svn: 295008
This adds MXCSR to the set of recognized registers for X86 targets and updates the instructions that read or write it. I do not intend for all of the various floating point instructions that implicitly use the control bits or update the status bits of this register to ever have that usage modeled by default. However, when constrained floating point modes (such as strict FP exception status modeling or dynamic rounding modes) are enabled, implicit use/def information for MXCSR will be added to those instructions.
Until those additional updates are made this should cause (almost?) no functional changes. Theoretically, this will prevent instructions like LDMXCSR and STMXCSR from being moved past one another, but that should be prevented anyway and I haven't found a case where it is happening now.
Differential Revision: https://reviews.llvm.org/D29903
llvm-svn: 295004
Summary:
This change edits the language reference to explicitly allow the
existence of readnone and readonly functions that can throw. Full
discussion at
http://lists.llvm.org/pipermail/llvm-dev/2017-January/108637.html
Reviewers: dberlin, chandlerc, hfinkel, majnemer
Reviewed By: majnemer
Subscribers: majnemer, mcrosier, llvm-commits
Differential Revision: https://reviews.llvm.org/D28740
llvm-svn: 295000
Summary:
Update the TBAA section to mention the struct path TBAA that LLVM
implements today. This is not a proposal or change in semantics -- it
is intended only to **document** what LLVM already does today.
This is related to https://reviews.llvm.org/D26438 where I've tried to
implement some of the constraints as verifier checks.
Reviewers: anna, reames, rsmith, chandlerc, hfinkel, rjmccall, mehdi_amini, dexonsmith, manmanren
Reviewed By: manmanren
Subscribers: dberlin, dberris, mcrosier, llvm-commits
Differential Revision: https://reviews.llvm.org/D26831
llvm-svn: 294999
This revealed that we actually have 8 more unused flag bits, and byval
size doesn't need to be a bitfield at all.
This came up during code review here:
https://reviews.llvm.org/D29668#inline-258469
llvm-svn: 294989
Backends don't support this yet. They would have to move to the swifterror
register before the tail call to make sure it is live-in to the call.
rdar://30495920
llvm-svn: 294982
Make the whole thing testable by adding YAML I/O support for the WPD
summary information and adding some negative tests that exercise the
YAML support.
Differential Revision: https://reviews.llvm.org/D29782
llvm-svn: 294981
This reverts commit r294967. This patch caused execution time slowdowns in a
few LLVM test-suite tests, as reported by the clang-cmake-aarch64-quick bot.
I'm reverting to investigate.
llvm-svn: 294973
This is consistent with what we do for GlobalISel. That way, it is easy
to see whether or not FastISel is able to fully select a function.
At some point we may want to switch that to an optimization remark.
llvm-svn: 294970
This patch extends the optimization of truncations whose operand is an
induction variable with a constant integer step. Previously we were only
applying this optimization to the primary induction variable. However, the cost
model assumes the optimization is applied to the truncation of all integer
induction variables (even regardless of step type). The transformation is now
applied to the other induction variables, and I've updated the cost model to
ensure it is better in sync with the transformation we actually perform.
Differential Revision: https://reviews.llvm.org/D29847
llvm-svn: 294967
Clean up the implementation of divide macro expansion by getting rid of a
FIXME regarding magic numbers and branch instructions. Match GAS' behaviour
for expansion of ddiv / div in the two and three operand cases. Add the two
operand alias for MIPSR6. Finally, optimize macro expansion cases where the
divisior is the $zero register.
Reviewers: slthakur
Differential Revision: https://reviews.llvm.org/D29887
llvm-svn: 294960