Instead of annotating (most of) the ArrayRef API, we can just annotate
the type directly. This is less code and it will warn in more cases.
llvm-svn: 284342
In theory this could be generalized to move anything where
we prove the operands are available, but that would require
rewriting PRE. As NewGVN will hopefully come soon, and we're
trying to rewrite PRE in terms of NewGVN+MemorySSA, it's probably
not worth spending too much time on it. Fix provided by
Daniel Berlin!
llvm-svn: 284311
BasicBlock::size is O(insts), making this loop O(blocks*insts), which
can be really slow on generated code. getPrevNode already checks if
we're at the beginning of the block and returns nullptr if so, just use
that instead. No functionality change intended.
llvm-svn: 284303
Instead of annotating (most of) the APInt API, we can just annotate
the type directly. This is less code and it will warn in more cases.
llvm-svn: 284297
- Removed unused class members.
- Made class internal data private.
- Made class scoped data function scoped where it's possible.
- Replace naked new/delete with unique_ptr.
- Made resources guaranteed to be freed.
Differential Revision: https://reviews.llvm.org/D25464
llvm-svn: 284290
The previous names were both misleading (the MachineLegalizer actually
contained the info tables) and inconsistent with the selector & translator (in
having a "Machine") prefix. This should make everything sensible again.
The only functional change is the name of a couple of command-line options.
llvm-svn: 284287
This is essentially a more powerful version of our current
LLVM_ATTRIBUTE_UNUSED_RESULT, in that it can also be applied to types
and generate warnings whenever an object of that type is returned by
value and the value is discarded.
I'll replace uses of LLVM_ATTRIBUTE_UNUSED_RESULT and remove that
macro in follow up commits.
llvm-svn: 284286
This is a patch to implement pr30640.
When a 64bit constant has the same hi/lo words, we can use rldimi to copy the low word into high word of the same register.
This optimization caused failure of test case bperm.ll because of not optimal heuristic in function SelectAndParts64. It chooses AND or ROTATE to extract bit groups from a register, and OR them together. This optimization lowers the cost of loading 64bit constant mask used in AND method, and causes different code sequence. But actually ROTATE method is better in this test case. The reason is in ROTATE method the final OR operation can be avoided since rldimi can insert the rotated bits into target register directly. So this patch also enhances SelectAndParts64 to prefer ROTATE method when the two methods have same cost and there are multiple bit groups need to be ORed together.
Differential Revision: https://reviews.llvm.org/D25521
llvm-svn: 284276
Eli noted this potential bug in the post-commit thread for:
https://reviews.llvm.org/rL284239
...but I'm not sure how to trigger it, so there's no test case yet.
llvm-svn: 284268
Summary:
We are using this helper for our 24-bit arithmetic combines, so we are now able to eliminate multi-use operations that mask the high-bits of 24-bit inputs (e.g. and x, 0xffffff)
Reviewers: arsenm, nhaehnle
Subscribers: tony-tye, arsenm, kzhuravl, wdng, nhaehnle, llvm-commits, yaxunl
Differential Revision: https://reviews.llvm.org/D24672
llvm-svn: 284267
Summary:
The main purpose of this new helper is to enable simplifying operations that
have multiple uses. SimplifyDemandedBits does not handle multiple uses
currently, and this new function makes it possible to optimize:
and v1, v0, 0xffffff
mul24 v2, v1, v1 ; Multiply ignoring high 8-bits.
To:
mul24 v2, v0, v0
Where before this would not be optimized, because v1 has multiple uses.
Reviewers: bogner, arsenm
Subscribers: nhaehnle, wdng, llvm-commits
Differential Revision: https://reviews.llvm.org/D24964
llvm-svn: 284266
X86. The pass optimizes as a unit the entire wide load + shuffles pattern
produced by interleaved vectorization. This initial patch optimizes one pattern
(64-bit elements interleaved by a factor of 4). Future patches will generalize
to additional patterns.
Patch by Farhana Aleen
Differential revision: http://reviews.llvm.org/D24681
llvm-svn: 284260
Use PackedRegisterRef to store the register information in the graph nodes.
This commit also removes support for virtual registers. It has never been
tested or used. It will be possible to add it back if there is a need.
llvm-svn: 284255
Add support for loading multiple coverage readers into a single
CoverageMapping instance. This should make it easier to prepare a
unified coverage report for multiple binaries.
Differential Revision: https://reviews.llvm.org/D25535
llvm-svn: 284251
This is an improvement when compiling with llvm. llvm doesn't inline
the call to insert, so the align is always executed and shows up in
the profile.
With gcc the call to insert is inlined and the align computation moved
and done only if needed.
With this patch we explicitly only compute it if it is needed.
In the two tests with debug info, the speedup was
scylla
master 3.008959365
patch 2.932080942 1.02621974786x faster
firefox
master 6.709823604
patch 6.592387227 1.01781393795x faster
In all others the difference was in the noise.
llvm-svn: 284249
This change adds transformations such as:
zext(or(setcc(eq, (cmp x, 0)), setcc(eq, (cmp y, 0))))
To:
srl(or(ctlz(x), ctlz(y)), log2(bitsize(x))
This optimisation is beneficial on Jaguar architecture only, where lzcnt has a good reciprocal throughput.
Other architectures such as Intel's Haswell/Broadwell or AMD's Bulldozer/PileDriver do not benefit from it.
For this reason the change also adds a "HasFastLZCNT" feature which gets enabled for Jaguar.
Differential Revision: https://reviews.llvm.org/D23446
llvm-svn: 284248
Summary:
* Describe new (3.3) parameter attribute group encoding, leaving old encoding there with a note about legacy
* Bring TYPE_BLOCK docs up to date
* Remove docs about obsolete (pre 3.0) TYPE_SYMTAB_BLOCK, TST_CODE_ENTRY
* Fix a couple of incorrect comments and remove one unused enum definition along the way
This addresses https://llvm.org/bugs/show_bug.cgi?id=28941.
Patch by: Ismail Badawi <ibadawi@cisco.com>
Differential Revision: https://reviews.llvm.org/D25623
llvm-svn: 284246
This test was apparently checking for 2 independent folds, but we have
plenty of tests for those individual folds already. We are lacking
vector tests, however, because we don't have the shift folds for vectors.
llvm-svn: 284243
Prefer add/zext because they are better supported in terms of value-tracking.
Note that the backend should be prepared for this IR canonicalization
(including vector types) after:
https://reviews.llvm.org/rL284015
Differential Revision: https://reviews.llvm.org/D25135
llvm-svn: 284241
Summary:
This will be used for 64-bit MULHU, which is in turn used for the 64-bit
divide-by-constant optimization (see D24822).
Reviewers: arsenm, tstellarAMD
Subscribers: kzhuravl, wdng, yaxunl, llvm-commits, tony-tye
Differential Revision: https://reviews.llvm.org/D25289
llvm-svn: 284224
Mostly this just means changing the triple from aarch64-apple-ios to the generic
aarch64--. Only one test needs more significant changes, but GlobalISel already
does the right thing so it's ok to just change the checks.
Differential Revision: https://reviews.llvm.org/D25532
llvm-svn: 284223
For compatiblity with binutils, define these instructions to take
two registers with a 16bit unsigned immediate. Both of the registers
have to be same for dahi and dati.
Reviewers: dsanders, zoran.jovanovic
Differential Review: https://reviews.llvm.org/D21473
llvm-svn: 284218
Committing in the name of Ziv Izhar: After check-all and LGTM .
The following patch is for compatability with Microsoft.
Microsoft ignores the keyword "short" when used after a jmp, for example:
__asm {
jmp short label
label:
}
A test for that patch will be added in another patch, since it's located in clang's codegen tests. Link will be added shortly.
link to test: https://reviews.llvm.org/D24958
Differential Revision: https://reviews.llvm.org/D24957
llvm-svn: 284211
This will be needed by a future commit to support sign/zero extending from v8i8 to v8i64 which requires a sign/zero_extend_vector_inreg to be created which requires v8i8 to be concatenated upto v64i8 and goes through this code.
llvm-svn: 284204
Added relocation names:
- R_AMDGPU_GOTPCREL32_LO
- R_AMDGPU_GOTPCREL32_HI
- R_AMDGPU_REL32_LO
- R_AMDGPU_REL32_HI
AMDGPU isa only supports 32-bit immediates. In order to access 64-bit address we need to generate 32-bit lo/hi relocations, and do the right math (separate patch). Currently we only generate one 32 bit relocation for lower bits for each access, losing higher bits. Hence we need relocations listed above.
Differential Revision: https://reviews.llvm.org/D25546
llvm-svn: 284191
Summary:
This will be used by ThinLTO to set the amount of backend
parallelism, which performs better when restricted to the number
of physical cores (on X86 at least, where getHostNumPhysicalCores is
currently defined). If not available this falls back to
thread::hardware_concurrency.
Note I didn't add to the thread class since that is a typedef to
std::thread where available.
Reviewers: mehdi_amini
Subscribers: beanz, llvm-commits, mgorny
Differential Revision: https://reviews.llvm.org/D25585
llvm-svn: 284180
Windows itanium is identical to MSVC when dealing with everything but C++.
Lower the math routines into msvcrt rather than compiler-rt.
llvm-svn: 284175
Windows itanium is equivalent to MSVC except in C++ mode. Ensure that the
promote the 32-bit floating point operations to their 64-bit equivalences.
llvm-svn: 284173
Summary:
This operation is promoted the same way was ISD::BSWAP. This will
prevent a regression in test/Target/AMDGOU/bitreverse.ll when i16
support is implemented.
Reviewers: bogner, hfinkel
Subscribers: hfinkel, wdng, llvm-commits
Differential Revision: https://reviews.llvm.org/D25202
llvm-svn: 284163
the X86 subdirectory. Original commit message:
Requires a valid TargetMachine to be passed to the SafeStack pass.
Patch by Michael LeMay
Differential revision: http://reviews.llvm.org/D24896
llvm-svn: 284161
This option indicates copy relocations support is available from the linker
when building as PIE and allows accesses to extern globals to avoid the GOT.
Differential Revision: https://reviews.llvm.org/D24849
llvm-svn: 284160
Relax the constraint for empty live-ranges while doing last chance
recoloring. Indeed, those live-ranges do not need an actual color to be
fond for the recoloring to work.
Empty live-range may happen as a result of splitting/spilling.
Unfortunately no test case for in-tree targets.
llvm-svn: 284152
Retrying after upstream changes.
Simplify Consecutive Merge Store Candidate Search
Now that address aliasing is much less conservative, push through
simplified store merging search which only checks for parallel stores
through the chain subgraph. This is cleaner as the separation of
non-interfering loads/stores from the store-merging logic.
Whem merging stores, search up the chain through a single load, and
finds all possible stores by looking down from through a load and a
TokenFactor to all stores visited. This improves the quality of the
output SelectionDAG and generally the output CodeGen (with some
exceptions).
Additional Minor Changes:
1. Finishes removing unused AliasLoad code
2. Unifies the the chain aggregation in the merged stores across
code paths
3. Re-add the Store node to the worklist after calling
SimplifyDemandedBits.
4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is
arbitrary, but seemed sufficient to not cause regressions in
tests.
This finishes the change Matt Arsenault started in r246307 and
jyknight's original patch.
Many tests required some changes as memory operations are now
reorderable. Some tests relying on the order were changed to use
volatile memory operations
Noteworthy tests:
CodeGen/AArch64/argument-blocks.ll -
It's not entirely clear what the test_varargs_stackalign test is
supposed to be asserting, but the new code looks right.
CodeGen/AArch64/arm64-memset-inline.lli -
CodeGen/AArch64/arm64-stur.ll -
CodeGen/ARM/memset-inline.ll -
The backend now generates *worse* code due to store merging
succeeding, as we do do a 16-byte constant-zero store efficiently.
CodeGen/AArch64/merge-store.ll -
Improved, but there still seems to be an extraneous vector insert
from an element to itself?
CodeGen/PowerPC/ppc64-align-long-double.ll -
Worse code emitted in this case, due to the improved store->load
forwarding.
CodeGen/X86/dag-merge-fast-accesses.ll -
CodeGen/X86/MergeConsecutiveStores.ll -
CodeGen/X86/stores-merging.ll -
CodeGen/Mips/load-store-left-right.ll -
Restored correct merging of non-aligned stores
CodeGen/AMDGPU/promote-alloca-stored-pointer-value.ll -
Improved. Correctly merges buffer_store_dword calls
CodeGen/AMDGPU/si-triv-disjoint-mem-access.ll -
Improved. Sidesteps loading a stored value and
merges two stores
CodeGen/X86/pr18023.ll -
This test has been removed, as it was asserting incorrect
behavior. Non-volatile stores *CAN* be moved past volatile loads,
and now are.
CodeGen/X86/vector-idiv.ll -
CodeGen/X86/vector-lzcnt-128.ll -
It's basically impossible to tell what these tests are actually
testing. But, looks like the code got better due to the memory
operations being recognized as non-aliasing.
CodeGen/X86/win32-eh.ll -
Both loads of the securitycookie are now merged.
CodeGen/AMDGPU/vgpr-spill-emergency-stack-slot-compute.ll -
This test appears to work but no longer exhibits the spill behavior.
Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle
Subscribers: wdng, nhaehnle, nemanjai, arsenm, weimingz, niravd, RKSimon, aemerson, qcolombet, dsanders, resistor, tstellarAMD, t.p.northover, spatel
Differential Revision: https://reviews.llvm.org/D14834
llvm-svn: 284151
Summary:
For now I have only added support for x86_64 Linux, but other systems
can be added incrementally.
This is to be used for setting the default parallelism for ThinLTO
backends (instead of thread::hardware_concurrency which includes
hyperthreading and is too aggressive). I'll send this as a follow-on
patch, and it will fall back to hardware_concurrency when the new
getHostNumPhysicalCores returns -1 (when not supported for a given
host system).
I also added an interface to MemoryBuffer to force reading a file
as a stream - this is required for /proc/cpuinfo which is a special
file that looks like a normal file but appears to have 0 size.
The existing readers of this file in Host.cpp are reading the first
1024 or so bytes from it, because the necessary info is near the top.
But for the new functionality we need to be able to read the entire
file. I can go back and change the other readers to use the new
getFileAsStream as a follow-on patch since it seems much more robust.
Added a unittest.
Reviewers: mehdi_amini
Subscribers: beanz, mgorny, llvm-commits, modocache
Differential Revision: https://reviews.llvm.org/D25564
llvm-svn: 284138
In the MS ABI, the frontend is supposed to MD5 such pathologically long
names. LLVM should still defend itself from long names, though.
Fixes part of PR29098.
llvm-svn: 284136
We don't need to return a MachineInstr* from these stack probe insertion
calls anyway. If we ever need to add it back, we can return an iterator
instead.
Based on a patch by David Kreitzer
This bug is a consequence of
r279314 | dexonsmith | 2016-08-19 13:40:12 -0700 (Fri, 19 Aug 2016) | 110 lines
We hit the "Assertion `!NodePtr->isKnownSentinel()' failed" assertion,
but only when inserting a stack probe call at the end of an MBB, which
isn't necessarily a common situation.
Differential Revision: https://reviews.llvm.org/D25566
llvm-svn: 284130
This patch assigns cost of the scaling used in addressing.
On many ARM cores, a negated register offset takes longer than a
non-negated register offset, in a register-offset addressing mode.
For instance:
LDR R0, [R1, R2 LSL #2]
LDR R0, [R1, -R2 LSL #2]
Above, (1) takes less cycles than (2).
By assigning appropriate scaling factor cost, we enable the LLVM
to make the right trade-offs in the optimization and code-selection phase.
Differential Revision: http://reviews.llvm.org/D24857
Reviewers: jmolloy, rengolin
llvm-svn: 284127
This patch modifies the cost calculation of predicated instructions (div and
rem) to avoid the accumulation of rounding errors due to multiple truncating
integer divisions. The calculation for predicated stores will be addressed in a
follow-on patch since we currently don't scale the cost of predicated stores by
block probability.
Differential Revision: https://reviews.llvm.org/D25333
llvm-svn: 284123
Because everything live is spilled at the end of a
block by fast regalloc, assume this will happen and
avoid the copies of the resource descriptor.
llvm-svn: 284119
These instructions were only defined for microMIPSR6 previously. Add
definitions for MIPSR6, correct definitions for microMIPSR6, flag these
instructions as having unmodelled side effects (they disable/enable
virtual processors) and add missing disassember tests for microMIPSR6.
Reviewers: vkalintiris
Differential Review: https://reviews.llvm.org/D24291
llvm-svn: 284115
The Register Calling Convention (RegCall) was introduced by Intel to optimize parameter transfer on function call.
This calling convention ensures that as many values as possible are passed or returned in registers.
This commit presents the basic additions to LLVM CodeGen in order to support RegCall in X86.
Differential Revision: http://reviews.llvm.org/D25022
llvm-svn: 284108
We don't need to check if AVX is enabled. It's implied by the operation action being set to Custom.
We don't need to check both the input and output type widths. We only need to check the type that's being inserted or extracted. The other type is known to be a legal type and we can assume its a different width.
llvm-svn: 284102
This is with an extra change to avoid calling MemoryLocation::get() on a call instruction.
Differential Revision: https://reviews.llvm.org/D25542
llvm-svn: 284098
This allows RegBankSelect in greedy mode to get rid some of the cross
register bank copies when loads are involved in the chain of
computation.
llvm-svn: 284097
- Use storage class C_STAT for 'PrivateLinkage' The storage class for
PrivateLinkage should equal to the Internal Linkage.
- Set 'PrivateGlobalPrefix' from "L" to ".L" for MM_WinCOFF (includes
x86_64) MM_WinCOFF has empty GlobalPrefix '\0' so PrivateGlobalPrefix
"L" may conflict to the normal symbol name starting with 'L'.
Based on a patch by Han Sangjin! Manually updated test cases.
llvm-svn: 284096
This CL didn't actually address the test case in PR30499, and clang
still crashes.
Also revert dependent change "Memory-SSA cleanup of clobbers interface, NFC"
Reverts r283965 and r283967.
llvm-svn: 284093
This document describes the proposal to move to GitHub, and
compare the two proposals through various workflow examples,
presenting the current set of commands following by the ones
involved in each of the two proposals.
It is intended to supersede the previous "submodule proposal"
document entirely, and drive the discussion at the BoF during
the next Dev Meeting.
Differential Revision: https://reviews.llvm.org/D24167
llvm-svn: 284077
Summary: We need a new LLVM intrinsic to implement MS _AddressOfReturnAddress builtin on 64-bit Windows.
Reviewers: majnemer, rnk
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D25293
llvm-svn: 284061
Update the CHECK lines in the shtest-timeout.py lit test to account for
the current output. The output has been changed in r271610 without
adjusting the tests.
Differential Revision: https://reviews.llvm.org/D25236
llvm-svn: 284057
Reappy r284044 after revert in r284051. Krzysztof fixed the error in r284049.
The original summary:
This patch tries to fully unroll loops having break statement like this
for (int i = 0; i < 8; i++) {
if (a[i] == value) {
found = true;
break;
}
}
GCC can fully unroll such loops, but currently LLVM cannot because LLVM only
supports loops having exact constant trip counts.
The upper bound of the trip count can be obtained from calling
ScalarEvolution::getMaxBackedgeTakenCount(). Part of the patch is the
refactoring work in SCEV to prevent duplicating code.
The feature of using the upper bound is enabled under the same circumstance
when runtime unrolling is enabled since both are used to unroll loops without
knowing the exact constant trip count.
llvm-svn: 284053
This patch tries to fully unroll loops having break statement like this
for (int i = 0; i < 8; i++) {
if (a[i] == value) {
found = true;
break;
}
}
GCC can fully unroll such loops, but currently LLVM cannot because LLVM only
supports loops having exact constant trip counts.
The upper bound of the trip count can be obtained from calling
ScalarEvolution::getMaxBackedgeTakenCount(). Part of the patch is the
refactoring work in SCEV to prevent duplicating code.
The feature of using the upper bound is enabled under the same circumstance
when runtime unrolling is enabled since both are used to unroll loops without
knowing the exact constant trip count.
Differential Revision: https://reviews.llvm.org/D24790
llvm-svn: 284044
We need to use the overload of Mangler::getNameWithPrefix that takes a
GlobalValue in order to mangle in the stdcall stack byte count for Windows
targets.
Differential Revision: https://reviews.llvm.org/D25529
llvm-svn: 284040
Branch folder removes implicit defs if they are the only non-branching
instructions in a block, and the branches do not use the defined registers.
The problem is that in some cases these implicit defs are required for
the liveness information to be correct.
Differential Revision: https://reviews.llvm.org/D25478
llvm-svn: 284036
This augments the STLExtras toolset with a zip iterator and range
adapter. Zip comes in two varieties: `zip`, which will zip to the
shortest of the input ranges, and `zip_first`, which limits its
`begin() == end()` checks to just the first krange.
Patch by: Bryant Wong <github.com/bryant>
Differential Revision: https://reviews.llvm.org/D23252
llvm-svn: 284035
This is the most basic handling of the indirect access
pseudos using GPR indexing mode. This currently only enables
the mode for a single v_mov_b32 and then disables it.
This is much more complicated to use than the movrel instructions,
so a new optimization pass is probably needed to fold the access
into the uses and keep the mode enabled for them.
llvm-svn: 284031
Module inline asm was always being linked/concatenated
when running the IRLinker. This is correct for full LTO but not when
we are importing for ThinLTO, as it can result in multiply defined
symbols when the module asm defines a global symbol.
In order to test with llvm-lto2, I had to work around PR30396,
where a symbol that is defined in module assembly but defined in the
LLVM IR appears twice. Added workaround to llvm-lto2 with a FIXME.
Fixes PR30610.
Reviewers: mehdi_amini
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D25359
llvm-svn: 284030
Summary:
Constant bundle operands may need to retain their constant-ness for
correctness. I'll admit that this is slightly odd, but it looks like
SimplifyCFG already does this for things like @llvm.frameaddress and
@llvm.stackmap, so I suppose adding one more case is not a big deal.
It is possible to add a mechanism to denote bundle operands that need to
remain constants, but that's probably too complicated for the time
being.
Reviewers: jmolloy
Subscribers: mcrosier, llvm-commits
Differential Revision: https://reviews.llvm.org/D25502
llvm-svn: 284028
Since this change is known to cause performance degradations in some cases it's commited under a temporary flag which is turned off by default.
Patch by Li Huang
Differential Revision: https://reviews.llvm.org/D18777
llvm-svn: 284022
Add a number of helper functions to match scalar or vector equivalent constant/splat values to allow most of the combine patterns to be used by vectors.
Differential Revision: https://reviews.llvm.org/D25374
llvm-svn: 284015
This combiner breaks debug experience and should not be run when optimizations are disabled.
For example:
int main() {
int j = 0;
j += 2;
if (j == 2)
return 0;
return 5;
}
When debugging this code compiled in /O0, it should be valid to break at line "j+=2;" and edit the value of j. It should change the return value of the function.
Differential Revision: https://reviews.llvm.org/D19268
llvm-svn: 284014
An arithmetic shift can be safely changed to a logical shift if the first
operand is known positive. This allows ComputeKnownBits (and similar analysis)
to determine the sign bit of the shifted value in some cases. In turn, this
allows InstCombine to canonicalize a signed comparison (a > 0) into an equality
check (a != 0).
PR30577
Differential Revision: https://reviews.llvm.org/D25119
llvm-svn: 284013
The current Cost Model implementation is very inaccurate and has to be
updated, improved, re-implemented to be able to take into account the
concrete CPU models and the concrete targets where this Cost Model is
being used. For example, the Latency Cost Model should be differ from
Code Size Cost Model, etc.
This patch is the first step to launch the developing and implementation
of a new Cost Model generation.
Differential Revision: https://reviews.llvm.org/D25186
llvm-svn: 284012
As discussed by Andrea on PR30486, we have an unsafe cast to an Instruction type in the select combine which doesn't take into account that it could be a ConstantExpr instead.
Differential Revision: https://reviews.llvm.org/D25466
llvm-svn: 284000
subcommands
This commit fixes a bug where the help output doesn't display subcommands when
a tool has less than 3 subcommands.
This change doesn't include a corresponding unittest as there is no viable way
to provide a unittest for it.
Differential Revision: https://reviews.llvm.org/D25463
llvm-svn: 283998
Add unit tests for checking a few tricky instruction sizes. Also remove the old
tests for the instruction sizes, which were clunky and brittle.
Since this is the first set of target-specific unit tests, we need to add some
CMake plumbing. In the future, adding unit tests for a given target will be as
simple as creating a directory with the same name as the target under
unittests/Target. The tests are only run if the target is enabled in
LLVM_TARGETS_TO_BUILD.
Differential Revision: https://reviews.llvm.org/D24548
llvm-svn: 283990
`RefSCC`.
Also improve the comments surrounding the lazy post-order iterator as
they had grown stale since the RefSCC/SCC split.
I'm sure there are more comments that need updating here, but I saw and
fixed these and didn't want to lose them. I've not gotten to doing
a really complete audit of every comment yet.
llvm-svn: 283987
The basic inlining operation makes the following changes to the call graph:
1) Add edges that were previously transitive edges. This is always trivial and
this patch gives the LCG helper methods to make this more convenient.
2) Remove the inlined edge. We had existing support for this, but it contained
bugs that needed to be fixed. Testing in the same pattern as the inliner
exposes these bugs very nicely.
3) Delete a function when it becomes dead because it is internal and all calls
have been inlined. The LCG had no support at all for this operation, so this
adds that support.
Two unittests have been added that exercise this specific mutation pattern to
the call graph. They were extremely effective in uncovering bugs. Sadly,
a large fraction of the code here is just to implement those unit tests, but
I think they're paying for themselves. =]
This was split out of a patch that actually uses the routines to
implement inlining in the new pass manager in order to isolate (with
unit tests) the logic that was entirely within the LCG.
Many thanks for the careful review from folks! There will be a few minor
follow-up patches based on the comments in the review as well.
Differential Revision: https://reviews.llvm.org/D24225
llvm-svn: 283982
This reverts commit r283946.
This breaks when build with GCC:
lib/Fuzzer/FuzzerTracePC.cpp:169:6: error: always_inline function might not be inlinable [-Werror=attributes]
lib/Fuzzer/FuzzerTracePC.cpp:169:6: error: inlining failed in call to always_inline 'void fuzzer::TracePC::HandleCmp(void*, T, T) [with T = long unsigned int]': target specific option mismatch
lib/Fuzzer/FuzzerTracePC.cpp:198:65: error: called from here
llvm-svn: 283979
Although Copies are not specific to preISel, we still have to assign them
a proper register class. However, given they are not constrained to
anything we do not have to handle the source register at the copy. It
will be properly mapped when reaching the related definition.
In the process, the handlong of G_ANYEXT is slightly modified as those
end up being selected as copy. The difference is that when register size
do not match on both sides, we need to insert SUBREG_TO_REG operation,
otherwise the post RA copy expansion will not be happy!
llvm-svn: 283972
Summary:
The Python file `utils/lit/lit/ShUtil.py` contains:
1. Logic used by lit itself
2. A set of unit tests for that logic, which can be run by invoking
`python utils/lit/lit/ShUtil.py`
Move these unit tests to a `tests/unit` subdirectory of lit, and run
the tests as part of lit's test suite. This ensures that, should the
lit test suite be included in LLVM's own regression test suite, these
unit tests will also be run.
(Instructions on how to run lit's test suite can be found in
`utils/lit/README.txt`.)
Reviewers: ddunbar, echristo, delcypher, beanz
Subscribers: mehdi_amini, llvm-commits
Differential Revision: https://reviews.llvm.org/D25411
llvm-svn: 283968
This implements the cleanup that Danny asked to commit separately from the
previous fix to GVN-hoist in https://reviews.llvm.org/D25476#inline-219818
Tested with ninja check on x86_64-linux.
llvm-svn: 283967
This is a refreshed version of a patch that was reverted: it fixes
the problems reported in both PR30216 and PR30499, and
contains all the test-cases from both bugs.
To hoist stores past loads, we used to search for potential
conflicting loads on the hoisting path by following a MemorySSA
def-def link from the store to be hoisted to the previous
defining memory access, and from there we followed the def-use
chains to all the uses that occur on the hoisting path. The
problem is that the def-def link may point to a store that does
not alias with the store to be hoisted, and so the loads that are
walked may not alias with the store to be hoisted, and even as in
the testcase of PR30216, the loads that may alias with the store
to be hoisted are not visited.
The current patch visits all loads on the path from the store to
be hoisted to the hoisting position and uses the alias analysis
to ask whether the store may alias the load. I was not able to
use the MemorySSA functionality to ask for whether load and
store are clobbered: I'm not sure which function to call, so I
used a call to AA->isNoAlias().
Store past store is still working as before using a MemorySSA
query: I added an extra test to pr30216.ll to make sure store
past store does not regress.
Tested on x86_64-linux with check and a test-suite run.
Differential Revision: https://reviews.llvm.org/D25476
llvm-svn: 283965
Summary:
In PPCMIPeephole, when we see two splat instructions, we can't simply do the following transformation:
B = Splat A
C = Splat B
=>
C = Splat A
because B may still be used between these two instructions. Instead, we should make the second Splat a PPC::COPY and let later passes decide whether to remove it or not:
B = Splat A
C = Splat B
=>
B = Splat A
C = COPY B
Fixes PR30663.
Reviewers: echristo, iteratee, kbarton, nemanjai
Subscribers: mehdi_amini, llvm-commits
Differential Revision: https://reviews.llvm.org/D25493
llvm-svn: 283961
Fixes a crash in the build_vector -> vector_shuffle combine
when the first vector input is twice as wide as the output,
and the second input vector is even wider.
llvm-svn: 283953
On Darwin, marking a section as "regular,live_support" means that a
symbol in the section should only be kept live if it has a reference to
something that is live. Otherwise, the linker is free to dead-strip it.
Turn this functionality on for the __llvm_prf_data section.
This means that counters and data associated with dead functions will be
removed from dead-stripped binaries. This will result in smaller
profiles and binaries, and should speed up profile collection.
Tested with check-profile, llvm-lit test/tools/llvm-{cov,profdata}, and
check-llvm.
Differential Revision: https://reviews.llvm.org/D25456
llvm-svn: 283947
Reverts r283938 to reinstate r283867 with a fix.
The original change had an ArrayRef referring to a destroyed temporary
initializer list. Use plain C arrays instead.
llvm-svn: 283942
load commands that uses the MachO::linker_option_command
type but not used in llvm libObject code but used in llvm tool code.
This includes just LC_LINKER_OPTION load command.
llvm-svn: 283939
This re-applies r283798, disabled in r283803, with the static_assert
tests disabled under MSVC. The deleted functions still seem to catch
mistakes in MSVC, so it's not a significant loss.
Part of rdar://problem/16375365
llvm-svn: 283935
The tail duplication pass uses an assumed layout when making duplication
decisions. This is fine, but passes up duplication opportunities that
may arise when blocks are outlined. Because we want the updated CFG to
affect subsequent placement decisions, this change must occur during
placement.
In order to achieve this goal, TailDuplicationPass is split into a
utility class, TailDuplicator, and the pass itself. The pass delegates
nearly everything to the TailDuplicator object, except for looping over
the blocks in a function. This allows the same code to be used for tail
duplication in both places.
This change, in concert with outlining optional branches, allows
triangle shaped code to perform much better, esepecially when the
taken/untaken branches are correlated, as it creates a second spine when
the tests are small enough.
Issue from previous rollback fixed, and a new test was added for that
case as well. Issue was worklist/scheduling/taildup issue in layout.
Issue from 2nd rollback fixed, with 2 additional tests. Issue was
tail merging/loop info/tail-duplication causing issue with loops that share
a header block.
Issue with early tail-duplication of blocks that branch to a fallthrough
predecessor fixed with test case: tail-dup-branch-to-fallthrough.ll
Differential revision: https://reviews.llvm.org/D18226
llvm-svn: 283934
The previous commit was failing because we filled empty slots of
the debug stream index with kInvalidStreamIndex. It should've been 0.
llvm-svn: 283925
Low level functionality to format numbers were embedded in the
implementation of raw_ostream. I have need to use these through
an interface other than the overloaded stream operators, so they
need to be raised to a level that they can be used from either
raw_ostream operators or other code.
llvm-svn: 283921
- Refactor bit packing/unpacking
- Calculate bit mask given bit shift and bit width
- Introduce function for decoding bits of waitcnt
- Introduce function for encoding bits of waitcnt
- Introduce function for getting waitcnt mask (instead of using bare numbers)
- Introduce function fot getting max waitcnt(s) (instead of using bare numbers)
Differential Revision: https://reviews.llvm.org/D25298
llvm-svn: 283919
I fixed all the other Targets in r283702, and interestingly the
sanitizers are only now "sometimes" catching this bug on the only
one I missed.
llvm-svn: 283914
This has existed pretty much forever AFAICT, but the code was
never being exercised because nobody was using the class. A
user of this class surfaced, and now we're breaking with UB.
The code was obviously wrong, so it's fixed here.
llvm-svn: 283912
Summary:
This test is allowed to run on non-x86 hosts and thus must use
llvm-nm rather than nm.
Differential Revision: https://reviews.llvm.org/D25473
llvm-svn: 283901
The non-obvious motivation for adding this fold (which already happens in InstCombine)
is that we want to canonicalize IR towards select instructions and canonicalize DAG
nodes towards boolean math. So we need to recreate some folds in the DAG to handle that
change in direction.
An interesting implementation difference for cases like this is that InstCombine
generally works top-down while the DAG goes bottom-up. That means we need to detect
different patterns. In this case, the SimplifyDemandedBits fold prevents us from
performing a zext to sext fold that would then be recognized as a negation of a sext.
llvm-svn: 283900
Previously we would print
USAGE: <exe> [subcommand] [options]
Even if no subcommands were present. This changes the output
format to only print "[subcommand]" if there is at least one
subcommand.
Fixes llvm.org/pr30598
Patch by Serge Guelton
llvm-svn: 283892
For each block check that it doesn't have any uses outside of it's innermost loop.
Differential Revision: https://reviews.llvm.org/D25364
llvm-svn: 283877
The high registers are not allocatable in Thumb1 functions, but they
could still be used by inline assembly, so we need to save and restore
the callee-saved high registers (r8-r11) in the prologue and epilogue.
This is complicated by the fact that the Thumb1 push and pop
instructions cannot access these registers. Therefore, we have to move
them down into low registers before pushing, and move them back after
popping into low registers.
In most functions, we will have low registers that are also being
pushed/popped, which we can use as the temporary registers for
saving/restoring the high registers. However, this is not guaranteed, so
we may need to push some extra low registers to ensure that the high
registers can be saved/restored. For correctness, it would be sufficient
to use just one low register, but if we have enough low registers
available then we only need one push/pop instruction, rather than one
per high register.
We can also use the argument/return registers when they are not live,
and the link register when saving (but not restoring), reducing the
number of extra registers we need to push.
There are still a few extreme edge cases where we need two push/pop
instructions, because not enough low registers can be made live in the
prologue or epilogue.
In addition to the regression tests included here, I've also tested this
using a script to generate functions which clobber different
combinations of registers, have different numbers of argument and return
registers (including variadic arguments), allocate different fixed sized
objects on the stack, and do or don't use variable sized allocas and the
__builtin_return_address intrinsic (all of which affect the available
registers in the prologue and epilogue). I ran these functions in a test
harness which verifies that all of the callee-saved registers are
correctly preserved.
Differential Revision: https://reviews.llvm.org/D24228
llvm-svn: 283867
Currently, the Int_eh_sjlj_dispatchsetup intrinsic is marked as
clobbering all registers, including floating-point registers that may
not be present on the target. This is technically true, as we could get
linked against code that does use the FP registers, but that will not
actually work, as the soft-float code cannot save and restore the FP
registers. SjLj exception handling can only work correctly if either all
or none of the code is built for a target with FP registers. Therefore,
we can assume that, when Int_eh_sjlj_dispatchsetup is compiled for a
soft-float target, it is only going to be linked against other
soft-float code, and so only clobbers the general-purpose registers.
This allows us to check that no non-savable registers are clobbered when
generating the prologue/epilogue.
Differential Revision: https://reviews.llvm.org/D25180
llvm-svn: 283866
Allow instructions such as 'cmp w0, #(end - start)' by folding the
expression into a constant. For ELF, we fold only if the symbols are in
the same section. For MachO, we fold if the expression contains only
symbols that are not linker visible.
Fixes https://llvm.org/bugs/show_bug.cgi?id=18920
Differential Revision: https://reviews.llvm.org/D23834
llvm-svn: 283862
Bot does not like it: http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-fast/builds/17075
/mnt/b/sanitizer-buildbot3/sanitizer-x86_64-linux-fast/build/llvm/test/Object/invalid.test:70:32: error: expected string not found in input
INVALID-SEC-ADDRESS-ALIGNMENT: Invalid address alignment of section headers
^
<stdin>:1:1: note: scanning from here
/mnt/b/sanitizer-buildbot3/sanitizer-x86_64-linux-fast/build/llvm/include/llvm/Object/ELF.h:412:7: runtime error: upcast of misaligned address 0x000002d8b899 for type 'llvm::object::Elf_Shdr_Impl<llvm::object::ELFType<llvm::support::endianness::little, true> >', which requires 2 byte alignment
^
<stdin>:1:125: note: possible intended match here
/mnt/b/sanitizer-buildbot3/sanitizer-x86_64-linux-fast/build/llvm/include/llvm/Object/ELF.h:412:7: runtime error: upcast of misaligned address 0x000002d8b899 for type 'llvm::object::Elf_Shdr_Impl<llvm::object::ELFType<llvm::support::endianness::little, true> >', which requires 2 byte alignment
llvm-svn: 283858
This reverts commit r283842.
test/CodeGen/X86/tail-dup-repeat.ll causes and llc crash with our
internal testing. I'll share a link with you.
llvm-svn: 283857
LLVM's RandomNumberGenerator wasn't compatible with
the random distribution from <random>.
Fixes PR25105
Patch by: Serge Guelton <serge.guelton@telecom-bretagne.eu>
Differential Revision: https://reviews.llvm.org/D25443
llvm-svn: 283854
Summary: This patch sets function as hot if function's entry count is hot/cold.
Reviewers: eraman, davidxl
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D25048
llvm-svn: 283852
This changes MachineRegisterInfo to be initializes after parsing all
instructions. This is in preparation for upcoming commits that allow the
register class specification on the operand or deduce them from the
MCInstrDesc.
This commit removes the unused feature of having nonsequential register
numbers. This was confusing anyway as the vreg numbers would be
different after parsing when you had "holes" in your numbering.
This patch also introduces the concept of an incomplete virtual
register. An incomplete virtual register may be used during .mir parsing
to construct MachineOperands without knowing the exact register class
(or register bank) yet.
NFC except for some error messages.
Differential Revision: https://reviews.llvm.org/D22397
llvm-svn: 283848
The tail duplication pass uses an assumed layout when making duplication
decisions. This is fine, but passes up duplication opportunities that
may arise when blocks are outlined. Because we want the updated CFG to
affect subsequent placement decisions, this change must occur during
placement.
In order to achieve this goal, TailDuplicationPass is split into a
utility class, TailDuplicator, and the pass itself. The pass delegates
nearly everything to the TailDuplicator object, except for looping over
the blocks in a function. This allows the same code to be used for tail
duplication in both places.
This change, in concert with outlining optional branches, allows
triangle shaped code to perform much better, esepecially when the
taken/untaken branches are correlated, as it creates a second spine when
the tests are small enough.
Issue from previous rollback fixed, and a new test was added for that
case as well. Issue was worklist/scheduling/taildup issue in layout.
Issue from 2nd rollback fixed, with 2 additional tests. Issue was
tail merging/loop info/tail-duplication causing issue with loops that share
a header block.
Issue with early tail-duplication of blocks that branch to a fallthrough
predecessor fixed with test case: tail-dup-branch-to-fallthrough.ll
Differential revision: https://reviews.llvm.org/D18226
llvm-svn: 283842
Summary:
Previously, when allocating unspillable live ranges, we would never
attempt to split. We would always bail out and try last ditch graph
recoloring.
This patch changes this by attempting to split all live intervals before
performing recoloring.
This fixes LLVM bug PR14879.
I can't add test cases for any backends other than AVR because none of
them have small enough register classes to trigger the bug.
Reviewers: qcolombet
Subscribers: MatzeB
Differential Revision: https://reviews.llvm.org/D25070
llvm-svn: 283838
When combining an integer load with !range metadata that does not include 0 to a pointer load, make sure emit !nonnull metadata on the newly-created pointer load. This prevents the !nonnull metadata from being dropped during a ptrtoint/inttoptr pair.
This fixes PR30597.
Patch by Ariel Ben-Yehuda!
Differential Revision: https://reviews.llvm.org/D25215
llvm-svn: 283836
This only adds the support for 64-bit vector OR. Adding more sizes is
not difficult, but it requires a bigger refactoring because ORs work on
any size, not necessarly the ones that match the width of the register
width. Right now, this is not expressed in the legalization, so don't
bother pushing the refactoring yet.
llvm-svn: 283831
Previously, there is no way to create a stream other than pre-defined
special stream such as DBI or IPI. This patch adds a new method,
addDbgStream, to add a debug stream to a PDB file.
Differential Revision: https://reviews.llvm.org/D25356
llvm-svn: 283823
Traceback (most recent call last):
File "bin/llvm-lit", line 44, in <module>
lit.main(builtin_parameters)
AttributeError: 'module' object has no attribute 'main'
Suggested by Artem Belevich.
llvm-svn: 283816
This reverts commit r283798, as it causes static asserts on
MSVC 2015 with the following errors:
ArrayRefTest.cpp(38): error C2338: Assigning from single prvalue element
ArrayRefTest.cpp(41): error C2338: Assigning from single xvalue element
ArrayRefTest.cpp(47): error C2338: Assigning from an initializer list
llvm-svn: 283803
llvm::cl already has a function called llvm::apply() so this is
causing an ODR violation. The STLExtras version should win the
vote on which one gets to be called apply() since it is named
after the equivalent STL function, but since renaiming the cl
version is more difficult, let's do this for now to get the
bots green.
llvm-svn: 283800
Without this, the following statements will create ArrayRefs that
refer to temporary storage that goes out of scope by the end of the
line:
someArrayRef = getSingleElement();
someArrayRef = {elem1, elem2};
Note that the constructor still has this problem:
ArrayRef<Element> someArrayRef = getSingleElement();
ArrayRef<Element> someArrayRef = {elem1, elem2};
but that's a little harder to get rid of because we want to be able to
use this in calls:
takesArrayRef(getSingleElement());
takesArrayRef({elem1, elem2});
Part of rdar://problem/16375365. Reviewed by Duncan Exon Smith.
llvm-svn: 283798
Add integer expansion for FLT_ROUNDS_ for targets where i32 is not a legal
type.
Patch by Edward Jones, thanks!
Differential Revision: https://reviews.llvm.org/D24459
llvm-svn: 283797
This is equivalent to the C++14 std::apply(). Since we are not
using C++14 yet, this allows us to still make use of apply anyway.
Differential revision: https://reviews.llvm.org/D25100
llvm-svn: 283779
Summary: The keys must still be copyable, because we store two copies of them.
Reviewers: timshen
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D25404
llvm-svn: 283764
The instructions VLDM/VSTM can only access word-aligned memory
locations and produce alignment fault if the condition is not met.
The compiler currently generates VLDM/VSTM for v2f64 load/store
regardless the alignment of the memory access. Instead, if a v2f64
load/store is not word-aligned, the compiler should generate
VLD1/VST1. For each non double-word-aligned VLD1/VST1, a VREV
instruction should be generated when targeting Big Endian.
Differential Revision: https://reviews.llvm.org/D25281
llvm-svn: 283763
Summary:
Rotate by 1 is translated to 1 micro-op, while rotate with imm8 is translated to 2 micro-ops.
Fixes pr30644.
Reviewers: delena, igorb, craig.topper, spatel, RKSimon
Differential Revision: https://reviews.llvm.org/D25399
llvm-svn: 283758
Fixed copy+paste vector alignment to correct for per-element scalar loads
Increased to 512-bit data sizes in preparation of avx512 tests
llvm-svn: 283748
sections_begin() may return unalignment pointer when Header->e_shoff isinvalid.
That may result in a crash in clients, for example we have one in LLD:
assert((PtrWord & ~PointerBitMask) == 0 &&
"Pointer is not sufficiently aligned");
fails when trying to push_back Elf_Shdr* (unaligned) into TinyPtrVector.
Patch forces check for alignment of Header->e_shoff.
Differential revision: https://reviews.llvm.org/D25368
llvm-svn: 283740
Commit in the name of:Coby Tayree
1.'v' constraint for (x86) non-avx arch imitates the already implemented 'x' constraint, i.e. allows XMM{0-15} & YMM{0-15} depending on the apparent arch & mode (32/64).
2.for the avx512 arch it allows [X,Y,Z]MM{0-31} (mode dependent)
This patch applies the needed changes to clang
clang patch: https://reviews.llvm.org/D25004
Differential Revision: D25005
llvm-svn: 283717
Summary:
Using Python linter flake8 on the utils/lit reveals several linter
warnings designated "F401: Unused import". Fix or silence these
warnings.
Some of these unused imports are legitimate, while some are part of lit's API.
For example, users of lit expect to be able to access `lit.formats.ShTest` in
their `lit.cfg`, despite the module hierarchy for that symbol actually being
`lit.formats.shtest.ShTest`. To silence linter errors for these lines,
include a "noqa" directive.
Reviewers: echristo, delcypher, beanz, ddunbar
Subscribers: mehdi_amini, llvm-commits
Differential Revision: https://reviews.llvm.org/D25407
llvm-svn: 283710
Summary:
`TestingProgressDisplay` initializes its `current` attribute to `None`, but
never reads or writes the value again. Remove it.
Reviewers: echristo, delcypher, beanz, ddunbar
Subscribers: llvm-commits, mehdi_amini
Differential Revision: https://reviews.llvm.org/D25415
llvm-svn: 283709
Summary:
`ArgumentError` is not defined by the Python standard library.
Executing this line of code would throw a exception, but not the
intended one. It would throw a `NameError` exception, since `ArgumentError`
is undefined.
Use `ValueError` instead, which is defined by the Python standard
library.
Reviewers: echristo, delcypher, beanz, ddunbar
Subscribers: llvm-commits, mehdi_amini
Differential Revision: https://reviews.llvm.org/D25410
llvm-svn: 283708
Summary:
Semicolons aren't necessary as statement terminators in Python, and
each of these uses are superfluous as they appear at the end of a line.
The convention is to not use semicolons where not needed, so remove them.
Reviewers: echristo, delcypher, beanz, ddunbar
Subscribers: llvm-commits, mehdi_amini
Differential Revision: https://reviews.llvm.org/D25409
llvm-svn: 283707
Summary: `prefix` is written to but never read.
Reviewers: echristo, delcypher, beanz, ddunbar
Subscribers: llvm-commits, mehdi_amini
Differential Revision: https://reviews.llvm.org/D25408
llvm-svn: 283706
Summary:
The minimum version of Python required to run LLVM's test suite is 2.7.
Remove a workaround for older Python versions.
Reviewers: echristo, delcypher, beanz, ddunbar
Subscribers: llvm-commits, mehdi_amini
Differential Revision: https://reviews.llvm.org/D25400
llvm-svn: 283705
Masked-expand-load node represents load operation that loads a variable amount of elements from memory according to amount of "true" bits in the mask and expands the loaded elements according to their position in the mask vector.
Right now, the node is used in intrinsics for VEXPAND* instructions.
The work is done towards implementation of masked.expandload and masked.compressstore intrinsics.
Differential Revision: https://reviews.llvm.org/D25322
llvm-svn: 283694
The core of the change is supposed to be NFC, however it also fixes
what I believe was an undefined behavior when calling:
va_start(ValueArgs, Desc);
with Desc being a StringRef.
Differential Revision: https://reviews.llvm.org/D25342
llvm-svn: 283671
This seems to have been responsible for the XMM16-31 spills observed in PR29112. With this fixed the test case has been modified to no longer have a spill of XMM16.
llvm-svn: 283668
Summary:
When there is a call to an alias in the same module, we were not
adding a call edge. So we could incorrectly think that the alias
was dead if it was inlined in that function, despite having a
reference imported elsewhere. This resulted in unsats at link time.
Add a call edge when the call is to an alias.
Reviewers: davide, mehdi_amini
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D25384
llvm-svn: 283664
Avoid generating indexed vector instructions for Exynos. This is needed for
fmla/fmls/fmul/fmulx. For example, the instruction
fmla v0.4s, v1.4s, v2.s[1]
is less efficient than the instructions
dup v2.4s, v2.s[1]
fmla v0.4s, v1.4s, v2.4s
Patch written by Abderrazek Zaafrani.
Differential Revision: https://reviews.llvm.org/D21571
llvm-svn: 283663
Value names may be prefixed with a binary '1' to indicate that the
backend should not modify the symbols due to any platform naming
convention.
This should not show up in the YAML opt record file because it breaks
the YAML parser.
llvm-svn: 283656
Clang always emit a hash for ThinLTO, but as other frontend are
starting to use ThinLTO, this could be a serious bug.
Differential Revision: https://reviews.llvm.org/D25379
llvm-svn: 283655
We need to add an entry in the combined-index for modules that have
a hash but otherwise empty summary, this is needed so that we can
get the hash for the module.
Also, if no entry is present in the combined index for a module, we
need to skip it when trying to compute a cache entry.
Differential Revision: https://reviews.llvm.org/D25300
llvm-svn: 283654
This is the first step towards round-tripping symbol information,
and thusly being able to write symbol information to a PDB.
This patch writes the symbol information for each compiland to
the Yaml when running in pdb2yaml mode. There's still some loose
ends, such as what to do about relocations (necessary in order to
print linkage names), how to print enums with friendly names, and
how to give the dumper access to the StringTable, but this is a
good first start.
llvm-svn: 283641
Once MULHS was expanded, this exposed an issue where the condition
register was thought to be 16-bit. This caused an attempt to copy a
16-bit register to an 8-bit register.
Authored by Jake Goulding
llvm-svn: 283634
Because screen space is precious, if an optimization (vectorization, for
example) never happens, don't leave empty space for the associated markers on
every line of the output. This makes the output much more compact, and allows
for the later inclusion of markers for more (although perhaps rare)
optimizations.
llvm-svn: 283626
Summary:
If heap allocation of a coroutine is elided, we need to make sure that we will update an address stored in the coroutine frame from f.destroy to f.cleanup.
Before this change, CoroSplit synthesized these stores after coro.begin:
```
store void (%f.Frame*)* @f.resume, void (%f.Frame*)** %resume.addr
store void (%f.Frame*)* @f.destroy, void (%f.Frame*)** %destroy.addr
```
In those cases where we did heap elision, but were not able to devirtualize all indirect calls, destroy call will attempt to "free" the coroutine frame stored on the stack. Oops.
Now we use select to put an appropriate coroutine subfunction in the destroy slot. As bellow:
```
store void (%f.Frame*)* @f.resume, void (%f.Frame*)** %resume.addr
%0 = select i1 %need.alloc, void (%f.Frame*)* @f.destroy, void (%f.Frame*)* @f.cleanup
store void (%f.Frame*)* %0, void (%f.Frame*)** %destroy.addr
```
Reviewers: majnemer
Subscribers: mehdi_amini, llvm-commits
Differential Revision: https://reviews.llvm.org/D25377
llvm-svn: 283625
The tail duplication pass uses an assumed layout when making duplication
decisions. This is fine, but passes up duplication opportunities that
may arise when blocks are outlined. Because we want the updated CFG to
affect subsequent placement decisions, this change must occur during
placement.
In order to achieve this goal, TailDuplicationPass is split into a
utility class, TailDuplicator, and the pass itself. The pass delegates
nearly everything to the TailDuplicator object, except for looping over
the blocks in a function. This allows the same code to be used for tail
duplication in both places.
This change, in concert with outlining optional branches, allows
triangle shaped code to perform much better, esepecially when the
taken/untaken branches are correlated, as it creates a second spine when
the tests are small enough.
Issue from previous rollback fixed, and a new test was added for that
case as well. Issue was worklist/scheduling/taildup issue in layout.
Issue from 2nd rollback fixed, with 2 additional tests. Issue was
tail merging/loop info/tail-duplication causing issue with loops that share
a header block.
Differential revision: https://reviews.llvm.org/D18226
llvm-svn: 283619
The code used llvm basic block predecessors to decided where to insert phi
nodes. Instruction selection can and will liberally insert new machine basic
block predecessors. There is not a guaranteed one-to-one mapping from pred.
llvm basic blocks and machine basic blocks.
Therefore the current approach does not work as it assumes we can mark
predecessor machine basic block as needing a copy, and needs to know the set of
all predecessor machine basic blocks to decide when to insert phis.
Instead of computing the swifterror vregs as we select instructions, propagate
them at the end of instruction selection when the MBB CFG is complete.
When an instruction needs a swifterror vreg and we don't know the value yet,
generate a new vreg and remember this "upward exposed" use, and reconcile this
at the end of instruction selection.
This will only happen if the target supports promoting swifterror parameters to
registers and the swifterror attribute is used.
rdar://28300923
llvm-svn: 283617
Type visitor code had already been refactored previously to
decouple the visitor and the visitor callback interface. This
was necessary for having the flexibility to visit in different
ways (for example, dumping to yaml, reading from yaml, dumping
to ScopedPrinter, etc).
This patch merely implements the same visitation pattern for
symbol records that has already been implemented for type records.
llvm-svn: 283609
Summary: Add tests for cases where we have zero coverage in RS4GC.
Reviewers: sanjoy, reames
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D25341
llvm-svn: 283591
If we're going to canonicalize IR towards select of constants, try harder to create those.
Also, don't lose the metadata.
This is actually 4 related transforms in one patch:
// select X, (sext X), C --> select X, -1, C
// select X, (zext X), C --> select X, 1, C
// select X, C, (sext X) --> select X, C, 0
// select X, C, (zext X) --> select X, C, 0
Differential Revision: https://reviews.llvm.org/D25126
llvm-svn: 283575
This is a new tool built on top of the new YAML ouput generated from
optimization remarks. It produces HTML for easy navigation and
visualization.
The tool assumes that hotness information for the remarks is available
(the YAML file was produced with PGO). It uses hotness to list the
remarks prioritized by the hotness on the index page. Clicking the
source location of the remark in the list takes you the source where the
remarks are rendedered inline in the source.
For now, the tool is meant as prototype.
It's written in Python. It uses PyYAML to parse the input.
Differential Revision: https://reviews.llvm.org/D25348
llvm-svn: 283571
Summary: -fsample-profile needs discriminator, which will not be added if built with -g0. This patch makes sure the discriminator is added for sample-profile at -g0. A followup patch will be send out to update clang tests.
Reviewers: davidxl, dblaikie, echristo, dnovillo
Subscribers: mehdi_amini, probinson, llvm-commits
Differential Revision: https://reviews.llvm.org/D25132
llvm-svn: 283565
Previously, we marked the branch conditions of latch blocks uniform after
vectorization if they were instructions contained in the loop. However, if a
condition instruction has users other than the branch, it may not remain
uniform. This patch ensures the conditions we mark uniform are only used by the
branch. This should fix PR30627.
Reference: https://llvm.org/bugs/show_bug.cgi?id=30627
llvm-svn: 283563
Summary:
While walking defs of pointer operands we were assuming that the pointer
size would remain constant. This is not true, because addresspacecast
instructions may cast the pointer to an address space with a different
pointer width.
This partial reverts r282612, which was a more conservative solution
to this problem.
Reviewers: reames, sanjoy, apilipenko
Subscribers: wdng, llvm-commits
Differential Revision: https://reviews.llvm.org/D24772
llvm-svn: 283557
Reapplying r283383 after revert in r283442. The additional fix
is a getting rid of a stray space in a function name, in the
refactoring part of the commit.
This avoids falling back to calling out to the GCC rem functions
(__moddi3, __umoddi3) when targeting Windows.
The __rt_div functions have flipped the two arguments compared
to the __aeabi_divmod functions. To match MSVC, we emit a
check for division by zero before actually calling the library
function (even if the library function itself also might do
the same check).
Not all calls to __rt_div functions for division are currently
merged with calls to the same function with the same parameters
for the remainder. This is more wasteful than a div + mls as before,
but avoids calls to __moddi3.
Differential Revision: https://reviews.llvm.org/D25332
llvm-svn: 283550
MOVSD/MOVSS take a 128-bit register and a FR32/FR64 register input, the commutation code wasn't taking this into account leading to verification errors.
This patch inserts a vreg copy mi to ensure that the registers are correct.
Fix for PR30607
Differential Revision: https://reviews.llvm.org/D25280
llvm-svn: 283539
unrolling.
The next code is not vectorized by the SLPVectorizer:
```
int test(unsigned int *p) {
int sum = 0;
for (int i = 0; i < 8; i++)
sum += p[i];
return sum;
}
```
During optimization this loop is fully unrolled and SLPVectorizer is
unable to vectorize it. Patch tries to fix this problem.
Differential Revision: https://reviews.llvm.org/D24796
llvm-svn: 283535
With the ROPI and RWPI relocation models we can't always have pointers
to global data or functions in constant data, so don't try to convert switches
into lookup tables if any value in the lookup table would require a relocation.
We can still safely emit lookup tables of other values, such as simple
constants.
Differential Revision: https://reviews.llvm.org/D24462
llvm-svn: 283530
Summary:
There was a bug with sequences like
s_mov_b64 s[0:1], exec
s_and_b64 s[2:3]<def>, s[0:1], s[2:3]<kill>
...
s_mov_b64_term exec, s[2:3]
because s[2:3] was defined and used in the same instruction, ending up with
SaveExecInst inside OtherUseInsts.
Note that the test case also exposes an unrelated bug.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98028
Reviewers: tstellarAMD, arsenm
Subscribers: kzhuravl, wdng, yaxunl, llvm-commits, tony-tye
Differential Revision: https://reviews.llvm.org/D25306
llvm-svn: 283528
Summary:
This class deals with the lowering of CodeGen `MachineInstr` objects to
MC `MCInst` objects.
Reviewers: kparzysz, arsenm
Subscribers: wdng, beanz, japaric, mgorny
Differential Revision: https://reviews.llvm.org/D25269
llvm-svn: 283522
In the left part of the reports, we have things like U<number>; if some of
these numbers use more digits than others, we don't want a space in between the
U and the start of the number. Instead, the space should come afterward. This
way it is clear that the number goes with the U and not any other optimization
indicator that might come later on the line.
Tests committed in r283518.
llvm-svn: 283519
In the left part of the reports, we have things like U<number>; if some of
these numbers use more digits than others, we don't want a space in between the
U and the start of the number. Instead, the space should come afterward. This
way it is clear that the number goes with the U and not any other optimization
indicator that might come later on the line.
llvm-svn: 283518
GetCaseResults assumed that a terminator with one successor was an
unconditional branch. This is not necessarily the case, it could be a
cleanupret.
Strengthen the check by querying whether or not the terminator is
exceptional.
llvm-svn: 283517
Vectorizer tests in the target-independent directory should not have a target
triple. If a test really needs to query a specific backend, it belongs in the
right target subdirectory (which "REQUIRES" the right backend). Otherwise, it
should not specify a triple.
llvm-svn: 283512
Summary:
I had for the second time today a bug where llvm::format("%s", Str)
was called with Str being a StringRef. The Linux and MacOS bots were
fine, but windows having different calling convention, it printed
garbage.
Instead we can catch this at compile-time: it is never expected to
call a C vararg printf-like function with non scalar type I believe.
Reviewers: bogner, Bigcheese, dexonsmith
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D25266
llvm-svn: 283509
Per spec changes, this implements block signatures, and adds just enough
logic to produce correct block signatures at the ends of functions.
Differential Revision: https://reviews.llvm.org/D25144
llvm-svn: 283503
Per spec changes, store instructions in WebAssembly no longer have a return
value. Update the instruction descriptions.
Differential Revision: https://reviews.llvm.org/D25122
llvm-svn: 283501
Summary:
These nodes need legalization for 3-element vectors. This commit
handles the legalization and adds tests for zext and sext.
This fixes PR30614.
Reviewers: RKSimon, srhines
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D25268
llvm-svn: 283496
Add a weak alias to the renamed Comdat function in IR level instrumentation,
using it's original name. This ensures the same behavior w/ and w/o IR
instrumentation, even for non standard conforming code.
Differential Revision: http://reviews.llvm.org/D25339
llvm-svn: 283490
When replacing FrameIndex with BasePtr, we must preserve BasePtr for
LEA64_32r since BasePtr is used later for stack adjustment if it is
the same as StackPtr.
Patch by H.J Lu <hjl.tools@gmail.com>
Differential Revision: https://reviews.llvm.org/D23575
llvm-svn: 283486
This generalizes the build_vector -> vector_shuffle combine to support any
number of inputs. The idea is to create a binary tree of shuffles, where
the first layer performs pairwise shuffles of the input vectors placing each
input element into the correct lane, and the rest of the tree blends these
shuffles together.
This doesn't try to be smart and create any sort of "optimal" shuffles.
The assumption is that even a "poor" shuffle sequence is better than extracting
and inserting the elements one by one.
Differential Revision: https://reviews.llvm.org/D24683
llvm-svn: 283480
This adds a new function to DebugInfo.cpp that takes an llvm::Module
as input and removes all debug info metadata that is not directly
needed for line tables, thus effectively stripping all type and
variable information from the module.
The primary motivation for this feature was the bitcode work flow
(cf. http://lists.llvm.org/pipermail/llvm-dev/2016-June/100643.html
for more background). This is not wired up yet, but will be in
subsequent patches. For testing, the new functionality is exposed to
opt with a -strip-nonlinetable-debuginfo option.
The secondary use-case (and one that works right now!) is as a
reduction pass in bugpoint. I added two new bugpoint options
(-disable-strip-debuginfo and -disable-strip-debug-types) to control
the new features. By default it will first attempt to remove all debug
information, then only the type info, and then proceed to hack at any
remaining MDNodes.
llvm-svn: 283473
When constant folding an operation to a copy or an immediate
mov, the implicit uses/defs of the old instruction were left behind,
e.g. replacing v_or_b32 left the implicit exec use on the new copy.
llvm-svn: 283471
Summary:
The acronym PR could be ambiguous to some users, especially those who
are used to interpreting it as GitHub's "pull request".
Reviewers: ddunbar, jordan_rose, void, beanz
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D25331
llvm-svn: 283465
Change erroneous parsing of push immediate instructions in intel syntax
to default to pointer size by rewriting into the ATT style for matching.
This fixes PR22028.
Reviewers: majnemer, rnk
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D25288
llvm-svn: 283457
When there are multiple optimizations on one line, record the vectorization
factors, etc. correctly (instead of incorrectly substituting default values).
llvm-svn: 283443
This reverts commit r283383 because it broke some of the bots:
undefined reference to ` __aeabi_uldivmod'
It affected (at least) clang-cmake-armv7-a15-selfhost,
clang-cmake-armv7-a15-selfhost and clang-native-arm-lnt.
llvm-svn: 283442
These ones need to have the size on the pseudo instruction set for
getInstSizeInBytes to work correctly. These also have a statically
known size.
llvm-svn: 283437
Summary:
The computeKnownBits and ComputeNumSignBits functions in ValueTracking can now do a simple look-through of ExtractElement.
Reviewers: majnemer, spatel
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D24955
llvm-svn: 283434
Global variables are GlobalValues, so they have explicit alignment. Querying
DataLayout for the alignment was incorrect.
Testcase added.
llvm-svn: 283423
We can work around a shortcoming of FileCheck by using {{\[}} to match a square
bracket before a [[ sequence.
Thanks to Eli Friedman for the heads up!
llvm-svn: 283422