load commands that use the MachO::routines_command and
and MachO::routines_command_64 types but are not used in llvm
libObject code but used in llvm tool code.
This includes the LC_ROUTINES and LC_ROUTINES_64
load commands.
llvm-svn: 284504
This is a follow-up to D24816 - where we changed reciprocal estimates to be function attributes
rather than TargetOptions.
This patch is intended to be a structural, but not functional change. By moving all of the
TargetRecip functionality into TargetLowering, we can remove all of the reciprocal estimate
state, shield the callers from the string format implementation, and simplify/localize the
logic needed for a target to enable this.
If a function has a "reciprocal-estimates" attribute, those settings may override the target's
default reciprocal preferences for whatever operation and data type we're trying to optimize.
If there's no attribute string or specific setting for the op/type pair, just use the target
default settings.
As noted earlier, a better solution would be to move the reciprocal estimate settings to IR
instructions and SDNodes rather than function attributes, but that's a multi-step job that
requires infrastructure improvements. I intend to work on that, but it's not clear how long
it will take to get all the pieces in place.
Differential Revision: https://reviews.llvm.org/D25440
llvm-svn: 284495
This patch teaches ias for mips to handle expressions such as
(8*4)+(8*31)($sp). Such expression typically occur from the expansion
of multiple macro definitions.
This partially resolves PR/30383.
Thanks to Sean Bruno for reporting the issue!
Reviewers: zoran.jovanovic, vkalintiris
Differential Revision: https://reviews.llvm.org/D24667
llvm-svn: 284485
The 'sync' instruction for MIPS was defined in MIPS-II as taking no operands.
MIPS32 extended the define of 'sync' as taking an optional unsigned 5 bit
immediate.
This patch correct the definition of sync so that it is accepted with an
operand of 0 or no operand for MIPS-II to MIPS-V, and a 5 bit unsigned
immediate for MIPS32 and later revisions.
Additionally a clear error is given when the MIPS32 version of sync is
used when targeting pre MIPS32.
This partially resolves PR/30714.
Thanks to Daniel Sanders for reporting this issue!
Reveiwers: vkalintiris
Differential Revision: https://reviews.llvm.org/D25672
llvm-svn: 284483
In futher patches we shall have alignment field added to DIVariable family
and switching from uint64_t to uint32_t will save 4 bytes per variable.
Differential Revision: https://reviews.llvm.org/D25620
llvm-svn: 284482
ld and sd when assembled for the O32 ABI expand to a pair of 32 bit word loads
or stores using the specified source or destination register and the next
register.
This patch does not add support for the cases where the offset is greater than
a 16 bit signed immediate as that would lead to a wrong/misleading error
message as the assembler would report "instruction requires a CPU feature
not currently enabled" for ld & sd for MIPS64 when their offset is not a signed
16 bit number.
This fixes PR/29159.
Thanks to Sean Bruno for reporting this issue!
Reviewers: vkalintiris, seanbruno, zoran.jovanovic
Differential Review: https://reviews.llvm.org/D24556
llvm-svn: 284481
Committing on behalf of Coby Tayree: After check-all and LGTM
Desc:
AVX512 allows dest operand to be followed by an op-mask register specifier ('{k<num>}', which in turn may be followed by a merging/zeroing specifier ('{z}')
Currently, the following forms are allowed:
{k<num>}
{k<num>}{z}
This patch allows the following forms:
{z}{k<num>}
and ignores the next form:
{z}
Justification would be quite simple - GCC
Differential Revision: http://reviews.llvm.org/D25013
llvm-svn: 284479
Summary:
Instead of instantiating the MipsFastISel class and checking if the
target is supported in the overriden methods, we should perform that
check before creating the class. This allows us to enable FastISel *only*
for targets that truly support it, ie. MIPS32 to MIPS32R5.
Reviewers: sdardis
Subscribers: ehostunreach, llvm-commits
Differential Revision: https://reviews.llvm.org/D24824
llvm-svn: 284475
In loops that look something like
i = n;
do {
...
} while(i++ < n+k);
where k is a constant, the maximum backedge count is k (in fact the backedge
count will be either 0 or k, depending on whether n+k wraps). More generally
for LHS < RHS if RHS-(LHS of first comparison) is a constant then the loop will
iterate either 0 or that constant number of times.
This allows for more loop unrolling with the recent upper bound loop unrolling
changes, and I'm working on a patch that will let loop unrolling additionally
make use of the loop being executed either 0 or k times (we need to retain the
loop comparison only on the first unrolled iteration).
Differential Revision: https://reviews.llvm.org/D25607
llvm-svn: 284465
This reverts commits 284436 and 284437 because they still break AArch64 bots:
Value of: format_number(-10, IntegerStyle::Integer, 1)
Actual: "-0"
Expected: "-10"
llvm-svn: 284462
This patch assigns cost of the scaling used in addressing for Cortex-R52.
On Cortex-R52 a negated register offset takes longer than a non-negated
register offset, in a register-offset addressing mode.
Differential Revision: http://reviews.llvm.org/D25670
Reviewer: jmolloy
llvm-svn: 284460
As discussed on PR28461 we currently miss the chance to lower "fptosi <2 x double> %arg to <2 x i32>" to cvttpd2dq due to its use of illegal types.
This patch adds support for fptosi to 2i32 from both 2f64 and 2f32.
It also recognises that cvttpd2dq zeroes the upper 64-bits of the xmm result (similar to D23797) - we still don't do this for the cvttpd2dq/cvttps2dq intrinsics - this can be done in a future patch.
Differential Revision: https://reviews.llvm.org/D23808
llvm-svn: 284459
This patch adds simplified support for tail calls on ARM with XRay instrumentation.
Known issue: compiled with generic flags: `-O3 -g -fxray-instrument -Wall
-std=c++14 -ffunction-sections -fdata-sections` (this list doesn't include my
specific flags like --target=armv7-linux-gnueabihf etc.), the following program
#include <cstdio>
#include <cassert>
#include <xray/xray_interface.h>
[[clang::xray_always_instrument]] void __attribute__ ((noinline)) fC() {
std::printf("In fC()\n");
}
[[clang::xray_always_instrument]] void __attribute__ ((noinline)) fB() {
std::printf("In fB()\n");
fC();
}
[[clang::xray_always_instrument]] void __attribute__ ((noinline)) fA() {
std::printf("In fA()\n");
fB();
}
// Avoid infinite recursion in case the logging function is instrumented (so calls logging
// function again).
[[clang::xray_never_instrument]] void simplyPrint(int32_t functionId, XRayEntryType xret)
{
printf("XRay: functionId=%d type=%d.\n", int(functionId), int(xret));
}
int main(int argc, char* argv[]) {
__xray_set_handler(simplyPrint);
printf("Patching...\n");
__xray_patch();
fA();
printf("Unpatching...\n");
__xray_unpatch();
fA();
return 0;
}
gives the following output:
Patching...
XRay: functionId=3 type=0.
In fA()
XRay: functionId=3 type=1.
XRay: functionId=2 type=0.
In fB()
XRay: functionId=2 type=1.
XRay: functionId=1 type=0.
XRay: functionId=1 type=1.
In fC()
Unpatching...
In fA()
In fB()
In fC()
So for function fC() the exit sled seems to be called too much before function
exit: before printing In fC().
Debugging shows that the above happens because printf from fC is also called as
a tail call. So first the exit sled of fC is executed, and only then printf is
jumped into. So it seems we can't do anything about this with the current
approach (i.e. within the simplification described in
https://reviews.llvm.org/D23988 ).
Differential Revision: https://reviews.llvm.org/D25030
llvm-svn: 284456
When Error was threaded through these APIs back in r265606 the
"return" was missed here, which triggers a warning if/when I add
LLVM_NODISCARD to the Error type.
llvm-svn: 284454
Summary: This is especially important for 32-bit targets with 64-bit shuffle elements.This is similar to how PSHUFB and VPERMIL handle the same problem.
Reviewers: RKSimon
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D25666
llvm-svn: 284451
Summary:
There are differences in codegen between Linux and Windows due to:
1. Using std::sort which uses quicksort which is a non-stable sort.
2. Iterating over Set data structure where the iteration order is
non deterministic.
Reviewers: arsenm, grosbach, junbuml, zinob, MatzeB
Subscribers: MatzeB, wdng
Differential Revision: https://reviews.llvm.org/D25695
llvm-svn: 284441
This resubmits commits 284425 and r284428, which were reverted
in r284429 due to some infinite recursion caused by an incorrect
selection of function overloads. Reproduced the failure on Linux
using GCC 4.8.4, and confirmed that with the new patch the tests
path on GCC as well as MSVC. So hopefully this fixes everything.
llvm-svn: 284436
Summary:
Reclaiming the name 'CachedHashString' will let us add a type with that
name that owns its value.
Reviewers: timshen
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D25644
llvm-svn: 284434
load commands that use the MachO::sub_framework_command,
MachO::sub_umbrella_command, MachO::sub_library_command
and MachO::sub_client_command types but are not used in llvm
libObject code but used in llvm tool code.
This includes the LC_SUB_FRAMEWORK, LC_SUB_UMBRELLA,
LC_SUB_LIBRARY and LC_SUB_CLIENT load commands.
llvm-svn: 284431
raw_ostream has not afforded a lot of flexibility in terms of
how to format numbers when outputting. Wrap this all up into
a set of low level helper functions that can be used to output
numbers with arbitrary precision, alignment, format, etc and
then update raw_ostream to use these functions.
This will be useful for upcoming improvements to llvm's string
formatting libraries, but are still useful independently.
Differential Revision: https://reviews.llvm.org/D25497
llvm-svn: 284425
As noted in:
https://reviews.llvm.org/D25685
This is the next-to-smallest step needed to enable the ComputeNumSignBits fix in that patch.
In a minor attempt to keep some structure, we're pulling the FP helper over along with its
integer sibling, but clearly we can and should do more refactoring of the similar helper
functions in DAGCombiner and SelectionDAG to simplify and not duplicate functionality.
llvm-svn: 284421
If -coverage is passed, but -g is not, clang populates the PassManager
pipeline with StripSymbols(debugOnly = true).
The stripSymbol pass therefore scans the list of named metadata,
drops !llvm.dbg.cu, but leaves !llvm.gcov and !0 (the compileUnit MD)
around. The verifier runs, and finds out that there's a CU not listed
in !llvm.dbg.cu (as it was previously dropped) -> crash.
When we strip debug info, so, check if there's coverage data,
and strip it as well, in order to avoid pending metadata left around.
Differential Revision: https://reviews.llvm.org/D25689
llvm-svn: 284418
Summary: Debug info should *not* affect code generation. This patch properly handles debug info to make sure the generated code are the same with or without debug info.
Reviewers: davidxl, mzolotukhin, jmolloy
Subscribers: aprantl, llvm-commits
Differential Revision: https://reviews.llvm.org/D25286
llvm-svn: 284415
Summary:
This adds the necessary logic to support relocations to thumb functions in the COFF dynamic linker.
The jumps to function addresses are mostly blx, which requires the ISA selection bit when jumping to a thumb function.
Note: I'm determining if the relocation requires the ISA bit when creating the relocation entries and not when resolving the relocation. I have to do that because I need the ObjectFile and the actual Symbol, which are available only when creating the entries. It would require a gross refactor if I do it otherwise, but I'm okay with doing it if you think it's better.
Reviewers: peter.smith, compnerd
Subscribers: rengolin, sas
Differential Revision: https://reviews.llvm.org/D25151
llvm-svn: 284410
Summary:
If we are loading an i16 value from a 32-bit memory location, then
we need to be able to truncate the loaded value to i16.
Reviewers: arsenm
Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, tony-tye, llvm-commits
Differential Revision: https://reviews.llvm.org/D25198
llvm-svn: 284397
This came up as part of:
https://reviews.llvm.org/D25485
Note that the vector case is missed because ComputeNumSignBits() is deficient for vectors.
llvm-svn: 284395
Based on post-commit review for D25585/r284180, rename
hardware_physical_concurrency to heavyweight_hardware_concurrency,
to better reflect what type of tasks it should be used for and
to enable other systems to map this to something other than the
number of physical cores.
llvm-svn: 284390
/../foo is still a proper path after removing the dotdot. This should
now finally match https://9p.io/sys/doc/lexnames.html [Cleaning names].
llvm-svn: 284384
SelectionDAG::getConstantPool will automatically determine an appropriate alignment if one is not specified. It does this by querying the type's preferred alignment. This can end up creating quite a lot of padding when the preferred alignment for vectors is 128.
In optimize-for-size mode, it makes sense to instead query the ABI type alignment which is often smaller and causes less padding.
llvm-svn: 284381
Not all ConstantExprs can be represented by a global variable, for example most
pointer arithmetic other than addition of a constant, so we can't convert these
values from switch statements to lookup tables.
Differential Revision: https://reviews.llvm.org/D25550
llvm-svn: 284379
Summary: The delinearization algorithm did not consider terms which had an extension without a multiply factor, i.e. a identify factor. We lose cases where size is char type where there will no multiply factor.
Reviewers: sanjoy, grosser
Subscribers: mzolotukhin, Eugene.Zelenko, llvm-commits, mssimpso, sanjoy, grosser
Differential Revision: https://reviews.llvm.org/D16492
llvm-svn: 284378
CodeGenPrepare knows how to move a zext of a load into the same basic block
where the load lives. The goal is to help ISel match a zero-extending load
instead of two separated instructions.
CGP attempts to move a zext computation even if it lives in a basic block that
does not post-dominate the load's basic block. That means, the hoisted zext may
be speculated. Preserving the zext location would hurt the debugging experience
and the quality of sample pgo.
With this patch, when moving a zext near to its associated load, CGP no longer
propagates the zext's debug location. Instead, CGP conservatively reuses the
same debug location for the load and the zext.
An alternative approach would be to assign an artificial line-0 location to the
zext. However we don't want to over-use the 'line-0' for this particular case
because it would have a size cost in the line-table section for no additional
benefit.
Differential Revision: https://reviews.llvm.org/D25611
llvm-svn: 284377
In theory this could be generalized to move anything where
we prove the operands are available, but that would require
rewriting PRE. As NewGVN will hopefully come soon, and we're
trying to rewrite PRE in terms of NewGVN+MemorySSA, it's probably
not worth spending too much time on it. Fix provided by
Daniel Berlin!
llvm-svn: 284311
BasicBlock::size is O(insts), making this loop O(blocks*insts), which
can be really slow on generated code. getPrevNode already checks if
we're at the beginning of the block and returns nullptr if so, just use
that instead. No functionality change intended.
llvm-svn: 284303
- Removed unused class members.
- Made class internal data private.
- Made class scoped data function scoped where it's possible.
- Replace naked new/delete with unique_ptr.
- Made resources guaranteed to be freed.
Differential Revision: https://reviews.llvm.org/D25464
llvm-svn: 284290
The previous names were both misleading (the MachineLegalizer actually
contained the info tables) and inconsistent with the selector & translator (in
having a "Machine") prefix. This should make everything sensible again.
The only functional change is the name of a couple of command-line options.
llvm-svn: 284287
This is a patch to implement pr30640.
When a 64bit constant has the same hi/lo words, we can use rldimi to copy the low word into high word of the same register.
This optimization caused failure of test case bperm.ll because of not optimal heuristic in function SelectAndParts64. It chooses AND or ROTATE to extract bit groups from a register, and OR them together. This optimization lowers the cost of loading 64bit constant mask used in AND method, and causes different code sequence. But actually ROTATE method is better in this test case. The reason is in ROTATE method the final OR operation can be avoided since rldimi can insert the rotated bits into target register directly. So this patch also enhances SelectAndParts64 to prefer ROTATE method when the two methods have same cost and there are multiple bit groups need to be ORed together.
Differential Revision: https://reviews.llvm.org/D25521
llvm-svn: 284276
Eli noted this potential bug in the post-commit thread for:
https://reviews.llvm.org/rL284239
...but I'm not sure how to trigger it, so there's no test case yet.
llvm-svn: 284268
Summary:
We are using this helper for our 24-bit arithmetic combines, so we are now able to eliminate multi-use operations that mask the high-bits of 24-bit inputs (e.g. and x, 0xffffff)
Reviewers: arsenm, nhaehnle
Subscribers: tony-tye, arsenm, kzhuravl, wdng, nhaehnle, llvm-commits, yaxunl
Differential Revision: https://reviews.llvm.org/D24672
llvm-svn: 284267
Summary:
The main purpose of this new helper is to enable simplifying operations that
have multiple uses. SimplifyDemandedBits does not handle multiple uses
currently, and this new function makes it possible to optimize:
and v1, v0, 0xffffff
mul24 v2, v1, v1 ; Multiply ignoring high 8-bits.
To:
mul24 v2, v0, v0
Where before this would not be optimized, because v1 has multiple uses.
Reviewers: bogner, arsenm
Subscribers: nhaehnle, wdng, llvm-commits
Differential Revision: https://reviews.llvm.org/D24964
llvm-svn: 284266
X86. The pass optimizes as a unit the entire wide load + shuffles pattern
produced by interleaved vectorization. This initial patch optimizes one pattern
(64-bit elements interleaved by a factor of 4). Future patches will generalize
to additional patterns.
Patch by Farhana Aleen
Differential revision: http://reviews.llvm.org/D24681
llvm-svn: 284260
Use PackedRegisterRef to store the register information in the graph nodes.
This commit also removes support for virtual registers. It has never been
tested or used. It will be possible to add it back if there is a need.
llvm-svn: 284255
Add support for loading multiple coverage readers into a single
CoverageMapping instance. This should make it easier to prepare a
unified coverage report for multiple binaries.
Differential Revision: https://reviews.llvm.org/D25535
llvm-svn: 284251
This is an improvement when compiling with llvm. llvm doesn't inline
the call to insert, so the align is always executed and shows up in
the profile.
With gcc the call to insert is inlined and the align computation moved
and done only if needed.
With this patch we explicitly only compute it if it is needed.
In the two tests with debug info, the speedup was
scylla
master 3.008959365
patch 2.932080942 1.02621974786x faster
firefox
master 6.709823604
patch 6.592387227 1.01781393795x faster
In all others the difference was in the noise.
llvm-svn: 284249
This change adds transformations such as:
zext(or(setcc(eq, (cmp x, 0)), setcc(eq, (cmp y, 0))))
To:
srl(or(ctlz(x), ctlz(y)), log2(bitsize(x))
This optimisation is beneficial on Jaguar architecture only, where lzcnt has a good reciprocal throughput.
Other architectures such as Intel's Haswell/Broadwell or AMD's Bulldozer/PileDriver do not benefit from it.
For this reason the change also adds a "HasFastLZCNT" feature which gets enabled for Jaguar.
Differential Revision: https://reviews.llvm.org/D23446
llvm-svn: 284248
Prefer add/zext because they are better supported in terms of value-tracking.
Note that the backend should be prepared for this IR canonicalization
(including vector types) after:
https://reviews.llvm.org/rL284015
Differential Revision: https://reviews.llvm.org/D25135
llvm-svn: 284241
Summary:
This will be used for 64-bit MULHU, which is in turn used for the 64-bit
divide-by-constant optimization (see D24822).
Reviewers: arsenm, tstellarAMD
Subscribers: kzhuravl, wdng, yaxunl, llvm-commits, tony-tye
Differential Revision: https://reviews.llvm.org/D25289
llvm-svn: 284224
For compatiblity with binutils, define these instructions to take
two registers with a 16bit unsigned immediate. Both of the registers
have to be same for dahi and dati.
Reviewers: dsanders, zoran.jovanovic
Differential Review: https://reviews.llvm.org/D21473
llvm-svn: 284218
Committing in the name of Ziv Izhar: After check-all and LGTM .
The following patch is for compatability with Microsoft.
Microsoft ignores the keyword "short" when used after a jmp, for example:
__asm {
jmp short label
label:
}
A test for that patch will be added in another patch, since it's located in clang's codegen tests. Link will be added shortly.
link to test: https://reviews.llvm.org/D24958
Differential Revision: https://reviews.llvm.org/D24957
llvm-svn: 284211
This will be needed by a future commit to support sign/zero extending from v8i8 to v8i64 which requires a sign/zero_extend_vector_inreg to be created which requires v8i8 to be concatenated upto v64i8 and goes through this code.
llvm-svn: 284204
Summary:
This will be used by ThinLTO to set the amount of backend
parallelism, which performs better when restricted to the number
of physical cores (on X86 at least, where getHostNumPhysicalCores is
currently defined). If not available this falls back to
thread::hardware_concurrency.
Note I didn't add to the thread class since that is a typedef to
std::thread where available.
Reviewers: mehdi_amini
Subscribers: beanz, llvm-commits, mgorny
Differential Revision: https://reviews.llvm.org/D25585
llvm-svn: 284180
Windows itanium is identical to MSVC when dealing with everything but C++.
Lower the math routines into msvcrt rather than compiler-rt.
llvm-svn: 284175
Windows itanium is equivalent to MSVC except in C++ mode. Ensure that the
promote the 32-bit floating point operations to their 64-bit equivalences.
llvm-svn: 284173
Summary:
This operation is promoted the same way was ISD::BSWAP. This will
prevent a regression in test/Target/AMDGOU/bitreverse.ll when i16
support is implemented.
Reviewers: bogner, hfinkel
Subscribers: hfinkel, wdng, llvm-commits
Differential Revision: https://reviews.llvm.org/D25202
llvm-svn: 284163
the X86 subdirectory. Original commit message:
Requires a valid TargetMachine to be passed to the SafeStack pass.
Patch by Michael LeMay
Differential revision: http://reviews.llvm.org/D24896
llvm-svn: 284161
This option indicates copy relocations support is available from the linker
when building as PIE and allows accesses to extern globals to avoid the GOT.
Differential Revision: https://reviews.llvm.org/D24849
llvm-svn: 284160
Relax the constraint for empty live-ranges while doing last chance
recoloring. Indeed, those live-ranges do not need an actual color to be
fond for the recoloring to work.
Empty live-range may happen as a result of splitting/spilling.
Unfortunately no test case for in-tree targets.
llvm-svn: 284152
Retrying after upstream changes.
Simplify Consecutive Merge Store Candidate Search
Now that address aliasing is much less conservative, push through
simplified store merging search which only checks for parallel stores
through the chain subgraph. This is cleaner as the separation of
non-interfering loads/stores from the store-merging logic.
Whem merging stores, search up the chain through a single load, and
finds all possible stores by looking down from through a load and a
TokenFactor to all stores visited. This improves the quality of the
output SelectionDAG and generally the output CodeGen (with some
exceptions).
Additional Minor Changes:
1. Finishes removing unused AliasLoad code
2. Unifies the the chain aggregation in the merged stores across
code paths
3. Re-add the Store node to the worklist after calling
SimplifyDemandedBits.
4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is
arbitrary, but seemed sufficient to not cause regressions in
tests.
This finishes the change Matt Arsenault started in r246307 and
jyknight's original patch.
Many tests required some changes as memory operations are now
reorderable. Some tests relying on the order were changed to use
volatile memory operations
Noteworthy tests:
CodeGen/AArch64/argument-blocks.ll -
It's not entirely clear what the test_varargs_stackalign test is
supposed to be asserting, but the new code looks right.
CodeGen/AArch64/arm64-memset-inline.lli -
CodeGen/AArch64/arm64-stur.ll -
CodeGen/ARM/memset-inline.ll -
The backend now generates *worse* code due to store merging
succeeding, as we do do a 16-byte constant-zero store efficiently.
CodeGen/AArch64/merge-store.ll -
Improved, but there still seems to be an extraneous vector insert
from an element to itself?
CodeGen/PowerPC/ppc64-align-long-double.ll -
Worse code emitted in this case, due to the improved store->load
forwarding.
CodeGen/X86/dag-merge-fast-accesses.ll -
CodeGen/X86/MergeConsecutiveStores.ll -
CodeGen/X86/stores-merging.ll -
CodeGen/Mips/load-store-left-right.ll -
Restored correct merging of non-aligned stores
CodeGen/AMDGPU/promote-alloca-stored-pointer-value.ll -
Improved. Correctly merges buffer_store_dword calls
CodeGen/AMDGPU/si-triv-disjoint-mem-access.ll -
Improved. Sidesteps loading a stored value and
merges two stores
CodeGen/X86/pr18023.ll -
This test has been removed, as it was asserting incorrect
behavior. Non-volatile stores *CAN* be moved past volatile loads,
and now are.
CodeGen/X86/vector-idiv.ll -
CodeGen/X86/vector-lzcnt-128.ll -
It's basically impossible to tell what these tests are actually
testing. But, looks like the code got better due to the memory
operations being recognized as non-aliasing.
CodeGen/X86/win32-eh.ll -
Both loads of the securitycookie are now merged.
CodeGen/AMDGPU/vgpr-spill-emergency-stack-slot-compute.ll -
This test appears to work but no longer exhibits the spill behavior.
Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle
Subscribers: wdng, nhaehnle, nemanjai, arsenm, weimingz, niravd, RKSimon, aemerson, qcolombet, dsanders, resistor, tstellarAMD, t.p.northover, spatel
Differential Revision: https://reviews.llvm.org/D14834
llvm-svn: 284151
Summary:
For now I have only added support for x86_64 Linux, but other systems
can be added incrementally.
This is to be used for setting the default parallelism for ThinLTO
backends (instead of thread::hardware_concurrency which includes
hyperthreading and is too aggressive). I'll send this as a follow-on
patch, and it will fall back to hardware_concurrency when the new
getHostNumPhysicalCores returns -1 (when not supported for a given
host system).
I also added an interface to MemoryBuffer to force reading a file
as a stream - this is required for /proc/cpuinfo which is a special
file that looks like a normal file but appears to have 0 size.
The existing readers of this file in Host.cpp are reading the first
1024 or so bytes from it, because the necessary info is near the top.
But for the new functionality we need to be able to read the entire
file. I can go back and change the other readers to use the new
getFileAsStream as a follow-on patch since it seems much more robust.
Added a unittest.
Reviewers: mehdi_amini
Subscribers: beanz, mgorny, llvm-commits, modocache
Differential Revision: https://reviews.llvm.org/D25564
llvm-svn: 284138
In the MS ABI, the frontend is supposed to MD5 such pathologically long
names. LLVM should still defend itself from long names, though.
Fixes part of PR29098.
llvm-svn: 284136
We don't need to return a MachineInstr* from these stack probe insertion
calls anyway. If we ever need to add it back, we can return an iterator
instead.
Based on a patch by David Kreitzer
This bug is a consequence of
r279314 | dexonsmith | 2016-08-19 13:40:12 -0700 (Fri, 19 Aug 2016) | 110 lines
We hit the "Assertion `!NodePtr->isKnownSentinel()' failed" assertion,
but only when inserting a stack probe call at the end of an MBB, which
isn't necessarily a common situation.
Differential Revision: https://reviews.llvm.org/D25566
llvm-svn: 284130
This patch assigns cost of the scaling used in addressing.
On many ARM cores, a negated register offset takes longer than a
non-negated register offset, in a register-offset addressing mode.
For instance:
LDR R0, [R1, R2 LSL #2]
LDR R0, [R1, -R2 LSL #2]
Above, (1) takes less cycles than (2).
By assigning appropriate scaling factor cost, we enable the LLVM
to make the right trade-offs in the optimization and code-selection phase.
Differential Revision: http://reviews.llvm.org/D24857
Reviewers: jmolloy, rengolin
llvm-svn: 284127
This patch modifies the cost calculation of predicated instructions (div and
rem) to avoid the accumulation of rounding errors due to multiple truncating
integer divisions. The calculation for predicated stores will be addressed in a
follow-on patch since we currently don't scale the cost of predicated stores by
block probability.
Differential Revision: https://reviews.llvm.org/D25333
llvm-svn: 284123
Because everything live is spilled at the end of a
block by fast regalloc, assume this will happen and
avoid the copies of the resource descriptor.
llvm-svn: 284119
These instructions were only defined for microMIPSR6 previously. Add
definitions for MIPSR6, correct definitions for microMIPSR6, flag these
instructions as having unmodelled side effects (they disable/enable
virtual processors) and add missing disassember tests for microMIPSR6.
Reviewers: vkalintiris
Differential Review: https://reviews.llvm.org/D24291
llvm-svn: 284115
The Register Calling Convention (RegCall) was introduced by Intel to optimize parameter transfer on function call.
This calling convention ensures that as many values as possible are passed or returned in registers.
This commit presents the basic additions to LLVM CodeGen in order to support RegCall in X86.
Differential Revision: http://reviews.llvm.org/D25022
llvm-svn: 284108
We don't need to check if AVX is enabled. It's implied by the operation action being set to Custom.
We don't need to check both the input and output type widths. We only need to check the type that's being inserted or extracted. The other type is known to be a legal type and we can assume its a different width.
llvm-svn: 284102
This is with an extra change to avoid calling MemoryLocation::get() on a call instruction.
Differential Revision: https://reviews.llvm.org/D25542
llvm-svn: 284098
This allows RegBankSelect in greedy mode to get rid some of the cross
register bank copies when loads are involved in the chain of
computation.
llvm-svn: 284097
- Use storage class C_STAT for 'PrivateLinkage' The storage class for
PrivateLinkage should equal to the Internal Linkage.
- Set 'PrivateGlobalPrefix' from "L" to ".L" for MM_WinCOFF (includes
x86_64) MM_WinCOFF has empty GlobalPrefix '\0' so PrivateGlobalPrefix
"L" may conflict to the normal symbol name starting with 'L'.
Based on a patch by Han Sangjin! Manually updated test cases.
llvm-svn: 284096
This CL didn't actually address the test case in PR30499, and clang
still crashes.
Also revert dependent change "Memory-SSA cleanup of clobbers interface, NFC"
Reverts r283965 and r283967.
llvm-svn: 284093