This is recommit of r256028 with minor fixes in unittests:
CodeGen/Mips/eh.ll
CodeGen/Mips/insn-zero-size-bb.ll
Original commit message:
When identifying blocks post-dominated by an unreachable-terminated block
in BranchProbabilityInfo, consider only the edge to the normal destination
block if the terminator is InvokeInst and let calcInvokeHeuristics() decide
edge weights for the InvokeInst.
llvm-svn: 256202
This patch transforms truncation between vectors of integers into
X86ISD::PACKUS/PACKSS operations during DAG combine. We don't do it in
lowering phase because after type legalization, the original truncation
will be turned into a BUILD_VECTOR with each element that is extracted
from a vector and then truncated, and from them it is difficult to do
this optimization. This greatly improves the performance of truncations
on some specific types.
Cost table is updated accordingly.
Differential revision: http://reviews.llvm.org/D14588
llvm-svn: 256194
When targeting COFF, it is required that a comdat section to
have a global obj with the same name as the comdat (except for
comdats with select kind to be associative). This fix makes
sure that the comdat is keyed on the data variable for COFF.
Also improved test coverage for this.
llvm-svn: 256193
LiveDebugVariables unconditionally propagates all DBG_VALUE down the
dominator tree, which happens to work fine if there already is another
DBG_VALUE or the DBG_VALUE happends to describe a single-assignment vreg
but is otherwise wrong if the DBG_VALUE is coming from only one of the
predecessors.
In r255759 we introduced a proper data flow analysis scheduled after
LiveDebugVariables that correctly propagates DBG_VALUEs across basic block
boundaries. With the new pass in place, the incorrect propagation in
LiveDebugVariables can be retired witout loosing any of the benefits
where LiveDebugVariables happened to do the right thing.
llvm-svn: 256188
Summary:
These register has different encodings on CI and VI, so we add pseudo
FLAT_SCRACTH registers to be used before MC, and subtarget specific
registers to be used by the MC layer.
Reviewers: arsenm
Subscribers: arsenm, llvm-commits
Differential Revision: http://reviews.llvm.org/D15661
llvm-svn: 256178
This patch adds to the target description two additional patterns for matching
extract-extend operations to SMOV. The patterns catch the v16i8-to-i64 and
v8i16-to-i64 cases. The existing patterns miss these cases because the
extracted elements must first be legalized to i32, resulting in any_extend
nodes.
This was originally implemented as a DAG combine (r255895), but was reverted
due to failing out-of-tree tests.
llvm-svn: 256176
This allows the AsmMatcherEmitter to properly tokenize the AsmStrings for
load and store instructions. This is a step towards asm parsing.
llvm-svn: 256166
This fixes a bug introduced by the ThinLTO metadata linking patch
r255909. The assert is overly-strict and while useful in development of
the patch, doesn't seem interesting to keep.
Fixes PR25907.
llvm-svn: 256161
Disable post-ra scheduler for perturbed tests to appease the bots and to
preserve the history of the tests.
http://reviews.llvm.org/D15652
llvm-svn: 256158
Support for COFF timestamps was unintentionally broken in r246905 when
it was conditionally available depending on whether or not LLVM was
configured with LLVM_ENABLE_TIMESTAMPS. However, Config/config.h was
never included which essentially broke the feature. Due to lax testing,
the breakage was never identified until we observed strange failures
during incremental links of Chromium.
This issue is resolved by simply including Config/config.h in
WinCOFFObjectWriter and teaching lit that the MC/COFF/timestamp.s test
is conditionally supported depending on LLVM_ENABLE_TIMESTAMPS. With
this in place, we can strengthen the test to ensure that it will not
accidentally get broken in the future.
This fixes PR25891.
llvm-svn: 256137
It resolves clang selfhosting with std::once() for Cygwin.
FIXME: It may be EmulatedTLS-generic also for X86-Android.
FIXME: Pass EmulatedTLS to LLVM CodeGen from Clang with -femulated-tls.
llvm-svn: 256134
This allows "icmp ugt %a, 4294967295" and "icmp uge %a, 4294967296" to be optimized into right shifts by 32 which can fold the immediate into the shift instruction. These patterns show up with some regularity in real code.
Unfortunately, since getImmCost can't see the icmp predicate we can't be tell if we're only catching these specific cases.
llvm-svn: 256126
Summary:
This adds the core AVR TableGen file, along with the register descriptions.
Lines in AVR.td which require other TableGen files which haven't been committed
yet are commented out.
This is a fairly trivial patch, and should only require a quick review.
I kept the line width smaller than 80 columns, but there are a few exceptions
because I'm not sure how to split a string over several lines.
Reviewers: stoklund
Subscribers: dylanmckay, agnat
Differential Revision: http://reviews.llvm.org/D14684
llvm-svn: 256120
Summary:
r250697 fixed the mapping for ARM mode. We have to do the same for Thumb2 otherwise the same llvm.arm.ssat() will generate different saturating amount for ARM and Thumb.
r250697: http://reviews.llvm.org/rL250697
Reviewers: rmaprath
Subscribers: aemerson, llvm-commits, rengolin
Differential Revision: http://reviews.llvm.org/D15653
llvm-svn: 256115
With the support of value profiling added, the Indexed prof
reader gets less efficient. The prof reader initialization
used to be just reading the file header, but with VP support
added, initialization needs to walk through all profile keys
of ondisk hash table resulting in very poor locality and large
memory increase (keys are stored together with the profile data
in the mapped profile buffer). Even worse, when the reader is
used by the compiler (not llvm-profdata too), the penalty becomes
very high as compilation of each single module requires touching
profile data buffer for the whole program.
In this patch, the icall target values (MD5hash) are no longer eargerly
converted back to name strings when the data is read into memory. New
interface is added to to profile reader so that InstrProfSymtab can be
lazily created for Indexed profile reader on-demand. Creating of the
symtab is intended to be used by llvm-profdata tool for symbolic dumping
of VP data. It can be used with compiler (for legacy out of tree uses)
too but not recommended due to compile time and memory reasons
mentioned above.
Some other cleanups are also included: Function Addr to md5 map is now
consolated into InstrProfSymtab. InstrProfStringtab is no longer used and
eliminated.
llvm-svn: 256114
`CloneAndPruneIntoFromInst` sometimes RAUW's dead instructions with
`undef` before erasing them (to avoid deleting instructions that still
have uses). This changes the `WeakVH` in `OperandBundleCallSites` to
hold an `undef`, and we need to guard for this situation in eventuality
in `llvm::InlineFunction`.
llvm-svn: 256110
An error that is pretty easy to make is to use the lazy bitcode reader
and then do something like
if (V.use_empty())
The problem is that uses in unmaterialized functions are not accounted
for.
This patch adds asserts that all uses are known.
llvm-svn: 256105
Make personality functions, prefix data, and prologue data hungoff
operands of Function.
This is based on the email thread "[RFC] Clean up the way we store
optional Function data" on llvm-dev.
Thanks to sanjoyd, majnemer, rnk, loladiro, and dexonsmith for feedback!
Includes a fix to scrub value subclass data in dropAllReferences. Does not
use binary literals.
Differential Revision: http://reviews.llvm.org/D13829
llvm-svn: 256095
Make personality functions, prefix data, and prologue data hungoff
operands of Function.
This is based on the email thread "[RFC] Clean up the way we store
optional Function data" on llvm-dev.
Thanks to sanjoyd, majnemer, rnk, loladiro, and dexonsmith for feedback!
Includes a fix to scrub value subclass data in dropAllReferences.
Differential Revision: http://reviews.llvm.org/D13829
llvm-svn: 256093
Make personality functions, prefix data, and prologue data hungoff
operands of Function.
This is based on the email thread "[RFC] Clean up the way we store
optional Function data" on llvm-dev.
Thanks to sanjoyd, majnemer, rnk, loladiro, and dexonsmith for feedback!
Differential Revision: http://reviews.llvm.org/D13829
llvm-svn: 256090
Summary:
The analysis of shader inputs was completely wrong. We were passing the
wrong index to AttributeSet::hasAttribute() and the logic for which
inputs where in SGPRs was wrong too.
Reviewers: arsenm
Subscribers: arsenm, llvm-commits
Differential Revision: http://reviews.llvm.org/D15608
llvm-svn: 256082
As shown by the included test case, it's reasonable to end up with constant references during base pointer calculation. The code actually handled this case just fine, we only had the assert to help isolate problems under the belief that constant references shouldn't be present in IR generated by managed frontends. This turned out to be wrong on two fronts: 1) Manual Jacobs is working on a language with constant references, and b) we found a case where the optimizer does create them in practice.
llvm-svn: 256079
Summary:
First up is instcombine, where in the dbg.declare -> dbg.value conversion,
the llvm.dbg.value needs to be called on the actual loaded value, rather
than the address (since the whole point of this transformation is to be
able to get rid of the alloca). Further, now that that's cleaned up, we
can remove a hack in the backend, that would add an implicit OP_deref if
the argument to dbg.value was an alloca. This stems from before the
existence of DIExpression and is no longer necessary since the deref can
be expressed explicitly.
Now, in order to make sure that the tests pass with this change, we need to
correct the printing of DEBUG_VALUE comments to take into account the
expression, which wasn't taken into account before.
Unfortunately, for both these changes, there were a number of incorrect
test cases (mostly the wrong number of DW_OP_derefs, but also a couple
where the test itself was broken more badly). aprantl and I have gone
through and adjusted these test case in order to make them pass with
these fixes and in some cases to make sure they're actually testing
what they are meant to test.
Reviewers: aprantl
Subscribers: dsanders
Differential Revision: http://reviews.llvm.org/D14186
llvm-svn: 256077
noduplicate prevents unrolling of small loops that happen to have
barriers in them. If a loop has a barrier in it, it is OK to duplicate
it for the unroll.
llvm-svn: 256075
Summary:
When copying aggregate registers within the same register class, there may
be an overlap between source and destination that forces us to do the copy
backwards.
Do the simplest possible thing that guarantees the correct order of moves
when there are overlaps, and does whatever when there is no overlap. (The
last part forces some trivial adjustments to test cases.)
Together with r255906, this fixes a VM fault in Unreal Elemental Demo.
While at it, change the generation of kill and def flags to something that
looks more reasonable. This method is used very late during compilation, so
it probably doesn't matter in practice, and to be honest, I don't know if
this change is actually correct because the semantics in connection with
aggregate registers vs. sub-registers are not clear to me.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93264
Reviewers: arsenm, tstellarAMD
Subscribers: arsenm, llvm-commits
Differential Revision: http://reviews.llvm.org/D15622
llvm-svn: 256072
This deprecates:
* LLVMParseBitcode
* LLVMParseBitcodeInContext
* LLVMGetBitcodeModuleInContext
* LLVMGetBitcodeModule
They are replaced with the functions with a 2 suffix which do not record
a diagnostic.
llvm-svn: 256065
This code changes the way Symbolize handles parsed binaries: now
parsed OwningBinary<Binary> is not broken into (binary, memory buffer)
pair, and is just stored as-is in a cache. ObjectFile components
of Mach-O universal binaries are also stored explicitly in a
separate cache.
Additionally, this change:
* simplifies the code that parses/caches binaries: it's now done
in a single place, not three different functions.
* makes flush() method behave as expected, and actually clear
the cached parsed binaries and objects.
* fixes a dangling pointer issue described in
http://reviews.llvm.org/D15638
llvm-svn: 256041
This patch removes all getEdgeWeight() interfaces from CodeGen directory. As
getEdgeProbability() is a little more expensive than getEdgeWeight(), I will
compose a patch soon in which BPI only stores probabilities instead of edge
weights so that getEdgeProbability() will have O(1) time.
Differential revision: http://reviews.llvm.org/D15489
llvm-svn: 256039
Summary:
If Candiadte may have a different type from GEP, we should bitcast or
pointer cast it to GEP's type so that the later RAUW doesn't complain.
Added a test in nary-gep.ll
Reviewers: tra, meheff
Subscribers: mcrosier, llvm-commits, jholewinski
Differential Revision: http://reviews.llvm.org/D15618
llvm-svn: 256035
When identifying blocks post-dominated by an unreachable-terminated block
in BranchProbabilityInfo, consider only the edge to the normal destination
block if the terminator is InvokeInst and let calcInvokeHeuristics() decide
edge weights for the InvokeInst.
llvm-svn: 256028
This inlines materializeAll into the only caller
(materializeAllPermanently) and renames materializeAllPermanently to
just materializeAll.
llvm-svn: 256024
Renamed variables to be more reflective of whether they are
an instance of Linker, IRLinker or ModuleLinker. Also fix a stale
comment.
llvm-svn: 256011
LLVM MC has single methods which can handle the output of EH frame and DWARF CIE's and FDE's.
This code improves DWARFDebugFrame::parse to do the same for parsing.
This also allows llvm-objdump to support the --dwarf=frames option which objdump supports. This
option dumps the .eh_frame section using the new code in DWARFDebugFrame::parse.
http://reviews.llvm.org/D15535
Reviewed by Rafael Espindola.
llvm-svn: 256008
This change promotes load instructions which directly read from stores by
replacing them with mov instructions. If the store is wider than the load,
the load will be replaced with a bitfield extract.
For example :
STRWui %W1, %X0, 1
%W0 = LDRHHui %X0, 3
becomes
STRWui %W1, %X0, 1
%W0 = UBFMWri %W1, 16, 31
llvm-svn: 256004
Summary:
Third patch split out from http://reviews.llvm.org/D14752.
Only map in needed DISubroutine metadata (imported or otherwise linked
in functions and other DISubroutine referenced by inlined instructions).
This is supported for ThinLTO, LTO and llvm-link --only-needed, with
associated tests for each one.
Depends on D14838.
Reviewers: dexonsmith, joker.eph
Subscribers: davidxl, llvm-commits, joker.eph
Differential Revision: http://reviews.llvm.org/D14843
llvm-svn: 256003
We always create archives with just he filename as the member name, but
other archives can put a more complicated path in there.
This patches handles it by computing just the filename as we do when
adding a new member.
If storing the path is important for some reason, we should probably
have an orthogonal option for doing that and do it for both old and new
members.
Fixes pr25877.
llvm-svn: 256001
Summary:
1. Modify AnalyzeCallGraph() to retain function info for external functions
if the function has [InaccessibleMemOr]ArgMemOnly flags.
2. When analyzing the use of a global is function parameter at a call site,
mark the callee also as modifying the global appropriately.
3. Add additional test cases.
Depends on D15499
Reviewers: hfinkel, jmolloy
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D15605
llvm-svn: 255994
Inspired by the bug reported in 25846. Whatever we end up doing about that one, the value handle change is a generally good one since it will help catch this type of mistake more quickly.
Patch by: Manuel Jacob
llvm-svn: 255984
Type specific declarations have been moved to Type.h and error handling
routines have been moved to ErrorHandling.h. Both are included in Core.h
so nothing should change for projects directly including the headers,
but transitive dependencies may be affected.
llvm-svn: 255965
Use the 3-byte (4 with REX prefix) push-pop sequence for materializing
small constants. This is smaller than using a mov (5, 6 or 7 bytes
depending on size and REX prefix), but it's likely to be slower, so
only used for 'minsize'.
This is a follow-up to r255656.
Differential Revision: http://reviews.llvm.org/D15549
llvm-svn: 255936
This extends the same line of reasoning used in EarlyCSE w/http://reviews.llvm.org/D15352 to the DSE implementation in InstCombine.
Key points:
* We only remove unordered or simple stores.
* The loads producing values consumed by dead stores don't influence whether the store is dead.
Differential Revision: http://reviews.llvm.org/D15354
llvm-svn: 255932
Summary:
I didn't realize that we already allowed atomic load/store of pointers,
it was added in 2012 by r162146. This patch updates the documentation
and tightens the verifier by using DataLayout to make sure that the
stored size is byte-sized and power-of-two. DataLayout is also used for
integers, and while I'm here I updated the corresponding code for
cmpxchg and rmw.
See the following discussion for context and upcoming changes to
add floating-point and vector atomics:
https://groups.google.com/forum/#!topic/llvm-dev/Nh0P_E3CRoo/discussion
Reviewers: reames
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D15512
llvm-svn: 255931
The rules for removing trivially dead stores are a lot less complicated than loads. Since we know the later store post dominates the former and the former dominates the later, unless the former has side effects other than the actual store, we can remove it. One slightly surprising thing is that we can freely remove atomic stores, even if the later one isn't atomic. There's no guarantee the atomic one was every visible.
For the moment, we don't handle DSE of ordered atomic stores. We could extend the same chain of reasoning to them, but the catch is we'd then have to model the ordering effect without a store instruction. Since our fences are a stronger than our operation orderings, simple using a fence isn't an obvious win. This arguable calls for a refinement in our fence specification, but that's (much) later work.
Differential Revision: http://reviews.llvm.org/D15352
llvm-svn: 255914
Summary:
Second patch split out from http://reviews.llvm.org/D14752.
Maps metadata as a post-pass from each module when importing complete,
suturing up final metadata to the temporary metadata left on the
imported instructions.
This entails saving the mapping from bitcode value id to temporary
metadata in the importing pass, and from bitcode value id to final
metadata during the metadata linking postpass.
Depends on D14825.
Reviewers: dexonsmith, joker.eph
Subscribers: davidxl, llvm-commits, joker.eph
Differential Revision: http://reviews.llvm.org/D14838
llvm-svn: 255909
Summary:
The method insertNOPs expected the number of wait states to be passed as
parameter, while eliminateFrameIndex passed the immediate argument for the
S_NOP, leading to an off-by-one error. Rename the method to make the
meaning of its parameter clearer. The number of 4 / 5 wait states (which
is what the method has always _tried_ to do according to the comment) is
correct according to the hardware docs.
I stumbled upon this while trying to track down the cause of
https://bugs.freedesktop.org/show_bug.cgi?id=93264. While clearly needed,
this patch unfortunately does not fix that bug...
Reviewers: arsenm, tstellarAMD
Subscribers: arsenm, llvm-commits
Differential Revision: http://reviews.llvm.org/D15542
llvm-svn: 255906
Clang has better diagnostics in this case. It is not necessary therefore
to change the destructor to avoid what is effectively an invalid warning
in gcc. Instead, better handle the warning flags given to the compiler.
llvm-svn: 255905
These days relocations are created and stored in a deterministic way.
The order they are created is also suitable for the .o file, so we don't
need an explicit sort.
The last remaining exception is MIPS.
llvm-svn: 255902
This patch enables PostRAScheduler specifically for AArch64 generic build,
which is beneficial from the performance perspective.
Speedups up to 2 to 7% for some benchmarks on A57 and A53 are observed.
Also benchmarks from LLVM test-suite did not regress.
Differential Revision: http://reviews.llvm.org/D15557
llvm-svn: 255896
This patch adds a DAG combine for (any_extend (extract_vector_elt v, i)) ->
(extract_vector_elt v, i). The combine enables us to better match some SMOV
patterns.
Differential Revision: http://reviews.llvm.org/D15515
llvm-svn: 255895
Add option to enable/disable LEA optimization pass. By default the pass is disabled.
Differential Revision: http://reviews.llvm.org/D15573
llvm-svn: 255881
This creates the initial infrastructure for writing ELF output files. It
doesn't yet have any implementation for encoding instructions.
Differential Revision: http://reviews.llvm.org/D15555
llvm-svn: 255869
This is a quick fix to PR25838. The issue comes from the restriction that we
cannot normalize probabilities containing both known and unknown ones. A patch
that removes this restriction is under the review now:
http://reviews.llvm.org/D15548
llvm-svn: 255867
Introduce a new class InstrProfSymtab to abstract
the PGO symbol table for prof and coverage reader.
The symtab is is to lookup function's PGO name
using function keys. The first user of the class
is CoverageMapping Reader. More will follow.
llvm-svn: 255862
Summary:
Implement eliminateCallFramePsuedo to handle ADJCALLSTACKUP/DOWN
pseudo-instructions. Add a test calling a vararg function which causes non-0
adjustments. This revealed an issue with RegisterCoalescer wherein it
eliminates a COPY from SP32 to a vreg but failes to update the live ranges
of EXPR_STACK, causing a machineinstr verifier failure (so this test
is commented out).
Also add a dynamic alloca test, which causes a callseq_end dag node with
a 0 (instead of undef) second argument to be generated. We currently fail to
select that, so adjust the ADJCALLSTACKUP tablegen code to handle it.
Differential Revision: http://reviews.llvm.org/D15587
llvm-svn: 255844
It looks like the code this patch deletes is based on a misunderstanding of
what guarantees writev provides. In particular, writev with 1 iovec is
not "more atomic" than a write.
Testing on OS X shows that both write and writev from multiple processes
can be intermixed.
llvm-svn: 255837
This matches the other MIB methods, none of which modify the builder.
Without this, we can't chain copyImplicitOps.
Also reformat the few users, in PPCEarlyReturn.
llvm-svn: 255828
Summary: Surface counter overflow when merging profile data. Merging still occurs on overflow but counts saturate to the maximum representable value. Overflow is reported to the user.
Reviewers: davidxl, dnovillo, silvas
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D15547
llvm-svn: 255825
The access function has a short entry and a short exit, the initialization
block is only run the first time. To improve the performance, we want to
have a short frame at the entry and exit.
We explicitly handle most of the CSRs via copies. Only the CSRs that are not
handled via copies will be in CSR_SaveList.
Frame lowering and prologue/epilogue insertion will generate a short frame
in the entry and exit according to CSR_SaveList. The majority of the CSRs will
be handled by register allcoator. Register allocator will try to spill and
reload them in the initialization block.
We add CSRsViaCopy, it will be explicitly handled during lowering.
1> we first set FunctionLoweringInfo->SplitCSR if conditions are met (the target
supports it for the given machine function and the function has only return
exits). We also call TLI->initializeSplitCSR to perform initialization.
2> we call TLI->insertCopiesSplitCSR to insert copies from CSRsViaCopy to
virtual registers at beginning of the entry block and copies from virtual
registers to CSRsViaCopy at beginning of the exit blocks.
3> we also need to make sure the explicit copies will not be eliminated.
The target independent portion was committed as r255353.
rdar://problem/23557469
Differential Revision: http://reviews.llvm.org/D15341
llvm-svn: 255821
Update supportSplitCSR's interface to take machine function instead of the
calling convention.
Review comments for http://reviews.llvm.org/D15341
llvm-svn: 255818
As of r255720, the loop pass manager will DTRT when passes update the
loop info for removed loops, so they no longer need to reach into
LPPassManager APIs to do this kind of transformation. This change very
nearly removes the need for the LPPassManager to even be passed into
loop passes - the only remaining pass that uses the LPM argument is
LoopUnswitch.
llvm-svn: 255797
increase
Summary:
This patch adds a function called getRegPressureSetScore() to
TargetRegisterInfo. The MachineScheduler uses this when comparing
instruction that increase the register pressure of different sets
to determine which set is safer to increase.
This hook is useful for GPU targets where the number of registers in the
class is not the best metric for determing which presser set is safer to
increase.
Future work may include adding more parameters to this function, like
for example, the current pressure level of the set or the amount that
the pressure will be increased/decreased.
Reviewers: qcolombet, escha, arsenm, atrick, MatzeB
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D14806
llvm-svn: 255795
When considering incoming values as part of a reduction phi, ensure the
incoming value is dominated by said phi.
Failing to ensure this property causes miscompiles.
Fixes PR25787.
Many thanks to Mattias Eriksson for reporting, reducing and analyzing the
problem for me.
Differential Revision: http://reviews.llvm.org/D15580
llvm-svn: 255792
The SystemZ linkers provide an optimization to transform a general-
or local-dynamic TLS sequence into an initial-exec sequence if possible.
Do do that, the compiler generates a function call to __tls_get_offset,
which is a brasl instruction annotated with *two* relocations:
- a R_390_PLT32DBL to install __tls_get_offset as branch target
- a R_390_TLS_GDCALL / R_390_TLS_LDCALL to inform the linker that
the TLS optimization should be performed if possible
If the optimization is performed, the brasl is replaced by an ld load
instruction.
However, *both* relocs are processed independently by the linker.
Therefore it is crucial that the R_390_PLT32DBL is processed *first*
(installing the branch target for the brasl) and the R_390_TLS_GDCALL
is processed *second* (replacing the whole brasl with an ld).
If the relocs are swapped, the linker will first replace the brasl
with an ld, and *then* install the __tls_get_offset branch target
offset. Since ld has a different layout than brasl, this may even
result in a completely different (or invalid) instruction; in any
case, the resulting code is corrupted.
Unfortunately, the way the MC common code sorts relocations causes
these two to *always* end up the wrong way around, resulting in
wrong code generation by the linker and crashes.
This patch overrides the sortRelocs routine to detect this particular
pair of relocs and enforce the required order.
llvm-svn: 255787
When comparing a zero-extended value against a constant small enough to
be in range of the inner type, it doesn't matter whether a signed or
unsigned compare operation (for the outer type) is being used. This is
why the code in adjustSubwordCmp had this assertion:
assert(C.ICmpType == SystemZICMP::Any &&
"Signedness shouldn't matter here.");
assuming the the caller had already detected that fact. However, it
turns out that there cases, in particular with always-true or always-
false conditions that have not been eliminated when compiling at -O0,
where this is not true.
Instead of failing an assertion if C.ICmpType is not SystemZICMP::Any
here, we can simply *set* it safely to SystemZICMP::Any, however.
llvm-svn: 255786
This removes an unpleasant hack involving a global variable for special
lowering of certain memcpy calls. These are now lowered as intended in
EmitTargetCodeForMemcpy in the same way that other targets do it.
llvm-svn: 255785
Add a function VLIWPacketizerList::shouldAddToPacket, which will allow
specific implementations to decide if it is profitable to add given
instruction to the current packet.
llvm-svn: 255780
Summary:
This patch introduces two new function attributes
InaccessibleMemOnly: This attribute indicates that the function may only access memory that is not accessible by the program/IR being compiled. This is a weaker form of ReadNone.
inaccessibleMemOrArgMemOnly: This attribute indicates that the function may only access memory that is either not accessible by the program/IR being compiled, or is pointed to by its pointer arguments. This is a weaker form of ArgMemOnly
Test cases have been updated. This revision uses this (d001932f3a) as reference.
Reviewers: jmolloy, hfinkel
Subscribers: reames, joker.eph, llvm-commits
Differential Revision: http://reviews.llvm.org/D15499
llvm-svn: 255778
In conditional store merging, we were creating PHIs when we didn't
need to. If the value to be predicated isn't defined in the block
we're predicating, then it doesn't need a PHI at all (because we only
deal with triangles and diamonds, any value not in the predicated BB
must dominate the predicated BB).
This fixes a large code size increase in some benchmarks in a popular embedded benchmark suite.
Now with a fix (and fixed tests) for the conformance issue seen in Chromium.
llvm-svn: 255767
ARMv8.2-A adds 16-bit floating point versions of all existing SIMD
floating-point instructions. This is an optional extension, so all of
these instructions require the FeatureFullFP16 subtarget feature.
Note that VFP without SIMD is not a valid combination for any version of
ARMv8-A, but I have ensured that these instructions all depend on both
FeatureNEON and FeatureFullFP16 for consistency.
Differential Revision: http://reviews.llvm.org/D15039
llvm-svn: 255764
ARMv8.2-A adds 16-bit floating point versions of all existing VFP
floating-point instructions. This is an optional extension, so all of
these instructions require the FeatureFullFP16 subtarget feature.
The assembly for these instructions uses S registers (AArch32 does not
have H registers), but the instructions have ".f16" type specifiers
rather than ".f32" or ".f64". The top 16 bits of each source register
are ignored, and the top 16 bits of the destination register are set to
zero.
These instructions are mostly the same as the 32- and 64-bit versions,
but they use coprocessor 9 rather than 10 and 11.
Two new instructions, VMOVX and VINS, have been added to allow packing
and extracting two 16-bit floats stored in the top and bottom halves of
an S register.
New fixup kinds have been added for the PC-relative load and store
instructions, but no ELF relocations have been added as they have a
range of 512 bytes.
Differential Revision: http://reviews.llvm.org/D15038
llvm-svn: 255762
This folds (ashr (shl a, [56,48,32,24,16]), SarConst)
into (shl, (sext (a), [56,48,32,24,16] - SarConst))
or into (lshr, (sext (a), SarConst - [56,48,32,24,16]))
depending on sign of (SarConst - [56,48,32,24,16])
sexts in X86 are MOVs.
The MOVs have the same code size as above SHIFTs (only SHIFT by 1 has lower code size).
However the MOVs have 2 advantages to SHIFTs on x86:
1. MOVs can write to a register that differs from source.
2. MOVs accept memory operands.
This fixes PR24373.
Patch by: evgeny.v.stupachenko@intel.com
Differential Revision: http://reviews.llvm.org/D13161
llvm-svn: 255761
Summary: On Windows, the allocation granularity can be significantly
larger than a page (64K), so with many small objects, just clearing
the FreeMem list rapidly leaks quite a bit of virtual memory space
(if not rss). Fix that by only removing those parts of the FreeMem
blocks that overlap pages for which we are applying memory permissions,
rather than dropping the FreeMem blocks entirely.
Reviewers: lhames
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D15202
llvm-svn: 255760
Summary: This patch adds a check in visitLandingPad to see if landingpad's result type is token type. If so, do not create DAG nodes for its exception pointer and selector value. This patch enables the back end to handle landingpads of token type.
Reviewers: JosephTremoulet, majnemer, rnk
Subscribers: sanjoy, llvm-commits
Differential Revision: http://reviews.llvm.org/D15405
llvm-svn: 255749
Newer versions of libstdc++ (4.9+), as well as libc++, depend directly on
libpthread from the standard library headers, so libfuzzer needs to declare
a standard library dependency.
llvm-svn: 255745
Extend EarlyCSE with an additional style of dead store elimination. If we write back a value just read from that memory location, we can eliminate the store under the assumption that the value hasn't changed.
I'm implementing this mostly because I noticed the omission when looking at the code. It seemed strange to have InstCombine have a peephole which was more powerful than EarlyCSE. :)
Differential Revision: http://reviews.llvm.org/D15397
llvm-svn: 255739
This patch allows atomic loads and stores of floating point to be specified in the IR and adds an adapter to allow them to be lowered via existing backend support for bitcast-to-equivalent-integer idiom.
Previously, the only way to specify a atomic float operation was to bitcast the pointer to a i32, load the value as an i32, then bitcast to a float. At it's most basic, this patch simply moves this expansion step to the point we start lowering to the backend.
This patch does not add canonicalization rules to convert the bitcast idioms to the appropriate atomic loads. I plan to do that in the future, but for now, let's simply add the support. I'd like to get instruction selection working through at least one backend (x86-64) without the bitcast conversion before canonicalizing into this form.
Similarly, I haven't yet added the target hooks to opt out of the lowering step I added to AtomicExpand. I figured it would more sense to add those once at least one backend (x86) was ready to actually opt out.
As you can see from the included tests, the generated code quality is not great. I plan on submitting some patches to fix this, but help from others along that line would be very welcome. I'm not super familiar with the backend and my ramp up time may be material.
Differential Revision: http://reviews.llvm.org/D15471
llvm-svn: 255737
When a pass removes a loop it currently has to reach up into the
LPPassManager's internals to update the state of the iteration over
loops. This reverse dependency results in a pretty awkward interplay
of the LPPassManager and its Passes.
Here, we change this to instead keep track of when a loop has become
"unlooped" in the Loop objects themselves, then the LPPassManager can
check this and manipulate its own state directly. This opens the door
to allow most of the loop passes to work without a backreference to
the LPPassManager.
I've kept passes calling the LPPassManager::deleteLoopFromQueue API
now so I could put an assert in to prove that this is NFC, but a later
pass will update passes just to preserve the LoopInfo directly and
stop referencing the LPPassManager completely.
llvm-svn: 255720
It adjusts from RSP-after-prologue to RBP, which is what SEH filters
need to do before they can use llvm.localrecover.
Fixes SEH filter captures, which were broken in r250088.
Issue reported by Alex Crichton.
llvm-svn: 255707
This patch improves on the suggested codegen from PR24475:
https://llvm.org/bugs/show_bug.cgi?id=24475
but only for the fmaxf() case to start, so we can sort out any bugs before
extending to fmin, f64, and vectors.
The fmax / maxnum definitions provide us flexibility for signed zeros, so the
only thing we have to worry about in this replacement sequence is NaN handling.
Note 1: It may be better to implement this as lowerFMAXNUM(), but that exposes
a problem: SelectionDAGBuilder::visitSelect() transforms compare/select
instructions into FMAXNUM nodes if we declare FMAXNUM legal or custom. Perhaps
that should be checking for NaN inputs or global unsafe-math before transforming?
As it stands, that bypasses a big set of optimizations that the x86 backend
already has in PerformSELECTCombine().
Note 2: The v2f32 test reveals another bug; the vector is extended to v4f32, so
we have completely unnecessary operations happening on undef elements of the
vector.
Differential Revision: http://reviews.llvm.org/D15294
llvm-svn: 255700
An LTO pass that generates a __cfi_check() function that validates a
call based on a hash of the call-site-known type and the target
pointer.
llvm-svn: 255693
Summary: I'm not sure how things worked before without this.
Reviewers: arsenm
Subscribers: arsenm, llvm-commits
Differential Revision: http://reviews.llvm.org/D15492
llvm-svn: 255692
(This is the third attempt to check in this patch, and the first two are r255454
and r255460. The once failed test file reg-usage.ll is now moved to
test/Transform/LoopVectorize/X86 directory with target datalayout and target
triple indicated.)
LoopVectorizationCostModel::calculateRegisterUsage() is used to estimate the
register usage for specific VFs. However, it takes into account many
instructions that won't be vectorized, such as induction variables,
GetElementPtr instruction, etc.. This makes the loop vectorizer too conservative
when choosing VF. In this patch, the induction variables that won't be
vectorized plus GetElementPtr instruction will be added to ValuesToIgnore set
so that their register usage won't be considered any more.
Differential revision: http://reviews.llvm.org/D15177
llvm-svn: 255691
Add instruction patterns for matching load and store instructions with constant
offsets in addresses. The code is fairly redundant due to the need to replicate
everything between imm, tglobaldadr, and texternalsym, but this appears to be
common tablegen practice. The main alternative appears to be to introduce
matching functions with C++ code, but sticking with purely generated matchers
seems better for now.
Also note that this doesn't yet support offsets from getelementptr, which will
be the most common case; that will depend on a change in target-independent code
in order to set the NoUnsignedWrap flag, which I'll submit separately. Until
then, the testcase uses ptrtoint+add+inttoptr with a nuw on the add.
Also implement isLegalAddressingMode with an approximation of this.
Differential Revision: http://reviews.llvm.org/D15538
llvm-svn: 255681
SimplifyCFG allows tail merging with code which terminates in
unreachable which, in turn, makes it possible for an invoke to end up in
a funclet which it was not originally part of.
Using operand bundles on invokes allows us to determine whether or not
an invoke was part of a funclet in the source program.
Furthermore, it allows us to unambiguously answer questions about the
legality of inlining into call sites which the personality may have
trouble with.
Differential Revision: http://reviews.llvm.org/D15517
llvm-svn: 255674
Summary:
We were previously selecting all constant loads to SMRD instructions and legalizing
the SMRDs with non-uniform addresses during the SIFixSGPRCopesPass.
This new solution is more simple and also generates much better code, because
the instruction selector is able to take advantage of all the MUBUF addressing
modes that are legalization pass wasn't able to.
We also no longer need to generate v_add_* instructions when we
have a uniform pointer and a non-uniform offset, as this is now folded into the
MUBUF instruction during instruction selection.
Reviewers: arsenm
Subscribers: arsenm, llvm-commits
Differential Revision: http://reviews.llvm.org/D15425
llvm-svn: 255672
A large number of loop utility functions take a `Pass *` and reach
into it to find out which analyses to preserve. There are a number of
problems with this:
- The APIs have access to pretty well any Pass state they want, so
it's hard to tell what they may or may not do.
- Other APIs have copied these and pass around a `Pass *` even though
they don't even use it. Some of these just hand a nullptr to the API
since the callers don't even have a pass available.
- Passes in the new pass manager don't work like the current ones, so
the APIs can't be used as is there.
Instead, we should explicitly thread the analysis results that we
actually care about through these APIs. This is both simpler and more
reusable.
llvm-svn: 255669
On SparcV8, doubles get passed in two 32-bit integer registers. The call
code was already handling endianness correctly, but the incoming
argument code was not -- it got the two halves in opposite order.
Also remove some dead code in LowerFormalArguments_32 to handle
less-than-32bit values, which can't actually happen.
Finally, add some test cases for the 32-bit calling convention, cribbed
from the 64abi.ll test, and run for both big and little-endian.
llvm-svn: 255668
We only want to emit CFI adjustments when actually using DWARF.
This fixes PR25828.
Differential Revision: http://reviews.llvm.org/D15522
llvm-svn: 255664
This is the last general step to allow more IR-level speculation with a safety harness in place in CodeGenPrepare.
The intent is to restore the behavior enabled by:
http://reviews.llvm.org/rL228826
but prevent bad performance such as:
https://llvm.org/bugs/show_bug.cgi?id=24818
Earlier patches in this sequence:
D12882 (disable SimplifyCFG speculation for expensive instructions)
D13297 (have CGP despeculate expensive ops)
D14630 (have CGP despeculate special versions of cttz/ctlz)
As shown in the test cases, we only have two instructions currently affected: ctz for some x86 and fdiv generally.
Allowing exactly one expensive instruction is a bit of a hack, but it lines up with what is currently implemented
in CGP. If we make the despeculation more general in CGP, we can make the speculation here more liberal.
A follow-up patch will adjust the cost for sqrt and possibly other typically expensive math intrinsics (currently
everything is cheap by default). GPU targets would likely want to override those expensive default costs (just as
they probably should already override the cost of div/rem) because just about any math is cheaper than control-flow
on those targets.
Differential Revision: http://reviews.llvm.org/D15213
llvm-svn: 255660
Summary:
This change adds support for specifying a weight when merging profile data with the llvm-profdata tool.
Weights are specified by using the --weighted-input=<weight>,<filename> option. Input files not specified
with this option (normal positional list after options) are given a default weight of 1.
Adding support for arbitrary weighting of input profile data allows for relative importance to be placed on the
input data from multiple training runs.
Both sampled and instrumented profiles are supported.
Reviewers: davidxl, dnovillo, bogner, silvas
Subscribers: silvas, davidxl, llvm-commits
Differential Revision: http://reviews.llvm.org/D15306
llvm-svn: 255659
Summary:
The LibCallSimplifier will turn llvm.exp2.* intrinsics into ldexp* libcalls
which do not make sense with the AMDGPU backend.
In the long run, we'll want an llvm.ldexp.* intrinsic to properly make use of
this optimization, but this works around the problem for now.
See also: http://reviews.llvm.org/D14327 (suggested llvm.ldexp.* implementation)
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92709
Reviewers: arsenm, tstellarAMD
Differential Revision: http://reviews.llvm.org/D14990
llvm-svn: 255658
The radeonsi fp64 support can hit these now that some redundant bitcasts
are folded.
Patch by: Michel Dänzer
Signed-off-by: Michel Dänzer <michel.daenzer@amd.com>
llvm-svn: 255657
"movl $-1, %eax" is 5 bytes, "xorl %eax, %eax; decl %eax" is 3 bytes.
This commit makes LLVM use the latter when optimizing for size.
Differential Revision: http://reviews.llvm.org/D14971
llvm-svn: 255656
Summary:
These are meant to be used instead of the llvm.SI.tid intrinsic which will
be deprecated at some point.
Reviewers: arsenm
Subscribers: arsenm, llvm-commits
Differential Revision: http://reviews.llvm.org/D15475
llvm-svn: 255652
Summary:
These are meant to be used instead of the llvm.SI.fs.interp intrinsic which
will be deprecated at some point.
Reviewers: arsenm
Subscribers: arsenm, llvm-commits
Differential Revision: http://reviews.llvm.org/D15474
llvm-svn: 255651
the feature flag is essential for RDPKRU and WRPKRU instruction
more about the instruction can be found in the SDM rev 56, vol 2 from http://www.intel.com/sdm
Differential Revision: http://reviews.llvm.org/D15491
llvm-svn: 255644
It appears that neither compiler-rt nor the gnu soft-float libraries actually
implement these conversions. Instead of emitting calls to library functions
that don't exist, handle it similarly to the way we handle i8 -> float and
i16 -> float conversions: call the i32 library function, and adjust the type.
Differential Revision: http://reviews.llvm.org/D15151
llvm-svn: 255643
This patch corresponds to review:
http://reviews.llvm.org/D15117
In preparation for supporting IEEE Quad precision floating point,
this patch simply defines a feature to specify the target supports this.
For now, nothing is done with the target feature, we just don't want
warnings from the Clang FE when a user specifies -mfloat128.
Calling convention and other related work will add to this patch in
the near future.
llvm-svn: 255642
This patch improves a temporary fix in r255530 so that we can normalize
successor list without trigger assertion failures in tail duplication pass.
llvm-svn: 255638
This patch does two things:
1. mem2reg is now run immediately after globalopt. Now that globalopt
can localize variables more aggressively, it makes sense to lower
them to SSA form earlier rather than later so they can benefit from
the full set of optimization passes.
2. More scalar optimizations are run after the loop optimizations in
LTO mode. The loop optimizations (especially indvars) can clean up
scalar code sufficiently to make it worthwhile running more scalar
passes. I've particularly added SCCP here as it isn't run anywhere
else in the LTO pass pipeline.
Mem2reg is super cheap and shouldn't affect compilation time at all. The
rest of the added passes are in the LTO pipeline only so doesn't affect
the vast majority of compilations, just the link step.
llvm-svn: 255634
Full type legalizer that works with all vectors length - from 2 to 16, (i32, i64, float, double).
This intrinsic, for example
void @llvm.masked.scatter.v2f32(<2 x float>%data , <2 x float*>%ptrs , i32 align , <2 x i1>%mask )
requires type widening for data and type promotion for mask.
Differential Revision: http://reviews.llvm.org/D13633
llvm-svn: 255629
I believe in one place we were always casting to ExtractValueConstantExpr when we were trying to choose between ExtractValueConstantExpr and InsertValueConstantExpr because of this. But since they have identical layouts this didn't cause any observable problems.
llvm-svn: 255624
The post-dominance property is not sufficient to guarantee that a restore point
inside a loop is safe.
E.g.,
while(1) {
Save
Restore
if (...)
break;
use/def CSRs
}
All the uses/defs of CSRs are dominated by Save and post-dominated
by Restore. However, the CSRs uses are still reachable after
Restore and before Save are executed.
This fixes PR25824
llvm-svn: 255613
For non padded structs, we can just proceed and deaggregate them.
We don't want ot do this when there is padding in the struct as to not
lose information about this padding (the subsequents passes would then
try hard to preserve the padding, which is undesirable).
Also update extractvalue.ll and cast.ll so that they use structs with padding.
Remove the FIXME in the extractvalue of laod case as the non padded case is
handled when processing the load, and we don't want to do it on the padded
case.
Patch by: Amaury SECHET <deadalnix@gmail.com>
Differential Revision: http://reviews.llvm.org/D14483
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 255600
This is a very simple implementation of a thread pool using C++11
thread. It accepts any std::function<void()> for asynchronous
execution. Individual task can be synchronize using the returned
future, or the client can block on the full queue completion.
In case LLVM is configured with Threading disabled, it falls back
to sequential execution using std::async with launch:deferred.
This is intended to support parallelism for ThinLTO processing in
linker plugin, but is generic enough for any other uses.
This is a recommit of r255444 ; trying to workaround a bug in the
MSVC 2013 standard library. I think I was hit by:
http://connect.microsoft.com/VisualStudio/feedbackdetail/view/791185/std-packaged-task-t-where-t-is-void-or-a-reference-class-are-not-movable
Recommit of r255589, trying to please g++ as well.
Differential Revision: http://reviews.llvm.org/D15464
From: mehdi_amini <mehdi_amini@91177308-0d34-0410-b5e6-96231b3b80d8>
llvm-svn: 255593
This is a very simple implementation of a thread pool using C++11
thread. It accepts any std::function<void()> for asynchronous
execution. Individual task can be synchronize using the returned
future, or the client can block on the full queue completion.
In case LLVM is configured with Threading disabled, it falls back
to sequential execution using std::async with launch:deferred.
This is intended to support parallelism for ThinLTO processing in
linker plugin, but is generic enough for any other uses.
This is a recommit of r255444 ; trying to workaround a bug in the
MSVC 2013 standard library. I think I was hit by:
http://connect.microsoft.com/VisualStudio/feedbackdetail/view/791185/std-packaged-task-t-where-t-is-void-or-a-reference-class-are-not-movable
Differential Revision: http://reviews.llvm.org/D15464
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 255589
Prior to this patch, we would wrongly stick to the variant with imm8 encoding
even when the relocation could not fit that size.
rdar://problem/23785506
llvm-svn: 255583
This moves the actual work to do loop rotation into standalone
functions with the analysis results they need passed in as arguments,
leaving the class itself as a relatively simple shim. This will make
the functions easy to reuse when we're ready to port this
transformation to the new pass manager.
llvm-svn: 255574
This just moves some callers after their callees. My next patch will
convert some of these methods to stand alone functions, and that diff
is more obviously NFC if I move these first. That change, in turn,
will make it much easier to port this pass to the new pass manager
once the loop pass manager is in place.
llvm-svn: 255573
This patch converts code that has access to a LLVMContext to not take a
diagnostic handler.
This has a few advantages
* It is easier to use a consistent diagnostic handler in a single program.
* Less clutter since we are not passing a handler around.
It does make it a bit awkward to implement some C APIs that return a
diagnostic string. I will propose new versions of these APIs and
deprecate the current ones.
llvm-svn: 255571
Prior to this patch, we would wrongly stick to the variant with imm8 encoding
even when the relocation could not fit that size.
rdar://problem/23785506
llvm-svn: 255570
Add return type information to call and call_indirect instructions. This
allows them to be disambiguated without knowledge of the callee.
Differential Revision: http://reviews.llvm.org/D15484
llvm-svn: 255565
Implement a new BLOCK scope placement algorithm which better handles
early-return blocks and early exists from nested scopes.
Differential Revision: http://reviews.llvm.org/D15368
llvm-svn: 255564
Part 1 was submitted in http://reviews.llvm.org/D15134.
Changes in this part:
* X86RegisterInfo.td, X86RecognizableInstr.cpp: Add FR128 register class.
* X86CallingConv.td: Pass f128 values in XMM registers or on stack.
* X86InstrCompiler.td, X86InstrInfo.td, X86InstrSSE.td:
Add instruction selection patterns for f128.
* X86ISelLowering.cpp:
When target has MMX registers, configure MVT::f128 in FR128RegClass,
with TypeSoftenFloat action, and custom actions for some opcodes.
Add missed cases of MVT::f128 in places that handle f32, f64, or vector types.
Add TODO comment to support f128 type in inline assembly code.
* SelectionDAGBuilder.cpp:
Fix infinite loop when f128 type can have
VT == TLI.getTypeToTransformTo(Ctx, VT).
* Add unit tests for x86-64 fp128 type.
Differential Revision: http://reviews.llvm.org/D11438
llvm-svn: 255558
This patch adds optional fast-math-flags (the same that apply to fmul/fadd/fsub/fdiv/frem/fcmp)
to call instructions in IR. Follow-up patches would use these flags in LibCallSimplifier, add
support to clang, and extend FMF to the DAG for calls.
Motivating example:
%y = fmul fast float %x, %x
%z = tail call float @sqrtf(float %y)
We'd like to be able to optimize sqrt(x*x) into fabs(x). We do this today using a function-wide
attribute for unsafe-math, but we really want to trigger on the instructions themselves:
%z = tail call fast float @sqrtf(float %y)
because in an LTO build it's possible that calls with fast semantics have been inlined into a
function with non-fast semantics.
The code changes and tests are based on the recent commits that added "notail":
http://reviews.llvm.org/rL252368
and added FMF to fcmp:
http://reviews.llvm.org/rL241901
Differential Revision: http://reviews.llvm.org/D14707
llvm-svn: 255555
The WebAssemblyStoreResults pass runs before LiveVariables, so it doesn't
expect to have to keep dead flags up to date; check this with an assert.
llvm-svn: 255551
This will make the depedence graph more accurate if an alias analysis
is provided. If nullptr is specified in its place, the behavior will
remain as it is currently.
llvm-svn: 255540
Make sure to check that the destination type is sized.
A check was present but was incorrectly checking the source type
instead.
Patch by Amaury SECHET!
Differential Revision: http://reviews.llvm.org/D15264
llvm-svn: 255536
The normalization may cause assertion failures on SystemZ and some out-of-tree
tests. The root cause is that unknown probabilities are materialized into known
ones by calling getSuccProbability(), which is then used to add another
successor to the same MBB which results in mixed known and unknown
probabilities. But currently those mixed probabilities cannot be normalized.
I will compose another patch to fix the root issue.
llvm-svn: 255530
This patch adds the missing functionality in parsable
text format support for value profiling.
Differential Revision: http://reviews.llvm.org/D15212
llvm-svn: 255523
It turns out that terminatepad gives little benefit over a cleanuppad
which calls the termination function. This is not sufficient to
implement fully generic filters but MSVC doesn't support them which
makes terminatepad a little over-designed.
Depends on D15478.
Differential Revision: http://reviews.llvm.org/D15479
llvm-svn: 255522
When FastISel fails to translate an instruction it hands off code
generation to SelectionDAG. Before it does so, it may have generated
local value instructions to feed phi nodes in successor blocks. These
instructions will then be generated again by SelectionDAG, causing
duplication and less efficient code, including extra spill
instructions.
Patch by Wolfgang Pieb!
Differential Revision: http://reviews.llvm.org/D11768
llvm-svn: 255520
This is the second in a set of patches for soft float support for ppc32,
it enables soft float operations.
Patch by Strahinja Petrovic.
Differential Revision: http://reviews.llvm.org/D13700
llvm-svn: 255516
This patch add support for variadic argument for AArch64. All the MSAN
unit tests are not passing as well the signal_stress_test (currently
set as XFAIl for aarch64).
llvm-svn: 255495
In conditional store merging, we were creating PHIs when we didn't
need to. If the value to be predicated isn't defined in the block
we're predicating, then it doesn't need a PHI at all (because we only
deal with triangles and diamonds, any value not in the predicated BB
must dominate the predicated BB).
This fixes a large code size increase in some benchmarks in a popular embedded benchmark suite.
llvm-svn: 255489
The .even directive aligns content to an evan-numbered address.
In at&t syntax .even
In Microsoft syntax even (without the dot).
Differential Revision: http://reviews.llvm.org/D15413
llvm-svn: 255462
(This is the second attempt to check in this patch: REQUIRES: asserts is added
to reg-usage.ll now.)
LoopVectorizationCostModel::calculateRegisterUsage() is used to estimate the
register usage for specific VFs. However, it takes into account many
instructions that won't be vectorized, such as induction variables,
GetElementPtr instruction, etc.. This makes the loop vectorizer too conservative
when choosing VF. In this patch, the induction variables that won't be
vectorized plus GetElementPtr instruction will be added to ValuesToIgnore set
so that their register usage won't be considered any more.
Differential revision: http://reviews.llvm.org/D15177
llvm-svn: 255460
This patch adds some missing calls to MBB::normalizeSuccProbs() in several
locations where it should be called. Those places are found by checking if the
sum of successors' probabilities is approximate one in MachineBlockPlacement
pass with some instrumented code (not in this patch).
Differential revision: http://reviews.llvm.org/D15259
llvm-svn: 255455
LoopVectorizationCostModel::calculateRegisterUsage() is used to estimate the
register usage for specific VFs. However, it takes into account many
instructions that won't be vectorized, such as induction variables,
GetElementPtr instruction, etc.. This makes the loop vectorizer too conservative
when choosing VF. In this patch, the induction variables that won't be
vectorized plus GetElementPtr instruction will be added to ValuesToIgnore set
so that their register usage won't be considered any more.
Differential revision: http://reviews.llvm.org/D15177
llvm-svn: 255454
EABI attributes should only be emitted on EABI targets. This prevents the
emission of the optimization goals EABI attribute on Windows ARM.
llvm-svn: 255448
This is a very simple implementation of a thread pool using C++11
thread. It accepts any std::function<void()> for asynchronous
execution. Individual task can be synchronize using the returned
future, or the client can block on the full queue completion.
In case LLVM is configured with Threading disabled, it falls back
to sequential execution using std::async with launch:deferred.
This is intended to support parallelism for ThinLTO processing in
linker plugin, but is generic enough for any other uses.
Differential Revision: http://reviews.llvm.org/D15464
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 255444
Summary:
Previously SelectionDAGBuilder asserted that the pointer operands of
memcpy / memset / memmove intrinsics are in address space < 256. This assert
implicitly assumed the X86 backend, where all address spaces < 256 are
equivalent to address space 0 from the code generator's point of view. On some
targets (R600 and NVPTX) several address spaces < 256 have a target-defined
meaning, so this assert made little sense for these targets.
This patch removes this wrong assertion and adds extra checks before lowering
these intrinsics to library calls. If a pointer operand can't be casted to
address space 0 without changing semantics, a fatal error is reported to the
user.
The new behavior should be valid for all targets that give address spaces != 0
a target-specified meaning (NVPTX, R600, X86). NVPTX lowers big or
variable-sized memory intrinsics before SelectionDAG construction. All other
memory intrinsics are inlined (the threshold is set very high for this target).
R600 doesn't support memcpy / memset / memmove library calls (previously the
illegal emission of a call to such library function triggered an error
somewhere in the code generator). X86 now emits inline loads and stores for
address spaces 256 and 257 up to the same threshold that is used for address
space 0 and reports a fatal error otherwise.
I call this a "partial fix" because there are still cases that can't be
lowered. A fatal error is reported in these cases.
Reviewers: arsenm, theraven, compnerd, hfinkel
Subscribers: hfinkel, llvm-commits, alex
Differential Revision: http://reviews.llvm.org/D7241
llvm-svn: 255441
Before the patch, -fprofile-instr-generate compile will fail
if no integrated-as is specified when the file contains
any static functions (the -S output is also invalid).
This is the second try. The fix in this patch is very localized.
Only profile symbol names of profile symbols with internal
linkage are fixed up while initializer of name syms are not
changes. This means there is no format change nor version bump.
llvm-svn: 255434
This change was discussed in D15392. It allows us to remove the fold that was added
in:
http://reviews.llvm.org/r255261
...and it will allow us to generalize this fold:
http://reviews.llvm.org/rL112232
while preserving the order of bitcast + extract that it produces and testing shows
is better handled by the backend.
Note that the existing check for "isVectorTy()" wasn't strong enough in general
and specifically because: x86_mmx. It's not a vector, but it's not vectorizable
either. So here we check VectorType::isValidElementType() directly before
proceeding with the transform.
llvm-svn: 255433
While we have successfully implemented a funclet-oriented EH scheme on
top of LLVM IR, our scheme has some notable deficiencies:
- catchendpad and cleanupendpad are necessary in the current design
but they are difficult to explain to others, even to seasoned LLVM
experts.
- catchendpad and cleanupendpad are optimization barriers. They cannot
be split and force all potentially throwing call-sites to be invokes.
This has a noticable effect on the quality of our code generation.
- catchpad, while similar in some aspects to invoke, is fairly awkward.
It is unsplittable, starts a funclet, and has control flow to other
funclets.
- The nesting relationship between funclets is currently a property of
control flow edges. Because of this, we are forced to carefully
analyze the flow graph to see if there might potentially exist illegal
nesting among funclets. While we have logic to clone funclets when
they are illegally nested, it would be nicer if we had a
representation which forbade them upfront.
Let's clean this up a bit by doing the following:
- Instead, make catchpad more like cleanuppad and landingpad: no control
flow, just a bunch of simple operands; catchpad would be splittable.
- Introduce catchswitch, a control flow instruction designed to model
the constraints of funclet oriented EH.
- Make funclet scoping explicit by having funclet instructions consume
the token produced by the funclet which contains them.
- Remove catchendpad and cleanupendpad. Their presence can be inferred
implicitly using coloring information.
N.B. The state numbering code for the CLR has been updated but the
veracity of it's output cannot be spoken for. An expert should take a
look to make sure the results are reasonable.
Reviewers: rnk, JosephTremoulet, andrew.w.kaylor
Differential Revision: http://reviews.llvm.org/D15139
llvm-svn: 255422
We don't need to pass OutStreamer as a parameter to LowerSTACKMAP and
LowerPATCHPOINT. It is a member variable of PPCAsmPrinter, and thus, is already
available. NFC.
llvm-svn: 255418
Summary: This patch adds support of conversion (mul x, 2^N + 1) => (add (shl x, N), x) and (mul x, 2^N - 1) => (sub (shl x, N), x) if the multiplication can not be converted to LEA + SHL or LEA + LEA. LLVM has already supported this on ARM, and it should also be useful on X86. Note the patch currently only applies to cases where the constant operand is positive, and I am planing to add another patch to support negative cases after this.
Reviewers: craig.topper, RKSimon
Subscribers: aemerson, llvm-commits
Differential Revision: http://reviews.llvm.org/D14603
llvm-svn: 255415
This change is discussed in D15392 and should allow us to effectively
revert:
http://llvm.org/viewvc/llvm-project?view=revision&revision=255261
if we canonicalize bitcasts ahead of extracts.
It should be safe to convert any pair of bitcasts into a single bitcast,
however, it was mentioned here:
http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20110829/127089.html
that we're not allowed to bitcast from an x86_mmx to some other types, but I'm
not seeing any failures from that, and we have regression tests in CodeGen/X86
that appear to cover all of those cases.
Some day we'll get to remove that MMX wart from LLVM IR completely?
Differential Revision: http://reviews.llvm.org/D15468
llvm-svn: 255399
This branch adds hints for highly biased branches on the PPC architecture. Even
in absence of profiling information, LLVM will mark code reaching unreachable
terminators and other exceptional control flow constructs as highly unlikely to
be reached.
Patch by Tom Jablin!
llvm-svn: 255398
Many tests are now passing due to eliminateFrameIndex implementation and
the list needs to be re-triaged because it unblocks other failures, and
some previous failures are different. However I'm about to churn it more
by implementing more lowering, so will wait on that.
llvm-svn: 255396
Summary:
Use the SP32 physical register as the base for FrameIndex
lowering. Update it and the __stack_pointer global var in the prolog and
epilog. Extend the mapping of virtual registers to wasm locals to
include the physical registers.
Rather than modify the target-independent PrologEpilogInserter (which
asserts that there are no virtual registers left) include a
slightly-modified copy for Wasm that does not have this assertion and
only clears the virtual registers if scavenging was needed (which of
course it isn't for wasm).
Differential Revision: http://reviews.llvm.org/D15344
llvm-svn: 255392
Summary: This patch adds support of conversion (mul x, 2^N + 1) => (add (shl x, N), x) and (mul x, 2^N - 1) => (sub (shl x, N), x) if the multiplication can not be converted to LEA + SHL or LEA + LEA. LLVM has already supported this on ARM, and it should also be useful on X86. Note the patch currently only applies to cases where the constant operand is positive, and I am planing to add another patch to support negative cases after this.
Reviewers: craig.topper, RKSimon
Subscribers: aemerson, llvm-commits
Differential Revision: http://reviews.llvm.org/D14603
llvm-svn: 255391
DenseMap is the wrong data structure to use for sample records and call
sites. The keys are too large, causing massive core memory growth when
reading profiles.
Before this patch, a 21Mb input profile was causing the compiler to grow
to 3Gb in memory. By switching to std::map, the compiler now grows to
300Mb in memory.
There still are some opportunities for memory footprint reduction. I'll
be looking at those next.
llvm-svn: 255389
After much discussion, ending here:
http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20151123/315620.html
it has been decided that, instead of having the vectorizer directly generate
special absdiff and horizontal-add intrinsics, we'll recognize the relevant
reduction patterns during CodeGen. Accordingly, these intrinsics are not needed
(the operations they represent can be pattern matched, as is already done in
some backends). Thus, we're backing these out in favor of the current
development work.
r248483 - Codegen: Fix llvm.*absdiff semantic.
r242546 - [ARM] Use [SU]ABSDIFF nodes instead of intrinsics for VABD/VABA
r242545 - [AArch64] Use [SU]ABSDIFF nodes instead of intrinsics for ABD/ABA
r242409 - [Codegen] Add intrinsics 'absdiff' and corresponding SDNodes for absolute difference operation
llvm-svn: 255387
I am seeing disappointing clang performance on a large PowerPC64
Linux box. GetRandomNumberSeed() does a buffered read from
/dev/urandom to seed its PRNG. As a result we read an entire page
even though we only need 4 bytes.
With every clang task reading a page worth of /dev/urandom we
end up spending a large amount of time stuck on kernel spinlock.
Patch by Anton Blanchard!
llvm-svn: 255386
Before the patch, -fprofile-instr-generate compile will fail
if no integrated-as is specified when the file contains
any static functions (the -S output is also invalid).
This patch fixed the issue. With the change, the index format
version will be bumped up by 1. Backward compatibility is
preserved with this change.
Differential Revision: http://reviews.llvm.org/D15243
llvm-svn: 255365
computeRegisterLiveness() was broken in that it reported dead for a
register even if a subregister was alive. I assume this was because the
results of analayzePhysRegs() are hard to understand with respect to
subregisters.
This commit: Changes the results of analyzePhysRegs (=struct
PhysRegInfo) to be clearly understandable, also renames the fields to
avoid silent breakage of third-party code (and improve the grammar).
Fix all (two) users of computeRegisterLiveness() in llvm: By reenabling
it and removing workarounds for the bug.
This fixes http://llvm.org/PR24535 and http://llvm.org/PR25033
Differential Revision: http://reviews.llvm.org/D15320
llvm-svn: 255362
These are redundant pairs of nodes defined for
INSERT_VECTOR_ELEMENT/EXTRACT_VECTOR_ELEMENT.
insertelement/extractelement are slightly closer to the corresponding
C++ node name, and has stricter type checking so prefer it.
Update targets to only use these nodes where it is trivial to do so.
AArch64, ARM, and Mips all have various type errors on simple replacement,
so they will need work to fix.
Example from AArch64:
def : Pat<(sext_inreg (vector_extract (v16i8 V128:$Rn), VectorIndexB:$idx), i8),
(i32 (SMOVvi8to32 V128:$Rn, VectorIndexB:$idx))>;
Which is trying to do sext_inreg i8, i8.
llvm-svn: 255359
Summary:
ADJCALLSTACK{DOWN,UP} (aka CALLSEQ_{START,END}) MIs are supposed to use
and def the stack pointer. Since they do not, all the nodes are being
eliminated by DeadMachineInstructionElim, so they aren't in the IR when
PrologEpilogInserter/eliminateCallFramePseudo needs them.
This change fixes that, but since RegStackify will not stackify across
them (and it runs early, before PEI), change LowerCall to only emit them
when the call frame size is > 0. That makes the current code work the
same way and makes code handled by D15344 also work the same way. We can
expand the condition beyond NumBytes > 0 in the future if needed.
Reviewers: sunfish, jfb
Subscribers: jfb, dschuff, llvm-commits
Differential Revision: http://reviews.llvm.org/D15459
llvm-svn: 255356
Revert "[DSE] Disable non-local DSE to see if the bots go green."
Revert "[DeadStoreElimination] Use range-based loops. NFC."
Revert "[DeadStoreElimination] Add support for non-local DSE."
llvm-svn: 255354
The access function has a short entry and a short exit, the initialization
block is only run the first time. To improve the performance, we want to
have a short frame at the entry and exit.
We explicitly handle most of the CSRs via copies. Only the CSRs that are not
handled via copies will be in CSR_SaveList.
Frame lowering and prologue/epilogue insertion will generate a short frame
in the entry and exit according to CSR_SaveList. The majority of the CSRs will
be handled by register allcoator. Register allocator will try to spill and
reload them in the initialization block.
We add CSRsViaCopy, it will be explicitly handled during lowering.
1> we first set FunctionLoweringInfo->SplitCSR if conditions are met (the target
supports it for the given calling convention and the function has only return
exits). We also call TLI->initializeSplitCSR to perform initialization.
2> we call TLI->insertCopiesSplitCSR to insert copies from CSRsViaCopy to
virtual registers at beginning of the entry block and copies from virtual
registers to CSRsViaCopy at beginning of the exit blocks.
3> we also need to make sure the explicit copies will not be eliminated.
rdar://problem/23557469
Differential Revision: http://reviews.llvm.org/D15340
llvm-svn: 255353
GlobalsAA's assumptions that passes do not escape globals not previously
escaped is not violated by AlignmentFromAssumptions and SLPVectorizer. Marking
them as such allows GlobalsAA to be preserved until GVN in the LTO pipeline.
http://lists.llvm.org/pipermail/llvm-dev/2015-December/092972.html
Patch by Vaivaswatha Nagaraj!
llvm-svn: 255348