By default in Python 3, check_output() returns a program's output as
an encoded byte sequence. This means it returns a Py3 `bytes` object,
which cannot be compared to a string since it's a different fundamental
type.
Although it might not be correct from a purist standpoint, from a
practical one we can assume that all output is encoded in the default
locale, in which case using universal_newlines=True will decode it
according to the current locale. Anyway, universal_newlines also
has the nice behavior that it converts \r\n to \n on Windows platforms
so this makes parsing code easier, should we need that. So it seems
like a win/win.
llvm-svn: 252025
Summary:
The goal of this pass is to perform store-to-load forwarding across the
backedge of a loop. E.g.:
for (i)
A[i + 1] = A[i] + B[i]
=>
T = A[0]
for (i)
T = T + B[i]
A[i + 1] = T
The pass relies on loop dependence analysis via LoopAccessAnalisys to
find opportunities of loop-carried dependences with a distance of one
between a store and a load. Since it's using LoopAccessAnalysis, it was
easy to also add support for versioning away may-aliasing intervening
stores that would otherwise prevent this transformation.
This optimization is also performed by Load-PRE in GVN without the
option of multi-versioning. As was discussed with Daniel Berlin in
http://reviews.llvm.org/D9548, this is inferior to a more loop-aware
solution applied here. Hopefully, we will be able to remove some
complexity from GVN/MemorySSA as a consequence.
In the long run, we may want to extend this pass (or create a new one if
there is little overlap) to also eliminate loop-indepedent redundant
loads and store that *require* versioning due to may-aliasing
intervening stores/loads. I have some motivating cases for store
elimination. My plan right now is to wait for MemorySSA to come online
first rather than using memdep for this.
The main motiviation for this pass is the 456.hmmer loop in SPECint2006
where after distributing the original loop and vectorizing the top part,
we are left with the critical path exposed in the bottom loop. Being
able to promote the memory dependence into a register depedence (even
though the HW does perform store-to-load fowarding as well) results in a
major gain (~20%). This gain also transfers over to x86: it's
around 8-10%.
Right now the pass is off by default and can be enabled
with -enable-loop-load-elim. On the LNT testsuite, there are two
performance changes (negative number -> improvement):
1. -28% in Polybench/linear-algebra/solvers/dynprog: the length of the
critical paths is reduced
2. +2% in Polybench/stencils/adi: Unfortunately, I couldn't reproduce this
outside of LNT
The pass is scheduled after the loop vectorizer (which is after loop
distribution). The rational is to try to reuse LAA state, rather than
recomputing it. The order between LV and LLE is not critical because
normally LV does not touch scalar st->ld forwarding cases where
vectorizing would inhibit the CPU's st->ld forwarding to kick in.
LoopLoadElimination requires LAA to provide the full set of dependences
(including forward dependences). LAA is known to omit loop-independent
dependences in certain situations. The big comment before
removeDependencesFromMultipleStores explains why this should not occur
for the cases that we're interested in.
Reviewers: dberlin, hfinkel
Subscribers: junbuml, dberlin, mssimpso, rengolin, sanjoy, llvm-commits
Differential Revision: http://reviews.llvm.org/D13259
llvm-svn: 252017
Summary: Will be used by the LoopLoadElimination pass.
Reviewers: hfinkel
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D13258
llvm-svn: 252016
Summary:
The functions use LAI and MemoryDepChecker classes so they need to be
defined after those definitions outside of the Dependence class.
Will be used by the LoopLoadElimination pass.
Reviewers: hfinkel
Subscribers: rengolin, llvm-commits
Differential Revision: http://reviews.llvm.org/D13257
llvm-svn: 252015
A profile of an LTO link of Chrome revealed that we were spending some
~30-50% of execution time in the function Constant::getRelocationInfo(),
which is called from TargetLoweringObjectFile::getKindForGlobal() and in turn
from TargetMachine::getNameWithPrefix().
It turns out that we only need the result of getKindForGlobal() when
targeting Mach-O, so this change moves the relevant part of the logic to
TargetLoweringObjectFileMachO.
NFCI.
Differential Revision: http://reviews.llvm.org/D14168
llvm-svn: 252014
I am not adding a test case for this since I don't know how portable the __fp16 type is between compilers and I don't want to break the test suite.
<rdar://problem/22375079>
llvm-svn: 252012
The printed name and the parsed assembler names weren't the same.
I'm not sure which name SC prints these as, but I think it's this one.
llvm-svn: 252010
If the requested SGPR was not actually aligned, it was
accepted and rounded down instead of rejected.
Also fix an assert if the range is an invalid size.
llvm-svn: 252009
Summary:
Add support for wasm's select operator, and lower LLVM's select DAG node
to it.
Reviewers: sunfish
Subscribers: dschuff, llvm-commits, jfb
Differential Revision: http://reviews.llvm.org/D14295
llvm-svn: 252002
This does not support TPOFF32 relocations to local symbols as the address calculations are separate. Support for this will be a separate patch.
llvm-svn: 251998
Introduce DIPrinter which takes care of rendering DILineInfo and
friends. This allows LLVMSymbolizer class to return a structured data
instead of plain std::strings.
llvm-svn: 251989
This is a case where there is inconsistency among ELF linkers:
* The spec says nothing special about empty sections.
* BFD ld removes them.
* Gold handles them like regular sections.
We were outputting them but sometimes ignoring them. This would create
odd looking outputs where a rw section could be in a ro segment for example.
The bfd way of doing things is also strange for the case where a symbol
points to the empty section.
Now we match gold and what seems to be the intention of the spec.
llvm-svn: 251988
Summary:
We now collect all types of dependences including lexically forward
deps not just "interesting" ones.
Reviewers: hfinkel
Subscribers: rengolin, llvm-commits
Differential Revision: http://reviews.llvm.org/D13256
llvm-svn: 251985
This patch actually introduces a dependency from unittest2 to
six. This should be ok since both packages are in our own
repo, and we assume a sys.path of the top-level script that
can find the third party packages. So unittest2 should be
able to find six.
llvm-svn: 251983
Make printDILineInfo and friends responsible for just rendering the
contents of the structures, demangling should actually be performed
earlier, when we have the information about the originating
SymbolizableModule at hand.
llvm-svn: 251981
Let the editor also clean up whitespace for that file.
Reviewers: clayborg
Subscribers: lldb-commits
Differential Revision: http://reviews.llvm.org/D13816
llvm-svn: 251979
unittest2 was using print statements in a few places, and also
using the `cmp` function (which is removed in Python 3). Again,
we need to stop using unittest2 and using unittest instead, but
this seems like an easier route for now.
llvm-svn: 251978