forked from OSchip/llvm-project
53b946aa63
The current design uses a unique entry for each argument/result attribute, with the name of the entry being something like "arg0". This provides for a somewhat sparse design, but ends up being much more expensive (from a runtime perspective) in-practice. The design requires building a string every time we lookup the dictionary for a specific arg/result, and also requires N attribute lookups when collecting all of the arg/result attribute dictionaries. This revision restructures the design to instead have an ArrayAttr that contains all of the attribute dictionaries for arguments and another for results. This design reduces the number of attribute name lookups to 1, and allows for O(1) lookup for individual element dictionaries. The major downside is that we can end up with larger memory usage, as the ArrayAttr contains an entry for each element even if that element has no attributes. If the memory usage becomes too problematic, we can experiment with a more sparse structure that still provides a lot of the wins in this revision. This dropped the compilation time of a somewhat large TensorFlow model from ~650 seconds to ~400 seconds. Differential Revision: https://reviews.llvm.org/D102035 |
||
---|---|---|
.. | ||
Utils | ||
BufferDeallocation.cpp | ||
BufferOptimizations.cpp | ||
BufferResultsToOutParams.cpp | ||
BufferUtils.cpp | ||
Bufferize.cpp | ||
CMakeLists.txt | ||
CSE.cpp | ||
Canonicalizer.cpp | ||
Inliner.cpp | ||
LocationSnapshot.cpp | ||
LoopCoalescing.cpp | ||
LoopFusion.cpp | ||
LoopInvariantCodeMotion.cpp | ||
MemRefDataFlowOpt.cpp | ||
NormalizeMemRefs.cpp | ||
OpStats.cpp | ||
ParallelLoopCollapsing.cpp | ||
PassDetail.h | ||
PipelineDataTransfer.cpp | ||
SCCP.cpp | ||
StripDebugInfo.cpp | ||
SymbolDCE.cpp | ||
ViewOpGraph.cpp | ||
ViewRegionGraph.cpp |