llvm-project/llvm/lib
Artem Belevich 29bbdc1c32 [NVPTX] Unify vectorization of load/stores of aggregate arguments and return values.
Original code only used vector loads/stores for explicit vector arguments.
It could also do more loads/stores than necessary (e.g v5f32 would
touch 8 f32 values). Aggregate types were loaded one element at a time,
even the vectors contained within.

This change attempts to generalize (and simplify) parameter space
loads/stores so that vector loads/stores can be used more broadly.
Functionality of the patch has been verified by compiling thrust
test suite and manually checking the differences between PTX
generated by llvm with and without the patch.

General algorithm:
* ComputePTXValueVTs() flattens input/output argument into a flat list
  of scalars to load/store and returns their types and offsets.
* VectorizePTXValueVTs() uses that data to create vectorization plan
  which returns an array of flags marking boundaries of vectorized
  load/stores. Scalars are represented as 1-element vectors.
* Code that generates loads/stores implements a simple state machine
  that constructs a vector according to the plan.

Differential Revision: https://reviews.llvm.org/D30011

llvm-svn: 295784
2017-02-21 22:56:05 +00:00
..
Analysis [ValueTracking] clang-format a section I'm about to touch; NFC 2017-02-21 02:42:42 +00:00
AsmParser Change debug-info-for-profiling from a TargetOption to a function attribute. 2017-02-01 22:45:09 +00:00
Bitcode Move symbols from the global namespace into (anonymous) namespaces. NFC. 2017-02-11 11:06:55 +00:00
CodeGen DAG: Check if extract_vector_elt is legal or custom 2017-02-21 22:47:27 +00:00
DebugInfo Don't assume little endian in StreamReader / StreamWriter. 2017-02-18 01:35:33 +00:00
Demangle Add support for demangling C++11 thread_local variables. 2017-01-31 15:56:36 +00:00
ExecutionEngine [Orc] Rename ObjectLinkingLayer -> RTDyldObjectLinkingLayer. 2017-02-20 05:45:14 +00:00
Fuzzer [libFuzzer] increase the size of FixedWord from 27 to 64, see PR31950 2017-02-14 23:02:37 +00:00
IR Teach the IR verifier to reject conflicting debug info for function arguments. 2017-02-21 19:03:15 +00:00
IRReader Timer: Track name and description. 2016-11-18 19:43:18 +00:00
LTO [LTO] Add ability to emit assembly to new LTO API 2017-02-15 20:36:36 +00:00
LibDriver LibDriver: Allow resource files to be archive members. 2016-12-15 19:37:46 +00:00
LineEditor
Linker IRMover: Merge flags LinkModuleInlineAsm and IsPerformingImport. 2017-02-03 17:01:14 +00:00
MC SubtargetFeature: Cleanup; NFC 2017-02-21 01:27:29 +00:00
Object Don't modify archive members unless really needed. 2017-02-21 20:40:54 +00:00
ObjectYAML Move symbols from the global namespace into (anonymous) namespaces. NFC. 2017-02-11 11:06:55 +00:00
Option Cleanup dump() functions. 2017-01-28 02:02:38 +00:00
Passes Increases full-unroll threshold. 2017-02-18 03:46:51 +00:00
ProfileData Cleanup dump() functions. 2017-01-28 02:02:38 +00:00
Support Try to fix the buildbot on OSX. 2017-02-21 21:31:28 +00:00
TableGen Use print() instead of dump() in code 2017-01-28 02:47:46 +00:00
Target [NVPTX] Unify vectorization of load/stores of aggregate arguments and return values. 2017-02-21 22:56:05 +00:00
Transforms Make default value for disable-licm-promotion in licm explicit. 2017-02-21 20:53:48 +00:00
XRay [XRAY] [x86_64] Adding a Flight Data filetype reader to the llvm-xray Trace implementation. 2017-02-17 01:47:16 +00:00
CMakeLists.txt [XRay] Define the library for XRay trace logs 2017-01-11 06:39:09 +00:00
LLVMBuild.txt Add an c++ itanium demangler to llvm. 2016-09-06 19:16:48 +00:00