1. Mentioned that CUDA support works best with trunk.
2. Simplified the example by removing its dependency on the CUDA samples.
3. Explain the --cuda-gpu-arch flag.
llvm-svn: 259307
Previously the code assumed all uses of FI on loads and stores were as
addresses. This checks whether the use is the address or a value and
handles the latter case as it does for non-memory instructions.
llvm-svn: 259306
The previous code was incorrect (can't getReg a frameindex). We could instead optimize it to reduce tree height, but I'm not sure that's worthwhile yet because we then try to eliminate the frameindex.
This patch also fixes frame index elimination for operations which may load or store: it used to assume the base was operand 2 and immediate offset operand 1. That's not true for stores, where they're 4 and 3.
llvm-svn: 259305
The AMDGPUPromoteAlloca pass was emitting the read.local.size
calls, which with HSA was incorrectly selected to reading from
the offset mesa uses off of the kernarg pointer.
Error on intrinsics which aren't supported by HSA, and start
emitting the correct IR to read the workgroup size
out of the dispatch pointer.
Also initialize the pass so it can be tested with opt, and
start moving towards not depending on the subtarget as an
argument.
Start emitting errors for the intrinsics not handled with HSA.
llvm-svn: 259297
Only the dispatch.ptr intrinsic is supposed to be used now to get
the workgroup size, and the read.local.size intrinsics do not
work correctly.
llvm-svn: 259296
Refine the test for whether an instruction is in an expression tree so that
it detects when one tree ends and another begins, so we can place a block
at that point, rather than continuing to find the first instruction not in
a tree at all.
llvm-svn: 259294
These sets perform linear searching in small mode so it is never a good
idea to use SmallSize/N bigger than 32.
Differential Revision: http://reviews.llvm.org/D16705
llvm-svn: 259284
The doxygen comments are automatically generated based on Sony's intrinsics document.
Differential Revision: http://reviews.llvm.org/D16562
llvm-svn: 259275
This patch adds support for expanding "%h" out to the machine hostname
in the LLVM_PROFILE_FILE environment variable.
Patch by Daniel Waters!
Differential Revision: http://reviews.llvm.org/D16371
llvm-svn: 259272
Switch the evaluation from isIntegerConstantExpr to EvaluateAsInt.
EvaluateAsInt will evaluate more types of expressions than
isIntegerConstantExpr.
Move one case from -Wsign-conversion to -Wconstant-conversion. The case is:
1) Source and target types are signed
2) Source type is wider than the target type
3) The source constant value is positive
4) The conversion will store the value as negative in the target.
llvm-svn: 259271
The list of class properties is saved in
Old ABI: protocol->ext->class_properties (protocol->ext->size will be updated)
New ABI: protocol->class_properties (protocol->size will be updated)
rdar://23891898
llvm-svn: 259268
The list of class properties is saved in
Old ABI: category->class_properties (category->size will be updated as well)
New ABI: category->class_properties (a flag in objc_image_info to indicate
whether or not the list of class properties is present)
rdar://23891898
llvm-svn: 259267
Add an option to llvm-profdata merge for writing out sparse indexed
profiles. These profiles omit InstrProfRecords for functions which are
never executed.
Differential Revision: http://reviews.llvm.org/D16727
llvm-svn: 259258
Loop transformations can sometimes fail because the loop, while in
valid rotated LCSSA form, is not in a canonical CFG form. This is
an extremely simple pass that just merges obviously redundant
blocks, which can be used to fix some known failure cases. In the
future, it may be enhanced with more cases (and have code shared with
SimplifyCFG).
This allows us to run LoopSimplifyCFG -> LoopRotate -> LoopUnroll,
so that SimplifyCFG cleans up the loop before Rotate tries to run.
Not currently used in the pass manager, since this pass doesn't do
anything unless you can hook it up in an LPM with other loop passes.
It'll be added once Chandler cleans up things to allow this.
Tested in a custom pipeline out of tree to confirm it works in
practice (in addition to the included trivial test).
llvm-svn: 259256