This miscompile came about because we tried to use a transform which was
only appropriate for xor operators when addition was present.
This fixes PR26407.
llvm-svn: 259375
A masked load with a zero mask means there's no load.
A masked load with an allOnes mask means it's a normal vector load.
Differential Revision: http://reviews.llvm.org/D16691
llvm-svn: 259369
Summary:
Simplify callee-save register save/restore code generation by
remembering the size of the callee-save area when it is computed so we
don't have to scan the prologue/epilogue instructions again later to
reconstruct it.
This is intended to simplify follow-on changes that reduce the number of
registers saved/restored.
Reviewers: mcrosier, jmolloy, t.p.northover
Subscribers: aemerson, rengolin, mcrosier, llvm-commits
Differential Revision: http://reviews.llvm.org/D16732
llvm-svn: 259365
In the future, we will vectorize recurrences other than reductions. This patch
renames a few variables and updates their associated comments to enable them to
be reused for non-reduction PHI nodes.
This change was requested in the review for D16197.
llvm-svn: 259364
Summary:
The bugs were:
* teq and similar take 4-bit unsigned immediates on microMIPS.
* teqi and similar have side-effects like teq do.
* shll_s.w and shra_r.w take 5-bit unsigned immediates.
* The various DSP ext* instructions take a 5-bit immediate.
* repl.qh takes an 8-bit unsigned immediate.
* repl.ph takes a 10-bit unsigned immediate.
* rddsp/wrdsp take a 10-bit unsigned immediate.
* teqi and similar take signed 16-bit immediates (10-bit for microMIPS).
* Out-of-range immediate macros for or/xor take a simm32/simm64 depending
on architecture. I'll fix the simm64 case properly when I reach simm32.
lui is a bit more lenient than GAS and accepts signed immediates in addition
to unsigned. This is because MipsMCExpr can produce signed values when
constant folding and it currently lacks a way of knowing it should fold to
an unsigned value.
Reviewers: vkalintiris
Subscribers: dsanders, llvm-commits
Differential Revision: http://reviews.llvm.org/D15446
llvm-svn: 259360
Changed emitting offset of macinfo entry into compiler unit DIE to use "addSectionLabel" method rather than explicitly calculating size/offset of macro entry.
Differential Revision: http://reviews.llvm.org/D16292
llvm-svn: 259358
Minor patch to trace back through target shuffles to the source of the inserted element in a (V)INSERTPS shuffle.
Differential Revision: http://reviews.llvm.org/D16652
llvm-svn: 259343
Those commits created an artificial edge from a cleanup to a synthesized
catchswitch in order to get the MSVC personality routine to execute
cleanups which don't cleanupret and are not wrapped by a catchswitch.
This worked well enough but is not a complete solution in situations
where there the cleanup infinite loops.
However, the real deal breaker behind this approach comes about from a
degenerate case where the cleanup is post-dominated by unreachable *and*
throws an exception. This ends poorly because the catchswitch will
inadvertently catch the exception.
Because of this we should go back to our previous behavior of not
executing certain cleanups (identical behavior with the Itanium ABI
implementation in clang, GCC and ICC).
N.B. I think this could be salvaged by making the catchpad rethrow the
exception and properly transforming throwing calls in the cleanup into
invokes.
llvm-svn: 259338
With poorly chosen custom parameters, the line table encoding logic would
sometimes end up generating a special opcode bigger than 255, which is wrong.
The set of default parameters that LLVM uses isn't subject to this bug.
When carefully chosing the line table parameters, it's impossible to fall into the
corner case that this patch fixes. The standard however doesn't require that these
parameters be carefully chosen. And even if it did, we shouldn't generate broken
encoding.
Add a unittest for this specific encoding bug, and while at it, create some unit
tests for the encoding logic using different sets of parameters.
llvm-svn: 259334
Previously the code assumed all uses of FI on loads and stores were as
addresses. This checks whether the use is the address or a value and
handles the latter case as it does for non-memory instructions.
llvm-svn: 259306
The previous code was incorrect (can't getReg a frameindex). We could instead optimize it to reduce tree height, but I'm not sure that's worthwhile yet because we then try to eliminate the frameindex.
This patch also fixes frame index elimination for operations which may load or store: it used to assume the base was operand 2 and immediate offset operand 1. That's not true for stores, where they're 4 and 3.
llvm-svn: 259305
The AMDGPUPromoteAlloca pass was emitting the read.local.size
calls, which with HSA was incorrectly selected to reading from
the offset mesa uses off of the kernarg pointer.
Error on intrinsics which aren't supported by HSA, and start
emitting the correct IR to read the workgroup size
out of the dispatch pointer.
Also initialize the pass so it can be tested with opt, and
start moving towards not depending on the subtarget as an
argument.
Start emitting errors for the intrinsics not handled with HSA.
llvm-svn: 259297
Only the dispatch.ptr intrinsic is supposed to be used now to get
the workgroup size, and the read.local.size intrinsics do not
work correctly.
llvm-svn: 259296
Refine the test for whether an instruction is in an expression tree so that
it detects when one tree ends and another begins, so we can place a block
at that point, rather than continuing to find the first instruction not in
a tree at all.
llvm-svn: 259294