In ScheduleOptimizer::isTileableBand(), allow the case in which
the band node's child is an isl_schedule_sequence_node and its
grandchildren isl_schedule_leaf_nodes. This case can arise when
two or more statements are fused by the isl scheduler.
The tile_after_fusion.ll test has two statements in separate
loop nests and checks whether they are tiled after being fused
when polly-opt-fusion equals "max".
Reviewers: grosser
Subscribers: gareevroman, pollydev
Tags: #polly
Contributed-by: Theodoros Theodoridis <theodort@student.ethz.ch>
Differential Revision: https://reviews.llvm.org/D30815
llvm-svn: 297587
I noticed unnecessary 'sbb' instructions in D30472 and while looking at 'ptest' codegen recently.
This happens because we were transforming any 'setb' - even when we only wanted a single-bit result.
This patch moves those transforms under visitAdd/visitSub, so we we're only creating sbb/adc when it
is a win. I don't know why we need a SETCC_CARRY node type, but I'm not proposing to change that
existing behavior in this patch.
Also, I'm skeptical that sbb/adc are a win for all micro-arches, so I added comments to the test files
where this transform still fires.
The test changes here are all cases where we no longer produce sbb/adc. Avoiding partial register
stalls (generating an xor to clear a register) is not handled in some cases, but that's a separate
issue.
Differential Revision: https://reviews.llvm.org/D30611
llvm-svn: 297586
There were a couple of problems with this function on Windows. Different
separators and differences in how tilde expressions are resolved for
starters, but in addition there was no clear indication of what the
function's inputs or outputs were supposed to be, and there were no tests
to demonstrate its use.
To more easily paper over the differences between Windows paths,
non-Windows paths, and tilde expressions, I've ported this function to use
LLVM-based directory iteration (in fact, I would like to eliminate all of
LLDB's directory iteration code entirely since LLVM's is cleaner / more
efficient (i.e. it invokes fewer stat calls)). and llvm's portable path
manipulation library.
Since file and directory completion assumes you are referring to files and
directories on your local machine, it's safe to assume the path syntax
properties of the host in doing so, so LLVM's APIs are perfect for this.
I've also added a fairly robust set of unit tests. Since you can't really
predict what users will be on your machine, or what their home directories
will be, I added an interface called TildeExpressionResolver, and in the
unit test I've mocked up a fake implementation that acts like a unix
password database. This allows us to configure some fake users and home
directories in the test, so we can exercise all of those hard-to-test
codepaths that normally otherwise depend on the host.
Differential Revision: https://reviews.llvm.org/D30789
llvm-svn: 297585
Summary:
A53 scheduler causes an assertion failure on all CRC instructions:
include/llvm/CodeGen/MachineInstr.h:280: const llvm::MachineOperand
&llvm::MachineInstr::getOperand(unsigned int) const: Assertion `i <
getNumOperands() && "getOperand() out of range!"' failed.
The case statements corresponding to CRC instructions are incorrect and should
be removed.
Also adding a testcase while on this.
Reviewers: t.p.northover, javed.absar, apazos, rengolin
Reviewed By: rengolin
Subscribers: evandro, aemerson, llvm-commits, rengolin
Differential Revision: https://reviews.llvm.org/D30274
llvm-svn: 297582
Unroller's specialized scalarizeInstruction() is mostly duplicating Vectorizer's
variant. OTOH Vectorizer's scalarizeInstruction() already supports the special
case of VF==1 except for avoiding mask-bit extraction in that case. This patch
removes Unroller's specialized version in favor of a unified method.
The only functional difference between the two variants seems to be setting
memcheck metadata for loads and stores only in Vectorizer's variant, which is a
bug in Unroller. To keep this patch an NFC the unified method doesn't set
memcheck metadata for VF==1.
Differential Revision: https://reviews.llvm.org/D30715
llvm-svn: 297580
If a SCoP is most probably sequential, then it's better to run it on a CPU.
Hence, there's no point in running it on a GPU.
Reviewers: grosser
Subscribers: nemanjai
Tags: #polly
Contributed-by: Singapuram Sanjay <singapuram.sanjay@gmail.com>
Differential Revision: https://reviews.llvm.org/D30864
llvm-svn: 297578
I'm pretty sure there are more problems lurking here. But I think this fixes PR32241.
I've added the test case from that bug and added asserts that will fail if we ever try to copy between high registers and mask registers again.
llvm-svn: 297574
Without SSE41 (pextrb) we currently extract byte elements from a vector by spilling to stack and reloading the byte.
This patch is an initial attempt at using MOVD/PEXTRW to extract the relevant DWORD/WORD from the vector and then shift+truncate to collect the correct byte.
Extraction of multiple bytes this way would result in code bloat, but as explained in the patch we could probably afford to be more aggressive with the supported extractions before again falling back on spilling - possibly through counting the number of extracts and which DWORD/WORD they originate?
Differential Revision: https://reviews.llvm.org/D29841
llvm-svn: 297568
Summary:
This patch implements a initial support of lit test for builtins.
Unit/arm/call_apsr.S is updated to support thumb1.
It also fixes a bug in arm/aeabi_uldivmod_test.c
gcc_personality_test is XFAILED as the framework cannot handle it so far.
cpu_model_test is also XFAILED for now as it is expected to return non-zero.
Reviewers: rengolin, compnerd, jroelofs, erik.pilkington, arphaman
Reviewed By: jroelofs
Subscribers: jroelofs, aemerson, srhines, nemanjai, llvm-commits, mgorny
Differential Revision: https://reviews.llvm.org/D30802
llvm-svn: 297566
Since v_max_f32_e64/v_max_f16_e64 can be folded if the target
instruction supports the clamp bit, we also need to maintain
modifiers when converting v_mac to v_mad.
This fixes a rendering issue with Dirt Rally because a v_mac
instruction with the clamp bit set was converted to a v_mad
but that bit was lost during the conversion.
Fixes: e184e01dd79 ("AMDGPU: Fold FP clamp as modifier bit")
Patch by Samuel Pitoiset <samuel.pitoiset@gmail.com>
llvm-svn: 297556
Clang doesn't produce gcov compatible coverage files. This
causes lcov to break because it uses gcov by default. This
patch switches lcov to use llvm-cov as the gcov-tool.
Unfortunatly llvm-cov doesn't provide a gcov like interface by
default so it won't work with lcov. However `llvm-cov gcov` does.
For this reason we generate 'llvm-cov-wrapper' script that always
passes the gcov flag.
llvm-svn: 297553
Summary:
Some coroutine diagnostics need to point to the location of the first coroutine keyword in the function, like when diagnosing a `return` inside a coroutine. Previously we did this by storing each *valid* coroutine statement in a list and select the first one to use in diagnostics. However if every coroutine statement is invalid we would have no location to point to.
This patch fixes the storage of the first coroutine statement location, ensuring that it gets stored even when the resulting AST node would be invalid.
This patch also removes the `CoroutineStmts` list in `FunctionScopeInfo` because it was unused.
Reviewers: rsmith, GorNishanov, aaron.ballman
Reviewed By: GorNishanov
Subscribers: mehdi_amini, cfe-commits
Differential Revision: https://reviews.llvm.org/D30776
llvm-svn: 297547
When CMAKE_INSTALL_MANDIR isn't defined it ends up attempting to install
the man pages under "/man1" and we really don't want to accidentally install
stuff at the filesystem root.
llvm-svn: 297545
Summary:
Ths "cases" support was not quite finished, is unused, and is really just debug counters.
(well, almost, debug counters are slightly more powerful, in that they can skip things at the start, too).
Note, opt-bisect itself could also be implemented as a wrapper around
debug counters, but not sure it's worth it ATM.
I'll shove it on a todo list if we think it is.
Reviewers: MatzeB, chandlerc
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D30856
llvm-svn: 297542
Summary:
Create only one OpaqueValue for await_ready/await_suspend/await_resume.
Store OpaqueValue used in the CoroutineSuspendExpr node, so that CodeGen does not have to hunt looking for it.
Reviewers: rsmith, EricWF, aaron.ballman
Reviewed By: EricWF
Subscribers: mehdi_amini, cfe-commits
Differential Revision: https://reviews.llvm.org/D30775
llvm-svn: 297541
r297310 began inserting red zones around allocations under ASan, which
perturbs the alignment of subsequent allocations. Deliberately specify
this in two places where it matters.
Fixes failures when these tests are run under ASan and UBSan together.
Reviewed by Duncan Exon Smith.
rdar://problem/30980047
llvm-svn: 297540
Summary:
This change solves the same problem as D30726, except that this only
throws out the bathwater.
AST was not correctly tracking and deleting UnknownInstructions via
handles. The existing code only tracks "pointers" in its
`ASTCallbackVH`, so an UnknownInstruction (that isn't also def'ing a
pointer used by another memory instruction) never gets a
`ASTCallbackVH`.
There are two other ways to solve this problem:
- Use the `PointerRec` scheme for both known and unknown instructions.
- Use a `CallbackVH` that erases the offending Instruction from the
UnknownInstruction list.
Both of the above changes seemed to be significantly (and unnecessarily
IMO) more complex than this.
Reviewers: chandlerc, dberlin, hfinkel, reames
Subscribers: mcrosier, llvm-commits
Differential Revision: https://reviews.llvm.org/D30849
llvm-svn: 297539
This method inverts the Reason field of a scheduling candidate.
It does right comparison between RegCritical and RegExcess, but
everything else is broken. In fact it can prefer less strong reason
such as Weak over RegCritical because Weak > -RegCritical.
The CandReason enum is properly sorted, so just remove artificial
ranking.
Differential Revision: https://reviews.llvm.org/D30557
llvm-svn: 297536