The pipeliner is not adding a dependence edge for a loop carried
dependence, and ends up scheduling a load from iteration n prior
to an aliased store in iteration n-1.
The code that adds the loop carried dependences in the pipeliner
doesn't check if the memory objects for loads and stores are
"identified" (i.e., distinct) objects. If they are not, then the
code that adds the dependences needs to be conservative. The
objects can be used to check dependences only when they are
distinct objects.
The code that checks for loop carried dependences has been updated
to classify loads and stores that are not identified as "unknown"
values. A store with an "unknown" value can potentially create
a loop carried dependence with any pending load.
Patch by Brendon Cahoon.
llvm-svn: 328547
The phi renaming code in the pipeliner uses the wrong value when
rewriting phi uses, which results in an undefined value. In this
case, the original phi is no longer needed due to the order of
instruction in the pipelined loop. The pipeliner was assuming, in
this case, the the phi loop definition should be used to
rewrite the uses. However, the pipeliner needs to check to make
sure that the loop definition has already been scheduled. If not,
then the phi initial value needs to be used instead.
Patch by Brendon Cahoon.
llvm-svn: 328545
If the link clause is used on the declare target directive, the object
should be linked on target or target data directives, not during the
codegen. Patch adds support for this clause.
llvm-svn: 328544
The pipeliner was generating too many phis in the epilog blocks, which
caused incorrect code generation when rewriting an instruction that uses
the phi.
In this case, there 3 prolog and epilog stages. An existing phi was
scheduled at stage 1. When generating the code for the 2nd epilog an
extra new phi was generated.
To fix this, we need to update the code that calculates the maximum
number of phis that can be generated, which is based upon the current
prolog stage and the stage of the original phi. In this case, when the
prolog stage is 1 and the original phi stage is 1, the maximum number
of phis to generate is 2.
Patch by Brendon Cahoon.
llvm-svn: 328543
The patch contains severals changes needed to pipeline an example
that was transformed so that a Phi with a subreg is converted to
copies.
The pipeliner wasn't working for a couple of reasons.
- The RecMII was 3 instead of 2 due to the extra copies.
- Copy instructions contained a latency of 1.
- The node order algorithm was not choosing the best "bottom"
node, which caused an instruction to be scheduled that had a
predecessor and successor already scheduled.
- Updated the Hexagon Machine Scheduler to check if the node is
latency bound when adding the cost for a 0-latency dependence.
The RecMII was 3 because the computation looks at the number of
nodes in the recurrence. The extra copy is an extra node but
it shouldn't increase the latency. The new RecMII computation
looks at the latency of the instructions in the recurrence. We
changed the latency of the dependence of a copy to 0. The latency
computation for the copy also checks the use of the copy (similar
to a reg_sequence).
The node order algorithm was not choosing the last instruction
in the recurrence for a bottom up traversal. This was when the
last instruction is a copy. A check was added when choosing the
instruction to check for NodeNum if the maxASAP is the same. This
means that the scheduler will not end up with another node in
the recurrence that has both a predecessor and successor already
scheduled.
The cost computation in Hexagon Machine Scheduler adds cost when
an instruction can be packetized with a zero-latency instruction.
We should only do this if the schedule is latency bound.
Patch by Brendon Cahoon.
llvm-svn: 328542
The pipeliner is asserting because the serialization step that
occurs at the end is deleting an instruction. The assert
occurs later on because there is a use without a definition.
The problem occurs when an instruction defines a value used
by a REQ_SEQUENCE and that value is used by a COPY instruction.
The latencies between these instructions are zero, so they are
put in to the same packet. The serialization code is unable to
handle this correctly, and ends up putting the REG_SEQUENCE
before its definition.
There is special code in the serialization step that attempts
to handle zero-cost instructions (phis, copy, reg_sequence)
differently than regular instructions. Unfortunately, this means
the order does not come out correct.
This patch simplifies the code by changing the seperate steps for
handling zero-cost and regular instructions. Only phis are
handled separate now, since they should occurs first. Then, this
patch adds checks to make use the MoveUse is set to the smallest
value if there are multiple uses in a cycle.
Patch by Brendon Cahoon.
llvm-svn: 328540
This change brings performance of zlib up by 10%. The example below is from a
hot loop in longest_match() from zlib.
do.body:
%cur_match.addr.0 = phi i32 [ %cur_match, %entry ], [ %2, %do.cond ]
%idx.ext = zext i32 %cur_match.addr.0 to i64
%add.ptr = getelementptr inbounds i8, i8* %win, i64 %idx.ext
%add.ptr2 = getelementptr inbounds i8, i8* %add.ptr, i64 %idx.ext1
%add.ptr3 = getelementptr inbounds i8, i8* %add.ptr2, i64 -1
In this example %idx.ext1 is a loop invariant. It will be moved above the use of
loop induction variable %idx.ext such that it can be hoisted out of the loop by
LICM. The operands that have dependences carried by the loop will be sinked down
in the GEP chain. This patch will produce the following output:
do.body:
%cur_match.addr.0 = phi i32 [ %cur_match, %entry ], [ %2, %do.cond ]
%idx.ext = zext i32 %cur_match.addr.0 to i64
%add.ptr = getelementptr inbounds i8, i8* %win, i64 %idx.ext1
%add.ptr2 = getelementptr inbounds i8, i8* %add.ptr, i64 -1
%add.ptr3 = getelementptr inbounds i8, i8* %add.ptr2, i64 %idx.ext
llvm-svn: 328539
The pipeliner changes dependences between base+offset instructions
(loads and stores) so that the instructions have more flexibility
to be scheduled with respect to each other. This occurs when the
pipeliner is able to compute that the instructions will not alias
if their order is changed. The prevous code enforced the alias
property by checking if the base register is the same, and that the
offset values are either both positive or negative.
This patch improves the alias check by using the API
areMemAccessesTriviallyDisjoint instead. This enables more cases,
especially if the offset is a negative value. The pipeliner uses
the function by creating a new instruction with the offset used
in the next iteration.
Patch by Brendon Cahoon.
llvm-svn: 328538
A schedule may require that a phi from the original loop is used in
multiple iterations in the scheduled loop. When this occurs, we generate
multiple phis in the pipelined loop to save the value across iterations.
When we generate the new phis and update the register names in the
pipelined loop, the pipeliner attempts to reuse a previously generated
phi, when possible. The calculation for the name of the new phi needs
to account for the version/iteration of the original phi. Also, in the
epilog, the code only needs to check backwards for a previous iteration
until reaching the first prolog block.
Patch by Brendon Cahoon.
llvm-svn: 328537
The code in orderDepdences that looks at the order dependences between
instructions was processing all the successor and predecessor order
dependences. However, we really only want to check for an order dependence
for instructions scheduled in the same cycle.
Also, fixed how the pipeliner handles output dependences. An output
dependence is also a potential loop carried dependence. The pipeliner
didn't handle this case properly so an invalid schedule could be created
that allowed an output dependence to be scheduled in the next iteration
at the same cycle.
Patch by Brendon Cahoon.
llvm-svn: 328516
When the definition of a phi is used by a phi in the next iteration,
the pipeliner was assuming that the definition is processed first.
Because of the assumption, an incorrect phi name was used. This patch
has a check to see if the phi definition has been processed already.
Patch by Brendon Cahoon.
llvm-svn: 328510
The software pipeliner attempts to delete dead instructions after
generating the pipelined loop. The code looks for uses of each
instruction. Physical registers should be treated differently because
the use chains do not exist. The code that checks for dead
instructions should assume that definitions of physical registers
are used if the operand doesn't contain the dead flag.
Patch by Brendon Cahoon.
llvm-svn: 328509
The pipeliner needs to be conservative when updating the memoperands
of instructions in the epilog. Previously, the pipeliner was changing
the offset of the memoperand based upon the scheduling stage. However,
that is incorrect when control flow branches around the kernel code.
The bug enabled a load and store to the same stack offset to be swapped.
This patch fixes the bug by updating the size of the memoperands to be
UINT_MAX. This conservative value means that dependences will be created
between other loads and stores.
Patch by Brendon Cahoon.
llvm-svn: 328508
The first issue was that the test was capturing the "before" disassembly
before launching, and the "after" after. This is a problem because some
of the disassembly will change after we know the load address (e.g. PCs
in call instructions). I fix this by capturing both disassemblies with
the process running.
The second issue was that the refactor in r328488 accidentaly changed
the meaning of the test, as it was no longer disassembling the function
which contained the breakpoint.
While inside, I also modernize the test to use
lldbutil.run_to_source_breakpoint and prevent debug-info replication.
llvm-svn: 328504
Summary:
We previously emulated multi-staged builds using two dockerfiles,
native support from Docker allows us to merge them into one,
simplifying our scripts.
For more details about multi-stage builds, see:
https://docs.docker.com/develop/develop-images/multistage-build/
Reviewers: mehdi_amini, klimek, sammccall
Reviewed By: sammccall
Subscribers: llvm-commits, ioeric, cfe-commits
Differential Revision: https://reviews.llvm.org/D44787
llvm-svn: 328503
This replaces a large chunk of code that was looking for compound
patterns that include these sub-patterns. Existing tests ensure that
all of the previous examples are still folded as expected.
We still need to loosen the FMF check.
llvm-svn: 328502
Summary:
This patch adds support for incremental document syncing, as described
in the LSP spec. The protocol specifies ranges in terms of Position (a
line and a character), and our drafts are stored as plain strings. So I
see two things that may not be super efficient for very large files:
- Converting a Position to an offset (the positionToOffset function)
requires searching for end of lines until we reach the desired line.
- When we update a range, we construct a new string, which implies
copying the whole document.
However, for the typical size of a C++ document and the frequency of
update (at which a user types), it may not be an issue. This patch aims
at getting the basic feature in, and we can always improve it later if
we find it's too slow.
Signed-off-by: Simon Marchi <simon.marchi@ericsson.com>
Reviewers: malaperle, ilya-biryukov
Reviewed By: ilya-biryukov
Subscribers: MaskRay, klimek, mgorny, ilya-biryukov, jkorous-apple, ioeric, cfe-commits
Differential Revision: https://reviews.llvm.org/D44272
llvm-svn: 328500
Declaring "_Pragma("clang optimize off")" before the body of a
function with a lambda leads to the lambda functions in the body
not being affected.
Differential Revision: https://reviews.llvm.org/D43821
llvm-svn: 328494
Document that flag -resource-pressure can be used to enable/disable the resource
pressure view. This change should have been part of r328305.
llvm-svn: 328492
Implement TTI interface for targets to indicate that the LSR should give
priority to post-incrementing addressing modes.
Combination of patches by Sebastian Pop and Brendon Cahoon.
Differential Revision: https://reviews.llvm.org/D44758
llvm-svn: 328490
- close_fds is not compatible with stdin/out redirection on windows. I
just remove it, as this is not required for correct operation.
- the command string was assuming a posix shell. I rewrite the Popen
invocation to avoid the need for passing the arguments through a shell.
llvm-svn: 328489
Summary:
TestExprsChar.py
Char is unsigned char by default in PowerPC.
TestDisassembleBreakpoint.py
Modify disassemble testcase to consider multiple architectures.
TestThreadJump.py
Jumping directly to the return line on PowerPC architecture dos not
means returning the value that is seen on the code. The last test fails,
because it needs the execution of some assembly in the beginning of the
function. Avoiding this test for this architecture.
TestEhFrameUnwind.py
Implement func for ppc64le test case.
TestWatchLocation.py
TestStepOverWatchpoint.py
PowerPC currently supports only one H/W watchpoint.
TestDisassembleRawData.py
Add PowerPC opcode and instruction for disassemble testcase.
Reviewers: labath
Reviewed By: labath
Subscribers: davide, labath, alexandreyy, lldb-commits, luporl, lbianc
Differential Revision: https://reviews.llvm.org/D44472
Patch by Alexandre Yukio Yamashita <alexandre.yamashita@eldorado.org.br>.
llvm-svn: 328488
The goal of this patch is to address most of PR36874. To fully fix PR36874 we
need to split the "InstructionInfo" view from the "SummaryView". That would make
easy to check the latency and rthroughput as well.
The patch reuses all the logic from ResourcePressureView to print out the
"instruction tables".
We have an entry for every instruction in the input sequence. Each entry reports
the theoretical resource pressure distribution. Resource pressure is uniformly
distributed across all the processor resource units of a group.
At the moment, the backend pipeline is not configurable, so the only way to fix
this is by creating a different driver that simply sends instruction events to
the resource pressure view. That means, we don't use the Backend interface.
Instead, it is simpler to just have a different code-path for when flag
-instruction-tables is specified.
Once Clement addresses bug 36663, then we can port the "instruction tables"
logic into a stage of our configurable pipeline.
Updated the BtVer2 test cases (thanks Simon for the help). Now we pass flag
-instruction-tables to each modified test.
Differential Revision: https://reviews.llvm.org/D44839
llvm-svn: 328487
Summary: PPC64's auxvec has a special key that must be ignored.
Reviewers: clayborg, labath
Reviewed By: clayborg, labath
Subscribers: alexandreyy, lbianc
Differential Revision: https://reviews.llvm.org/D43771
Patch by Leandro Lupori <leandro.lupori@gmail.com>.
llvm-svn: 328486
Summary:
First attempt at landing D42145 was reverted because it caused test
failures on some android devices. It turned out this was because these
devices had vdso modules with differing physical and virtual addresses.
This was not caught earlier because all of the modules in our tests
either lack physical addresses or have them identical to virtual ones.
In the discussion on the patch, we came to the conclusion that in the
scenario where we are merely setting a load address of a module (for
example from a dynamic loader plugin), we should always use virtual
addresses (i.e., preserve status quo). This patch adds a test to make
sure we don't regress in that direction.
Reviewers: owenpshaw
Subscribers: lldb-commits
Differential Revision: https://reviews.llvm.org/D44738
llvm-svn: 328485
Current logic of loop SCEV invalidation in Loop Unroller implicitly relies on
fact that exit count of outer loops cannot rely on exiting blocks of
inner loops, which is true in current implementation of backedge taken count
calculation but is wrong in general. As result, when we only forget the loop that
we have just unrolled, we may still have cached data for its outer loops (in particular,
exit counts) which keeps references on blocks of inner loop that could have been
changed or even deleted.
The attached test demonstrates a situaton when after unrolling of innermost loop
the outermost loop contains a dangling pointer on non-existant block. The problem
shows up when we apply patch https://reviews.llvm.org/D44677 that makes SCEV
smarter about exit count calculation. I am not sure if the bug exists without this patch,
it appears that now it is accidentally correct just because in practice exact backedge
taken count for outer loops with complex control flow inside is never calculated.
But when SCEV learns to do so, this problem shows up.
This patch replaces existing logic of SCEV loop invalidation with a correct one, which
happens to be invalidation of outermost loop (which also leads to invalidation of all
loops inside of it). It is the only way to ensure that no outer loop keeps dangling pointers
on removed blocks, or just outdated information that has changed after unrolling.
Differential Revision: https://reviews.llvm.org/D44818
Reviewed By: samparker
llvm-svn: 328483
This broke Chromium (see crbug.com/825748). It looks like mstorsjo's follow-up
patch at D44876 fixes this, but let's revert back to green for now until that's
ready to land.
(Also reverts r328443.)
> Both GCC and MSVC only look at the low byte of a boolean when it is
> passed.
llvm-svn: 328482
CanBeMin is currently used which will report true for any unknown
values, but often a check is performed outside the loop which covers
this situation:
for (int i = 0; i < N; ++i)
...
if (N > 0)
for (int i = 0; i < N; ++i)
...
So I've add 'LoopGuardedAgainstMin' which reports whether N is
greater than the minimum value which then allows loop with a variable
loop count to be optimised. I've also moved the increasing bound
checking into its own function and replaced SumCanReachMax is another
isLoopEntryGuardedByCond function.
llvm-svn: 328480
Currently, we might have a bug with scripts like below:
.foo : ALIGN(8)
{
*(.foo)
} > ram
because do not expand the memory region when doing ALIGN.
This might result in file range overlaps. The patch fixes the issue.
Differential revision: https://reviews.llvm.org/D44730
llvm-svn: 328479
The NB comments for filesystem changed permissions and added
a new enum `perm_options` which control how the permissions
are applied.
This implements than NB resolution
llvm-svn: 328476
As I move towards implementing std::filesystem, there is a need to
make the existing tests run against both the std and experimental versions.
Additionally, it's helpful to allow running the tests against other
implementations of filesystem.
This patch converts the test to easily target either. First, it
adds a filesystem_include.hpp header which is soley responsible
for selecting and including the correct implementation. Second,
it converts existing tests to use this header instead of including
filesystem directly.
llvm-svn: 328475