This will make it easier to switch the default of Polly's invariant load
hoisting strategy and also makes it very clear that these test cases
indeed require invariant code hoisting to work.
llvm-svn: 278667
This is the third patch to apply the BLIS matmul optimization pattern on matmul
kernels (http://www.cs.utexas.edu/users/flame/pubs/TOMS-BLIS-Analytical.pdf).
BLIS implements gemm as three nested loops around a macro-kernel, plus two
packing routines. The macro-kernel is implemented in terms of two additional
loops around a micro-kernel. The micro-kernel is a loop around a rank-1
(i.e., outer product) update. In this change we perform replacement of
the access relations and create empty arrays, which are steps to implement
the packing transformation. In subsequent changes we will implement copying
to created arrays.
Reviewed-by: Tobias Grosser <tobias@grosser.es>
Differential Revision: http://reviews.llvm.org/D22187
llvm-svn: 278666
Summary:
The following problem was occuring:
- broadcaster B had two listeners: L1 and L2 (thread T1)
- (T1) B has started to broadcast an event, it has locked a shared_ptr to L1 (in
ListenerIterator())
- on another thread T2 the penultimate reference to L1 was destroyed (the transient object in B is
now the last reference)
- (T2) the last reference to L2 was destroyed as well
- (T1) B has finished broadcasting the event to L1 and destroyed the last shared_ptr
- (T1) this triggered the destructor, which called into B->RemoveListener()
- (T1) all pointers in the m_listeners list were now stale, so RemoveListener emptied the list
- (T1) Eventually control returned to the ListenerIterator() for doing broadcasting, which was
still in the middle of iterating through the list
- (T1) Only now, it was holding onto a dangling iterator. BOOM.
I fix this issue by making sure nothing can interfere with the
iterate-and-remove-expired-pointers loop, by moving this logic into a single function, which
first locks (or clears) the whole list and then returns the list of valid and locked Listeners
for further processing. Instead of std::list I use an llvm::SmallVector which should hopefully
offset the fact that we create a copy of the list for the common case where we have only a few
listeners (no heap allocations).
A slight difference in behaviour is that now RemoveListener does not remove an element from the
list -- it only sets it's mask to 0, which means it will be removed during the next iteration of
GetListeners(). This is purely an implementation detail and it should not be externally
noticable.
I was not able to reproduce this bug reliably without inserting sleep statements into the code,
so I do not add a test for it. Instead, I add some unit tests for the functions that I do modify.
Reviewers: clayborg, jingham
Subscribers: tberghammer, lldb-commits
Differential Revision: https://reviews.llvm.org/D23406
llvm-svn: 278664
The commit started passing a nullptr port into GDBRemoteCommunication::StartDebugserverProcess.
The function was mostly handling the null value correctly, but it one case it did not check it's
value before assigning to it. Fix that.
llvm-svn: 278662
This adds two new utility functions findLoopControlBlock and findLoopPreheader
to MachineLoop and MachineLoopInfo. These functions are refactored and taken
from the Hexagon target as they are target independent; thus this is intendend to
be a non-functional change.
Differential Revision: https://reviews.llvm.org/D22959
llvm-svn: 278661
The new version has several advantages:
1) IMSHO it's more readable and neater
2) It handles loads and stores properly
3) It can handle any number of incoming blocks rather than just two. I'll be taking advantage of this in a followup patch.
With this change we can now finally sink load-modify-store idioms such as:
if (a)
return *b += 3;
else
return *b += 4;
=>
%z = load i32, i32* %y
%.sink = select i1 %a, i32 5, i32 7
%b = add i32 %z, %.sink
store i32 %b, i32* %y
ret i32 %b
When this works for switches it'll be even more powerful.
llvm-svn: 278660
Summary:
The assembler currently does not check the branch target for CBZ/CBNZ
instructions, which only permit branching forwards with a positive offset. This
adds validation for the branch target to ensure negative PC-relative offsets are
not encoded into the instruction, whether specified as a literal or as an
assembler symbol.
Reviewers: rengolin, t.p.northover
Subscribers: llvm-commits, rengolin
Differential Revision: https://reviews.llvm.org/D23312
llvm-svn: 278659
If a loop is not rotated (for example when optimizing for size), the latch is not the backedge. If we promote an expression to post-inc form, we not only increase register pressure and add a COPY for that IV expression but for all IVs!
Motivating testcase:
void f(float *a, float *b, float *c, int n) {
while (n-- > 0)
*c++ = *a++ + *b++;
}
It's imperative that the pointer increments be located in the latch block and not the header block; if not, we cannot use post-increment loads and stores and we have to keep both the post-inc and pre-inc values around until the end of the latch which bloats register usage.
llvm-svn: 278658
We processed unnamed bitfields after our logic for non-vector field
elements in records larger than 128 bits. The vector logic would
determine that the bit-field disqualifies the record from occupying a
register despite the unnamed bit-field not participating in the record
size nor its alignment.
N.B. This behavior matches GCC and ICC.
llvm-svn: 278656
An __m512 vector type wrapped in a structure should be passed in a
vector register.
Our prior implementation was based on a draft version of the psABI.
This fixes PR28975.
N.B. The update to the ABI was made here:
https://github.com/hjl-tools/x86-psABI/commit/30f9c9
llvm-svn: 278655
We are trying to prove that one group of operands is a subset of
another. We did this by populating two Sets and determining that every
element within one was inside the other.
However, this is unnecessary. We can simply construct a single set and
test if each operand is within it.
llvm-svn: 278641
tuple-like decomposition declaration. This significantly simplifies the
semantics of BindingDecls for AST consumers (they can now always be evalated
at the point of use).
llvm-svn: 278640
Although libc++ only requires C++11 to build, there are other
reasons to turn on a newer dialect in the build. For example
IDE's may not highlight any C++14/C++17 in the headers when
configured for C++11. This patch add's a private option for
changing this.
llvm-svn: 278638
1. Use shuffle to insert element i1 into vector. The previous implementation was incorrect ( dest_bit OR src_bit , it doesn't clear the bit if src_bit=0 )
2. Improve shuffle i1 vector, use CVT2MASK if supported instead TRUNCATE.
Differential Revision: http://reviews.llvm.org/D23347
llvm-svn: 278623