These intrinsics are special because they directly take a memory operand (AVX2
adds the register counterparts). Typically, other non-memop intrinsics take
registers and then it's left to isel to fold memory operands.
In order to LICM intrinsics directly reading memory, we require that no stores
are in the loop (LICM) or that the folded load accesses constant memory
(MachineLICM). When neither is the case we fail to hoist a loop-invariant
broadcast.
We can work around this limitation if we expose the load as a regular load and
then just implement the broadcast using the vector initializer syntax. This
exposes the load to LICM and other optimizations.
At the IR level this is translated into a series of insertelements. The
sequence is already recognized as a broadcast so there is no impact on the
quality of codegen.
_mm256_broadcast_pd and _mm256_broadcast_ps are not updated by this patch
because right now we lack the DAG-combiner smartness to recover the broadcast
instructions. This will be tackled in a follow-on.
There will be completing changes on the LLVM side to remove the LLVM
intrinsics and to auto-upgrade bitcode files.
Fixes <rdar://problem/16494520>
llvm-svn: 209846
Added new SocketPacketPump class to decouple gdb remote packet
reading from packet expectations code. This allowed for cleaner
implementation of the separate $O output streams (non-deterministic
packaging of inferior stdout/stderr) from all the rest of the packets.
Added a packet expectation matcher that can match expected accumulated
output with a timeout. Use a dictionary with "type":"output_match".
See lldbgdbserverutils.MatchRemoteOutputEntry for details.
Added a gdb remote test to verify that $Hc (continue thread selection)
plus signal delivery ($C{signo}) works. Having trouble getting this
to pass with debugserver on MacOSX 10.9. Tried different variants,
including $vCont;C{signo}:{thread-id};c. In some cases, I get the
test exe's signal handler to run ($vCont variant first time), in others I don't
($vCont second and further times). $C{signo} doesn't hit the signal
handler code at all in the test exe but delivers a stop. Further
$Hc and $C{signo} deliver the stop marking the wrong thread. For now I'm
marking the test as XFAIL on dsym/debugserver. Will revisit this on
lldb-dev.
Updated the text exe for these tests to support thread:print-ids (each
thread announces its thread id) and provide a SIGUSR1 thread handler
that prints out the thread id on which it was signaled.
llvm-svn: 209845
Currently LLVM will generally merge GEPs. This allows backends to use more
complex addressing modes. In some cases this is not happening because there
is PHI inbetween the two GEPs:
GEP1--\
|-->PHI1-->GEP3
GEP2--/
This patch checks to see if GEP1 and GEP2 are similiar enough that they can be
cloned (GEP12) in GEP3's BB, allowing GEP->GEP merging (GEP123):
GEP1--\ --\ --\
|-->PHI1-->GEP3 ==> |-->PHI2->GEP12->GEP3 == > |-->PHI2->GEP123
GEP2--/ --/ --/
This also breaks certain use chains that are preventing GEP->GEP merges that the
the existing instcombine would merge otherwise.
Tests included.
llvm-svn: 209843
Summary:
This adds documentation for -Rpass, -Rpass-missed and -Rpass-analysis.
It also adds release notes for 3.5.
Reviewers: rsmith
Subscribers: cfe-commits
Differential Revision: http://reviews.llvm.org/D3730
llvm-svn: 209841
Summary:
These two flags are in the same family as -Rpass, but are used in
different situations.
-Rpass-missed is used by optimizers to inform the user when they tried
to apply an optimization but couldn't (or wouldn't).
-Rpass-analysis is used by optimizers to report analysis results back
to the user (e.g., why the transformation could not be applied).
Depends on D3682.
Reviewers: rsmith
Subscribers: cfe-commits
Differential Revision: http://reviews.llvm.org/D3683
llvm-svn: 209839
without this case we would end on an infinite recursion: the remainder is zero,
so Numerator - Remainder is equal to Numerator and so we would recursively ask
for the division of Numerator by Denominator.
llvm-svn: 209838
when ScalarEvolution::getElementSize returns nullptr it is safe to early return
in ScalarEvolution::findArrayDimensions such that we avoid later problems when
we try to divide the terms by ElementSize.
llvm-svn: 209837
You can expect the sanitizers to be built under any of the following conditions:
1) CMAKE_C_COMPILER is GCC built to cross-compile to ARM
2) CMAKE_C_COMPILER is Clang built to cross-compile to ARM (ARM is default target)
3) CMAKE_C_COMPILER is Clang and CMAKE_C_FLAGS contains -target and --sysroot
Differential Revision: http://reviews.llvm.org/D3794
llvm-svn: 209835
This makes it slightly harder to misuse Twines. It is still possible to
refer to destroyed temporaries with the regular constructors, though.
Patch by Marco Alesiani!
llvm-svn: 209832
With -Weverything, the backend remarks are enabled. This was
causing spurious diagnostics for remarks that we don't yet
handle (cf http://reviews.llvm.org/D3683).
This will stop being a problem once http://reviews.llvm.org/D3683
is committed.
llvm-svn: 209823
This seems to match what gcc does for ppc and what every other llvm
backend does.
This is a fixed version of r209638. The difference is to avoid any change
in behavior for functions. The logic for using constant pools for function
addresseses is spread over a few places and we have to keep them in sync.
llvm-svn: 209821
The new storage (MetaMap) is based on direct shadow (instead of a hashmap + per-block lists).
This solves a number of problems:
- eliminates quadratic behaviour in SyncTab::GetAndLock (https://code.google.com/p/thread-sanitizer/issues/detail?id=26)
- eliminates contention in SyncTab
- eliminates contention in internal allocator during allocation of sync objects
- removes a bunch of ad-hoc code in java interface
- reduces java shadow from 2x to 1/2x
- allows to memorize heap block meta info for Java and Go
- allows to cleanup sync object meta info for Go
- which in turn enabled deadlock detector for Go
llvm-svn: 209810
This fix is possibly temporary while we determine whether -x cuda should be considered along with -std=c++11 when setting language options.
llvm-svn: 209808