Summary:
When propagating mass through irregular loops, the mass flowing through
each loop header may not be equal. This was causing wrong frequencies
to be computed for irregular loop headers.
Fixed by keeping track of masses flowing through each of the headers in
an irregular loop. To do this, we now keep track of per-header backedge
weights. After the loop mass is distributed through the loop, the
backedge weights are used to re-distribute the loop mass to the loop
headers.
Since each backedge will have a mass proportional to the different
branch weights, the loop headers will end up with a more approximate
weight distribution (as opposed to the current distribution that assumes
that every loop header is the same).
Reviewers: dexonsmith
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D10348
llvm-svn: 239843
While completely undefined registers are easy to catch and get their
<undef> flag early in ProcessImplicitDefs/RegisterCoalescer reading from
a partially defined register where just the subreg happens to be
undefined is harder to catch so we only add the undef flag in the
virtual register rewriting step.
No testcase as I cannot reproduce the problem on any of the in-tree targets at
the moment.
This fixes rdar://21387089
Differential Revision: http://reviews.llvm.org/D10470
llvm-svn: 239838
LaneMasks as given by getSubRegIndexLaneMask() have a limited number of
of bits, so for targets with more than 31 disjunct subregister there may
be cases where:
getSubReg(Reg,A) does not overlap getSubReg(Reg,B)
but we still have
(getSubRegIndexLaneMask(A) & getSubRegIndexLaneMask(B)) != 0.
I had hoped to keep this an implementation detail of the tablegen but as
my next commit shows we can avoid unnecessary imp-defs operands if we
know that the lane masks in use are precise.
This is in preparation to http://reviews.llvm.org/D10470.
llvm-svn: 239837
Old names, new names, and what they really mean:
- IsWin64 -> IsWin64CC: This is true on non-Windows x86_64 platforms
when the ms_abi calling convention is used.
- IsWinEH -> IsWin64Prologue: True when the target is Win64, regardless
of calling convention. Changes the prologue to obey the constraints of
the Win64 unwinder.
- NeedsWinEH -> NeedsWinCFI: We're using the win64 prologue *and* the we
want .xdata unwind tables. Analogous to NeedsDwarfCFI.
NFC
llvm-svn: 239836
A reduction is a special kind of recurrence. In the loop vectorizer we currently
identify basic reductions. Future patches will extend this to identifying basic
recurrences.
llvm-svn: 239835
This commit reports an error when a machine function from a MIR file that contains
LLVM IR can't find a function with the same name in the loaded LLVM IR module.
Reviewers: Duncan P. N. Exon Smith
Differential Revision: http://reviews.llvm.org/D10468
llvm-svn: 239831
This is an updated version of the patch that was checked in at:
http://reviews.llvm.org/rL237046
but subsequently reverted because it exposed a bug in the DAG Combiner:
http://reviews.llvm.org/D9893
This time, there's an enablement flag ("EnableFMFInDAG") around the code in
SelectionDAGBuilder where we copy the set of FP optimization flags from IR
instructions to DAG nodes. So, in theory, there should be no functional change
from this patch as-is, but it will allow testing with the added functionality
to proceed via "-enable-fmf-dag" passed to llc.
This patch adds the minimum plumbing necessary to use IR-level
fast-math-flags (FMF) in the backend without actually using
them for anything yet. This is a follow-on to:
http://reviews.llvm.org/rL235997
Differential Revision: http://reviews.llvm.org/D10403
llvm-svn: 239828
The mftb instruction was incorrectly marked as deprecated in the PPC
Backend. Instead, it should not be treated as deprecated, but rather be
implemented using the mfspr instruction. A similar patch was put into GCC last
year. Details can be found at:
https://sourceware.org/ml/binutils/2014-11/msg00383.html.
This change will replace instances of the mftb instruction with the mfspr
instruction for all CPUs except 601 and pwr3. This will also be the default
behaviour.
Additional details can be found in:
https://llvm.org/bugs/show_bug.cgi?id=23680
Phabricator review: http://reviews.llvm.org/D10419
llvm-svn: 239827
Reapply r239539. Don't assume the collected number of
stores is the same vector size. Just take the first N
stores to fill the vector.
llvm-svn: 239825
In order to support asynchronous notifications for non-stop mode this patch adds a packet read thread. This is done by implementing AppendBytesToCache() from the communications class, which continually reads packets into a packet queue. To initialize this thread StartReadThread() must be called by the client, so since llgs and platform tools use the GBDRemoteCommunicatos code they must also call this function as well as ProcessGDBRemote.
When the read thread detects an async notify packet it broadcasts this event, where the matching listener will be added in the next non-stop patch.
Packets are now accessed by calling ReadPacket() which pops a packet from the queue, instead of using WaitForPacketWithTimeoutMicroSecondsNoLock()
Reviewers: vharron, clayborg
Subscribers: lldb-commits, labath, ted, domipheus, deepak2427
Differential Revision: http://reviews.llvm.org/D10085
llvm-svn: 239824
In r239421, the mangling of long double on PowerPC Linux targets
was changed to use "g" instead of "e". This same change also needs
to be done for SystemZ (all targets, since we support only Linux
on SystemZ anyway).
This is because an old ABI variant set "long double" to a 64-bit
type equivalent to "double", and the "e" mangling code is still
used to refer to that old ABI for compatibility reasons.
llvm-svn: 239822
Any combination of +-inf/+-inf is NaN so it's already ignored with
nnan and we can skip checking for ninf. Also rephrase logic in comments
a bit.
llvm-svn: 239821
The cmake check for whether libatomic could be used had been
unconditionally setting the result to false. Which was somewhat
fortunate, because the prerequisite check for whether it was *needed*
was always claiming it was, even if it was not.
However, this made platforms where libatomic is actually necessary
fail to link.
Differential Revision: http://reviews.llvm.org/D10453
llvm-svn: 239819
Summary:
If the driver is only given -msoft-float/-mfloat-abi=soft or -msingle-float,
we should refrain from propagating -mfpxx, unless it was explicitly given on the
command line.
Reviewers: atanasyan, dsanders
Reviewed By: atanasyan, dsanders
Subscribers: cfe-commits, mpf
Differential Revision: http://reviews.llvm.org/D10387
llvm-svn: 239818
Summary:
Relocs that can be converted from absolute to PC-relative now do so if IsPCRel
is true. Relocs that require PC-relative now call llvm_unreachable() if IsPCRel
is false and similarly those that require absolute assert that IsPCRel is false.
Note that while it looks like some relocs (e.g. R_MIPS_26) can be converted into
the MIPS32r6/MIPS64r6 relocs (R_MIPS_PC*_S2), it isn't actually valid to do so.
Placeholders have been left in the testcase for unsupported relocs and relocs
that cannot be generated at the moment.
Reviewers: vkalintiris
Reviewed By: vkalintiris
Subscribers: llvm-commits, rafael
Differential Revision: http://reviews.llvm.org/D10184
llvm-svn: 239817
Summary:
This continues the patch series to eliminate StringRef forms of GNU triples
from the internals of LLVM that began in r239036.
Reviewers: rengolin
Reviewed By: rengolin
Subscribers: llvm-commits, rengolin
Differential Revision: http://reviews.llvm.org/D10381
llvm-svn: 239815
Summary:
This affects other tools so the previous C++ API has been retained as a
deprecated function for the moment. Clang has been updated with a trivial
patch (not covered by the pre-commit review) to avoid breaking -Werror builds.
Other in-tree tools will be fixed with similar patches.
This continues the patch series to eliminate StringRef forms of GNU triples
from the internals of LLVM that began in r239036.
The first time this was committed it accidentally fixed an inconsistency in
triples in llvm-mc and this caused a failure. This inconsistency was fixed in
r239808.
Reviewers: rengolin
Reviewed By: rengolin
Subscribers: llvm-commits, rengolin
Differential Revision: http://reviews.llvm.org/D10366
llvm-svn: 239812
Previously the last iteration for simd loop-based OpenMP constructs were generated as a separate code. This feature is not required and codegen is simplified.
llvm-svn: 239810
insertions. It is unlikely to be the intention to delete parts of newly
inserted code. To do so, changed sorting Replacements at the same offset
to have decreasing length.
llvm-svn: 239809
Summary:
GetTarget() may modify TripleName without also updating TheTriple.
This can lead to situations where the MCObjectStreamer has a different triple
to the rest of LLVM.
This inconsistency caused sparc-little-endian.s to pass on Windows because most
of LLVM had sparcel-pc-win32 while MCObjectStreamer had "". I believe the same
kind of thing was also true of Darwin.
Reviewers: rengolin
Reviewed By: rengolin
Subscribers: llvm-commits, rengolin, rafael
Differential Revision: http://reviews.llvm.org/D10450
llvm-svn: 239808
We were propagating the coverage map into the body of an if statement,
but not into the condition thereafter. This is fine as long as the two
locations are in the same virtual file, but they won't be when the
"if" part of the statement is from a macro and the condition is not.
llvm-svn: 239803
When we multiply two 64-bit vectors, we extract lower and upper part and use the PMULUDQ instruction.
When one of the operands is a constant, the upper part may be zero, we know this at compile time.
Example: %a = mul <4 x i64> %b, <4 x i64> < i64 5, i64 5, i64 5, i64 5>.
I'm checking the value of the upper part and prevent redundant "multiply", "shift" and "add" operations.
llvm-svn: 239802
These are really immediate DUPs, and suffer from the same problem
with long instructions with a high/2 variant (e.g. smull).
By extending a MOVI (or DUP, before this patch), we can avoid an ext
on the other operand of the long instruction, e.g. turning:
ext.16b v0, v0, v0, #8
movi.4h v1, #0x53
smull.4s v0, v0, v1
into:
movi.8h v1, #0x53
smull2.4s v0, v0, v1
While there, add a now-necessary combine to fold (VT NVCAST (VT x)).
llvm-svn: 239799
This change is hopefully NFC. The only tricky part is that I changed the context instruction being used to the branch rather than the comparison. I believe both to be correct, but the branch is strictly more powerful. With the moved code, using the branch instruction is required for the basic block comparison test to return the same result. The previous code was able to directly access both the branch and the comparison where the revised code is not.
Differential Revision: http://reviews.llvm.org/D9652
llvm-svn: 239797
`LLVM_ENABLE_MODULES` builds sometimes fail because `Intrinsics.td`
needs to regenerate `Instrinsics.h` before anyone can include anything
from the LLVM_IR module. Represent the dependency explicitly to prevent
that.
llvm-svn: 239796
If a parameter to a function is known non-null, use the existing parameter attributes to record that fact at the call site. This has no optimization benefit by itself - that I know of - but is an enabling change for http://reviews.llvm.org/D9129.
Differential Revision: http://reviews.llvm.org/D9132
llvm-svn: 239795