I think I fixed all instances of this in the codebase
(r258202, 258200, 258190). Also, the suppression didn't
have an effect on bots using make anyways, and it looks
like many bots still use configure/make bots.
llvm-svn: 258210
As vector shuffles can only reference two inputs many (V)INSERTPS patterns end up being split over two targets shuffles.
This patch adds combines to attempt to combine (V)INSERTPS nodes with input/output nodes that are just zeroing out these additional vector elements.
Differential Revision: http://reviews.llvm.org/D16072
llvm-svn: 258205
r100895 landed an llvm-only change to add minix support to googletest.
It did that by putting "defined()" in a macro, which has undefined
behavior. Slightly reshuffle things to remove that undefined behavior.
Also mention in README.LLVM that minix support is a local change.
llvm-svn: 258190
they're needed.
Prior to this patch objects were loaded (via RuntimeDyld::loadObject) when they
were added to the ObjectLinkingLayer, but were not relocated and finalized until
a symbol address was requested. In the interim, another object could be loaded
and finalized with the same memory manager, causing relocation/finalization of
the first object to fail (as the first finalization call may have marked the
allocated memory for the first object read-only).
By deferring the loadObject call (and subsequent memory allocations) until an
object file is needed we can avoid prematurely finalizing memory.
llvm-svn: 258185
In some cases, the max backedge taken count can be more conservative
than the exact backedge taken count (for instance, because
ScalarEvolution::getRange is not control-flow sensitive whereas
computeExitLimitFromICmp can be). In these cases,
computeExitLimitFromCond (specifically the bit that deals with `and` and
`or` instructions) can create an ExitLimit instance with a
`SCEVCouldNotCompute` max backedge count expression, but a computable
exact backedge count expression. This violates an implicit SCEV
assumption: a computable exact BE count should imply a computable max BE
count.
This change
- Makes the above implicit invariant explicit by adding an assert to
ExitLimit's constructor
- Changes `computeExitLimitFromCond` to be more robust around
conservative max backedge counts
llvm-svn: 258184
According the build bots, clang is using the Registry class somewhere as well. Will reapply with appropriate clang changes at a later point.
llvm-svn: 258159
The Registry class constructs a linked list of nodes whose storage is inside static variables and nodes are added via static initializers. The trick is that those static initializers are in both the LLVM code base, and some random plugin that might get loaded in at runtime. The existing code tries to use C++ templates and their ODR rules to get a single definition of the registry for each type, but, experimentally, this doesn't quite work as designed. (Well, the entire structure doesn't. It might not actually be an ODR problem.)
Previously, when I tried moving the GCStrategy class (along with it's registry) from CodeGen to IR, I ran into a problem where asking the GCStrategyRegistry a question would return inconsistent results depending on whether you asked from CodeGen (where the static initializers still were) or Transforms. My best guess is that this is a result of either a) an order of initialization error, or b) we ended up with two copies of the registry being created. I remember at the time having convinced myself it was probably (b), but I don't have any of my notes around from that investigation any more.
See http://reviews.llvm.org/rL226311 for the original patch in question.
This patch tries to remove the possibility of (b) above. (a) was already fixed in change 258109.
Differential Revision: http://reviews.llvm.org/D16170
llvm-svn: 258157
This patch creates the profile data variable before lowering the profile intrinsics.
Reviewers: davidxl, silvas
Differential Revision: http://reviews.llvm.org/D16015
llvm-svn: 258156
Our loop construct is not a way to identify cycles in the CFG. This wasn't immediately obvious from the header, so clarify that fact.
The motivation for this was that I just fixed a out of tree bug due to a mistaken assumption (on my part) on what a Loop actually was. While it was fresh in my mind, I wanted to document the key point.
llvm-svn: 258154
This is a continuation of adding FMF to call instructions:
http://reviews.llvm.org/rL255555
As with D15937, the intent of the patch is to preserve the current behavior of the transform
except that we use the pow call's 'fast' attribute as a trigger rather than a function-level
attribute.
The TODO comment notes a potential follow-on patch that would propagate FMF to the new
instructions.
Differential Revision: http://reviews.llvm.org/D16122
llvm-svn: 258153
Summary:
add_version_info_from_vcs was setting SVN_REVISION to the last fetched
svn revision when using git svn instead of the svn revision
corresponding to HEAD. This leads to conflicts with the definition of
SVN_REVISION in SVNVersion.inc generated by GetSVN.cmake when HEAD is
not the most recently fetched svn revision.
Use 'git svn info' to determine SVN_REVISION when git svn is being used
instead (as is done in GetSVN.cmake).
Reviewers: beanz
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D16299
llvm-svn: 258148
Summary:
GEPOperator: provide getResultElementType alongside getSourceElementType.
This is made possible by adding a result element type field to GetElementPtrConstantExpr, which GetElementPtrInst already has.
GEP: replace get(Pointer)ElementType uses with get{Source,Result}ElementType.
Reviewers: mjacob, dblaikie
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D16275
llvm-svn: 258145
Teach the register stackifier to rematerialize constants that have multiple
uses instead of leaving them in registers. In the WebAssembly encoding, it's
the same code size to materialize most constants as it is to read a value
from a register.
llvm-svn: 258142
The value size was always 1 or 0, so we don't need to store it.
In a no asserts build this takes the testcase of pr26208 from 11 to 10
seconds.
llvm-svn: 258141
According to x86 spec "xlat m8" is a legal instruction and it is equivalent to "xlatb".
Differential Revision: http://reviews.llvm.org/D15150
llvm-svn: 258135
The following are legal according to X86 spec:
ins mem, DX
outs DX, mem
lods mem
stos mem
scas mem
cmps mem, mem
movs mem, mem
Differential Revision: http://reviews.llvm.org/D14827
llvm-svn: 258132
This commit changes the default on our lowering of vectors-of-pointers from splitting in RS4GC to reporting them in the final stack map. All of the changes to do so are already in place and tested. Assuming no problems are unearthed in the next week, we will be deleting the old code entirely next Monday.
llvm-svn: 258111
Combine a bunch of small files into a single, still rather small, file. The primary purpose of this is to get all of the static initializers into a single file so as to have a well defined order of initialization.
llvm-svn: 258109
Summary:
This is a companion patch for http://reviews.llvm.org/D16124.
Internalized symbols increase the size of strongly-connected components in
SCC-based module splitting and thus reduce the amount of parallelism. This
patch records the original linkage of non-local symbols prior to
internalization and then restores it just before splitting/CodeGen. This is
also useful for cases where the linker requires symbols to remain external, for
instance, so they can be placed according to linker script rules.
It's currently under its own flag (-restore-globals) but should eventually
share a common flag with D16124.
Reviewers: joker.eph, pcc
Subscribers: slarin, llvm-commits, joker.eph
Differential Revision: http://reviews.llvm.org/D16229
llvm-svn: 258100
This breaks the tests that were meant for testing
64-bit inline immediates, so move those to shl where
they won't be broken up.
This should be repeated for the other related bit ops.
llvm-svn: 258095
Although glibc defines it, this is currently of no use for my primary
use-case (dumping DT_* keys correctly). Its semantic is not described
anywhere I can find, so better leave it out for now.
Thanks to Rafael for pointing out in his post-commit review!
llvm-svn: 258089
Summary:
Currently llvm::SplitModule as the first step globalizes all local objects, which might not be desirable in some scenarios.
This change adds a new flag to llvm::SplitModule that uses SCC approach to search for a balanced partition without the need to externalize symbols.
Such partition might not be possible or fully balanced for a given number of partitions, and is a function of the module properties (global/local dependencies within the module).
Joint development Tobias Edler von Koch (tobias@codeaurora.org) and Sergei Larin (slarin@codeaurora.org)
Subscribers: llvm-commits, joker.eph
Differential Revision: http://reviews.llvm.org/D16124
llvm-svn: 258083
AVX2 can only broadcast from the zero'th element of a vector, but if the broadcastable element is the zero'th element of a 128-bit subvector its advantageous to extract the subvector, broadcast from that and avoid the loading of shuffle mask data that would be needed for VPERMPS/VPERMD. The only exception being when the source type is 4f64 or 4i64 which can directly use the immediate shuffle VPERMPD/VPERMQ directly.
Differential Revision: http://reviews.llvm.org/D16050
llvm-svn: 258081
Summary:
When SimplifySetCC sees a setcc node that compares the result of a
value extension operation with a constant, it tries to simplify the
setcc node by eliminating the extension and shrinking the constant.
If shrinking the inputs to setcc is deemed not desirable by the target
(e.g. the target does not want a setcc comparing i1 values), then it
is still possible to optimize this sequence in some cases.
This patch adds the following combines to SimplifySetCC when shrinking setcc
inputs is not desirable:
(setcc ([sz]ext (setcc x, y, cc)), 0, setne) -> (setcc (x, y, cc))
(setcc ([sz]ext (setcc x, y, cc)), 0, seteq) -> (setcc (x, Y, !cc))
There are no tests for this yet, but once AMDGPU correctly implements
TargetLowering::isTypeDesirableForOp(), this new combine will be
exercised by the existing CodeGen/AMDGPU/setcc-opt.ll test.
Reviewers: resistor, arsenm
Subscribers: jroelofs, arsenm, llvm-commits
Differential Revision: http://reviews.llvm.org/D15034
llvm-svn: 258067
When the shift immediate is zero, PKHTB is an alias for PKHBT, but the order of
the input operands needs to be swapped.
Differential Revision: http://reviews.llvm.org/D16288
llvm-svn: 258044
The symtab is logically referenced beyond the call to the create
method. This changes makes sure its lifetime matches that of
the reader.
llvm-svn: 258036
Previously these were Darwin-only. Since the switch to direct binary emission
of stubs, trampolines and resolver blocks, these should work on other *nix
platforms too.
These tests can be enabled on Windows once known issues with ORC's handling of
Windows symbol mangling (see e.g. https://llvm.org/PR25940) have been fixed.
llvm-svn: 258031
The feature flag is for VPERMB,VPERMI2B,VPERMT2B and VPMULTISHIFTQB instructions.
More about the instruction can be found in:
hattps://software.intel.com/sites/default/files/managed/07/b7/319433-023.pdf
Differential Revision: http://reviews.llvm.org/D16190
llvm-svn: 258012
Adds the corresponding CodeGenInstruction number to each AsmWriterInst. Then write all the operand uniqueing loops using the AsmWriterInst array and indices. Then use the CodeGenInstruction index to fill out the OpCodeInfo array.
llvm-svn: 258005
MIPS 32-bit ABI uses REL relocation record format to save dynamic
relocations. The patch teaches llvm-readobj to show dynamic relocations
in this format.
Differential Revision: http://reviews.llvm.org/D16114
llvm-svn: 258001
Added support for the extraction of the upper 128-bit subvectors for lower/upper half undef shuffles if it would reduce the number of extractions/insertions or avoid loads of AVX2 permps/permd shuffle masks.
Minor follow up to D15477.
llvm-svn: 258000
%RBP can't be handled explicitly. We generate the following code:
pushq %rbp
movq %rsp, %rbp
...
movq %rbx, (%rbp) ## 8-byte Spill
where %rbp will be overwritten by the spilled value.
The fix is to let PEI handle %RBP.
PR26136
llvm-svn: 257997
Entry block count was not counted and is corrected. Also
introduce a new metric that is MaxInternalBlockCount which
show command shows (as before).
llvm-svn: 257987
FIXME: Add more targets to use emutls into clang/test/Driver/emulated-tls.cpp.
FIXME: Add cygwin tests into llvm/test/CodeGen/X86. Working in progress.
llvm-svn: 257984
Summary:
Later in DWARF emission we check that DebugLocEntries have
non-overlapping pieces, so we should create any such entries
by merging here.
Fixes PR26163.
Reviewers: aprantl
Differential Revision: http://reviews.llvm.org/D16249
llvm-svn: 257979