During LSR of one loop we can run into a situation where we have to expand the
start of a recurrence of a loop induction variable in this loop. This start
value is a value derived of the induction variable of a preceeding loop. SCEV
has cannonicalized this value to a different recurrence than the recurrence of
the preceeding loop's induction variable (the type and/or step direction) has
changed). When we come to instantiate this SCEV we created a second induction
variable in this preceeding loop. This patch tries to base such derived
induction variables of the preceeding loop's induction variable.
This helps twolf on arm and seems to help scimark2 on x86.
Reapply with a fix for the case of a value derived from a pointer.
radar://15970709
llvm-svn: 201496
During LSR of one loop we can run into a situation where we have to expand the
start of a recurrence of a loop induction variable in this loop. This start
value is a value derived of the induction variable of a preceeding loop. SCEV
has cannonicalized this value to a different recurrence than the recurrence of
the preceeding loop's induction variable (the type and/or step direction) has
changed). When we come to instantiate this SCEV we created a second induction
variable in this preceeding loop. This patch tries to base such derived
induction variables of the preceeding loop's induction variable.
This helps twolf on arm and seems to help scimark2 on x86.
radar://15970709
llvm-svn: 201465
This should be a small build time improvement in general and fixes
the build on OS X with -DBUILD_SHARED_LIBS=ON.
The issue is that not all users are including GenericDomTreeConstruction.h,
causing undefined references when ld64 managed to hide the
linkonce_odr symbols.
llvm-svn: 201440
Summary:
AsmPrinter::EmitInlineAsm() will no longer use the EmitRawText() call for
targets with mature MC support. Such targets will always parse the inline
assembly (even when emitting assembly). Targets without mature MC support
continue to use EmitRawText() for assembly output.
The hasRawTextSupport() check in AsmPrinter::EmitInlineAsm() has been replaced
with MCAsmInfo::UseIntegratedAs which when true, causes the integrated assembler
to parse inline assembly (even when emitting assembly output). UseIntegratedAs
is set to true for targets that consider any failure to parse valid assembly
to be a bug. Target specific subclasses generally enable the integrated
assembler in their constructor. The default value can be overridden with
-no-integrated-as.
All tests that rely on inline assembly supporting invalid assembly (for example,
those that use mnemonics such as 'foo' or 'hello world') have been updated to
disable the integrated assembler.
Changes since review (and last commit attempt):
- Fixed test failures that were missed due to configuration of local build.
(fixes crash.ll and a couple others).
- Fixed tests that happened to pass because the local build was on X86
(should fix 2007-12-17-InvokeAsm.ll)
- mature-mc-support.ll's should no longer require all targets to be compiled.
(should fix ARM and PPC buildbots)
- Object output (-filetype=obj and similar) now forces the integrated assembler
to be enabled regardless of default setting or -no-integrated-as.
(should fix SystemZ buildbots)
Reviewers: rafael
Reviewed By: rafael
CC: llvm-commits
Differential Revision: http://llvm-reviews.chandlerc.com/D2686
llvm-svn: 201333
'OK_NonUniformConstValue' to identify operands which are constants but
not constant splats.
The cost model now allows returning 'OK_NonUniformConstValue'
for non splat operands that are instances of ConstantVector or
ConstantDataVector.
With this change, targets are now able to compute different costs
for instructions with non-uniform constant operands.
For example, On X86 the cost of a vector shift may vary depending on whether
the second operand is a uniform or non-uniform constant.
This patch applies the following changes:
- The cost model computation now takes into account non-uniform constants;
- The cost of vector shift instructions has been improved in
X86TargetTransformInfo analysis pass;
- BBVectorize, SLPVectorizer and LoopVectorize now know how to distinguish
between non-uniform and uniform constant operands.
Added a new test to verify that the output of opt
'-cost-model -analyze' is valid in the following configurations: SSE2,
SSE4.1, AVX, AVX2.
llvm-svn: 201272
The ID type for the stackmap and patchpoint intrinsics are in both cases i64.
This fixes an zero extend in the SelectionDAGBuilder that still used i32. This
also updates the target independent instructions STACKMAP and PATCHPOINT to use
the correct type.
llvm-svn: 201262
required for all sections in a module. This can be useful when targets or
code-models place strict requirements on how sections must be laid out
in memory.
If RTDyldMemoryManger::needsToReserveAllocationSpace() is overridden to return
true then the JIT will call the following method on the memory manager, which
can be used to preallocate the necessary memory.
void RTDyldMemoryManager::reserveAllocationSpace(uintptr_t CodeSize,
uintptr_t DataSizeRO,
uintptr_t DataSizeRW)
Patch by Vaidas Gasiunas. Thanks very much Viadas!
llvm-svn: 201259
Summary:
AsmPrinter::EmitInlineAsm() will no longer use the EmitRawText() call for targets with mature MC support. Such targets will always parse the inline assembly (even when emitting assembly). Targets without mature MC support continue to use EmitRawText() for assembly output.
The hasRawTextSupport() check in AsmPrinter::EmitInlineAsm() has been replaced with MCAsmInfo::UseIntegratedAs which when true, causes the integrated assembler to parse inline assembly (even when emitting assembly output). UseIntegratedAs is set to true for targets that consider any failure to parse valid assembly to be a bug. Target specific subclasses generally enable the integrated assembler in their constructor. The default value can be overridden with -no-integrated-as.
All tests that rely on inline assembly supporting invalid assembly (for example, those that use mnemonics such as 'foo' or 'hello world') have been updated to disable the integrated assembler.
Reviewers: rafael
Reviewed By: rafael
CC: llvm-commits
Differential Revision: http://llvm-reviews.chandlerc.com/D2686
llvm-svn: 201237
This function adds an extra path argument to lto_module_create_from_memory.
The path argument will be passed to makeBuffer to make sure the MemoryBuffer
has a name and the created module has a module identifier.
This is mainly for emitting warning messages from the linker. When we emit
warning message on a module, we can use the module identifier.
rdar://15985737
llvm-svn: 201114
A const ObjectFile needs to be able to provide its name. For an IRObjectFile,
that means being able to call the mangler. Since each IRObjectFile can have
a different mangling, it is natural for them to contain a Mangler which is
therefore also const.
llvm-svn: 201113
Similarly to the vshrn instructions, these are simple zext/sext + trunc
operations. Using normal LLVM IR should allow for better code, and more sharing
with the AArch64 backend.
llvm-svn: 201093
vshrn is just the combination of a right shift and a truncate (and the limits
on the immediate value actually mean the signedness of the shift doesn't
matter). Using that representation allows us to get rid of an ARM-specific
intrinsic, share more code with AArch64 and hopefully get better code out of
the mid-end optimisers.
llvm-svn: 201085
Some of the more complex directive and macro handling for GAS compatibility
requires lookahead. Add a single token lookahead in the MCAsmLexer.
llvm-svn: 201058
These methods normally call each other and it is really annoying if the
arguments are in different order. The more common rule was that the arguments
specific to call are first (GV, Encoding, Suffix) and the auxiliary objects
(Mang, TM) come after. This patch changes the exceptions.
llvm-svn: 201044
According to the AAPCS, when a CPRC is allocated to the stack, all other
VFP registers should be marked as unavailable.
I have also modified the rules for allocating non-CPRCs to the stack, to make
it more explicit that all GPRs must be made unavailable. I cannot think of a
case where the old version would produce incorrect answers, so there is no test
for this.
llvm-svn: 200970
Generalize the AArch64 .td nodes for AssertZext and AssertSext. Use
them to match the relevant pextr store instructions.
The test widen_load-2.ll requires a slight change because with the
stores gone, the remaining instructions are scheduled in a different
order.
Add test cases for SSE4 and AVX variants.
Resolves rdar://13414672.
Patch by Adam Nemet <anemet@apple.com>.
llvm-svn: 200957
In a previous commit (r199818) we added a const_cast to an existing
subtarget info instead of creating a new one so that we could reuse
it when creating the TargetAsmParser for parsing inline assembly.
This cast was necessary because we needed to reuse the existing STI
to avoid generating incorrect code when the inline asm contained
mode-switching directives (e.g. .code 16).
The root cause of the failure was that there was an implicit sharing
of the STI between the parser and the MCCodeEmitter. To fix a
different but related issue, we now explicitly pass the STI to the
MCCodeEmitter (see commits r200345-r200351).
The const_cast is no longer necessary and we can now create a fresh
STI for the inline asm parser to use.
Differential Revision: http://llvm-reviews.chandlerc.com/D2709
llvm-svn: 200929
build but spectacularly changed behavior of the C++98 build. =]
This shows my one problem with not having unittests -- basic API
expectations aren't well exercised by the integration tests because they
*happen* to not come up, even though they might later. I'll probably add
a basic unittest to complement the integration testing later, but
I wanted to revive the bots.
llvm-svn: 200905
The primary motivation for this pass is to separate the call graph
analysis used by the new pass manager's CGSCC pass management from the
existing call graph analysis pass. That analysis pass is (somewhat
unfortunately) over-constrained by the existing CallGraphSCCPassManager
requirements. Those requirements make it *really* hard to cleanly layer
the needed functionality for the new pass manager on top of the existing
analysis.
However, there are also a bunch of things that the pass manager would
specifically benefit from doing differently from the existing call graph
analysis, and this new implementation tries to address several of them:
- Be lazy about scanning function definitions. The existing pass eagerly
scans the entire module to build the initial graph. This new pass is
significantly more lazy, and I plan to push this even further to
maximize locality during CGSCC walks.
- Don't use a single synthetic node to partition functions with an
indirect call from functions whose address is taken. This node creates
a huge choke-point which would preclude good parallelization across
the fanout of the SCC graph when we got to the point of looking at
such changes to LLVM.
- Use a memory dense and lightweight representation of the call graph
rather than value handles and tracking call instructions. This will
require explicit update calls instead of some updates working
transparently, but should end up being significantly more efficient.
The explicit update calls ended up being needed in many cases for the
existing call graph so we don't really lose anything.
- Doesn't explicitly model SCCs and thus doesn't provide an "identity"
for an SCC which is stable across updates. This is essential for the
new pass manager to work correctly.
- Only form the graph necessary for traversing all of the functions in
an SCC friendly order. This is a much simpler graph structure and
should be more memory dense. It does limit the ways in which it is
appropriate to use this analysis. I wish I had a better name than
"call graph". I've commented extensively this aspect.
This is still very much a WIP, in fact it is really just the initial
bits. But it is about the fourth version of the initial bits that I've
implemented with each of the others running into really frustrating
problms. This looks like it will actually work and I'd like to split the
actual complexity across commits for the sake of my reviewers. =] The
rest of the implementation along with lots of wiring will follow
somewhat more rapidly now that there is a good path forward.
Naturally, this doesn't impact any of the existing optimizer. This code
is specific to the new pass manager.
A bunch of thanks are deserved for the various folks that have helped
with the design of this, especially Nick Lewycky who actually sat with
me to go through the fundamentals of the final version here.
llvm-svn: 200903
Ideally only those transform passes that run at -O0 remain enabled,
in reality we get as close as we reasonably can.
Passes are responsible for disabling themselves, it's not the job of
the pass manager to do it for them.
llvm-svn: 200892
I think this was just over-eagerness on my part. The analysis results
need to often be non-const because they need to (in some cases at least)
be updated by the transformation pass in order to remain correct. It
also makes lazy analyses (a common case) needlessly annoying to write in
order to make their entire state mutable.
llvm-svn: 200881
Summary:
The check performed in the comparator is invalid, as some STL
implementations enforce strict weak ordering by calling the comparator with the
same value. This check was also in a wrong place: the assertion would only fire
when -help was used. The new check is performed each time the category is
registered (we are not going to have thousands of them, so it's fine to do it in
O(N^2)).
Reviewers: jordan_rose
Reviewed By: jordan_rose
CC: cfe-commits, alexmc
Differential Revision: http://llvm-reviews.chandlerc.com/D2699
llvm-svn: 200853