lazily built.
Also, make it a much more generic SCEV cache, which today exposes only
a reduced GEP model description but could be extended in the future to
do other profitable caching of SCEV information.
llvm-svn: 238124
Stop creating symbols we don't need in `DwarfStringPool`. The consumers
only call `DwarfStringPoolEntryRef::getSymbol()` when DWARF is
relocatable, so this just stops creating the unused symbols when it's
not. This drops memory usage from 851 MB to 845 MB, around 0.7%.
(I'm looking at `llc` memory usage on `verify-uselistorder.lto.opt.bc`;
see r236629 for details.)
llvm-svn: 238122
Mint a new function, `AsmPrinter::emitDwarfStringOffset()`, which takes
a `DwarfStringPoolEntryRef`. When DWARF is relocatable across sections,
this defers to `emitSectionOffset()` and emits the `MCSymbol`;
otherwise, just emit the offset directly, without using any intermediate
symbols.
`EmitLabelDifference()` is already optimized to emit absolute label
differences cheaply when possible, so there aren't any major memory
savings here (853 MB down to 851 MB, or 0.2%). However, it prepares for
making the `MCSymbol`s in the `DwarfStringPool` optional.
(I'm looking at `llc` memory usage on `verify-uselistorder.lto.opt.bc`;
see r236629 for details.)
llvm-svn: 238119
Expose the `DwarfStringPool` entry in a header, and store a pointer to
it directly in `DIEString`. Instead of choosing at creation time how to
emit it, use the `dwarf::Form` to determine that at emission time.
Besides avoiding the other `DIEValue`, this shaves two pointers off of
`DIEString`; the data is now a single pointer. This is a nice cleanup
on its own -- and drops memory usage from 861 MB down to 853 MB, around
0.9% -- but it's also preparation for passing `DIEValue`s by value.
(I'm looking at `llc` memory usage on `verify-uselistorder.lto.opt.bc`;
see r236629 for details.)
llvm-svn: 238117
Extract out `DwarfStringPoolEntry` and `DwarfStringPoolRef` from
`DwarfStringPool` so that downstream users can start using
`DwarfStringPool::getEntry()` directly. This will allow users to delay
the decision between emitting a symbol or an offset until later.
llvm-svn: 238116
Change `DwarfStringPool` to calculate byte offsets on-the-fly, and
update `DwarfUnit::getLocalString()` to use a `DIEInteger` instead of a
`DIEDelta` when Dwarf doesn't use relocations (i.e., Mach-O). This
eliminates another call to `EmitLabelDifference()`, and drops memory
usage from 865 MB down to 861 MB, around 0.5%.
(I'm looking at `llc` memory usage on `verify-uselistorder.lto.opt.bc`;
see r236629 for details.)
llvm-svn: 238114
Move `DwarfStringPool`'s `getEntry()` to the header (and make it a
member function) in preparation for calculating symbol offsets
on-the-fly.
llvm-svn: 238112
Using getCanonicalArchName() is the right way to parse ARM arch names.
Mapping ARMTargetParser IDs to Triple Arch IDs is temporary, until they
are merged into a TargetDescription class.
This was the last LLVM FIXME to move things to ARMTargetParser. Now on
to Clang and beyond.
llvm-svn: 238110
On GPU targets, materializing constants is cheap and stores are
expensive, so only doing this for zero vectors was silly.
Most of the new testcases aren't optimally merged, and are for
later improvements.
llvm-svn: 238108
When the compare feeding a branch was in a different BB from the branch, we'd
try to "regenerate" the compare in the block with the branch, possibly trying
to make use of values not available there. Copy a page from AArch64's play book
here to fix the problem (at least in terms of correctness).
Fixes PR23640.
llvm-svn: 238097
Remove all virtual functions from `DIEValue`, dropping the vtable
pointer from its layout. Instead, create "impl" functions on the
subclasses, and use the `DIEValue::Type` to implement the dynamic
dispatch.
This is necessary -- obviously not sufficient -- for passing `DIEValue`s
around by value. However, this change stands on its own: we make tons
of these. I measured a drop in memory usage from 888 MB down to 860 MB,
or around 3.2%.
(I'm looking at `llc` memory usage on `verify-uselistorder.lto.opt.bc`;
see r236629 for details.)
llvm-svn: 238084
This is part of the work to remove TargetMachine::resetTargetOptions.
In this patch, instead of updating global variable NoFramePointerElim in
resetTargetOptions, its use in DisableFramePointerElim is replaced with a call
to TargetFrameLowering::noFramePointerElim. This function determines on a
per-function basis if frame pointer elimination should be disabled.
There is no change in functionality except that cl:opt option "disable-fp-elim"
can now override function attribute "no-frame-pointer-elim".
llvm-svn: 238080
Normally an ELF .o has two string tables, one for symbols, one for section
names.
With the scheme of naming sections like ".text.foo" where foo is a symbol,
there is a big potential saving in using a single one.
Building llvm+clang+lld with master and with this patch the results were:
master: 193,267,008 bytes
patch: 186,107,952 bytes
master non unique section names: 183,260,192 bytes
patch non unique section names: 183,118,632 bytes
So using non usique saves 10,006,816 bytes, and the patch saves 7,159,056 while
still using distinct names for the sections.
llvm-svn: 238073
This patch extends EarlyCSE to take advantage of the information that a controlling branch gives us about the value of a Value within this and dominated basic blocks. If the current block has a single predecessor with a controlling branch, we can infer what the branch condition must have been to execute this block. The actual change to support this is downright simple because EarlyCSE's existing scoped hash table logic deals with most of the complexity around merging.
The patch actually implements two optimizations.
1) The first is analogous to JumpThreading in that it enables EarlyCSE's CSE handling to fold branches which are exactly redundant due to a previous branch to branches on constants. (It doesn't actually replace the branch or change the CFG.) This is pretty clearly a win since it enables substantial CFG simplification before we start trying to inline.
2) The second is analogous to CVP in that it exploits the knowledge gained to replace dominated *uses* of the original value. EarlyCSE does not otherwise reason about specific uses, so this is the more arguable one. It does enable further simplication and constant folding within the rest of the visit by EarlyCSE.
In both cases, the added code only handles the easy dominance based case of each optimization. The general case is deferred to the existing passes.
Differential Revision: http://reviews.llvm.org/D9763
llvm-svn: 238071
InstCombine transforms A *nsw B +nsw A *nsw C to A *nsw (B + C).
This is incorrect -- e.g. if A = -1, B = 1, C = INT_SMAX. Then
nothing in the LHS overflows, but the multiplication in RHS overflows.
We need to first make sure that we won't multiple by INT_SMAX + 1.
Test case `add_of_mul` contributed by Sanjoy Das.
This fixes PR23635.
Differential Revision: http://reviews.llvm.org/D9629
llvm-svn: 238066
The usual CodeGenPrepare trickery, on a target-specific intrinsic.
Without this, the expansion of atomics will usually have the zext
be hoisted out of the loop, defeating the various patterns we have
to catch this precise case.
Differential Revision: http://reviews.llvm.org/D9930
llvm-svn: 238054
This patch adds a class for processing many recip codegen possibilities.
The TargetRecip class is intended to handle both command-line options to llc as well
as options passed in from a front-end such as clang with the -mrecip option.
The x86 backend is updated to use the new functionality.
Only -mcpu=btver2 with -ffast-math should see a functional change from this patch.
All other CPUs continue to *not* use reciprocal estimates by default with -ffast-math.
Differential Revision: http://reviews.llvm.org/D8982
llvm-svn: 238051
Before, getCanonicalArchName was relying on parseArch() to validate the arch
name, which was a problem when other methods, that also needed to call it,
were duplicating the steps.
But to dissociate getCanonicalArchName from parseArch, we needed to make
getCanonicalArchName more robust in detecting valid arch names. It's still
not perfect, but will do for the time being, until we merge Triple with
TargetParser into a TargetDescription mega class.
llvm-svn: 238047
The 'off' field of 'struct bpf_insn' is in cpu-endianness,
since the rest is emitted as little endian, make sure
that 'off' field is little endian as well.
llvm-svn: 238038
This allows us to match armv6m to default to thumb, but will also be used by
Clang's driver and remove the current incomplete copy in it.
llvm-svn: 238036
The problem was that I slipped a change required for shrink-wrapping, namely I
used getFirstTerminator instead of the getLastNonDebugInstr that was here before
the refactoring, whereas the surrounding code is not yet patched for that.
Original message:
[X86] Refactor the prologue emission to prepare for shrink-wrapping.
- Add a late pass to expand pseudo instructions (tail call and EH returns).
Instead of doing it in the prologue emission.
- Factor some static methods in X86FrameLowering to ease code sharing.
NFC.
Related to <rdar://problem/20821487>
llvm-svn: 238035
accumulating estimated cost, and other loop-centric logic from the logic
used to analyze instructions in a particular iteration.
This makes the visitor very narrow in scope -- all it does is visit
instructions, update a map of simplified values, and return whether it
is able to optimize away a particular instruction.
The two cost metrics are now returned as an optional struct. When the
optional is left unengaged, there is no information about the unrolled
cost of the loop, when it is engaged the cost metrics are available to
run against the thresholds.
No functionality changed.
llvm-svn: 238033
This patch adds support for the ISA 2.07 additions involving the
branch history rolling buffer and event-based branching. These will
not be used by typical applications, so built-in support is not
required. They will only be available via inline assembly.
Assembly/disassembly tests are included in the patch.
llvm-svn: 238032
MachO and COFF quite reasonably only define the size for common symbols.
We used to try to figure out the "size" by computing the gap from one symbol to
the next.
This would not be correct in general, since a part of a section can belong to no
visible symbol (padding, private globals).
It was also really expensive, since we would walk every symbol to find the size
of one.
If a caller really wants this, it can sort all the symbols once and get all the
gaps ("size") in O(n log n) instead of O(n^2).
On MachO this also has the advantage of centralizing all the checks for an
invalid n_sect.
llvm-svn: 238028
The list of subtarget features for the 7em triple contains 't2xtpk',
which actually disables that subtarget feature. Correct that to
'+t2xtpk' and test that the instructions enabled by that feature do
actually work.
Differential Revision: http://reviews.llvm.org/D9936
llvm-svn: 238022
Revert "[X86] Refactor the prologue emission to prepare for shrink-wrapping."
This reverts commit 6b3b93fc8b68a2c806aa992ee4bd3d7f61898d4b.
This reverts commit ab0b15dff8539826283a59c2dd700a18a9680e0f.
llvm-svn: 238011
This change to VirtRegRewriter::addMBBLiveIns adds live-in registers for each
MachineBasicBlock's LiveIns set without isLiveIn checks as they are being added
because doing so is expensive. After all live-in registers are added, the LiveIn
vectors are sorted and uniqued.
llvm-svn: 238008
Shave a pointer off of `MCSymbolName` by storing `StringMapEntry<bool>*`
instead of `StringRef`. This brings `sizeof(MCSymbol)` down to 64 on
64-bit platforms, a nice round number. My profile showed memory
dropping from 914 MB down to 908 MB, roughly 0.7%. Other than memory
usage, no functionality change here.
(I'm looking at `llc` memory usage on `verify-uselistorder.lto.opt.bc`;
see r236629 for details.)
llvm-svn: 238005
Previously `SDDbgValue`s used the general allocator that lives for all
of `SelectionDAG`. Instead, give them their own allocator, and reset it
whenever `SDDbgInfo::clear()` is called, plugging a spiritual leak.
This drops `SelectionDAGBuilder::visitIntrinsicCall()` off of my heap
profile (was at around 2% of `llc` for codegen of `-flto -g`). Thanks
to Pete Cooper for spotting the problem and suggesting the fix.
llvm-svn: 237998
Cleanup how `SDDbgValue` is initialized, and rearrange the fields to
save two pointers in the struct layout. No real functionality change
though (and I doubt the memory savings would show up in a profile).
llvm-svn: 237997
This change does a few things:
- Move some InstCombine transforms to InstSimplify
- Run SimplifyCall from within InstCombine::visitCallInst
- Teach InstSimplify to fold [us]mul_with_overflow(X, undef) to 0.
llvm-svn: 237995
simplified model for use simulating each iteration into a separate
helper function that just returns the cache.
Building this cache had nothing to do with the rest of the unroll
analysis and so this removes an unnecessary coupling, etc. It should
also make it easier to think about the concept of providing fast cached
access to basic SCEV models as an orthogonal concept to the overall
unroll simulation.
I'd really like to see this kind of caching logic folded into SCEV
itself, it seems weird for us to provide it at this layer rather than
making repeated queries into SCEV fast all on their own.
No functionality changed.
llvm-svn: 237993
a single location.
This reduces code duplication a bit and will also pave the way for
a better separation between the visitation algorithm and the unroll
analysis.
No functionality changed.
llvm-svn: 237990
PR23608 pointed out that using the preheader to gain a context instruction isn't always legal because a loop might not have a preheader. When looking into that, I realized that using the preheader to determine legality for sinking is questionable at best. Given no test covers that case and the original commit didn't seem to intend it, I restructured the code to only ask context sensative queries for hoising of loads and stores. This is effectively a partial revert of 237593.
llvm-svn: 237985
- Add a late pass to expand pseudo instructions (tail call and EH returns).
Instead of doing it in the prologue emission.
- Factor some static methods in X86FrameLowering to ease code sharing.
NFC.
Related to <rdar://problem/20821487>
llvm-svn: 237977
Unfortunately, I can't reduce a small test case for this (although compiling
mpfr-3.1.2 with -O2 -mcpu=a2 would fairly reliably trigger a crash), but the
problem is fairly clear (at least once you know you're looking for one). If the
TLS instruction being replaced was at the end of the block, we'd increment the
iterator past it (so it would then point to MBB.end()), and then we'd increment
it again as part of the for statement, thus overrunning the end of the list.
Don't do that.
llvm-svn: 237974
Summary:
x = &a[i];
y = &a[i + j];
=>
y = x + j;
along with some refactoring work such as extracting method
findClosestMatchingDominator.
Depends on D9786 which provides the ScalarEvolution::getGEPExpr interface.
Test Plan: nary-gep.ll
Reviewers: meheff, broune
Reviewed By: broune
Subscribers: jholewinski, llvm-commits
Differential Revision: http://reviews.llvm.org/D9802
llvm-svn: 237971
Prior to this patch, we could update the operand of another MI in the same
bundle.
Longer version:
Before InlineSpiller rematerializes a vreg, it iterates over operands of each MI
in a bundle, collecting all (MI, OpNo) pairs that reference that vreg.
Then if it does rematerialize, it goes through the pair list and replaces the
operands with the new (rematerialized) vreg. The problem is, it tries to
replace all of these operands in the main MI ! This works fine for single MIs.
However, if we are processing a bundle of MIs and the list contains multiple
pairs - the rematerialization will either crash trying to access a non-existing
operand of the main MI, or silently corrupt one of the existing ones. It will
also ignore other MIs in the bundle.
The obvious fix is to use the MI pointers saved in collected (MI, OpNo) pairs.
This must have been the original intent of the pair list but somehow these
pointers got lost.
Patch by Dmitri Shtilman <dshtilman@icloud.com>!
Differential revision: http://reviews.llvm.org/D9904
<rdar://problem/21002163>
llvm-svn: 237964
Summary:
This supersedes http://reviews.llvm.org/D4010, hopefully properly
dealing with the JIT case and also adds an actual test case.
DwarfContext was basically already usable for the JIT (and back when
we were overwriting ELF files it actually worked out of the box by
accident), but in order to resolve relocations correctly it needs
to know the load address of the section.
Rather than trying to get this out of the ObjectFile or requiring
the user to create a new ObjectFile just to get some debug info,
this adds the capability to pass in that info directly.
As part of this I separated out part of the LoadedObjectInfo struct
from RuntimeDyld, since it is now required at a higher layer.
Reviewers: lhames, echristo
Reviewed By: echristo
Subscribers: vtjnash, friss, rafael, llvm-commits
Differential Revision: http://reviews.llvm.org/D6961
llvm-svn: 237961
The raw non-instruction/constant form of this is still relying on being
able to access the pointee type from a pointer type - those will be
cleaned up later. For now, just focus on the cases where the pointee
type is easily accessible.
llvm-svn: 237958
This commit is a 2nd attempt at committing the initial MIR serialization patch.
The first commit (r237708) made the incremental buildbots unstable and was
reverted in r237730. The original commit didn't add a terminating null
character to the LLVM IR source which was passed to LLParser, and this
sometimes caused the test 'llvmIR.mir' to fail with a parsing error because
the LLVM IR source didn't have a null character immediately after the end
and thus LLLexer encountered some garbage characters that ultimately caused
the error.
This commit also includes the other test fixes I committed in
r237712 (llc path fix) and r237723 (remove target triple) which
also got reverted in r237730.
--Original Commit Message--
MIR Serialization: print and parse LLVM IR using MIR format.
This commit is the initial commit for the MIR serialization project.
It creates a new library under CodeGen called 'MIR'. This new
library adds a new machine function pass that prints out the LLVM IR
using the MIR format. This pass is then added as a last pass when a
'stop-after' option is used in llc. The new library adds the initial
functionality for parsing of MIR files as well. This commit also
extends the llc tool so that it can recognize and parse MIR input files.
Reviewers: Duncan P. N. Exon Smith, Matthias Braun, Philip Reames
Differential Revision: http://reviews.llvm.org/D9616
llvm-svn: 237954
My recent patch to add support for ISA 2.07 vector pack/unpack
instructions didn't properly check for availability of the vpkudum
instruction when recognizing it as a special vector shuffle case.
This causes us to leave the vector shuffle in place (rather than
converting it to a vector permute) so that it can be recognized later
as a vpkudum, but that pattern is invalid for processors prior to
POWER8. Thus LLVM crashes with an "unable to select" message. We
observed this since one of our buildbots is configured to generate
code for a POWER7.
This patch fixes the problem by checking for availability of the
vpkudum instruction during custom lowering of vector shuffles.
I've added a test case variant for the vpkudum pattern when the
instruction isn't available.
llvm-svn: 237952
so DWARF skeleton CUs can be expression in IR. A skeleton CU is a
(typically empty) DW_TAG_compile_unit that has a DW_AT_(GNU)_dwo_name and
a DW_AT_(GNU)_dwo_id attribute. It is used to refer to external debug info.
This is a prerequisite for clang module debugging as discussed in
http://lists.cs.uiuc.edu/pipermail/cfe-dev/2014-November/040076.html.
In order to refer to external types stored in split DWARF (dwo) objects,
such as clang modules, we need to emit skeleton CUs, which identify the
dwarf object (i.e., the clang module) by filename (the SplitDebugFilename)
and a hash value, the dwo_id.
This patch only contains the IR changes. The idea is that a CUs with a
non-zero dwo_id field will be emitted together with a DW_AT_GNU_dwo_name
and DW_AT_GNU_dwo_id attribute.
http://reviews.llvm.org/D9488
rdar://problem/20091852
llvm-svn: 237949
On X86 (and similar OOO cores) unrolling is very limited, and even if the
runtime unrolling is otherwise profitable, the expense of a division to compute
the trip count could greatly outweigh the benefits. On the A2, we unroll a lot,
and the benefits of unrolling are more significant (seeing a 5x or 6x speedup
is not uncommon), so we're more able to tolerate the expense, on average, of a
division to compute the trip count.
llvm-svn: 237947
The commit null terminates the string value in the `yaml::BlockScalarNode`
class.
This change is motivated by the initial MIR serialization commit (r237708)
that I reverted in r237730 because the LLVM IR source from the block
scalar node wasn't terminated by a null character and thus the buildbots
failed on one testcase sometimes. This change enables me to recommit
the reverted commit.
llvm-svn: 237942
The existing code for method StreamingMemoryObject.fetchToPos does not respect
the corresonding call to setKnownObjectSize(). As a result, it allows the
StreamingMemoryObject to read bytes past the object size.
This patch provides a test case, and code to fix the problem.
Patch by Karl Schimpf
Differential Revision: http://reviews.llvm.org/D8931
llvm-svn: 237939
http://reviews.llvm.org/D9891
Following up on the VSX single precision loads and stores added earlier, this
adds support for elementary arithmetic operations on single precision values
in VSX registers. These instructions utilize the new VSSRC register class.
Instructions added:
xsaddsp
xsdivsp
xsmulsp
xsresp
xsrsqrtesp
xssqrtsp
xssubsp
llvm-svn: 237937
This starts merging MCSection and MCSectionData.
There are a few issues with the current split between MCSection and
MCSectionData.
* It optimizes the the not as important case. We want the production
of .o files to be really fast, but the split puts the information used
for .o emission in a separate data structure.
* The ELF/COFF/MachO hierarchy is not represented in MCSectionData,
leading to some ad-hoc ways to represent the various flags.
* It makes it harder to remember where each item is.
The attached patch starts merging the two by moving the alignment from
MCSectionData to MCSection.
Most of the patch is actually just dropping 'const', since
MCSectionData is mutable, but MCSection was not.
llvm-svn: 237936