v2: TargetLoweringBase:: -> TargetLowering::
Use Ops array
v3: Explicitly use value 0 for ?DIV
Remove redundant newline
Differential revision: http://reviews.llvm.org/D7803
reviewer: ab
llvm-svn: 238336
Counter symbols created for linkonce functions are not discarded by ELF
linkers unless the symbols are placed in the same comdat section as its
associated function.
llvm-svn: 238335
manager arguments...
I have no idea why, but compilers seem to hate this and its late, so I'm
not going to debug it. :: sigh :: This is why we can't have nice things.
llvm-svn: 238306
when invoking run methods.
This technique was suggested by Dinesh Dwivedi who also wrote the
original patch. During the code review, they explained to me that this
isn't a fully general technique as we need to know the signatures of the
method candidates. Since this is really a narrower utility, I switched
the names and structure to be more clearly a specialized run method
invoke helper and commented it accordingly. I still think this is
a pretty big win.
Very sorry to Dinesh for the extreme delay in landing this patch. I've
been far to busy poking at other things.
Original review: http://reviews.llvm.org/D3543
llvm-svn: 238305
This broke the llvm-mips-linux builder and several of our out-of-tree builders.
Initial investigations show that the commit probably isn't the problem but
reverting anyway while I investigate.
llvm-svn: 238302
With this patch the x86 backend is now shrink-wrapping capable
and this functionality can be tested by using the
-enable-shrink-wrap switch.
The next step is to make more test and enable shrink-wrapping by
default for x86.
Related to <rdar://problem/20821487>
llvm-svn: 238293
- Clean documentation comment
- Change the API to accept an iterator so you can actually pass
MachineBasicBlock::end() now.
- Add more "const".
llvm-svn: 238288
model the dense vector instruction bonuses.
Previously, this code really didn't effectively compute the density of
inlined vector instructions and apply the intended inliner bonus. It
would try to compute it repeatedly while analyzing the function and
didn't handle the case where future vector instructions would tip the
scales back towards the bonus.
Instead, speculatively apply all possible bonuses to the threshold
initially. Once we *know* that a certain bonus can not be applied,
subtract it. This should delay early bailout enough to get much more
consistent results without actually causing us to analyze huge swaths of
code. I expect some (hopefully mild) compile time hit here, and some
swings in performance, but this was definitely the intended behavior of
these bonuses.
This also dramatically simplifies the computation of the bonuses to not
interact with each other in confusing ways. The previous code didn't do
a good job of this and the values for bonuses may be surprising but are
at least now clearly written in the code.
Finally, fix code to be in line with comments and use zero as the
bailout condition.
Patch by Easwaran Raman, with some comment tweaks by me to try and
further clarify what is going on with this code.
http://reviews.llvm.org/D8267
llvm-svn: 238276
Long ago, the poll insertion code assumed that the insertion site was a terminator. As a result, the entry selection code would split a basic block to ensure it could pass a terminator. The insertion code was updated quite a while ago - possibly before it ever landed upstream - but the now redundant work was never removed.
While I'm at it, remove a comment which doesn't apply to the upstreamed code.
NFC intended.
llvm-svn: 238254
While working on another change, I noticed that the naming in this function was mildly deceptive. While fixing that, I took the oppurtunity to modernize some of the code. NFC intended.
llvm-svn: 238252
remove ExecutionEngine's dependence on CodeGen. NFC.
This is a follow-up to r238080.
Differential Revision: http://reviews.llvm.org/D9830
llvm-svn: 238244
This gets gas and llc -filetype=obj to agree on the order of prefixes.
For llvm-mc we need to fix the asm parser to know that it makes a difference
on which line the "lock" is in.
Part of pr23594.
llvm-svn: 238232
v2: Use C++ comments and end with periods
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Matt Arsenault <Matthew.Arsenault@amd.com>
llvm-svn: 238228
This is to fix problems introduced by r232481. For HSAIL,
this function does essentially nothing desirable, and
injects unwanted / incorrect stuff before the function.
The only thing it really needs to do is call EmitFunctionEntryLabel
in this case.
llvm-svn: 238222
If there is an InstAlias defined for an instruction that had a custom
converter (AsmMatchConverter), then when the alias is matched,
the custom converter will be used rather than the converter generated
by the InstAlias.
This patch adds the UseInstAsmMatchConverter field to the InstAlias
class, which allows you to override this behavior and force the
converter generated by the InstAlias to be used.
This is required for some future improvemnts to the R600 assembler.
Differential Revision: http://reviews.llvm.org/D9083
llvm-svn: 238210
Previously, subtarget features were a bitfield with the underlying type being uint64_t.
Since several targets (X86 and ARM, in particular) have hit or were very close to hitting this bound, switching the features to use a bitset.
No functional change.
The first several times this was committed (e.g. r229831, r233055), it caused several buildbot failures.
Apparently the reason for most failures was both clang and gcc's inability to deal with large numbers (> 10K) of bitset constructor calls in tablegen-generated initializers of instruction info tables.
This should now be fixed.
llvm-svn: 238192
Summary:
Following on from r209907 which made personality encodings indirect, do the
same for TType encodings. This fixes the case where a try/catch block needs
to generate references to, for example, std::exception in the
.gcc_except_table.
This commit uses DW_EH_PE_sdata8 for N64 as far as is possible at the moment.
However, it is possible to end up with DW_EH_PE_sdata4 when a TargetMachine is
not available. There's no risk of issues with inconsistency here since the
tables are self describing but it does mean there is a small chance of the
PC-relative offset being out of range for particularly large programs.
Reviewers: petarj
Reviewed By: petarj
Subscribers: srhines, joerg, tberghammer, llvm-commits
Differential Revision: http://reviews.llvm.org/D9669
llvm-svn: 238190
The old code had a bug if the description was between 75 and 85 characters or so as it substracted PSLen from Desc.size() instead of MAX_LINE_LEN in the compare. It also calculated odd values for PosE on the last split and just let StringRef::slice take care of it being larger than the description string.
llvm-svn: 238187
I think the fact that it was explicitly excluding 0 kept this from being a tautology. The exclusion of 0 for the old math was also a bug that's easily hit if the description gets split into multiple lines.
llvm-svn: 238186
Summary:
In case of functions that have a pointer argument and only pass it to
each other, the function attributes pass deduces that the pointer should
get the readnone attribute, but fails to remove a readonly attribute
that may already have been present.
Reviewers: nlewycky
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D9995
llvm-svn: 238152
Part of D9474, this patch extends AVX2 v16i16 types to 2 x 8i32 vectors and uses i32 shift variable shifts before packing back to i16.
Adds AVX2 tests for v8i16 and v16i16
llvm-svn: 238149
This lets us drop a parameter the opName parameter to the VINTRP
multiclass and makes it possible to create multiple VINTRP defs
with the same asm mnemonic.
llvm-svn: 238146
in POWER8:
vadduqm
vaddeuqm
vaddcuq
vaddecuq
vsubuqm
vsubeuqm
vsubcuq
vsubecuq
In addition to adding the instructions themselves, it also adds support for the
v1i128 type for intrinsics (Intrinsics.td, Function.cpp, and
IntrinsicEmitter.cpp).
http://reviews.llvm.org/D9081
llvm-svn: 238144
Use clang-tidy to simplify boolean conditional return statements. Patch by
Richard Thomson <legalize@xmission.com>!
Differential Revision: http://reviews.llvm.org/D9972
llvm-svn: 238132
The semantics of the scalar FMA intrinsics are that the high vector elements are copied from the first source.
The existing pattern switches src1 and src2 around, to match the "213" order, which ends up tying the original src2 to the dest. Since the actual scalar fma3 instructions copy the high elements from the dest register, the wrong values are copied.
This modifies the pattern to leave src1 and src2 in their original order.
Differential Revision: http://reviews.llvm.org/D9908
llvm-svn: 238131
lazily built.
Also, make it a much more generic SCEV cache, which today exposes only
a reduced GEP model description but could be extended in the future to
do other profitable caching of SCEV information.
llvm-svn: 238124
Stop creating symbols we don't need in `DwarfStringPool`. The consumers
only call `DwarfStringPoolEntryRef::getSymbol()` when DWARF is
relocatable, so this just stops creating the unused symbols when it's
not. This drops memory usage from 851 MB to 845 MB, around 0.7%.
(I'm looking at `llc` memory usage on `verify-uselistorder.lto.opt.bc`;
see r236629 for details.)
llvm-svn: 238122
Mint a new function, `AsmPrinter::emitDwarfStringOffset()`, which takes
a `DwarfStringPoolEntryRef`. When DWARF is relocatable across sections,
this defers to `emitSectionOffset()` and emits the `MCSymbol`;
otherwise, just emit the offset directly, without using any intermediate
symbols.
`EmitLabelDifference()` is already optimized to emit absolute label
differences cheaply when possible, so there aren't any major memory
savings here (853 MB down to 851 MB, or 0.2%). However, it prepares for
making the `MCSymbol`s in the `DwarfStringPool` optional.
(I'm looking at `llc` memory usage on `verify-uselistorder.lto.opt.bc`;
see r236629 for details.)
llvm-svn: 238119
Expose the `DwarfStringPool` entry in a header, and store a pointer to
it directly in `DIEString`. Instead of choosing at creation time how to
emit it, use the `dwarf::Form` to determine that at emission time.
Besides avoiding the other `DIEValue`, this shaves two pointers off of
`DIEString`; the data is now a single pointer. This is a nice cleanup
on its own -- and drops memory usage from 861 MB down to 853 MB, around
0.9% -- but it's also preparation for passing `DIEValue`s by value.
(I'm looking at `llc` memory usage on `verify-uselistorder.lto.opt.bc`;
see r236629 for details.)
llvm-svn: 238117
Extract out `DwarfStringPoolEntry` and `DwarfStringPoolRef` from
`DwarfStringPool` so that downstream users can start using
`DwarfStringPool::getEntry()` directly. This will allow users to delay
the decision between emitting a symbol or an offset until later.
llvm-svn: 238116
Change `DwarfStringPool` to calculate byte offsets on-the-fly, and
update `DwarfUnit::getLocalString()` to use a `DIEInteger` instead of a
`DIEDelta` when Dwarf doesn't use relocations (i.e., Mach-O). This
eliminates another call to `EmitLabelDifference()`, and drops memory
usage from 865 MB down to 861 MB, around 0.5%.
(I'm looking at `llc` memory usage on `verify-uselistorder.lto.opt.bc`;
see r236629 for details.)
llvm-svn: 238114
This test was relying on the numbering of preceding .set directives, but
an upcoming commit is going to remove some of them. Make the CHECKs
more nuanced.
llvm-svn: 238113
Move `DwarfStringPool`'s `getEntry()` to the header (and make it a
member function) in preparation for calculating symbol offsets
on-the-fly.
llvm-svn: 238112
Android's API-9 SDK is missing log2 builtins. A previous commit added
support for building against this API revision but this requires log2l
to be present. (And it doesn't seem to be defined, despite being in
the headers.)
Author: pasaulais (Pierre-Andre Saulais)
Differential Revision: http://reviews.llvm.org/D9884
llvm-svn: 238111
Using getCanonicalArchName() is the right way to parse ARM arch names.
Mapping ARMTargetParser IDs to Triple Arch IDs is temporary, until they
are merged into a TargetDescription class.
This was the last LLVM FIXME to move things to ARMTargetParser. Now on
to Clang and beyond.
llvm-svn: 238110
On GPU targets, materializing constants is cheap and stores are
expensive, so only doing this for zero vectors was silly.
Most of the new testcases aren't optimally merged, and are for
later improvements.
llvm-svn: 238108
In preparation for adding support for decoding DF_FLAGS_1 to
llvm-readobj.
Differential Revision: http://reviews.llvm.org/D9955
Reviewed by: echristo
llvm-svn: 238107
When the compare feeding a branch was in a different BB from the branch, we'd
try to "regenerate" the compare in the block with the branch, possibly trying
to make use of values not available there. Copy a page from AArch64's play book
here to fix the problem (at least in terms of correctness).
Fixes PR23640.
llvm-svn: 238097
Remove all virtual functions from `DIEValue`, dropping the vtable
pointer from its layout. Instead, create "impl" functions on the
subclasses, and use the `DIEValue::Type` to implement the dynamic
dispatch.
This is necessary -- obviously not sufficient -- for passing `DIEValue`s
around by value. However, this change stands on its own: we make tons
of these. I measured a drop in memory usage from 888 MB down to 860 MB,
or around 3.2%.
(I'm looking at `llc` memory usage on `verify-uselistorder.lto.opt.bc`;
see r236629 for details.)
llvm-svn: 238084
This is part of the work to remove TargetMachine::resetTargetOptions.
In this patch, instead of updating global variable NoFramePointerElim in
resetTargetOptions, its use in DisableFramePointerElim is replaced with a call
to TargetFrameLowering::noFramePointerElim. This function determines on a
per-function basis if frame pointer elimination should be disabled.
There is no change in functionality except that cl:opt option "disable-fp-elim"
can now override function attribute "no-frame-pointer-elim".
llvm-svn: 238080
Normally an ELF .o has two string tables, one for symbols, one for section
names.
With the scheme of naming sections like ".text.foo" where foo is a symbol,
there is a big potential saving in using a single one.
Building llvm+clang+lld with master and with this patch the results were:
master: 193,267,008 bytes
patch: 186,107,952 bytes
master non unique section names: 183,260,192 bytes
patch non unique section names: 183,118,632 bytes
So using non usique saves 10,006,816 bytes, and the patch saves 7,159,056 while
still using distinct names for the sections.
llvm-svn: 238073
This patch extends EarlyCSE to take advantage of the information that a controlling branch gives us about the value of a Value within this and dominated basic blocks. If the current block has a single predecessor with a controlling branch, we can infer what the branch condition must have been to execute this block. The actual change to support this is downright simple because EarlyCSE's existing scoped hash table logic deals with most of the complexity around merging.
The patch actually implements two optimizations.
1) The first is analogous to JumpThreading in that it enables EarlyCSE's CSE handling to fold branches which are exactly redundant due to a previous branch to branches on constants. (It doesn't actually replace the branch or change the CFG.) This is pretty clearly a win since it enables substantial CFG simplification before we start trying to inline.
2) The second is analogous to CVP in that it exploits the knowledge gained to replace dominated *uses* of the original value. EarlyCSE does not otherwise reason about specific uses, so this is the more arguable one. It does enable further simplication and constant folding within the rest of the visit by EarlyCSE.
In both cases, the added code only handles the easy dominance based case of each optimization. The general case is deferred to the existing passes.
Differential Revision: http://reviews.llvm.org/D9763
llvm-svn: 238071
InstCombine transforms A *nsw B +nsw A *nsw C to A *nsw (B + C).
This is incorrect -- e.g. if A = -1, B = 1, C = INT_SMAX. Then
nothing in the LHS overflows, but the multiplication in RHS overflows.
We need to first make sure that we won't multiple by INT_SMAX + 1.
Test case `add_of_mul` contributed by Sanjoy Das.
This fixes PR23635.
Differential Revision: http://reviews.llvm.org/D9629
llvm-svn: 238066
The usual CodeGenPrepare trickery, on a target-specific intrinsic.
Without this, the expansion of atomics will usually have the zext
be hoisted out of the loop, defeating the various patterns we have
to catch this precise case.
Differential Revision: http://reviews.llvm.org/D9930
llvm-svn: 238054
This patch adds a class for processing many recip codegen possibilities.
The TargetRecip class is intended to handle both command-line options to llc as well
as options passed in from a front-end such as clang with the -mrecip option.
The x86 backend is updated to use the new functionality.
Only -mcpu=btver2 with -ffast-math should see a functional change from this patch.
All other CPUs continue to *not* use reciprocal estimates by default with -ffast-math.
Differential Revision: http://reviews.llvm.org/D8982
llvm-svn: 238051
Before, getCanonicalArchName was relying on parseArch() to validate the arch
name, which was a problem when other methods, that also needed to call it,
were duplicating the steps.
But to dissociate getCanonicalArchName from parseArch, we needed to make
getCanonicalArchName more robust in detecting valid arch names. It's still
not perfect, but will do for the time being, until we merge Triple with
TargetParser into a TargetDescription mega class.
llvm-svn: 238047
The 'off' field of 'struct bpf_insn' is in cpu-endianness,
since the rest is emitted as little endian, make sure
that 'off' field is little endian as well.
llvm-svn: 238038
This allows us to match armv6m to default to thumb, but will also be used by
Clang's driver and remove the current incomplete copy in it.
llvm-svn: 238036
The problem was that I slipped a change required for shrink-wrapping, namely I
used getFirstTerminator instead of the getLastNonDebugInstr that was here before
the refactoring, whereas the surrounding code is not yet patched for that.
Original message:
[X86] Refactor the prologue emission to prepare for shrink-wrapping.
- Add a late pass to expand pseudo instructions (tail call and EH returns).
Instead of doing it in the prologue emission.
- Factor some static methods in X86FrameLowering to ease code sharing.
NFC.
Related to <rdar://problem/20821487>
llvm-svn: 238035
accumulating estimated cost, and other loop-centric logic from the logic
used to analyze instructions in a particular iteration.
This makes the visitor very narrow in scope -- all it does is visit
instructions, update a map of simplified values, and return whether it
is able to optimize away a particular instruction.
The two cost metrics are now returned as an optional struct. When the
optional is left unengaged, there is no information about the unrolled
cost of the loop, when it is engaged the cost metrics are available to
run against the thresholds.
No functionality changed.
llvm-svn: 238033
This patch adds support for the ISA 2.07 additions involving the
branch history rolling buffer and event-based branching. These will
not be used by typical applications, so built-in support is not
required. They will only be available via inline assembly.
Assembly/disassembly tests are included in the patch.
llvm-svn: 238032
MachO and COFF quite reasonably only define the size for common symbols.
We used to try to figure out the "size" by computing the gap from one symbol to
the next.
This would not be correct in general, since a part of a section can belong to no
visible symbol (padding, private globals).
It was also really expensive, since we would walk every symbol to find the size
of one.
If a caller really wants this, it can sort all the symbols once and get all the
gaps ("size") in O(n log n) instead of O(n^2).
On MachO this also has the advantage of centralizing all the checks for an
invalid n_sect.
llvm-svn: 238028
The list of subtarget features for the 7em triple contains 't2xtpk',
which actually disables that subtarget feature. Correct that to
'+t2xtpk' and test that the instructions enabled by that feature do
actually work.
Differential Revision: http://reviews.llvm.org/D9936
llvm-svn: 238022
Revert "[X86] Refactor the prologue emission to prepare for shrink-wrapping."
This reverts commit 6b3b93fc8b68a2c806aa992ee4bd3d7f61898d4b.
This reverts commit ab0b15dff8539826283a59c2dd700a18a9680e0f.
llvm-svn: 238011
This change to VirtRegRewriter::addMBBLiveIns adds live-in registers for each
MachineBasicBlock's LiveIns set without isLiveIn checks as they are being added
because doing so is expensive. After all live-in registers are added, the LiveIn
vectors are sorted and uniqued.
llvm-svn: 238008
Shave a pointer off of `MCSymbolName` by storing `StringMapEntry<bool>*`
instead of `StringRef`. This brings `sizeof(MCSymbol)` down to 64 on
64-bit platforms, a nice round number. My profile showed memory
dropping from 914 MB down to 908 MB, roughly 0.7%. Other than memory
usage, no functionality change here.
(I'm looking at `llc` memory usage on `verify-uselistorder.lto.opt.bc`;
see r236629 for details.)
llvm-svn: 238005
Save a pointer for each `MCSymbol`, bringing `llc` memory usage down
from 920 MB to 914 MB, around ~0.6%.
(I'm looking at `llc` memory usage on `verify-uselistorder.lto.opt.bc`;
see r236629 for details.)
llvm-svn: 238003
llvm/include/llvm/DebugInfo/DIContext.h:144:11: error: overriding ‘virtual llvm::LoadedObjectInfo::~LoadedObjectInfo() noexcept (true)’
It seems the destructor in the base class may not be "default".
llvm-svn: 238000
Previously `SDDbgValue`s used the general allocator that lives for all
of `SelectionDAG`. Instead, give them their own allocator, and reset it
whenever `SDDbgInfo::clear()` is called, plugging a spiritual leak.
This drops `SelectionDAGBuilder::visitIntrinsicCall()` off of my heap
profile (was at around 2% of `llc` for codegen of `-flto -g`). Thanks
to Pete Cooper for spotting the problem and suggesting the fix.
llvm-svn: 237998
Cleanup how `SDDbgValue` is initialized, and rearrange the fields to
save two pointers in the struct layout. No real functionality change
though (and I doubt the memory savings would show up in a profile).
llvm-svn: 237997
This change does a few things:
- Move some InstCombine transforms to InstSimplify
- Run SimplifyCall from within InstCombine::visitCallInst
- Teach InstSimplify to fold [us]mul_with_overflow(X, undef) to 0.
llvm-svn: 237995
simplified model for use simulating each iteration into a separate
helper function that just returns the cache.
Building this cache had nothing to do with the rest of the unroll
analysis and so this removes an unnecessary coupling, etc. It should
also make it easier to think about the concept of providing fast cached
access to basic SCEV models as an orthogonal concept to the overall
unroll simulation.
I'd really like to see this kind of caching logic folded into SCEV
itself, it seems weird for us to provide it at this layer rather than
making repeated queries into SCEV fast all on their own.
No functionality changed.
llvm-svn: 237993
a single location.
This reduces code duplication a bit and will also pave the way for
a better separation between the visitation algorithm and the unroll
analysis.
No functionality changed.
llvm-svn: 237990
PR23608 pointed out that using the preheader to gain a context instruction isn't always legal because a loop might not have a preheader. When looking into that, I realized that using the preheader to determine legality for sinking is questionable at best. Given no test covers that case and the original commit didn't seem to intend it, I restructured the code to only ask context sensative queries for hoising of loads and stores. This is effectively a partial revert of 237593.
llvm-svn: 237985
- Add a late pass to expand pseudo instructions (tail call and EH returns).
Instead of doing it in the prologue emission.
- Factor some static methods in X86FrameLowering to ease code sharing.
NFC.
Related to <rdar://problem/20821487>
llvm-svn: 237977
As noted in the original review, this is unused in tree & is used by
Julia... that's problematic. This API coudl easily be deleted/modified
by accident without any validation that it remains correct.
llvm-svn: 237976
Unfortunately, I can't reduce a small test case for this (although compiling
mpfr-3.1.2 with -O2 -mcpu=a2 would fairly reliably trigger a crash), but the
problem is fairly clear (at least once you know you're looking for one). If the
TLS instruction being replaced was at the end of the block, we'd increment the
iterator past it (so it would then point to MBB.end()), and then we'd increment
it again as part of the for statement, thus overrunning the end of the list.
Don't do that.
llvm-svn: 237974