Summary: This patch adds support of conversion (mul x, 2^N + 1) => (add (shl x, N), x) and (mul x, 2^N - 1) => (sub (shl x, N), x) if the multiplication can not be converted to LEA + SHL or LEA + LEA. LLVM has already supported this on ARM, and it should also be useful on X86. Note the patch currently only applies to cases where the constant operand is positive, and I am planing to add another patch to support negative cases after this.
Reviewers: craig.topper, RKSimon
Subscribers: aemerson, llvm-commits
Differential Revision: http://reviews.llvm.org/D14603
llvm-svn: 255391
DenseMap is the wrong data structure to use for sample records and call
sites. The keys are too large, causing massive core memory growth when
reading profiles.
Before this patch, a 21Mb input profile was causing the compiler to grow
to 3Gb in memory. By switching to std::map, the compiler now grows to
300Mb in memory.
There still are some opportunities for memory footprint reduction. I'll
be looking at those next.
llvm-svn: 255389
After much discussion, ending here:
http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20151123/315620.html
it has been decided that, instead of having the vectorizer directly generate
special absdiff and horizontal-add intrinsics, we'll recognize the relevant
reduction patterns during CodeGen. Accordingly, these intrinsics are not needed
(the operations they represent can be pattern matched, as is already done in
some backends). Thus, we're backing these out in favor of the current
development work.
r248483 - Codegen: Fix llvm.*absdiff semantic.
r242546 - [ARM] Use [SU]ABSDIFF nodes instead of intrinsics for VABD/VABA
r242545 - [AArch64] Use [SU]ABSDIFF nodes instead of intrinsics for ABD/ABA
r242409 - [Codegen] Add intrinsics 'absdiff' and corresponding SDNodes for absolute difference operation
llvm-svn: 255387
I am seeing disappointing clang performance on a large PowerPC64
Linux box. GetRandomNumberSeed() does a buffered read from
/dev/urandom to seed its PRNG. As a result we read an entire page
even though we only need 4 bytes.
With every clang task reading a page worth of /dev/urandom we
end up spending a large amount of time stuck on kernel spinlock.
Patch by Anton Blanchard!
llvm-svn: 255386
Before the patch, -fprofile-instr-generate compile will fail
if no integrated-as is specified when the file contains
any static functions (the -S output is also invalid).
This patch fixed the issue. With the change, the index format
version will be bumped up by 1. Backward compatibility is
preserved with this change.
Differential Revision: http://reviews.llvm.org/D15243
llvm-svn: 255365
computeRegisterLiveness() was broken in that it reported dead for a
register even if a subregister was alive. I assume this was because the
results of analayzePhysRegs() are hard to understand with respect to
subregisters.
This commit: Changes the results of analyzePhysRegs (=struct
PhysRegInfo) to be clearly understandable, also renames the fields to
avoid silent breakage of third-party code (and improve the grammar).
Fix all (two) users of computeRegisterLiveness() in llvm: By reenabling
it and removing workarounds for the bug.
This fixes http://llvm.org/PR24535 and http://llvm.org/PR25033
Differential Revision: http://reviews.llvm.org/D15320
llvm-svn: 255362
These are redundant pairs of nodes defined for
INSERT_VECTOR_ELEMENT/EXTRACT_VECTOR_ELEMENT.
insertelement/extractelement are slightly closer to the corresponding
C++ node name, and has stricter type checking so prefer it.
Update targets to only use these nodes where it is trivial to do so.
AArch64, ARM, and Mips all have various type errors on simple replacement,
so they will need work to fix.
Example from AArch64:
def : Pat<(sext_inreg (vector_extract (v16i8 V128:$Rn), VectorIndexB:$idx), i8),
(i32 (SMOVvi8to32 V128:$Rn, VectorIndexB:$idx))>;
Which is trying to do sext_inreg i8, i8.
llvm-svn: 255359
Summary:
ADJCALLSTACK{DOWN,UP} (aka CALLSEQ_{START,END}) MIs are supposed to use
and def the stack pointer. Since they do not, all the nodes are being
eliminated by DeadMachineInstructionElim, so they aren't in the IR when
PrologEpilogInserter/eliminateCallFramePseudo needs them.
This change fixes that, but since RegStackify will not stackify across
them (and it runs early, before PEI), change LowerCall to only emit them
when the call frame size is > 0. That makes the current code work the
same way and makes code handled by D15344 also work the same way. We can
expand the condition beyond NumBytes > 0 in the future if needed.
Reviewers: sunfish, jfb
Subscribers: jfb, dschuff, llvm-commits
Differential Revision: http://reviews.llvm.org/D15459
llvm-svn: 255356
Revert "[DSE] Disable non-local DSE to see if the bots go green."
Revert "[DeadStoreElimination] Use range-based loops. NFC."
Revert "[DeadStoreElimination] Add support for non-local DSE."
llvm-svn: 255354
The access function has a short entry and a short exit, the initialization
block is only run the first time. To improve the performance, we want to
have a short frame at the entry and exit.
We explicitly handle most of the CSRs via copies. Only the CSRs that are not
handled via copies will be in CSR_SaveList.
Frame lowering and prologue/epilogue insertion will generate a short frame
in the entry and exit according to CSR_SaveList. The majority of the CSRs will
be handled by register allcoator. Register allocator will try to spill and
reload them in the initialization block.
We add CSRsViaCopy, it will be explicitly handled during lowering.
1> we first set FunctionLoweringInfo->SplitCSR if conditions are met (the target
supports it for the given calling convention and the function has only return
exits). We also call TLI->initializeSplitCSR to perform initialization.
2> we call TLI->insertCopiesSplitCSR to insert copies from CSRsViaCopy to
virtual registers at beginning of the entry block and copies from virtual
registers to CSRsViaCopy at beginning of the exit blocks.
3> we also need to make sure the explicit copies will not be eliminated.
rdar://problem/23557469
Differential Revision: http://reviews.llvm.org/D15340
llvm-svn: 255353
Quoting from the comment added to the code:
// Objective-C on i386 uses artificial absolute symbols to
// perform some link time checks. Those symbols have a fixed 0
// address that might conflict with real symbols in the object
// file. As I cannot see a way for absolute symbols to find
// their way into the debug information, let's just ignore those.
llvm-svn: 255350
GlobalsAA's assumptions that passes do not escape globals not previously
escaped is not violated by AlignmentFromAssumptions and SLPVectorizer. Marking
them as such allows GlobalsAA to be preserved until GVN in the LTO pipeline.
http://lists.llvm.org/pipermail/llvm-dev/2015-December/092972.html
Patch by Vaivaswatha Nagaraj!
llvm-svn: 255348
AsmWriterEmitter will generate a getRegisterName function with an alternate
register name index as its second argument if the target makes use of them. The
enum of these values is generated in RegisterInfoEmitter. The getRegisterName
generator would assume the namespace could always be found by reading index 1
of the list of AltNameIndices, but this will fail if this list is sorted such
that the NoRegAltName is at index 1. Because this list is sorted by record name
(in CodeGenTarget::ReadRegAltNameIndices), you only run in to problems if your
MyTargetRegisterInfo.td defines a single RegAltNameIndex that sorts lexically
before NoRegAltName.
For example, if a target has something like
def AnAltNameIndex : RegAltNameIndex
and defines RegAltNameIndices for some registers then, prior to this change,
AsmWriterEmitter would generate references to
::AnAltNameIndex and ::NoRegAltName
Patch by Alex Bradbury!
llvm-svn: 255344
Mem2Reg shouldn't be optimizing a function that is marked
optnone. There is a test checking this that fails when mem2reg is
explicitly added to the standard pass pipeline.
llvm-svn: 255336
MatchBSwap has most of the functionality to match bit reversals already. If we switch it from looking at bytes to individual bits and remove a few early exits, we can extend the main recursive function to match any sequence of ORs, ANDs and shifts that assemble a value from different parts of another, base value. Once we have this bit->bit mapping, we can very simply detect if it is appropriate for a bswap or bitreverse.
llvm-svn: 255334
Before this patch, each function's on-disk VP data is 'pointed'
to by the Value field of per-function ProfileData structue, and
read relies on this field (relocated with ValueDataDelta field)
to read the value data. However this means the Value field needs
to be updated during runtime before dumping, which creates undesirable
data races.
With this patch, the reading of VP data no longer depends on Value
field. There is no format change. ValueDataDelta header field becomes
obsolute but will be kept for compatibility reason (will be removed
next time the raw format change is needed).
llvm-svn: 255329
reduce memory usage.
Previously, LazyValueInfoCache inserted overdefined lattice values into
both ValueCache and OverDefinedCache. This wasn't necessary and was
causing LazyValueInfo to use an excessive amount of memory in some cases.
This patch changes LazyValueInfoCache to insert overdefined values only
into OverDefinedCache. The memory usage decreases by 70 to 75% when one
of the files in llvm is compiled.
rdar://problem/11388615
Differential revision: http://reviews.llvm.org/D15391
llvm-svn: 255320
Access to aligned globals gives us a chance to peephole optimize nonzero
offsets. If a struct is 4 byte aligned, then accesses to bytes 0-3 won't
overflow the available displacement. For example:
addis 3, 2, b4v@toc@ha
addi 4, 3, b4v@toc@l
lbz 5, b4v@toc@l(3) ; This is the result of the current peephole
lbz 6, 1(4) ; optimizer
lbz 7, 2(4)
lbz 8, 3(4)
If b4v is 4-byte aligned, we can skip using register 4 because we know
that b4v@toc@l+{1,2,3} won't overflow 32K, and instead generate:
addis 3, 2, b4v@toc@ha
lbz 4, b4v@toc@l(3)
lbz 5, b4v@toc@l+1(3)
lbz 6, b4v@toc@l+2(3)
lbz 7, b4v@toc@l+3(3)
Saving a register and an addition.
Larger alignments allow larger structures/arrays to be optimized.
llvm-svn: 255319
Previously in the conversion cost table there are no entries for integer-integer
conversions on SSE2. This will result in imprecise costs for certain vectorized
operations. This patch adds those entries for SSE2 and SSE4.1. The cost numbers
are counted from the result of running llc on the new test case in this patch.
Differential revision: http://reviews.llvm.org/D15132
llvm-svn: 255315
This was causing bad code gen and assembly that won't assemble, as
mixed altivec and vsx code would end up with a vsx high register
assigned to an altivec instruction, which won't work. Constraining the
classes allows the optimization to proceed.
llvm-svn: 255299
This is the first step in supporting PGO data generation via CMake. I've marked the option as advanced and experimental until it is fleshed out further.
llvm-svn: 255298
Summary: As a follow-up to rL255054 I wasn't able to convince myself that the code did what I thought, so I wrote more tests.
Reviewers: reames
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D15371
llvm-svn: 255295
PR25763 demonstrated an issue with D14683 - vector comparison constant folding only works for i1 results, so we need to split off the sign-extension of the result to the required type. Luckily this can be done with the existing type legalization code.
llvm-svn: 255289
Avoid O(N^2) behaviour when checking for bad bitcasts in `ConstantExpr`s
buried inside of aggregate initializers to `GlobalVariable`s. I've:
- centralized the "visited" set for recursing through `ConstantExpr`s so
that expressions are only visited once per Verifier run,
- removed the duplicate logic for the stack visit, and
- avoided recursing into other `GlobalValue`s.
This recovers roughly a 100x time difference in clang compiles of a
particular input file (filled with large cross-referencing tables) that
depends on whether `-disable-llvm-verifier` is on. This slowdown was
caused by r187506, which introduced these checks.
Now, avoiding `-disable-llvm-verifier` only causes a 2x slowdown for
this case.
(Interestingly, dumping the textual IR for this file starts at least
50GB of global variable initializers (I don't know the total, since I
killed the dump)...)
llvm-svn: 255269
Summary:
Adds support for in-memory round-trip of sample profile data along with basic
round trip unit tests. This will also make it easier to include unit tests for
future changes to sample profiling.
Reviewers: davidxl, dnovillo, silvas
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D15211
llvm-svn: 255264
This is a redo of r255137 (reverted at r255227) which was a redo of
r255124 (reverted at r255126) with a fixed check for a scalar source
type and an added test for the failure that caused the revert.
Original commit message:
Example:
bitcast (extractelement (bitcast <2 x float> %X to <2 x i32>), 1) to float
--->
extractelement <2 x float> %X, i32 1
This is part of fixing PR25543:
https://llvm.org/bugs/show_bug.cgi?id=25543
The next step will be to generalize this fold:
trunc ( lshr ( bitcast X) ) -> extractelement (X)
Ie, I'm hoping to replace the existing transform of:
bitcast ( trunc ( lshr ( bitcast X)))
added by:
http://reviews.llvm.org/rL112232
with 2 less specific transforms to catch the case in the bug report.
Differential Revision: http://reviews.llvm.org/D14879
llvm-svn: 255261
Added some missing spaces between the module identifier and the start of
the debug message. Also added a ":" after the module identifier to make
this look a little nicer.
llvm-svn: 255259
Ensure we release the files even when they don't hold a function index
summary section, by restructuring the control flow a little bit.
llvm-svn: 255256
A linker normally has two stages: symbol resolution and "moving stuff".
In lib/Linker there is the complication of lazy linking some globals,
but it was still far more mixed than it needed to.
This splits the linker into a lower level IRMover and the linker proper.
The IRMover just takes a list of globals to move and a callback that
lets the user control what is lazy linked.
The main motivation is that now tools/gold (and soon lld) can use their
own symbol resolution to instruct IRMover what to do.
llvm-svn: 255254
We extend the search for redundant stores to predecessor blocks that
unconditionally lead to the block BB with the current store instruction. That
also includes single-block loops that unconditionally lead to BB, and
if-then-else blocks where then- and else-blocks unconditionally lead to BB.
http://reviews.llvm.org/D13363
Patch by Ivan Baev <ibaev@codeaurora.org>!
llvm-svn: 255247
This patch corresponds to review:
http://reviews.llvm.org/D15286
LLVM IR frequently contains bitcast operations between floating point and
integer values of the same width. Doing this through memory operations is
quite expensive on PPC. This patch allows the use of direct register moves
between FPRs and GPRs for lowering bitcasts.
llvm-svn: 255246
Introduced DIMacro and DIMacroFile debug info metadata in the LLVM IR to support macros.
Differential Revision: http://reviews.llvm.org/D14687
llvm-svn: 255245
Summary:
LAA uses the PredicatedScalarEvolution interface, so it can produce
forward/backward dependences having SCEVs that are AddRecExprs only after being
transformed by PredicatedScalarEvolution.
Use PredicatedScalarEvolution to get the expected expressions.
Reviewers: anemet
Subscribers: llvm-commits, sanjoy
Differential Revision: http://reviews.llvm.org/D15382
llvm-svn: 255241
SystemZ needs to do its scheduling after branch relaxation, which can
only happen after block placement, and therefore the standard
PostRAScheduler point in the pass sequence is too early.
TargetMachine::targetSchedulesPostRAScheduling() is a new method that
signals on returning true that target will insert the final scheduling
pass on its own.
Reviewed by Hal Finkel
llvm-svn: 255234
- This simplifies the CallSite class, arg_begin / arg_end are now
simple wrapper getters.
- In several places, we were creating CallSite instances solely to call
arg_begin and arg_end. With this change, that's no longer required.
llvm-svn: 255226
Patch turns on OpenMP support in clang by default after fixing OpenMP buildbots.
Differential Revision: http://reviews.llvm.org/D13802
llvm-svn: 255222
ISD::FCOPYSIGN permits its operands to have differing types, and DAGCombiner
uses this. Add some def : Pat rules to expand this out into an explicit
conversion and a normal copysign operation.
llvm-svn: 255220
Summary:
This allows us to remove the END_OF_TEXT_LABEL hack we had been using
and simplifies the fixups used to compute the address of constant
arrays.
Reviewers: arsenm
Subscribers: arsenm, llvm-commits
Differential Revision: http://reviews.llvm.org/D15257
llvm-svn: 255204
Detecting additional dead-defs without a dead flag that are only visible
through liveness information should be part of the register operand
collection not intertwined with the register pressure update logic.
llvm-svn: 255192
Target-specific instructions may have uninteresting physreg clobbers,
for target-specific reasons. The peephole pass doesn't need to concern
itself with such defs, as long as they're implicit and marked as dead.
llvm-svn: 255182
without a frame pointer when unwind may happen.
This is a workaround for a bug in the way we emit the CFI directives for
frameless unwind information. See PR25614.
llvm-svn: 255175
Two tests diag_mismatch.ll and diag_no_funcprofdata.ll generates the same
profdata filename which can conflict in current test runs. This patch
renames them to have different names.
llvm-svn: 255158
The ConstantDataArray::getFP(LLVMContext &, ArrayRef<uint16_t>)
overload has had a typo in it since it was written, where it will
create a Vector instead of an Array. This obviously doesn't work at
all, but it turns out that until r254991 there weren't actually any
callers of this overload. Fix the typo and add some test coverage.
llvm-svn: 255157
This is a preliminary change towards deduplicating type units based on
their signatures. Next change will skip emission of types when their
signature has already been seen.
llvm-svn: 255154
`CloneAndPruneIntoFromInst` can DCE instructions after cloning them into
the new function, and so an AssertingVH is too strong. This change
switches CloneCodeInfo to use a std::vector<WeakVH>.
llvm-svn: 255148