I could not come up a way to test this -- I think this bug is latent
today, and will not actually result in a miscompile.
In `getPreStartForExtend`, SCEV constructs `PreStart` as a sum of all of
`SA`'s operands except `Op`. It also uses `SA`'s no-wrap flags, and
this is problematic because removing an element from an add expression
can make it signed-wrap. E.g. if `SA` was `(127 + 1 + -1)`, then it
could safely be `<nsw>` (since `sext(127) + sext(1) + sext(-1)` ==
`sext(127 + 1 + -1)`), but `(127 + 1)` (== `PreStart` if `Op` is `-1`)
is not `<nsw>`.
Transferring `<nuw>` from `SA` to `PreStart` is safe, as far as I can
tell.
llvm-svn: 251097
Also adds a 'trivial' ELF file. This was generated by assembling
and linking a file with the symbol main which contains a single
return instruction.
llvm-svn: 251096
This patch converts the remaining references to literal
strings for names of profile runtime entites (such as
profile runtime hook, runtime hook use function, profile
init method, register function etc).
Also added documentation for all the new interfaces.
llvm-svn: 251093
In r251064 I removed a logically unreachable call to `redoLoop`, and
now there aren't any callers of this API at all. Remove the needless
complexity.
llvm-svn: 251067
The insertLoop() API is only used to add new loops, and has confusing
ownership semantics. Simplify it by replacing it with addLoop().
llvm-svn: 251064
Summary: Currently SimplifyResume can convert an invoke instruction to a call instruction if its landing pad is trivial. In practice we could have several invoke instructions with trivial landing pads and share a common rethrow block, and in the common rethrow block, all the landing pads join to a phi node. The patch extends SimplifyResume to check the phi of landing pad and their incoming blocks. If any of them is trivial, remove it from the phi node and convert the invoke instruction to a call instruction.
Reviewers: hfinkel, reames
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D13718
llvm-svn: 251061
This is a clean up patch that defines instr prof section and variable
name prefixes in a common header with access helper functions.
clang FE change will be done as a follow up once this patch is in.
Differential Revision: http://reviews.llvm.org/D13919
llvm-svn: 251058
As an invariant, BasicBlocks cannot be empty when passed to a transform.
This is not the case for MachineBasicBlocks and the Sink pass was ported
from the MachineSink pass which would explain the check's existence.
llvm-svn: 251057
Summary:
An unsigned comparision is equivalent to is corresponding signed version
if both the operands being compared are positive. Teach SCEV to use
this fact when profitable.
Reviewers: atrick, hfinkel, reames, nlewycky
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D13687
llvm-svn: 251051
Summary:
- A s< (A + C)<nsw> if C > 0
- A s<= (A + C)<nsw> if C >= 0
- (A + C)<nsw> s< A if C < 0
- (A + C)<nsw> s<= A if C <= 0
Right now `C` needs to be a constant, but we can later generalize it to
be a non-constant if needed.
Reviewers: atrick, hfinkel, reames, nlewycky
Subscribers: sanjoy, llvm-commits
Differential Revision: http://reviews.llvm.org/D13686
llvm-svn: 251050
Summary:
This uses `ScalarEvolution::getRange` and not potentially control
dependent `nsw` and `nuw` bits on the arithmetic instruction.
Reviewers: atrick, hfinkel, nlewycky
Subscribers: llvm-commits, sanjoy
Differential Revision: http://reviews.llvm.org/D13613
llvm-svn: 251048
* Don't instrument promotable dynamic allocas:
We already have a test that checks that promotable dynamic allocas are
ignored, as well as static promotable allocas. Make sure this test will
still pass if/when we enable dynamic alloca instrumentation by default.
* Handle lifetime intrinsics before handling dynamic allocas:
lifetime intrinsics may refer to dynamic allocas, so we need to emit
instrumentation before these dynamic allocas would be replaced.
Differential Revision: http://reviews.llvm.org/D12704
llvm-svn: 251045
It turned out not to improve any of our benchmarks but occasionally led
to increased register pressure and spilling.
Only enabling for the Cyclone CPU as the results on the cortex CPUs
give mixed results.
Differential Revision: http://reviews.llvm.org/D13708
llvm-svn: 251038
When we fold "mul ((add x, c1), c1)" -> "add ((mul x, c2), c1*c2)", we bail if (add x, c1) has multiple
users which would result in an extra add instruction.
In such cases, this patch adds a check to see if we can eliminate a multiply instruction in exchange for the extra add.
I also added the capability of doing the existing optimization with non-splatted vectors (splatted also works).
Differential Revision: http://reviews.llvm.org/D13740
llvm-svn: 251028
PR24686 identifies a problem where a relocation expression is invalid
when not all of the symbols in the expression can be locally
resolved. This causes the compiler to request a PC-relative half16ds
relocation, which is nonsensical for PowerPC. This patch recognizes
this situation and ensures we fail the assembly cleanly.
Test case provided by Anton Blanchard.
llvm-svn: 251027
Instead of bailing out when we see loads, analyze them. If we can prove that the loaded-from address must escape, then we can conclude that a load from that address must escape too and therefore cannot alias a non-addr-taken global.
When checking if a Value can alias a non-addr-taken global, if the Value is a LoadInst of a non-global, recurse instead of bailing.
If we can follow a trail of loads up to some base that is captured, we know by inference that all the loads we followed are also captured.
llvm-svn: 251017
If the final indices of two GEPs can be proven to not be equal, and
the GEP is of a SequentialType (not a StructType), then the two GEPs
do not alias.
llvm-svn: 251016
isKnownNonEqual(A, B) returns true if it can be determined that A != B.
At the moment it only knows two facts, that a non-wrapping add of nonzero to a value cannot be that value:
A + B != A [where B != 0, addition is nsw or nuw]
and that contradictory known bits imply two values are not equal.
This patch also hooks this up to InstSimplify; InstSimplify had a peephole for the first fact but not the second so this teaches InstSimplify a new trick too (alas no measured performance impact!)
llvm-svn: 251012
Clang runtime failure was reported.
Assertion failed: (isExtended() && "Type is not extended!"), function getTypeForEVT
I'll need to add a proper handling for PointerType in masked load/store intrinsics.
llvm-svn: 250995
Summary: This will be used in a future change to ScalarEvolution.
Reviewers: hfinkel, reames, nlewycky
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D13612
llvm-svn: 250975
Summary:
This makes attribute accessors on `CallInst` and `InvokeInst` do the
(conservatively) right thing. This essentially involves, in some
cases, *not* falling back querying the attributes on the called
`llvm::Function` when operand bundles are present.
Attributes locally present on the `CallInst` or `InvokeInst` will still
override operand bundle semantics. The LangRef has been amended to
reflect this. Note: this change does not do anything prevent
`-function-attrs` from inferring `CallSite` local attributes after
inspecting the called function -- that will be done as a separate
change.
I've used `-adce` and `-early-cse` to test these changes. There is
nothing special about these passes (and they did not require any
changes) except that they seemed be the easiest way to write the tests.
This change does not add deal with `argmemonly`. That's a later change
because alias analysis requires a related fix before `argmemonly` can be
tested.
Reviewers: reames, chandlerc
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D13961
llvm-svn: 250973
These were the cause of a verifier error when building 7zip with
-verify-machineinstrs. Running 'make check' with the verifier
triggered the same error on the test here so i've updated the test
to run the verifier on one of its runs instead of adding a new one.
While looking at this code, there was a stale comment that these
instructions were only used for disassembly. This probably used to
be the case, but they are now used in the 'ARM load / store optimization pass' too.
This reapplies r242300 which was reverted in r242428 due to bot failures.
Ultimately those failures were spurious and completely unrelated to this commit. I reverted this
at the time because it was thought to be at fault.
llvm-svn: 250969
There may be other use operands that also need their kill flags cleared.
This happens in a few tests when SIFoldOperands is moved after
PeepholeOptimizer.
PeepholeOptimizer rewrites cases that look like:
%vreg0 = ...
%vreg1 = COPY %vreg0
use %vreg1<kill>
%vreg2 = COPY %vreg0
use %vreg2<kill>
to use the earlier source to
%vreg0 = ...
use %vreg0
use %vreg0
Currently SIFoldOperands sees the copied registers, so there is
only one use. So far I haven't managed to come up with a test
that currently has multiple uses of a foldable VGPR -> VGPR copy.
llvm-svn: 250960
This was checking for a variety of situations that should
never happen. This saves a tiny bit of compile time.
We should not be selecting instructions with invalid operands in the
first place. Most of the time for registers copys are inserted
to the correct operand register class.
For VOP3, since all operand types are supported and literal
constants never are, we just need to verify the constant bus
requirements (all immediates should be legal inline ones).
The only possibly tricky case to maybe worry about is if when
legalizing operands in moveToVALU with s_add_i32 and similar
instructions. If the original s_add_i32 had a literal constant
and we need to replace it with v_add_i32_e64 we would have an
unsupported literal operand. However, I don't think we should worry
about that because SIFoldOperands should handle folding literal
constant operands into the SALU instructions based on the uses.
At SIFoldOperands time, the legality and profitability of
operand types is a bit different.
llvm-svn: 250951
This will be used in future commits for AMDGPU to promote
operations on i64 vectors into operations on 32-bit vector
components.
This will be used / tested in future AMDGPU commits.
llvm-svn: 250945
Summary: ELF's STT_File symbols may overlap with regular globals in
other files, so we should ignore them here in order to avoid having
bogus entries in the symbol table that confuse us when resolving relocations.
Reviewers: lhames
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D13888
llvm-svn: 250942
SimplifyTerminatorOnSelect didn't consider the possibility that the
condition might be related to one of PHI nodes.
This fixes PR25267.
llvm-svn: 250922
in the size field in the archive header for the member is not a number. To do this we
have all of the needed methods return ErrorOr to push them up until we get out of lib.
Then the tools and can handle the error in whatever way is appropriate for that tool.
So the solution is to plumb all the ErrorOr stuff through everything that touches archives.
This include its iterators as one can create an Archive object but the first or any other
Child object may fail to be created due to a bad size field in its header.
Thanks to Lang Hames on the changes making child_iterator contain an
ErrorOr<Child> instead of a Child and the needed changes to ErrorOr.h to add
operator overloading for * and -> .
We don’t want to use llvm_unreachable() as it calls abort() and is produces a “crash”
and using report_fatal_error() to move the error checking will cause the program to
stop, neither of which are really correct in library code. There are still some uses of
these that should be cleaned up in this library code for other than the size field.
Also corrected the code where the size gets us to the “at the end of the archive”
which is OK but past the end of the archive will return object_error::parse_failed now.
The test cases use archives with text files so one can see the non-digit character,
in this case a ‘%’, in the size field.
llvm-svn: 250906
Summary:
Previously, we were inserting an InlineAsm statement for each line of the
inline assembly. This works for GAS but it triggers prologue/epilogue
emission when IAS is in use. This caused:
.set noreorder
.cpload $25
to be emitted as:
.set push
.set reorder
.set noreorder
.set pop
.set push
.set reorder
.cpload $25
.set pop
which led to assembler errors and caused the test to fail.
The whitespace-after-comma changes included in this patch are necessary to
match the output when IAS is in use.
Reviewers: vkalintiris
Subscribers: rkotler, llvm-commits, dsanders
Differential Revision: http://reviews.llvm.org/D13653
llvm-svn: 250895
"external" AA wrapper pass.
This is a generic hook that can be used to thread custom code into the
primary AAResultsWrapperPass for the legacy pass manager in order to
allow it to merge external AA results into the AA results it is
building. It does this by threading in a raw callback and so it is
*very* powerful and should serve almost any use case I have come up with
for extending the set of alias analyses used. The only thing not well
supported here is using a *different order* of alias analyses. That form
of extension *is* supportable with the new pass manager, and I can make
the callback structure here more elaborate to support it in the legacy
pass manager if this is a critical use case that people are already
depending on, but the only use cases I have heard of thus far should be
reasonably satisfied by this simpler extension mechanism.
It is hard to test this using normal facilities (the built-in AAs don't
use this for obvious reasons) so I've written a fairly extensive set of
custom passes in the alias analysis unit test that should be an
excellent test case because it models the out-of-tree users: it adds
a totally custom AA to the system. This should also serve as
a reasonably good example and guide for out-of-tree users to follow in
order to rig up their existing alias analyses.
No support in opt for commandline control is provided here however. I'm
really unhappy with the kind of contortions that would be required to
support that. It would fully re-introduce the analysis group
self-recursion kind of patterns. =/
I've heard from out-of-tree users that this will unblock their use cases
with extending AAs on top of the new infrastructure and let us retain
the new analysis-group-free-world.
Differential Revision: http://reviews.llvm.org/D13418
llvm-svn: 250894
When we have to convert the masked.load, masked.store to scalar code, we generate a chain of conditional basic blocks.
I added optimization for constant mask vector.
Differential Revision: http://reviews.llvm.org/D13855
llvm-svn: 250893
Summary:
The forwards compatibility strategy employed by MIPS is to consider registers
to be infinitely sign-extended. Then on ISA's with a wider register, the result
of existing instructions are sign-extended to register width and zero-extended
counterparts are added. copy_u.w on MSA32 and copy_u.w on MSA64 violate this
strategy and we have therefore corrected the MSA specs to fix this.
We still keep track of sign/zero-extension during legalization but we now
match copy_s.[wd] where required.
No change required to clang since __builtin_msa_copy_u_[wd] will map to
copy_s.[wd] where appropriate for the target.
Reviewers: vkalintiris
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D13472
llvm-svn: 250887
A mem-to-mem instruction (that both loads and stores), which store to an
FI, cannot pass the verifier since it thinks it is loading from the FI.
For the mem-to-mem instruction, do a looser check in visitMachineOperand()
and only check liveness at the reg-slot while analyzing a frame index operand.
Needed to make CodeGen/SystemZ/xor-01.ll pass with -verify-machineinstrs,
which now runs with this flag.
Reviewed by Evan Cheng and Quentin Colombet.
llvm-svn: 250885
Do not tail duplicate blocks where the successor has a phi node,
and the corresponding value in that phi node uses a subregister.
http://reviews.llvm.org/D13922
llvm-svn: 250877
C/C++ code can declare an extern function, which will show up as an import in WebAssembly's output. It's expected that the linker will resolve these, and mark unresolved imports as call_import (I have a patch which does this in wasmate).
llvm-svn: 250875
In some cases (as illustrated in the unittest), lineno can be less than the heade_lineno because the function body are included from some other files. In this case, offset will be negative. This patch makes clang still able to match the profile to IR in this situation.
http://reviews.llvm.org/D13914
llvm-svn: 250873
It is now possible to infer intrinsic modref behaviour purely from intrinsic attributes.
This change will allow to completely remove GET_INTRINSIC_MODREF_BEHAVIOR table.
Differential Revision: http://reviews.llvm.org/D13907
llvm-svn: 250860
default: llvm_unreachable("This action is not supported yet!");
-- so I'm adding one to the third switch block, too.
This is a follow-up fix for http://reviews.llvm.org/D13862
llvm-svn: 250830
Summary:
TargetLoweringBase::Expand is defined as "Try to expand this to other ops,
otherwise use a libcall." For ISD::UDIV and ISD::SDIV, the choice between
the two possibilities was defined in a rather convoluted way:
- if DIVREM is legal, expand to DIVREM
- if DIVREM has a custom lowering, expand to DIVREM
- if DIVREM libcall is defined and a remainder from the same division is
computed elsewhere, expand to a DIVREM libcall
- else, expand to a DIV libcall
This had the undesirable effect that if both DIV and DIVREM are implemented
as libcalls, then ISD::UDIV and ISD::SDIV are expanded to the heavier DIVREM
libcall, even when the remainder isn't used.
The new code adds a new LegalizeAction, TargetLoweringBase::LibCall, so that
backends can directly control whether they prefer an expansion or a conversion
to a libcall. This makes the generic lowering code even more generic,
allowing its reuse in a wider range of target-specific configurations.
The useful effect is that ARM backend will now generate a call
to __aeabi_{i,u}div rather than __aeabi_{i,u}divmod in cases where
it doesn't need the remainder. There's no functional change outside
the ARM backend.
Reviewers: t.p.northover, rengolin
Subscribers: t.p.northover, llvm-commits, aemerson
Differential Revision: http://reviews.llvm.org/D13862
llvm-svn: 250826
Summary:
In addition to moving the code over, this patch amends the DIV,REM -> DIVREM
combining to run on all affected nodes at once: if the nodes are converted
to DIVREM one at a time, then the resulting DIVREM may get legalized by the
backend into something target-specific that we won't be able to recognize
and correlate with the remaining nodes.
The motivation is to "prepare terrain" for D13862: when we set DIV and REM
to be legalized to libcalls, instead of the DIVREM, we otherwise lose the
ability to combine them together. To prevent this, we need to take the
DIV,REM -> DIVREM combining out of the lowering stage.
Reviewers: RKSimon, eli.friedman, rengolin
Subscribers: john.brawn, rengolin, llvm-commits
Differential Revision: http://reviews.llvm.org/D13733
llvm-svn: 250825
Summary: In r231241, TargetLibraryInfoWrapperPass was added to
`getAnalysisUsage` for `AddressSanitizer`, but the corresponding
`INITIALIZE_PASS_DEPENDENCY` was not added.
Reviewers: dvyukov, chandlerc, kcc
Subscribers: kcc, llvm-commits
Differential Revision: http://reviews.llvm.org/D13629
llvm-svn: 250813
This wasn't doing anything useful. They weren't explicitly used
anywhere, and the RegScavenger ignores reserved registers.
This for some reason caused a random scheduling change in the test.
Getting the check lines to pass is too frustrating, and there's probably
not too much value in checking the vector case's operands N times.
llvm-svn: 250794
`normalizeForInvokeSafepoint` in RewriteStatepointsForGC.cpp, as it is
written today, deals with `gc.relocate` and `gc.result` uses of a
statepoint equally well. This change documents this fact and adds a
test case.
There is no functional change here -- only documentation of existing
functionality.
llvm-svn: 250784
There are two things out of the ordinary in this commit. First, I made
a loop obviously "infinite" in HexagonInstrInfo.cpp. After checking if
an instruction was at the beginning of a basic block (in which case,
`break`), the loop decremented and checked the iterator for `nullptr` as
the loop condition. This has never been possible (the prev pointers are
always been circular, so even with the weird ilist/iplist
implementation, this isn't been possible), so I removed the condition.
Second, in HexagonAsmPrinter.cpp there was another case of comparing a
`MachineBasicBlock::instr_iterator` against `MachineBasicBlock::end()`
(which returns `MachineBasicBlock::iterator`). While not incorrect,
it's fragile. I switched this to `::instr_end()`.
All that said, no functionality change intended here.
llvm-svn: 250778
Currently, in MachineBlockPlacement pass the loop is rotated to let the best exit to be the last BB in the loop chain, to maximize the fall-through from the loop to outside. With profile data, we can determine the cost in terms of missed fall through opportunities when rotating a loop chain and select the best rotation. Basically, there are three kinds of cost to consider for each rotation:
1. The possibly missed fall through edge (if it exists) from BB out of the loop to the loop header.
2. The possibly missed fall through edges (if they exist) from the loop exits to BB out of the loop.
3. The missed fall through edge (if it exists) from the last BB to the first BB in the loop chain.
Therefore, the cost for a given rotation is the sum of costs listed above. We select the best rotation with the smallest cost. This is only for PGO mode when we have more precise edge frequencies.
Differential revision: http://reviews.llvm.org/D10717
llvm-svn: 250754
As usual, this is a polymorphic hierarchy without polymorphic ownership,
so simply make the dtor protected non-virtual, protected default copy
ctor/assign, and make derived classes final. The derived classes will
pick up correct default public copy ops (and dtor) implicitly.
(wish I could add -Wdeprecated to the build, but last time I tried it
triggered on some system headers I still need to look into/figure out)
llvm-svn: 250747
Allow LLVM to optimize the sequence like the following:
%inc = add nsw i32 %i, 1
%cmp = icmp slt %n, %inc
into:
%cmp = icmp sle i32 %n, %i
The case is not handled previously due to the complexity of compuation of %n.
Hence, LLVM cannot swap operands of icmp accordingly.
llvm-svn: 250746
Besides the usual, I finally added an overload to
`BasicBlock::splitBasicBlock()` that accepts an `Instruction*` instead
of `BasicBlock::iterator`. Someone can go back and remove this overload
later (after updating the callers I'm going to skip going forward), but
the most common call seems to be
`BB->splitBasicBlock(BB->getTerminator(), ...)` and I'm not sure it's
better to add `->getIterator()` to every one than have the overload.
It's pretty hard to get the usage wrong.
llvm-svn: 250745
This was originally checked in at r250527, but reverted at r250570 because of PR25222.
There were at least 2 problems:
1. The cost check was checking for an instruction with an exact cost of TCC_Expensive;
that should have been >=.
2. The cause of the clang stage 1 failures was illegally sinking 'call' instructions;
we can't sink instructions that may have side effects / are not safe to execute speculatively.
Fixed those conditions in sinkSelectOperand() and added test cases.
Original commit message:
This is a follow-up to the discussion in D12882.
Ideally, we would like SimplifyCFG to be able to form select instructions even when the operands
are expensive (as defined by the TTI cost model) because that may expose further optimizations.
However, we would then like a later pass like CodeGenPrepare to undo that transformation if the
target would likely benefit from not speculatively executing an expensive op (this patch).
Once we have this safety mechanism in place, we can adjust SimplifyCFG to restore its
select-formation behavior that changed with r248439.
Differential Revision: http://reviews.llvm.org/D13297
llvm-svn: 250743
Convert two halfword loads into a single 32-bit word load with bitfield extract
instructions. For example :
ldrh w0, [x2]
ldrh w1, [x2, #2]
becomes
ldr w0, [x2]
ubfx w1, w0, #16, #16
and w0, w0, #ffff
llvm-svn: 250719
- Isolate the check for the existence of a stack frame into hasFP.
- Implement getFrameIndexReference for DWARF address computation.
- Use getFrameIndexReference for offset computation in eliminateFrameIndex.
- Preserve debug information for dynamically allocated stack objects.
- Prefer FP to access local objects at -O0.
- Add experimental code to skip allocframe when not strictly necessary
(disabled by default).
llvm-svn: 250718
Emit the CFI instructions after all code transformation have been done.
This will avoid any interference between CFI instructions and packetization.
llvm-svn: 250714
memory, rather than representing the stubs in IR. Update the CompileOnDemand
layer to use this functionality.
Directly emitting stubs is much cheaper than building them in IR and codegen'ing
them (see below). It also plays well with remote JITing - stubs can be emitted
directly in the target process, rather than having to send them over the wire.
The downsides are:
(1) Care must be taken when resolving symbols, as stub symbols are held in a
separate symbol table. This is only a problem for layer writers and other
people using this API directly. The CompileOnDemand layer hides this detail.
(2) Aliases of function stubs can't be symbolic any more (since there's no
symbol definition in IR), but must be converted into a constant pointer
expression. This means that modules containing aliases of stubs cannot be
cached. In practice this is unlikely to be a problem: There's no benefit to
caching such a module anyway.
On balance I think the extra performance is more than worth the trade-offs: In a
simple stress test with 10000 dummy functions requiring stubs and a single
executed "hello world" main function, directly emitting stubs reduced user time
for JITing / executing by over 90% (1.5s for IR stubs vs 0.1s for direct
emission).
llvm-svn: 250712
The mapping of these two intrinsics in ARMInstrInfo.td had a small
omission which lead to their operands not being validated/transformed
before being lowered into usat and ssat instructions. This can cause
incorrect instructions to be emitted.
I've also added tests for the remaining two saturating arithmatic
intrinsics @llvm.arm.qadd and @llvm.arm.qsub as they are missing
codegen tests.
llvm-svn: 250697
We were keeping a reference to an object in a DenseMap then mutating it. At the end of the function we were attempting to clone that reference into other keys in the DenseMap, but DenseMap may well decide to resize its hashtable which would invalidate the reference!
It took an extremely complex testcase to catch this - many thanks to Zhendong Su for catching it in PR25225.
This fixes PR25225.
llvm-svn: 250692
Originally I planned to use the same interface for masked gather/scatter and set isConsecutive to "false" in this case.
Now I'm implementing masked gather/scatter and see that the interface is inconvenient. I want to add interfaces isLegalMaskedGather() / isLegalMaskedScatter() instead of using the "Consecutive" parameter in the existing interfaces.
Differential Revision: http://reviews.llvm.org/D13850
llvm-svn: 250686
1. Key constant values (version, magic) and data structures related to raw and
indexed profile format are moved into one centralized file: InstrProf.h.
2. Utility function such as MD5Hash computation is also moved to the common
header to allow sharing with other components in the future.
3. A header data structure is introduced for Indexed format so that the reader
and writer can always be in sync.
4. Added some comments to document different places where multiple definition
of the data structure must be kept in sync (reader/writer, runtime, lowering
etc). No functional change is intended.
Differential Revision: http://reviews.llvm.org/D13758
llvm-svn: 250638
Add FastISel support for SSE4A scalar float / double non-temporal stores
Follow up to D13698
Differential Revision: http://reviews.llvm.org/D13773
llvm-svn: 250610
This patch improves support for combining the SSE4A EXTRQ(I) and INSERTQ(I) intrinsics:
1 - Converts INSERTQ/EXTRQ calls to INSERTQI/EXTRQI if the 'bit index' and 'length' operands are constant
2 - Converts INSERTQI/EXTRQI calls to shufflevector if the bit index/length are both byte aligned (we can already lower shuffles to INSERTQI/EXTRQI if its useful)
3 - Constant folding support
4 - Add zeroinitializer handling
Differential Revision: http://reviews.llvm.org/D13348
llvm-svn: 250609
This property was already used in the code path when no liveness
intervals are present. Unfortunately the code path that uses liveness
intervals tried to query a cached live interval for an allocatable
physreg, those are usually not computed so a conservative default was
used.
This doesn't affect any of the lit testcases. This is a foreclosure to
upcoming changes which should be NFC but without this patch this tidbit
wouldn't be NFC.
llvm-svn: 250596
This should not change behaviour because as far as I can see all code
reading the pressure changes has no effect if the PressureInc is 0.
Removing these entries however does avoid unnecessary computation, and
results in a more stable debug output. I want the stable debug output to
check that some upcoming changes are indeed NFC and identical even at
the debug output level.
llvm-svn: 250595
Summary:
This is a temporary hack until we get around to remapping the vreg
numbers to local numbers. Dead vregs cause bad numbering and make
consumers sad.
We could also just look at debug info an use named locals instead, but
vregs have to work properly anyways so there!
Reviewers: binji, sunfish
Subscribers: jfb, llvm-commits, dschuff
Differential Revision: http://reviews.llvm.org/D13839
llvm-svn: 250594
Summary:
Some shared code for handling eh.exceptionpointer and eh.exceptioncode
needs to not share the part that truncates to 32 bits, which is intended
just for exception codes.
Reviewers: rnk
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D13747
llvm-svn: 250588
Our previous value of "16 + 8 + MaxCallFrameSize" for ParentFrameOffset
is incorrect when CSRs are involved. We were supposed to have a test
case to catch this, but it wasn't very rigorous.
The main effect here is that calling _CxxThrowException inside a
catchpad doesn't immediately crash on MOVAPS when you have an odd number
of CSRs.
llvm-svn: 250583
The motivation for this patch starts with PR20134:
https://llvm.org/bugs/show_bug.cgi?id=20134
void foo(int *a, int i) {
a[i] = a[i+1] + a[i+2];
}
It seems better to produce this (14 bytes):
movslq %esi, %rsi
movl 0x4(%rdi,%rsi,4), %eax
addl 0x8(%rdi,%rsi,4), %eax
movl %eax, (%rdi,%rsi,4)
Rather than this (22 bytes):
leal 0x1(%rsi), %eax
cltq
leal 0x2(%rsi), %ecx
movslq %ecx, %rcx
movl (%rdi,%rcx,4), %ecx
addl (%rdi,%rax,4), %ecx
movslq %esi, %rax
movl %ecx, (%rdi,%rax,4)
The most basic problem (the first test case in the patch combines constants) should also be fixed in InstCombine,
but it gets more complicated after that because we need to consider architecture and micro-architecture. For
example, AArch64 may not see any benefit from the more general transform because the ISA solves the sexting in
hardware. Some x86 chips may not want to replace 2 ADD insts with 1 LEA, and there's an attribute for that:
FeatureSlowLEA. But I suspect that doesn't go far enough or maybe it's not getting used when it should; I'm
also not sure if FeatureSlowLEA should also mean "slow complex addressing mode".
I see no perf differences on test-suite with this change running on AMD Jaguar, but I see small code size
improvements when building clang and the LLVM tools with the patched compiler.
A more general solution to the sext(add nsw(x, C)) problem that works for multiple targets is available
in CodeGenPrepare, but it may take quite a bit more work to get that to fire on all of the test cases that
this patch takes care of.
Differential Revision: http://reviews.llvm.org/D13757
llvm-svn: 250560
Crashing is bad, m'kay? Fixing a 4 year old bug of my own creation.
Adding the testcase now which I should have added then which would have
long since caught this.
The problem is that printMessage() will display the diagnostic but not
set HadError to true, resulting in the assembler continuing on its way
and trying to create relocations for things that may not allow them or
otherwise get itself into trouble. Using the Error() helper function
here rather than calling printMessage() directly resolves this.
rdar://23133240
llvm-svn: 250557
Summary:
We now use the block for the catchpad itself, rather than its normal
successor, as the funclet entry.
Putting the normal successor in the map leads downstream funclet
membership computations to erroneous results.
Reviewers: majnemer, rnk
Subscribers: rnk, llvm-commits
Differential Revision: http://reviews.llvm.org/D13798
llvm-svn: 250552
The number of samples collected at the head of a function only make
sense for top-level functions (i.e., those actually called as opposed to
being inlined inside another).
Head samples essentially count the time spent inside the function's
prologue. This clearly doesn't make sense for inlined functions, so we
were always emitting 0 in those.
llvm-svn: 250539
Summary: The syntax has changed a bit recently.
Reviewers: binji
Subscribers: llvm-commits, jfb, sunfish, dschuff
Differential Revision: http://reviews.llvm.org/D13821
llvm-svn: 250535
Summary:
When a cleanup's cleanupendpad or cleanupret targets a catchendpad, stop
trying to propagate the cleanup's parent's color to the catchendpad, since
what's needed is the cleanup's grandparent's color and the catchendpad
will get that color from the catchpad linkage already. We already had
this exclusion for invokes, but were missing it for
cleanupendpad/cleanupret.
Also add a missing line that tags cleanupendpads' states in the
EHPadStateMap, without with lowering invokes that target cleanupendpads
which unwind to other handlers (and so don't have the -1 state) will fail.
This fixes the reduced IR repro in PR25163.
Reviewers: majnemer, andrew.w.kaylor, rnk
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D13797
llvm-svn: 250534
Ideally, we would like SimplifyCFG to be able to form select instructions even when the operands
are expensive (as defined by the TTI cost model) because that may expose further optimizations.
However, we would then like a later pass like CodeGenPrepare to undo that transformation if the
target would likely benefit from not speculatively executing an expensive op (this patch).
Once we have this safety mechanism in place, we can adjust SimplifyCFG to restore its
select-formation behavior that changed with r248439.
Differential Revision: http://reviews.llvm.org/D13297
llvm-svn: 250527
Summary: Make the relooper an analysis pass, to convert CFG to AST.
Reviewers: sunfish
Subscribers: jfb, dschuff
Differential Revision: http://reviews.llvm.org/D12744
llvm-svn: 250524
Summary: This patch replaces usage of deprecated SHGetFolderPathW with SHGetKnownFolderPath. The usage of SHGetKnownFolderPath is wrapped to allow queries for other "known" folders in the near future.
Reviewers: aaron.ballman, gbedwell
Subscribers: chapuni, llvm-commits
Differential Revision: http://reviews.llvm.org/D13753
llvm-svn: 250501
This patch adds the underlying infrastructure for an AVR backend to be included into LLVM. It is the first of a series of patches aimed at moving the out-of-tree AVR backend into the tree.
It consists of adding a new`Triple` target 'avr'.
llvm-svn: 250492
The `"statepoint-id"` and `"statepoint-num-patch-bytes"` attributes are
used solely to determine properties of the `gc.statepoint` being
created. Once the `gc.statepoint` is in place, these should be removed.
llvm-svn: 250491
Summary:
This is a step towards using operand bundles to carry deopt state till
RewriteStatepointsForGC. The change adds a flag to
RewriteStatepointsForGC that teaches it to pick up deopt state from a
`"deopt"` operand bundle attached to the `call` or `invoke` it is
wrapping.
The command line flag added, `-rs4gc-use-deopt-bundles`, will only exist
for a short while. Once we are able to pipe deopt bundle state through
the full optimization pipeline without problems, we will "constant fold"
`-rs4gc-use-deopt-bundles` to `true`.
Reviewers: swaroop.sridhar, reames
Subscribers: llvm-commits, sanjoy
Differential Revision: http://reviews.llvm.org/D13372
llvm-svn: 250489
Summary:
`cloneArithmeticIVUser` currently trips over expression like `add %iv,
-1` when `%iv` is being zero extended -- it tries to construct the
widened use as `add %iv.zext, zext(-1)` and (correctly) fails to prove
equivalence to `zext(add %iv, -1)` (here the SCEV for `%iv` is
`{1,+,1}`).
This change teaches `IndVars` to try sign extending the non-IV operand
if that makes the newly constructed IV use equivalent to the widened
narrow IV use.
Reviewers: atrick, hfinkel, reames
Subscribers: sanjoy, llvm-commits
Differential Revision: http://reviews.llvm.org/D13717
llvm-svn: 250483
Summary:
This NFC splitting is intended to make a later diff easier to follow.
It just tail duplicates `cloneIVUser` into `cloneArithmeticIVUser` and
`cloneBitwiseIVUser`.
Reviewers: atrick, hfinkel, reames
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D13716
llvm-svn: 250481
Summary:
Follow the same syntax as for the spec repo. Both have evolved slightly
independently and need to converge again.
This, along with wasmate changes, allows me to do the following:
echo "int add(int a, int b) { return a + b; }" > add.c
./out/bin/clang -O2 -S --target=wasm32-unknown-unknown add.c -o add.wack
./experimental/prototype-wasmate/wasmate.py add.wack > add.wast
./sexpr-wasm-prototype/out/sexpr-wasm add.wast -o add.wasm
./sexpr-wasm-prototype/third_party/v8-native-prototype/v8/v8/out/Release/d8 -e "print(WASM.instantiateModule(readbuffer('add.wasm'), {print:print}).add(42, 1337));"
As you'd expect, the d8 shell prints out the right value.
Reviewers: sunfish
Subscribers: jfb, llvm-commits, dschuff
Differential Revision: http://reviews.llvm.org/D13712
llvm-svn: 250480
Android libc provides a fixed TLS slot for the unsafe stack pointer,
and this change implements direct access to that slot on AArch64 via
__builtin_thread_pointer() + offset.
This change also moves more code into TargetLowering and its
target-specific subclasses to get rid of target-specific codegen
in SafeStackPass.
This change does not touch the ARM backend because ARM lowers
builting_thread_pointer as aeabi_read_tp, which is not available
on Android.
llvm-svn: 250456
D4796 taught LLVM to fold some atomic integer operations into a single
instruction. The pattern was unaware that the instructions clobbered
flags. I fixed some of this issue in D13680 but had missed INC/DEC.
This patch adds the missing EFLAGS definition.
llvm-svn: 250438
Turns out this approach is buggy. In discussion about follow on work, Sanjoy pointed out that we could be subject to circular logic problems.
Consider:
if (i u< L) leave()
if ((i + 1) u< L) leave()
print(a[i] + a[i+1])
If we know that L is less than UINT_MAX, we could possible prove (in a control dependent way) that i + 1 does not overflow. This gives us:
if (i u< L) leave()
if ((i +nuw 1) u< L) leave()
print(a[i] + a[i+1])
If we now do the transform this patch proposed, we end up with:
if ((i +nuw 1) u< L) leave_appropriately()
print(a[i] + a[i+1])
That would be a miscompile when i==-1. The problem here is that the control dependent nuw bits got used to prove something about the first condition. That's obviously invalid.
This won't happen today, but since I plan to enhance LVI/CVP with exactly that transform at some point in the not too distant future...
llvm-svn: 250430
Summary:
x86 codegen is clever about generating good code for relaxed
floating-point operations, but it was being silly when globals and
immediates were involved, forgetting where the global was and
loading/storing from/to the wrong place. The same applied to hard-coded
address immediates.
Don't let it forget about the displacement.
This fixes https://llvm.org/bugs/show_bug.cgi?id=25171
A very similar bug when doing floating-points atomics to the stack is
also fixed by this patch.
This fixes https://llvm.org/bugs/show_bug.cgi?id=25144
Reviewers: pete
Subscribers: llvm-commits, majnemer, rsmith
Differential Revision: http://reviews.llvm.org/D13749
llvm-svn: 250429
This adjusts all integers in the reader/writer to reflect the types
stored on profile files. They should all be unsigned 32-bit or 64-bit
values. Changed all associated internal types to be uint32_t or
uint64_t.
The only place that needed some adjustments is in the sample profile
transformation. Altough the weight read from the profile are 64-bit
values, the internal API for branch weights only accepts 32-bit values.
The pass now saturates weights that overflow uint32_t.
llvm-svn: 250427
Summary:
This macro is needed to prevent test/CodeGen/Mips/2008-08-01-AsmInline.ll from
failing after the integrated assembler is enabled by default.
Reviewers: vkalintiris
Subscribers: llvm-commits, dsanders
Differential Revision: http://reviews.llvm.org/D13654
llvm-svn: 250414
Summary:
The -mcpu=mips16 option caused the Integrated Assembler to crash because
it couldn't figure out the architecture revision number to write to the
.MIPS.abiflags section. This CPU definition has been removed because, like
microMIPS, MIPS16 is an ASE to a base architecture.
Reviewers: vkalintiris
Subscribers: rkotler, llvm-commits, dsanders
Differential Revision: http://reviews.llvm.org/D13656
llvm-svn: 250407
AVX-512 bit shuffle fails on 32 bit since we create a vector of 64-bit constants.
I split 8x64-bit const vector to 16x32 on 32-bit mode.
Differential Revision: http://reviews.llvm.org/D13644
llvm-svn: 250390
(e.g. bss sections).
MachO and ELF have been silently letting this pass, but COFFObjectFile contains
an assertion to catch this kind of (ab)use of the getSectionContents, and this
was causing the JIT to crash on COFF objects with BSS sections. This patch
should fix that.
llvm-svn: 250371
Recommit r250342: move coal-sections-powerpc.s to subdirectory for powerpc.
Some background on why we don't have to use *coal* sections anymore:
Long ago when C++ was new and "weak" had not been standardized, an attempt was
made in cctools to support C++ inlines that can be coalesced by putting them
into their own section (TEXT/textcoal_nt instead of TEXT/text).
The current macho linker supports the weak-def bit on any symbol to allow it to
be coalesced, but the compiler still puts weak-def functions/data into alternate
section names, which the linker must map back to the base section name.
This patch makes changes that are necessary to prevent the compiler from using
the "coal" sections and have it use the non-coal sections instead when the
target architecture is not powerpc:
TEXT/textcoal_nt instead use TEXT/text
TEXT/const_coal instead use TEXT/const
DATA/datacoal_nt instead use DATA/data
If the target is powerpc, we continue to use the *coal* sections since anyone
targeting powerpc is probably using an old linker that doesn't have support for
the weak-def bits.
Also, have the assembler issue a warning if it encounters a *coal* section in
the assembly file and inform the users to use the non-coal sections instead.
rdar://problem/14265330
Differential Revision: http://reviews.llvm.org/D13188
llvm-svn: 250370
With r250345 and r250343, we start to observe the following failure
when bootstrap clang with lto and pgo:
PHI node entries do not match predecessors!
%.sroa.029.3.i = phi %"class.llvm::SDNode.13298"* [ null, %30953 ], [ null, %31017 ], [ null, %30998 ], [ null, %_ZN4llvm8dyn_castINS_14ConstantSDNodeENS_7SDValueEEENS_10cast_rettyIT_T0_E8ret_typeERS5_.exit.i.1804 ], [ null, %30975 ], [ null, %30991 ], [ null, %_ZNK4llvm3EVT13getScalarTypeEv.exit.i.1812 ], [ %..sroa.029.0.i, %_ZN4llvm11SmallVectorIiLj8EED1Ev.exit.i.1826 ], !dbg !451895
label %30998
label %_ZNK4llvm3EVTeqES0_.exit19.thread.i
LLVM ERROR: Broken function found, compilation aborted!
I will re-commit this if the bot does not recover.
llvm-svn: 250366
A PDB can be thought of as a very simple file system. It is
occasionally illuminating to see the contents of the underlying files.
Differential Revision: http://reviews.llvm.org/D13674
llvm-svn: 250356
Recommit r250342: add -arch=ppc32 to the RUN lines of powerpc tests.
Some background on why we don't have to use *coal* sections anymore:
Long ago when C++ was new and "weak" had not been standardized, an attempt was
made in cctools to support C++ inlines that can be coalesced by putting them
into their own section (TEXT/textcoal_nt instead of TEXT/text).
The current macho linker supports the weak-def bit on any symbol to allow it to
be coalesced, but the compiler still puts weak-def functions/data into alternate
section names, which the linker must map back to the base section name.
This patch makes changes that are necessary to prevent the compiler from using
the "coal" sections and have it use the non-coal sections instead when the
target architecture is not powerpc:
TEXT/textcoal_nt instead use TEXT/text
TEXT/const_coal instead use TEXT/const
DATA/datacoal_nt instead use DATA/data
If the target is powerpc, we continue to use the *coal* sections since anyone
targeting powerpc is probably using an old linker that doesn't have support for
the weak-def bits.
Also, have the assembler issue a warning if it encounters a *coal* section in
the assembly file and inform the users to use the non-coal sections instead.
rdar://problem/14265330
Differential Revision: http://reviews.llvm.org/D13188
llvm-svn: 250349
Currently in JumpThreading pass, the branch weight metadata is not updated after CFG modification. Consider the jump threading on PredBB, BB, and SuccBB. After jump threading, the weight on BB->SuccBB should be adjusted as some of it is contributed by the edge PredBB->BB, which doesn't exist anymore. This patch tries to update the edge weight in metadata on BB->SuccBB by scaling it by 1 - Freq(PredBB->BB) / Freq(BB->SuccBB).
This is the third attempt to submit this patch, while the first two led to failures in some FDO tests. After investigation, it is the edge weight normalization that caused those failures. In this patch the edge weight normalization is fixed so that there is no zero weight in the output and the sum of all weights can fit in 32-bit integer. Several unit tests are added.
Differential revision: http://reviews.llvm.org/D10979
llvm-svn: 250345
If we have a series of branches which are all unlikely to fail, we can possibly combine them into a single check on the fastpath combined with a bit of dispatch logic on the slowpath. We don't want to do this unconditionally since it requires speculating instructions past a branch, but if the profiling metadata on the branch indicates profitability, this can reduce the number of checks needed along the fast path.
The canonical example this is trying to handle is removing the second bounds check implied by the Java code: a[i] + a[i+1]. Note that it can currently only do so for really simple conditions and the values of a[i] can't be used anywhere except in the addition. (i.e. the load has to have been sunk already and not prevent speculation.) I plan on extending this transform over the next few days to handle alternate sequences.
Differential Revision: http://reviews.llvm.org/D13070
llvm-svn: 250343
Some background on why we don't have to use *coal* sections anymore:
Long ago when C++ was new and "weak" had not been standardized, an attempt was
made in cctools to support C++ inlines that can be coalesced by putting them
into their own section (TEXT/textcoal_nt instead of TEXT/text).
The current macho linker supports the weak-def bit on any symbol to allow it to
be coalesced, but the compiler still puts weak-def functions/data into alternate
section names, which the linker must map back to the base section name.
This patch makes changes that are necessary to prevent the compiler from using
the "coal" sections and have it use the non-coal sections instead when the
target architecture is not powerpc:
TEXT/textcoal_nt instead use TEXT/text
TEXT/const_coal instead use TEXT/const
DATA/datacoal_nt instead use DATA/data
If the target is powerpc, we continue to use the *coal* sections since anyone
targeting powerpc is probably using an old linker that doesn't have support for
the weak-def bits.
Also, have the assembler issue a warning if it encounters a *coal* section in
the assembly file and inform the users to use the non-coal sections instead.
rdar://problem/14265330
Differential Revision: http://reviews.llvm.org/D13188
llvm-svn: 250342
This is a cleaned up patch from the one written by John Regehr based on the findings of the Souper superoptimizer.
The basic idea here is that input bits that are known zero reduce the maximum count that the intrinsic could return. We know that the number of bits required to represent a particular count is at most log2(N)+1.
Differential Revision: http://reviews.llvm.org/D13253
llvm-svn: 250338
PR25157 identifies a bug where a load plus a vector shuffle is
incorrectly converted into an LXVDSX instruction. That optimization
is only valid if the load is of a doubleword, and in the noted case,
it was not. This corrects that problem.
Joint patch with Eric Schweitz, who provided the bugpoint-reduced test
case.
llvm-svn: 250324
This adds documentation for the binary profile encoding and moves the
documentation for the text encoding into the header file
SampleProfReader.h.
llvm-svn: 250309
Summary:
Caching SDLoc(N), instead of recreating it in every single
function call, keeps the code denser, and allows to unwrap long lines.
Reviewers: sunfish, atrick, sdmitrouk
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D13726
llvm-svn: 250305
Summary: The two implementations had more code in common than not.
Reviewers: sunfish, MatzeB, sdmitrouk
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D13724
llvm-svn: 250302
This patch teaches x86 fast-isel how to select nontemporal stores.
On x86, we can use MOVNTI for nontemporal stores of doublewords/quadwords.
Instructions (V)MOVNTPS/PD/DQ can be used for SSE2/AVX aligned nontemporal
vector stores.
Before this patch, fast-isel always selected 'movd/movq' instead of 'movnti'
for doubleword/quadword nontemporal stores. In the case of nontemporal stores
of aligned vectors, fast-isel always selected movaps/movapd/movdqa instead of
movntps/movntpd/movntdq.
With this patch, if we use SSE2/AVX intrinsics for nontemporal stores we now
always get the expected (V)MOVNT instructions.
The lack of fast-isel support for nontemporal stores was spotted when analyzing
the -O0 codegen for nontemporal stores.
Differential Revision: http://reviews.llvm.org/D13698
llvm-svn: 250285
Binary encoded profiles used to encode all function names inline at
every reference. This is clearly suboptimal in terms of space. This
patch fixes this by adding a name table to the header of the file.
llvm-svn: 250241
We forgot to append the terminatepad's arguments which resulted in us
treating the old terminatepad as an argument to the new terminatepad
causing us to crash immediately. Instead, add the old terminatepad's
arguments to the new terminatepad.
This fixes PR25155.
llvm-svn: 250234
ArchiveMemberHeader, suggestion by Rafael Espíndola.
Also The clang-x86-win2008-selfhost bot still does not like the
malformed-machos 00000031.a test, so removing it for now. All
the other bots are fine with it however.
llvm-svn: 250222
Summary:
Emit the handler and clause locations immediately after the standard
xdata.
Clauses are emitted in the same order and format used to communiate them
to the CLR Execution Engine.
Add a lit test to verify correct table generation on a small but
interesting example function.
Reviewers: majnemer, andrew.w.kaylor, rnk
Subscribers: pgavlin, AndyAyers, llvm-commits
Differential Revision: http://reviews.llvm.org/D13451
llvm-svn: 250219
One of the changes in lib/Target/AMDGPU/AMDGPUMCInstLower.cpp was a new
one. Previously, bundle iterators and single-instruction iterators
could be compared to each other (comparing on underlying pointers).
I changed a comparison from using `MBB->end()` to using
`MBB->instr_end()`, since both end iterators should point at the some
place anyway.
I don't think the implicit conversion between the two iterator types is
a good idea since it's fairly easy to accidentally compare to the wrong
thing (they aren't always end iterators). Otherwise I would have just
added the conversion.
Even with that, no there should be functionality change here.
llvm-svn: 250218
Remove remaining `ilist_iterator` implicit conversions from
LLVMScalarOpts.
This change exposed some scary behaviour in
lib/Transforms/Scalar/SCCP.cpp around line 1770. This patch changes a
call from `Function::begin()` to `&Function::front()`, since the return
was immediately being passed into another function that takes a
`Function*`. `Function::front()` started to assert, since the function
was empty. Note that `Function::end()` does not point at a legal
`Function*` -- it points at an `ilist_half_node` -- so the other
function was getting garbage before. (I added the missing check for
`Function::isDeclaration()`.)
Otherwise, no functionality change intended.
llvm-svn: 250211
Currently in JumpThreading pass, the branch weight metadata is not updated after CFG modification. Consider the jump threading on PredBB, BB, and SuccBB. After jump threading, the weight on BB->SuccBB should be adjusted as some of it is contributed by the edge PredBB->BB, which doesn't exist anymore. This patch tries to update the edge weight in metadata on BB->SuccBB by scaling it by 1 - Freq(PredBB->BB) / Freq(BB->SuccBB).
Differential revision: http://reviews.llvm.org/D10979
llvm-svn: 250204
On Linux, the profile runtime can use __start_SECTNAME and __stop_SECTNAME
symbols defined by the linker to locate the start and end location of
a named section (with C name). This eliminates the need for instrumented
binary to call __llvm_profile_register_function during start-up time.
llvm-svn: 250199
Summary:
Add an iterator that can walk across blocks and which visits the state
transitions rather than state ranges, with explicit transitions to -1
indicating the presence of top-level calls that may throw and cause the
current function to unwind to caller. This will simplify code that needs
to identify nested try regions.
Refactor SEH and C++EH table generation to use the new
InvokeStateChangeIterator, and remove the InvokeLabelIterator they were
using.
Reviewers: majnemer, andrew.w.kaylor, rnk
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D13623
llvm-svn: 250179
As discussed in D13348 - the INSERTQI range combining code is wrong in that it confuses the insertion bit index with an extraction bit index.
The remaining legal combines are very unlikely (especially once we've converted to shuffles in D13348) so I'm removing the optimization.
llvm-svn: 250160
Now that all the known faults with GlobalsAA have been fixed, flip the big switch on -enable-non-lto-gmr again.
Feel free to pester me with any more bugs found, and don't hesitate to flip the switch back off.
llvm-svn: 250157
Weak linkage and friends allow a symbol to be overriden outside the
code generator's model, so GlobalsAA shouldn't assume that anything it
can compute about such a symbol is valid.
llvm-svn: 250156
Now LLVMBitWriter compiles without implicit ilist iterator conversions.
In these cases, the cleanest thing was to switch to range-based for
loops. Since there wasn't much noise I converted sub-loops and parent
loops as a drive-by.
llvm-svn: 250144
In a later commit, `SplitBinaryAdd` will be used outside `IsConstDiff`,
so lift that out. And lift out `IsConstDiff` as
`computeConstantDifference` to keep things clean and to avoid playing
C++ access specifier games.
NFC.
llvm-svn: 250143
Continuing the work from last week to remove implicit ilist iterator
conversions. First related commit was probably r249767, with some more
motivation in r249925. This edition gets LLVMTransformUtils compiling
without the implicit conversions.
No functional change intended.
llvm-svn: 250142
The comment says this was stopped because it was unlikely to be
profitable. This is not true if you want to combine vector loads
with multiple components.
For a simple case that looks like
t0 = load t0 ...
t1 = load t0 ...
t2 = load t0 ...
t3 = load t0 ...
t4 = store t0:1, t0:1
t5 = store t4, t1:0
t6 = store t5, t2:0
t7 = store t6, t3:0
We want to get all of these stores onto a chain
that is a TokenFactor of these N loads. This mostly
solves the AMDGPU merge-stores.ll regressions
with -combiner-alias-analysis for merging vector
stores of vector loads.
llvm-svn: 250138
Summary:
D4796 taught LLVM to fold some atomic integer operations into a single
instruction. The pattern was unaware that the instructions clobbered
flags.
This patch adds the missing EFLAGS definition.
Floating point operations don't set flags, the subsequent fadd
optimization is therefore correct. The same applies for surrounding
load/store optimizations.
Reviewers: rsmith, rtrieu
Subscribers: llvm-commits, reames, morisset
Differential Revision: http://reviews.llvm.org/D13680
llvm-svn: 250135
This basic combine was surprisingly missing.
AMDGPU legalizes many operations in terms of 32-bit vector components,
so not doing this results in many extra copies and subregister extracts
that need to be cleaned up later.
InstCombine already does this for the hasOneUse case. The target hook
is to fix a handful of tests which break (e.g. ARM/vmov.ll) which turn
from a vector materialize repeated immediate instruction to a constant
vector load with more scalar copies from it.
llvm-svn: 250129
When lowering invoke statement, all unwind destinations are directly added as successors of call site block, and the weight of those new edges are not assigned properly. Actually, default weight 16 are used for those edges. This patch calculates the proper edge weights for those edges when collecting all unwind destinations.
Differential revision: http://reviews.llvm.org/D13354
llvm-svn: 250119
We have a number of functions that implement constant folding of vectors (unary and binary ops) in near identical manners (and the differences don't appear to be critical).
This patch introduces a common implementation (SelectionDAG::FoldConstantVectorArithmetic) and calls this in both the unary and binary op cases.
After this initial patch I intend to begin enabling vector constant folding for a wider number of opcodes in SelectionDAG::getNode().
Differential Revision: http://reviews.llvm.org/D13665
llvm-svn: 250118
that caused aborts. This was because of the characters of the ‘Size’ field in
the archive header did not contain decimal characters.
rdar://22983603
llvm-svn: 250117