Implement 'retn' simply by aliasing it to the relevant 'ret' instruction
Commit on behalf of coby
Differential Revision: https://reviews.llvm.org/D24346
llvm-svn: 282601
Simplify Consecutive Merge Store Candidate Search
Now that address aliasing is much less conservative, push through
simplified store merging search which only checks for parallel stores
through the chain subgraph. This is cleaner as the separation of
non-interfering loads/stores from the store-merging logic.
Whem merging stores, search up the chain through a single load, and
finds all possible stores by looking down from through a load and a
TokenFactor to all stores visited. This improves the quality of the
output SelectionDAG and generally the output CodeGen (with some
exceptions).
Additional Minor Changes:
1. Finishes removing unused AliasLoad code
2. Unifies the the chain aggregation in the merged stores across
code paths
3. Re-add the Store node to the worklist after calling
SimplifyDemandedBits.
4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is
arbitrary, but seemed sufficient to not cause regressions in
tests.
This finishes the change Matt Arsenault started in r246307 and
jyknight's original patch.
Many tests required some changes as memory operations are now
reorderable. Some tests relying on the order were changed to use
volatile memory operations
Noteworthy tests:
CodeGen/AArch64/argument-blocks.ll -
It's not entirely clear what the test_varargs_stackalign test is
supposed to be asserting, but the new code looks right.
CodeGen/AArch64/arm64-memset-inline.lli -
CodeGen/AArch64/arm64-stur.ll -
CodeGen/ARM/memset-inline.ll -
The backend now generates *worse* code due to store merging
succeeding, as we do do a 16-byte constant-zero store efficiently.
CodeGen/AArch64/merge-store.ll -
Improved, but there still seems to be an extraneous vector insert
from an element to itself?
CodeGen/PowerPC/ppc64-align-long-double.ll -
Worse code emitted in this case, due to the improved store->load
forwarding.
CodeGen/X86/dag-merge-fast-accesses.ll -
CodeGen/X86/MergeConsecutiveStores.ll -
CodeGen/X86/stores-merging.ll -
CodeGen/Mips/load-store-left-right.ll -
Restored correct merging of non-aligned stores
CodeGen/AMDGPU/promote-alloca-stored-pointer-value.ll -
Improved. Correctly merges buffer_store_dword calls
CodeGen/AMDGPU/si-triv-disjoint-mem-access.ll -
Improved. Sidesteps loading a stored value and merges two stores
CodeGen/X86/pr18023.ll -
This test has been removed, as it was asserting incorrect
behavior. Non-volatile stores *CAN* be moved past volatile loads,
and now are.
CodeGen/X86/vector-idiv.ll -
CodeGen/X86/vector-lzcnt-128.ll -
It's basically impossible to tell what these tests are actually
testing. But, looks like the code got better due to the memory
operations being recognized as non-aliasing.
CodeGen/X86/win32-eh.ll -
Both loads of the securitycookie are now merged.
CodeGen/AMDGPU/vgpr-spill-emergency-stack-slot-compute.ll -
This test appears to work but no longer exhibits the spill
behavior.
Reviewers: arsenm, hfinkel, tstellarAMD, nhaehnle, jyknight
Subscribers: wdng, nhaehnle, nemanjai, arsenm, weimingz, niravd, RKSimon, aemerson, qcolombet, resistor, tstellarAMD, t.p.northover, spatel
Differential Revision: https://reviews.llvm.org/D14834
llvm-svn: 282600
Summary:
This adds the AVRMCTargetDesc file in tree. It allows creation of the
core classes used in the backend.
Reviewers: arsenm, kparzysz
Subscribers: wdng, beanz, mgorny
Differential Revision: https://reviews.llvm.org/D25023
llvm-svn: 282597
The previous data layout caused issues when dealing with atomics.
Foe example, it is illegal to load a 16-bit value with less than 16-bits
of alignment.
This changes the data layout so that all types are aligned by at least
their own width.
Interestingly, this also _slightly_ decreased register pressure in some
cases.
llvm-svn: 282587
The KORTEST was introduced due to a bug where a TEST instruction used a K register.
but, turns out that the opposite case of KORTEST using a GPR is now happening
The change removes the KORTEST flow and adds a COPY instruction from the K reg to a GPR.
Differential Revision: https://reviews.llvm.org/D24953
llvm-svn: 282580
This commit enables more unrolling for SystemZ by implementing the
SystemZTargetTransformInfo::getUnrollingPreferences() method.
It has been found that it is better to only unroll moderately, so the
DefaultUnrollRuntimeCount has been moved into UnrollingPreferences in order
to set this to a lower value for SystemZ (4).
Reviewers: Evgeny Stupachenko, Ulrich Weigand.
https://reviews.llvm.org/D24451
llvm-svn: 282570
This check currently doesn't seem to do anything useful on any in-tree target:
On non-x86, it always evaluates to false, so we never hit the code path that
creates the shuffle with zero.
On x86, it just forwards to isShuffleMaskLegal(), which is a reasonable thing to
query in general, but doesn't make sense if only restricted to zero blends.
Differential Revision: https://reviews.llvm.org/D24625
llvm-svn: 282567
Ever since LAA was split out into an analysis on its own, this function
stopped emitting the report directly. Instead it stores it to be
retrieved by the client which can then emit it as its own report
(e.g. -Rpass-analysis=loop-vectorize).
llvm-svn: 282561
other load commands that use the MachO::dylinker_command type
but not used in llvm libObject code but used in llvm tool code.
This includes LC_ID_DYLINKER, LC_LOAD_DYLINKER
and LC_DYLD_ENVIRONMENT load commands.
llvm-svn: 282553
Another step toward TableGen'ed like structure for the RegisterBankInfo
of AArch64. By doing this, we also save a bit of compile time for the
exact same output.
llvm-svn: 282550
The 'or' case shows up in copysign. The copysign code also had
redundant checking for a scalar zero operand with 'and', so I
removed that.
I'm not sure how to test vector 'and', 'andn', and 'xor' yet,
but it seems better to just include all of the logic ops since
we're fixing 'or' anyway.
llvm-svn: 282546
Summary:
The current implementation of isConstantPhysReg() checks for defs of
physical registers to determine if they are constant. Some
architectures (e.g. AArch64 XZR/WZR) have registers that are constant
and may be used as destinations to indicate the generated value is
discarded, preventing isConstantPhysReg() from returning true. This
change adds a TargetRegisterInfo hook that overrides the no defs check
for cases such as this.
Reviewers: MatzeB, qcolombet, t.p.northover, jmolloy
Subscribers: junbuml, aemerson, mcrosier, rengolin
Differential Revision: https://reviews.llvm.org/D24570
llvm-svn: 282543
There is really no reason for these to be separate.
The vectorizer started this pretty bad tradition that the text of the
missed remarks is pretty meaningless, i.e. vectorization failed. There,
you have to query analysis to get the full picture.
I think we should just explain the reason for missing the optimization
in the missed remark when possible. Analysis remarks should provide
information that the pass gathers regardless whether the optimization is
passing or not.
llvm-svn: 282542
(Re-committed after moving the template specialization under the yaml
namespace. GCC was complaining about this.)
This allows various presentation of this data using an external tool.
This was first recommended here[1].
As an example, consider this module:
1 int foo();
2 int bar();
3
4 int baz() {
5 return foo() + bar();
6 }
The inliner generates these missed-optimization remarks today (the
hotness information is pulled from PGO):
remark: /tmp/s.c:5:10: foo will not be inlined into baz (hotness: 30)
remark: /tmp/s.c:5:18: bar will not be inlined into baz (hotness: 30)
Now with -pass-remarks-output=<yaml-file>, we generate this YAML file:
--- !Missed
Pass: inline
Name: NotInlined
DebugLoc: { File: /tmp/s.c, Line: 5, Column: 10 }
Function: baz
Hotness: 30
Args:
- Callee: foo
- String: will not be inlined into
- Caller: baz
...
--- !Missed
Pass: inline
Name: NotInlined
DebugLoc: { File: /tmp/s.c, Line: 5, Column: 18 }
Function: baz
Hotness: 30
Args:
- Callee: bar
- String: will not be inlined into
- Caller: baz
...
This is a summary of the high-level decisions:
* There is a new streaming interface to emit optimization remarks.
E.g. for the inliner remark above:
ORE.emit(DiagnosticInfoOptimizationRemarkMissed(
DEBUG_TYPE, "NotInlined", &I)
<< NV("Callee", Callee) << " will not be inlined into "
<< NV("Caller", CS.getCaller()) << setIsVerbose());
NV stands for named value and allows the YAML client to process a remark
using its name (NotInlined) and the named arguments (Callee and Caller)
without parsing the text of the message.
Subsequent patches will update ORE users to use the new streaming API.
* I am using YAML I/O for writing the YAML file. YAML I/O requires you
to specify reading and writing at once but reading is highly non-trivial
for some of the more complex LLVM types. Since it's not clear that we
(ever) want to use LLVM to parse this YAML file, the code supports and
asserts that we're writing only.
On the other hand, I did experiment that the class hierarchy starting at
DiagnosticInfoOptimizationBase can be mapped back from YAML generated
here (see D24479).
* The YAML stream is stored in the LLVM context.
* In the example, we can probably further specify the IR value used,
i.e. print "Function" rather than "Value".
* As before hotness is computed in the analysis pass instead of
DiganosticInfo. This avoids the layering problem since BFI is in
Analysis while DiagnosticInfo is in IR.
[1] https://reviews.llvm.org/D19678#419445
Differential Revision: https://reviews.llvm.org/D24587
llvm-svn: 282539
Support overriding the Doxygen & OCamldoc install directories,
and provide a more FHS-compliant defaults for both of them. This extends
r282240 that added this override for Sphinx-built documentation.
LLVM_INSTALL_DOXYGEN_HTML_DIR and LLVM_INSTALL_OCAMLDOC_HTML_DIR are
added, to control the location where Doxygen-generated and
OCamldoc-generated HTML docs are installed appropriately. They both
specify CMake-style install paths, and therefore can either by relative
to the install prefix or absolute.
The new defaults are subdirectories of share/doc/llvm, and replace
the previous directories of 'docs/html' and 'docs/ocaml/html' that
resulted in creating invalid '/usr/docs' that furthermore lacked proper
namespacing for the LLVM package. The new defaults are consistent with
the ones used for Sphinx HTML documentation, differing only in the last
component. Since the 'html' subdirectory is already used for Sphinx
docs, the 'doxygen-html' and 'ocaml-html' directories are used instead.
Differential Revision: https://reviews.llvm.org/D24935
llvm-svn: 282536
Turns out several external projects relied on llvm printing statistics
on exit. Let's go back to this behaviour by default and have an optional
parameter to disable it.
llvm-svn: 282532
LLVM developers might be surprised to learn that there are blocks
without valid insertion points (catchswitch), so it seems worth calling
that out explicitly. Also add a FIXME about what we should really be
doing if we ever need to make optimized Windows EH code debuggable.
While I'm here, make auto usage more consistent with LLVM standards and
avoid an unecessary call to insertBefore.
llvm-svn: 282521
A landing pad can have live-in registers that are defined by the runtime,
not the program (exception pointer register and exception selector
register). Make sure to recognize that case and not link these registers
with any defs in the program.
Each landing pad will have phi nodes added at the beginning to provide
definitions of these registers, but the uses of those phi nodes will not
have any reaching defs.
llvm-svn: 282519
Summary:
The previous output was confusing as it would output "Taget triple:
x86_64-unknown-linux-gnu" even when LLVM_HOST_TRIPLE or
LLVM_DEFAULT_TARGET_TRIPLE were set on the CMake command line
Patch by: Alex Richardson!
Reviewers: beanz
Subscribers: Eugene.Zelenko
Differential Revision: https://reviews.llvm.org/D17067
llvm-svn: 282516