Commit Graph

39 Commits

Author SHA1 Message Date
David Green d50e188a07 Revert "[ARM][MVE] VPT Blocks: findVCMPToFoldIntoVPS"
This reverts commit e34801c8e6 and the followup due to multiple
problems.

I've tried to keep the tests and RDA parts where possible, as those
still seem useful.
2020-02-02 13:24:05 +00:00
Fangrui Song 9a24488cb6 [CodeGen] Move fentry-insert, xray-instrumentation and patchable-function before addPreEmitPass()
This intention is to move patchable-function before aarch64-branch-targets
(configured in AArch64PassConfig::addPreEmitPass) so that we emit BTI before NOPs
(see https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92424).

This also allows addPreEmitPass() passes to know the precise instruction sizes if they want.

Tried x86-64 Debug/Release builds of ccls with -fxray-instrument -fxray-instruction-threshold=1.
No output difference with this commit and the previous commit.
2020-01-19 00:09:46 -08:00
Anna Welker 346f6b54bd [ARM][MVE] Enable masked gathers from vector of pointers
Adds a pass to the ARM backend that takes a v4i32
gather and transforms it into a call to MVE's
masked gather intrinsics.

Differential Revision: https://reviews.llvm.org/D71743
2020-01-08 13:43:12 +00:00
Sjoerd Meijer e34801c8e6 [ARM][MVE] VPT Blocks: findVCMPToFoldIntoVPS
This is a recommit of D71330, but with a few things fixed and changed:

1) ReachingDefAnalysis: this was not running with optnone as it was checking
skipFunction(), which other analysis passes don't do. I guess this is a
copy-paste from a codegen pass.
2) VPTBlockPass: here I've added skipFunction(), because like most/all
optimisations, we don't want to run this with optnone.

This fixes the issues with the initial/previous commit: the VPTBlockPass was
running with optnone, but ReachingDefAnalysis wasn't, and so VPTBlockPass was
crashing querying ReachingDefAnalysis.

I've added test case mve-vpt-block-optnone.mir to check that we don't run
VPTBlock with optnone.

Differential Revision: https://reviews.llvm.org/D71470
2020-01-07 13:54:47 +00:00
Sjoerd Meijer e91420e17d Revert "[ARM][MVE] findVCMPToFoldIntoVPS. NFC."
This reverts commit 9468e3334b.

There's a test that doesn't like this change. The RDA analysis
gets invalided by changes in the block, which is not taken into
account. Revert while I work on a fix for this.
2019-12-13 11:56:44 +00:00
Sjoerd Meijer 9468e3334b [ARM][MVE] findVCMPToFoldIntoVPS. NFC.
This adds ReachingDefAnalysis (RDA) to the VPTBlock pass, so that we can
reimplement findVCMPToFoldIntoVPS with just a few calls to RDA.

Differential Revision: https://reviews.llvm.org/D71330
2019-12-12 15:41:20 +00:00
Hiroshi Yamauchi d9ae493937 [PGO][PGSO] Instrument the code gen / target passes.
Summary:
Split off of D67120.

Add the profile guided size optimization instrumentation / queries in the code
gen or target passes. This doesn't enable the size optimizations in those passes
yet as they are currently disabled in shouldOptimizeForSize (for non-IR pass
queries).

A second try after reverted D71072.

Reviewers: davidxl

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D71149
2019-12-09 12:42:59 -08:00
Hiroshi Yamauchi 2eb30fafa5 Revert "[PGO][PGSO] Instrument the code gen / target passes."
This reverts commit 9a0b5e1407.

This seems to break buildbots.
2019-12-06 12:17:32 -08:00
Hiroshi Yamauchi 9a0b5e1407 [PGO][PGSO] Instrument the code gen / target passes.
Summary:
Split off of D67120.

Add the profile guided size optimization instrumentation / queries in the code
gen or target passes. This doesn't enable the size optimizations in those passes
yet as they are currently disabled in shouldOptimizeForSize (for non-IR pass
queries).

Reviewers: davidxl

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D71072
2019-12-06 10:43:39 -08:00
Sam Parker 26bf2a510f Fix for buildbots
Change pass name in pipeline test.
2019-12-03 11:30:38 +00:00
Sam Parker bc76dadb3c [CodeGen] Move ARMCodegenPrepare to TypePromotion
Convert ARMCodeGenPrepare into a generic type promotion pass by:
- Removing the insertion of arm specific intrinsics to handle narrow
  types as we weren't using this.
- Removing ARMSubtarget references.
- Now query a generic TLI object to know which types should be
  promoted and what they should be promoted to.
- Move all codegen tests into Transforms folder and testing using opt
  and not llc, which is how they should have been written in the
  first place...

The pass searches up from icmp operands in an attempt to safely
promote types so we can avoid generating unnecessary unsigned extends
during DAG ISel.

Differential Revision: https://reviews.llvm.org/D69556
2019-12-03 11:12:52 +00:00
Sam Parker cced971fd3 [ARM][ReachingDefs] RDA in LoLoops
Add several new methods to ReachingDefAnalysis:
- getReachingMIDef, instead of returning an integer, return the
  MachineInstr that produces the def.
- getInstFromId, return a MachineInstr for which the given integer
  corresponds to.
- hasSameReachingDef, return whether two MachineInstr use the same
  def of a register.
- isRegUsedAfter, return whether a register is used after a given
  MachineInstr.

These methods have been used in ARMLowOverhead to replace searching
for uses/defs.

Differential Revision: https://reviews.llvm.org/D70009
2019-11-26 10:13:46 +00:00
David Green 7d9af03ff7 [Scheduling][ARM] Consistently enable PostRA Machine scheduling
In the ARM backend, for historical reasons we have only some targets
using Machine Scheduling. The rest use the old list scheduler as they
are using itinaries and the list scheduler seems to produce better code
(and not crash running out of register on v6m codes). So whether to use
the MIScheduler or not is checked at runtime from the subtarget
features.

This is fine, except for post-ra scheduling. Whether to use the old
post-ra list scheduler or the post-ra machine schedule is decided as the
pass manager is set up, in arms case from a newly constructed subtarget.
Under some situations, like LTO, this won't include the correct cpu so
can pick the wrong option. This can have a surprising effect on
performance.

To fix that, this patch overrides targetSchedulesPostRAScheduling and
addPreSched2 in the ARM backend, adding _both_ post-ra schedulers and
picking at runtime which to execute. To pick between the two I've had to
add a enablePostRAMachineScheduler() method that normally returns
enableMachineScheduler() && enablePostRAScheduler(), which can be
overridden to enable just one of PostRAMachineScheduler vs
PostRAScheduler.

Thanks to David Penry for the identifying this problem.

Differential Revision: https://reviews.llvm.org/D69775
2019-11-05 10:44:55 +00:00
Sjoerd Meijer 92164cf25d Recommit "[HardwareLoops] Optimisation remarks"
With a few things fixed:
- initialisaiton of the optimisation remark pass (this was causing the buildbot
  failures on PPC),
- a test case.

Differential Revision: https://reviews.llvm.org/D69660
2019-11-05 09:06:22 +00:00
Sjoerd Meijer 5a13188966 Revert "[HardwareLoops] Optimisation remarks"
while I investigate the PPC build bot failures.

This reverts commit ad76375156.

llvm-svn: 374992
2019-10-16 10:55:06 +00:00
Sjoerd Meijer ad76375156 [HardwareLoops] Optimisation remarks
This adds the initial plumbing to support optimisation remarks in
the IR hardware-loop pass.

I have left a todo in a comment where we can improve the reporting,
and will iterate on that now that we have this initial support in.

Differential Revision: https://reviews.llvm.org/D68579

llvm-svn: 374980
2019-10-16 09:09:55 +00:00
Joerg Sonnenberger 9681ea9560 Reapply r374743 with a fix for the ocaml binding
Add a pass to lower is.constant and objectsize intrinsics

This pass lowers is.constant and objectsize intrinsics not simplified by
earlier constant folding, i.e. if the object given is not constant or if
not using the optimized pass chain. The result is recursively simplified
and constant conditionals are pruned, so that dead blocks are removed
even for -O0. This allows inline asm blocks with operand constraints to
work all the time.

The new pass replaces the existing lowering in the codegen-prepare pass
and fallbacks in SDAG/GlobalISEL and FastISel. The latter now assert
on the intrinsics.

Differential Revision: https://reviews.llvm.org/D65280

llvm-svn: 374784
2019-10-14 16:15:14 +00:00
Dmitri Gribenko 1a21f98ac3 Revert "Add a pass to lower is.constant and objectsize intrinsics"
This reverts commit r374743. It broke the build with Ocaml enabled:
http://lab.llvm.org:8011/builders/clang-x86_64-debian-fast/builds/19218

llvm-svn: 374768
2019-10-14 12:22:48 +00:00
Joerg Sonnenberger e4300c392d Add a pass to lower is.constant and objectsize intrinsics
This pass lowers is.constant and objectsize intrinsics not simplified by
earlier constant folding, i.e. if the object given is not constant or if
not using the optimized pass chain. The result is recursively simplified
and constant conditionals are pruned, so that dead blocks are removed
even for -O0. This allows inline asm blocks with operand constraints to
work all the time.

The new pass replaces the existing lowering in the codegen-prepare pass
and fallbacks in SDAG/GlobalISEL and FastISel. The latter now assert
on the intrinsics.

Differential Revision: https://reviews.llvm.org/D65280

llvm-svn: 374743
2019-10-13 23:00:15 +00:00
Jakub Kuderski 7ed4fb389b Add a missing pass in ARM O3 pipeline
llvm-svn: 373382
2019-10-01 18:53:54 +00:00
Jakub Kuderski 856c1cd852 [Dominators][CodeGen] Don't mark MachineDominatorTree as preserved in MachineLICM
llvm-svn: 373378
2019-10-01 18:27:44 +00:00
David Green c42ca16cfa [ARM] Fixup pipeline test. NFC
llvm-svn: 372133
2019-09-17 15:25:24 +00:00
Sam Parker 95b28a4c72 [ARM] LE support in ConstantIslands
The low-overhead branch extension provides a loop-end 'LE' instruction
that performs no decrement nor compare, it just jumps backwards. This
patch modifies the constant islands pass to try to insert LE
instructions in place of a Thumb2 conditional branch, instead of
shrinking it. This only happens if a cmp can be converted to a cbn/z
and used to exit the loop.

Differential Revision: https://reviews.llvm.org/D67404

llvm-svn: 372085
2019-09-17 09:08:05 +00:00
Dmitri Gribenko 2bf8d77453 Revert "Reland "r364412 [ExpandMemCmp][MergeICmps] Move passes out of CodeGen into opt pipeline.""
This reverts commit r371502, it broke tests
(clang/test/CodeGenCXX/auto-var-init.cpp).

llvm-svn: 371507
2019-09-10 10:39:09 +00:00
Clement Courbet 612c260ec3 Reland "r364412 [ExpandMemCmp][MergeICmps] Move passes out of CodeGen into opt pipeline."
With a fix for sanitizer breakage (see explanation in D60318).

llvm-svn: 371502
2019-09-10 09:18:00 +00:00
Sam Parker 29bf68fcfa [ARM] Fix for buildbot
llvm-svn: 371187
2019-09-06 09:36:23 +00:00
Sam Parker a761ba0f2d [ARM][ParallelDSP] Change search for muls
rL369567 reverted a couple of recent changes made to ARMParallelDSP
because of a miscompilation error: PR43073.

The issue stemmed from an underlying bug that was caused by adding
muls into a reduction before it was proved that they could be executed
in parallel with another mul.

Most of the changes here are from the previously reverted commits.
The additional changes have been made area:
1) The Search function now doesn't insert any muls into the Reduction
   object. That now happens once the search has successfully finished.
2) For any muls added into the reduction but that weren't paired, we
   accumulate their values as an input into the smlad.

Differential Revision: https://reviews.llvm.org/D66660

llvm-svn: 370171
2019-08-28 08:51:13 +00:00
Nico Weber ed18e70c86 Revert r367389 (and follow-up r368404); it caused PR43073.
llvm-svn: 369567
2019-08-21 19:53:42 +00:00
Sam Parker 2200a9bdf3 [ARM][ParallelDSP] Convert to function pass
Run across a whole function, visiting each basic block one at a time.

Differential Revision: https://reviews.llvm.org/D65324

llvm-svn: 367389
2019-07-31 07:32:03 +00:00
Kai Luo dec624682e [MachineCSE][MachinePRE] Avoid hoisting code from code regions into hot BBs.
Summary:
Current PRE hoists common computations into
CMBB = DT->findNearestCommonDominator(MBB, MBB1).
However, if CMBB is in a hot loop body, we might get performance
degradation.

Differential Revision: https://reviews.llvm.org/D64394

llvm-svn: 366570
2019-07-19 12:58:16 +00:00
Clement Courbet 2851248fa1 Revert "r364412 [ExpandMemCmp][MergeICmps] Move passes out of CodeGen into opt pipeline."
Breaks sanitizers:
    libFuzzer :: cxxstring.test
    libFuzzer :: memcmp.test
    libFuzzer :: recommended-dictionary.test
    libFuzzer :: strcmp.test
    libFuzzer :: value-profile-mem.test
    libFuzzer :: value-profile-strcmp.test

llvm-svn: 364416
2019-06-26 12:13:13 +00:00
Clement Courbet 7b3a5f0e6d [ExpandMemCmp][MergeICmps] Move passes out of CodeGen into opt pipeline.
This allows later passes (in particular InstCombine) to optimize more
cases.

One that's important to us is `memcmp(p, q, constant) < 0` and memcmp(p, q, constant) > 0.

llvm-svn: 364412
2019-06-26 11:50:18 +00:00
Sam Parker a6fd919cb3 [ARM] DLS/LE low-overhead loop code generation
Introduce three pseudo instructions to be used during DAG ISel to
represent v8.1-m low-overhead loops. One maps to set_loop_iterations
while loop_decrement_reg is lowered to two, so that we can separate
the decrement and branching operations. The pseudo instructions are
expanded pre-emission, where we can still decide whether we actually
want to generate a low-overhead loop, in a new pass:
ARMLowOverheadLoops. The pass currently bails, reverting to an sub,
icmp and br, in the cases where a call or stack spill/restore happens
between the decrement and branching instructions, or if the loop is
too large.

Differential Revision: https://reviews.llvm.org/D63476

llvm-svn: 364288
2019-06-25 10:45:51 +00:00
Matt Arsenault 9cac4e6d14 Rename ExpandISelPseudo->FinalizeISel, delay register reservation
This allows targets to make more decisions about reserved registers
after isel. For example, now it should be certain there are calls or
stack objects in the frame or not, which could have been introduced by
legalization.

Patch by Matthias Braun

llvm-svn: 363757
2019-06-19 00:25:39 +00:00
Sjoerd Meijer 3058a62b90 [ARM] MVE VPT Block Pass
Initial commit of a new pass to create vector predication blocks, called VPT
blocks, that are supported by the Armv8.1-M MVE architecture.

This is a first naive implementation. I.e., for 2 consecutive predicated
instructions I1 and I2, for example, it will generate 2 VPT blocks:

VPST
I1
VPST
I2

A more optimal implementation would obviously put instructions in the same VPT
block when they are predicated on the same condition and when it is allowed to
do this:

VPTT
I1
I2

We will address this optimisation with follow up patches when the groundwork is
in. Creating VPT Blocks is very similar to IT Blocks, which is the reason I
added this to Thumb2ITBlocks.cpp. This allows reuse of the def use analysis
that we need for the more optimal implementation.

VPT blocks cannot be nested in IT blocks, and vice versa, and so these 2 passes
cannot interact with each other. Instructions allowed in VPT blocks must
be MVE instructions that are marked as VPT compatible.

Differential Revision: https://reviews.llvm.org/D63247

llvm-svn: 363370
2019-06-14 11:46:05 +00:00
Sjoerd Meijer c0f43bee37 Follow up of r361810: test case fix attempt for Windows builder
llvm-svn: 361817
2019-05-28 13:04:47 +00:00
Sjoerd Meijer 4df2baadd2 [ARM] Use CHECK-NEXT in CodeGen/ARM/O3-pipeline.ll. NFC.
Use CHECK-NEXT, like in other pipeline tests, so that we actually
notice when the pipeline is changed.

llvm-svn: 361810
2019-05-28 12:06:26 +00:00
Sam Parker f82d4ed771 [ARM] Remove EarlyCSE from backend
There is an issue with early CSE hitting an assert, so temporarily
remove the pass from the Arm backend.
    
Bug: https://bugs.llvm.org/show_bug.cgi?id=41081

Differential Revision: https://reviews.llvm.org/D59410

llvm-svn: 356259
2019-03-15 13:36:37 +00:00
Sam Parker 3b2ba20afd [ARM] Run ARMParallelDSP in the IRPasses phase
Run EarlyCSE before ParallelDSP and do this in the backend IR opt
phase.

Differential Revision: https://reviews.llvm.org/D59257

llvm-svn: 356130
2019-03-14 10:57:40 +00:00