It breaks up the function pass manager in the codegen pipeline.
With empty parameters, it looks at the -mllvm flag -rewrite-map-file.
This is likely not in use.
Add a check that we only have one function pass manager in the codegen
pipeline.
Some tests relied on the fact that we had a module pass somewhere in the
codegen pipeline.
addr-label.ll crashes on ARM due to this change. This is because a
ARMConstantPoolConstant containing a BasicBlock to represent a
blockaddress may hold an invalid pointer to a BasicBlock if the
blockaddress is invalidated by its BasicBlock getting removed. In that
case all referencing blockaddresses are RAUW a constant int. Making
ARMConstantPoolConstant::CVal a WeakVH fixes the crash, but I'm not sure
that's the right fix. As a workaround, create a barrier right before
ISel so that IR optimizations can't happen while a
ARMConstantPoolConstant has been created.
Reviewed By: rnk, MaskRay, compnerd
Differential Revision: https://reviews.llvm.org/D99707
MVE does not have a single sext/zext or trunc instruction that takes the
bottom half of a vector and extends to a full width, like NEON has with
MOVL. Instead it is expected that this happens through top/bottom
instructions. So the MVE equivalent VMOVLT/B instructions take either
the even or odd elements of the input and extend them to the larger
type, producing a vector with half the number of elements each of double
the bitwidth. As there is no simple instruction for a normal extend, we
often have to expand sext/zext/trunc into a series of lane moves (or
stack loads/stores, which we do not do yet).
This pass takes vector code that starts at truncs, looks for
interconnected blobs of operations that end with sext/zext and
transforms them by adding shuffles so that the lanes are interleaved and
the MVE VMOVL/VMOVN instructions can be used. This is done pre-ISel so
that it can work across basic blocks.
This initial version of the pass just handles a limited set of
instructions, not handling constants or splats or FP, which can all come
as extensions to this base.
Differential Revision: https://reviews.llvm.org/D95804
Blocks can be laid out such that a t2WhileLoopStart branches backwards. This is forbidden by the architecture and so it fails to be converted into a low-overhead loop. This new pass checks for these cases and moves the target block, fixing any fall-through that would then be broken.
Differential Revision: https://reviews.llvm.org/D92385
Some processors may speculatively execute the instructions immediately
following indirect control flow, such as returns, indirect jumps and
indirect function calls.
To avoid a potential miss-speculatively executed gadget after these
instructions leaking secrets through side channels, this pass places a
speculation barrier immediately after every indirect control flow where
control flow doesn't return to the next instruction, such as returns and
indirect jumps, but not indirect function calls.
Hardening of indirect function calls will be done in a later,
independent patch.
This patch is implementing the same functionality as the AArch64 counter
part implemented in https://reviews.llvm.org/D81400.
For AArch64, returns and indirect jumps only occur on RET and BR
instructions and hence the function attribute to control the hardening
is called "harden-sls-retbr" there. On AArch32, there is a much wider
variety of instructions that can trigger an indirect unconditional
control flow change. I've decided to stick with the name
"harden-sls-retbr" as introduced for the corresponding AArch64
mitigation.
This patch implements this for ARM mode. A future patch will extend this
to also support Thumb mode.
The inserted barriers are never on the correct, architectural execution
path, and therefore performance overhead of this is expected to be low.
To ensure these barriers are never on an architecturally executed path,
when the harden-sls-retbr function attribute is present, indirect
control flow is never conditionalized/predicated.
On targets that implement that Armv8.0-SB Speculation Barrier extension,
a single SB instruction is emitted that acts as a speculation barrier.
On other targets, a DSB SYS followed by a ISB is emitted to act as a
speculation barrier.
These speculation barriers are implemented as pseudo instructions to
avoid later passes to analyze them and potentially remove them.
The mitigation is off by default and can be enabled by the
harden-sls-retbr subtarget feature.
Differential Revision: https://reviews.llvm.org/D92395
No longer rely on an external tool to build the llvm component layout.
Instead, leverage the existing `add_llvm_componentlibrary` cmake function and
introduce `add_llvm_component_group` to accurately describe component behavior.
These function store extra properties in the created targets. These properties
are processed once all components are defined to resolve library dependencies
and produce the header expected by llvm-config.
Differential Revision: https://reviews.llvm.org/D90848
Adds a pass to the ARM backend that takes a v4i32
gather and transforms it into a call to MVE's
masked gather intrinsics.
Differential Revision: https://reviews.llvm.org/D71743
Convert ARMCodeGenPrepare into a generic type promotion pass by:
- Removing the insertion of arm specific intrinsics to handle narrow
types as we weren't using this.
- Removing ARMSubtarget references.
- Now query a generic TLI object to know which types should be
promoted and what they should be promoted to.
- Move all codegen tests into Transforms folder and testing using opt
and not llc, which is how they should have been written in the
first place...
The pass searches up from icmp operands in an attempt to safely
promote types so we can avoid generating unnecessary unsigned extends
during DAG ISel.
Differential Revision: https://reviews.llvm.org/D69556
The MVE and LOB extensions of Armv8.1m can be combined to enable
'tail predication' which removes the need for a scalar remainder
loop after vectorization. Lane predication is performed implicitly
via a system register. The effects of predication is described in
Section B5.6.3 of the Armv8.1-m Arch Reference Manual, the key points
being:
- For vector operations that perform reduction across the vector and
produce a scalar result, whether the value is accumulated or not.
- For non-load instructions, the predicate flags determine if the
destination register byte is updated with the new value or if the
previous value is preserved.
- For vector store instructions, whether the store occurs or not.
- For vector load instructions, whether the value that is loaded or
whether zeros are written to that element of the destination
register.
This patch implements a pass that takes a hardware loop, containing
masked vector instructions, and converts it something that resembles
an MVE tail predicated loop. Currently, if we had code generation,
we'd generate a loop in which the VCTP would generate the predicate
and VPST would then setup the value of VPR.PO. The loads and stores
would be placed in VPT blocks so this is not tail predication, but
normal VPT predication with the predicate based upon a element
counting induction variable. Further work needs to be done to finally
produce a true tail predicated loop.
Because only the loads and stores are predicated, in both the LLVM IR
and MIR level, we will restrict support to only lane-wise operations
(no horizontal reductions). We will perform a final check on MIR
during loop finalisation too.
Another restriction, specific to MVE, is that all the vector
instructions need operate on the same number of elements. This is
because predication is performed at the byte level and this is set
on entry to the loop, or by the VCTP instead.
Differential Revision: https://reviews.llvm.org/D65884
llvm-svn: 371179
This just pulls the MVEVPTBlockPass into a separate file, as opposed to being
wrapped up in Thumb2ITBlockPass.
Differential revision: https://reviews.llvm.org/D66579
llvm-svn: 370187
Introduce three pseudo instructions to be used during DAG ISel to
represent v8.1-m low-overhead loops. One maps to set_loop_iterations
while loop_decrement_reg is lowered to two, so that we can separate
the decrement and branching operations. The pseudo instructions are
expanded pre-emission, where we can still decide whether we actually
want to generate a low-overhead loop, in a new pass:
ARMLowOverheadLoops. The pass currently bails, reverting to an sub,
icmp and br, in the cases where a call or stack spill/restore happens
between the decrement and branching instructions, or if the loop is
too large.
Differential Revision: https://reviews.llvm.org/D63476
llvm-svn: 364288
Create the ARMBasicBlockUtils class for tracking and querying basic
blocks sizes so we can use them when generating low-overhead loops.
Differential Revision: https://reviews.llvm.org/D63265
llvm-svn: 363530
For some targets, there is a circular dependency between InstPrinter and
MCTargetDesc. Merging them together will fix this. For the other targets,
the merging is to maintain consistency so all targets will have the same
structure.
llvm-svn: 360490
Arm specific codegen prepare is implemented to perform type promotion
on icmp operands, which can enable the removal of uxtb and uxth
(unsigned extend) instructions. This is possible because performing
type promotion before ISel alleviates this duty from the DAG builder
which has to perform legalisation, but has a limited view on data
ranges.
The pass visits any instruction operand of an icmp and creates a
worklist to traverse the use-def tree to determine whether the values
can simply be promoted. Our concern is values in the registers
overflowing the narrow (i8, i16) data range, so instructions marked
with nuw can be promoted easily. For add and sub instructions, we are
able to use the parallel dsp instructions to operate on scalar data
types and avoid overflowing bits. Underflowing adds and subs are also
permitted when the result is only used by an unsigned icmp.
Differential Revision: https://reviews.llvm.org/D48832
llvm-svn: 337687
Armv6 introduced instructions to perform 32-bit SIMD operations. The purpose of
this pass is to do some straightforward IR pattern matching to create ACLE DSP
intrinsics, which map on these 32-bit SIMD operations.
Currently, only the SMLAD instruction gets recognised. This instruction
performs two multiplications with 16-bit operands, and stores the result in an
accumulator. We will follow this up with patches to recognise SMLAD in more
cases, and also to generate other DSP instructions (like e.g. SADD16).
Patch by: Sam Parker and Sjoerd Meijer
Differential Revision: https://reviews.llvm.org/D48128
llvm-svn: 335850
Makes it easier to see mistakes such as the one fixed in r329178 and makes
the different target CMakeLists more consistent.
Also remove some stale-looking comments from the Nios2 target cmakefile.
No intended behavior change.
llvm-svn: 329181
With this change, the GlobalISel library gets always built. In
particular, this is not possible to opt GlobalISel out of the build
using the LLVM_BUILD_GLOBAL_ISEL variable any more.
llvm-svn: 309990
This patch cleans up and fixes issues in the M-Class system register handling:
1. It defines the system registers and the encoding (SYSm values) in one place:
a new ARMSystemRegister.td using SearchableTable, thereby removing the
hand-coded values which existed in multiple places.
2. Some system registers e.g. BASEPRI_MAX_NS which do not exist were being allowed!
Ref: ARMv6/7/8M architecture reference manual.
Reviewed by: @t.p.northover, @olist01, @john.brawn
Differential Revision: https://reviews.llvm.org/D35209
llvm-svn: 308456
Summary:
This patch adds a macro fusion using CodeGen/MacroFusion.cpp to pair AES
instructions back to back and adds FeatureFuseAES to enable the feature.
Reviewers: evandro, javed.absar, rengolin, t.p.northover
Reviewed By: javed.absar
Subscribers: aemerson, mgorny, kristof.beyls, llvm-commits
Differential Revision: https://reviews.llvm.org/D34142
llvm-svn: 305988
Emit and use the TableGen instruction selector for ARM. At the moment,
this allows us to remove the hand-written code for selecting G_SDIV and
G_UDIV.
Future commits will focus on increasing the code coverage for it and
removing more dead code from the current instruction selector.
llvm-svn: 301905
Summary:
This patch tablegen-erates the ARM register bank information so that the
static tables added in D27807 no longer need to be maintained.
Depends on D27338
Reviewers: t.p.northover, rovka, ab, qcolombet, aditya_nandakumar
Reviewed By: rovka
Subscribers: aemerson, rengolin, mgorny, dberris, kristof.beyls, llvm-commits
Differential Revision: https://reviews.llvm.org/D28567
llvm-svn: 294124
functions so that the size computation is available not only in ConstantIslands
but in other passes as well.
Differential Revision: https://reviews.llvm.org/D22640
llvm-svn: 276399
be deleted. This will be reapplied as soon as possible and before
the 3.6 branch date at any rate.
Approved by Jim Grosbach, Lang Hames, Rafael Espindola.
This reverts commits r215111, 215115, 215116, 215117, 215136.
llvm-svn: 215154
I am sure we will be finding bits and pieces of dead code for years to
come, but this is a good start.
Thanks to Lang Hames for making MCJIT a good replacement!
llvm-svn: 215111
Still only 32-bit ARM using it at this stage, but the promotion allows
direct testing via opt and is a reasonably self-contained patch on the
way to switching ARM64.
At this point, other targets should be able to make use of it without
too much difficulty if they want. (See ARM64 commit coming soon for an
example).
llvm-svn: 206485
The previous situation where ATOMIC_LOAD_WHATEVER nodes were expanded
at MachineInstr emission time had grown to be extremely large and
involved, to account for the subtly different code needed for the
various flavours (8/16/32/64 bit, cmpxchg/add/minmax).
Moving this transformation into the IR clears up the code
substantially, and makes future optimisations much easier:
1. an atomicrmw followed by using the *new* value can be more
efficient. As an IR pass, simple CSE could handle this
efficiently.
2. Making use of cmpxchg success/failure orderings only has to be done
in one (simpler) place.
3. The common "cmpxchg; did we store?" idiom can be exposed to
optimisation.
I intend to gradually improve this situation within the ARM backend
and make sure there are no hidden issues before moving the code out
into CodeGen to be shared with (at least ARM64/AArch64, though I think
PPC & Mips could benefit too).
llvm-svn: 205525
ARM specific optimiztion, finding places in ARM machine code where 2 dmbs
follow one another, and eliminating one of them.
Patch by Reinoud Elhorst.
llvm-svn: 205409
I think, in principle, intrinsics_gen may be added explicitly.
That said, it can be added incidentally, since each target already has dependencies to llvm-tblgen.
Almost all source files depend on both CommonTaleGen and intrinsics_gen.
Explicit add_dependencies() have been pruned under lib/Target.
llvm-svn: 195929
add_public_tablegen_target adds *CommonTableGen to LLVM_COMMON_DEPENDS.
LLVM_COMMON_DEPENDS affects add_llvm_library (and other add_target stuff) within its scope.
llvm-svn: 195927
Without explicit dependencies, both per-file action and in-CommonTableGen action could run in parallel.
It races to emit *.inc files simultaneously.
llvm-svn: 187780
a TargetMachine to construct (and thus isn't always available), to an
analysis group that supports layered implementations much like
AliasAnalysis does. This is a pretty massive change, with a few parts
that I was unable to easily separate (sorry), so I'll walk through it.
The first step of this conversion was to make TargetTransformInfo an
analysis group, and to sink the nonce implementations in
ScalarTargetTransformInfo and VectorTargetTranformInfo into
a NoTargetTransformInfo pass. This allows other passes to add a hard
requirement on TTI, and assume they will always get at least on
implementation.
The TargetTransformInfo analysis group leverages the delegation chaining
trick that AliasAnalysis uses, where the base class for the analysis
group delegates to the previous analysis *pass*, allowing all but tho
NoFoo analysis passes to only implement the parts of the interfaces they
support. It also introduces a new trick where each pass in the group
retains a pointer to the top-most pass that has been initialized. This
allows passes to implement one API in terms of another API and benefit
when some other pass above them in the stack has more precise results
for the second API.
The second step of this conversion is to create a pass that implements
the TargetTransformInfo analysis using the target-independent
abstractions in the code generator. This replaces the
ScalarTargetTransformImpl and VectorTargetTransformImpl classes in
lib/Target with a single pass in lib/CodeGen called
BasicTargetTransformInfo. This class actually provides most of the TTI
functionality, basing it upon the TargetLowering abstraction and other
information in the target independent code generator.
The third step of the conversion adds support to all TargetMachines to
register custom analysis passes. This allows building those passes with
access to TargetLowering or other target-specific classes, and it also
allows each target to customize the set of analysis passes desired in
the pass manager. The baseline LLVMTargetMachine implements this
interface to add the BasicTTI pass to the pass manager, and all of the
tools that want to support target-aware TTI passes call this routine on
whatever target machine they end up with to add the appropriate passes.
The fourth step of the conversion created target-specific TTI analysis
passes for the X86 and ARM backends. These passes contain the custom
logic that was previously in their extensions of the
ScalarTargetTransformInfo and VectorTargetTransformInfo interfaces.
I separated them into their own file, as now all of the interface bits
are private and they just expose a function to create the pass itself.
Then I extended these target machines to set up a custom set of analysis
passes, first adding BasicTTI as a fallback, and then adding their
customized TTI implementations.
The fourth step required logic that was shared between the target
independent layer and the specific targets to move to a different
interface, as they no longer derive from each other. As a consequence,
a helper functions were added to TargetLowering representing the common
logic needed both in the target implementation and the codegen
implementation of the TTI pass. While technically this is the only
change that could have been committed separately, it would have been
a nightmare to extract.
The final step of the conversion was just to delete all the old
boilerplate. This got rid of the ScalarTargetTransformInfo and
VectorTargetTransformInfo classes, all of the support in all of the
targets for producing instances of them, and all of the support in the
tools for manually constructing a pass based around them.
Now that TTI is a relatively normal analysis group, two things become
straightforward. First, we can sink it into lib/Analysis which is a more
natural layer for it to live. Second, clients of this interface can
depend on it *always* being available which will simplify their code and
behavior. These (and other) simplifications will follow in subsequent
commits, this one is clearly big enough.
Finally, I'm very aware that much of the comments and documentation
needs to be updated. As soon as I had this working, and plausibly well
commented, I wanted to get it committed and in front of the build bots.
I'll be doing a few passes over documentation later if it sticks.
Commits to update DragonEgg and Clang will be made presently.
llvm-svn: 171681