A register in CodeGen can be marked as reserved: In that case we
consider the register always live and do not use (or rather ignore)
kill/dead/undef operand flags.
LiveIntervalAnalysis however tracks liveness per register unit (not per
register). We already needed adjustments for this in r292871 to deal
with super/sub registers. However I did not look at aliased register
there. Looking at ARM:
FPSCR (regunits FPSCR, FPSCR~FPSCR_NZCV) aliases with FPSCR_NZCV
(regunits FPSCR_NZCV, FPSCR~FPSCR_NZCV) hence they share a register unit
(FPSCR~FPSCR_NZCV) that represents the aliased parts of the registers.
This shared register unit was previously considered non-reserved,
however given that we uses of the reserved FPSCR potentially violate
some rules (like uses without defs) we should make FPSCR~FPSCR_NZCV
reserved too and stop tracking liveness for it.
This patch:
- Defines a register unit as reserved when: At least for one root
register, the root register and all its super registers are reserved.
- Adjust LiveIntervals::computeRegUnitRange() for new reserved
definition.
- Add MachineRegisterInfo::isReservedRegUnit() to have a canonical way
of testing.
- Stop computing LiveRanges for reserved register units in HMEditor even
with UpdateFlags enabled.
- Skip verification of uses of reserved reg units in the machine
verifier (this usually didn't happen because there would be no cached
liverange but there is no guarantee for that and I would run into this
case before the HMEditor tweak, so may as well fix the verifier too).
Note that this should only affect ARMs FPSCR/FPSCR_NZCV registers today;
aliased registers are rarely used, the only other cases are hexagons
P0-P3/P3_0 and C8/USR pairs which are not mixing reserved/non-reserved
registers in an alias.
Differential Revision: https://reviews.llvm.org/D37356
llvm-svn: 312348
Summary:
This fixes a bug that was exposed on gfx9 in various
GL45-CTS.shaders.loops.*_iterations.select_iteration_count_fragment tests,
e.g. GL45-CTS.shaders.loops.do_while_uniform_iterations.select_iteration_count_fragment
Reviewers: arsenm
Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits
Differential Revision: https://reviews.llvm.org/D36193
llvm-svn: 312337
Summary:
LoopVectorizer is creating casts between vec<ptr> and vec<float> types
on ARM when compiling OpenCV. Since, tIs is illegal to directly cast a
floating point type to a pointer type even if the types have same size
causing a crash. Fix the crash using a two-step casting by bitcasting
to integer and integer to pointer/float.
Fixes PR33804.
Reviewers: mkuper, Ayal, dlj, rengolin, srhines
Reviewed By: rengolin
Subscribers: aemerson, kristof.beyls, mkazantsev, Meinersbur, rengolin, mzolotukhin, llvm-commits
Differential Revision: https://reviews.llvm.org/D35498
llvm-svn: 312331
Issues addressed since original review:
- Moved removal of dead instructions found by
LiveIntervals::shrinkToUses() outside of loop iterating over
instructions to avoid instructions being deleted while pointed to by
iterator.
- Fixed ARMLoadStoreOptimizer bug exposed by this change in r311907.
- The pass no longer forwards COPYs to physical register uses, since
doing so can break code that implicitly relies on the physical
register number of the use.
- The pass no longer forwards COPYs to undef uses, since doing so
can break the machine verifier by creating LiveRanges that don't
end on a use (since the undef operand is not considered a use).
[MachineCopyPropagation] Extend pass to do COPY source forwarding
This change extends MachineCopyPropagation to do COPY source forwarding.
This change also extends the MachineCopyPropagation pass to be able to
be run during register allocation, after physical registers have been
assigned, but before the virtual registers have been re-written, which
allows it to remove virtual register COPY LiveIntervals that become dead
through the forwarding of all of their uses.
llvm-svn: 312328
In the ROPI relocation model, read-only variables are accessed relative
to the PC. We use the (MOV|LDRLIT)_ga_pcrel pseudoinstructions for this.
llvm-svn: 312323
Test constants as well in the PIC tests. These are also represented as
G_GLOBAL_VALUE, and although they are treated just like other globals
for PIC, they won't be for ROPI, so it's good to have this coverage.
llvm-svn: 312319
Not all of these will be able to be used by atomics because tablegen, but it
still seems like a good change by itself.
Differential Revision: https://reviews.llvm.org/D37345
llvm-svn: 312287
Summary:
Hopefully this also clarifies exactly when and why we're rewriting
certiain S_LOCALs using reference types: We're using the reference type
to stand in for a zero-offset load.
Reviewers: inglorion
Subscribers: llvm-commits, hiraditya
Differential Revision: https://reviews.llvm.org/D37309
llvm-svn: 312247
Summary:
This patch enables the following:
1) Regex based Instruction itineraries for integer instructions.
2) The instructions are grouped as per the nature of the instructions
(move, arithmetic, logic, Misc, Control Transfer).
3) FP instructions and their itineraries are added which includes values
for SSE4A, BMI, BMI2 and SHA instructions.
Patch by Ganesh Gopalasubramanian
Reviewers: RKSimon, craig.topper
Subscribers: vprasad, shivaram, ddibyend, andreadb, javed.absar, llvm-commits
Differential Revision: https://reviews.llvm.org/D36617
llvm-svn: 312237
This is similar to what was done for ARM in SVN r269574; the code
and the test are straight copypaste to the corresponding AArch64
code and test directory.
Differential revision: https://reviews.llvm.org/D37204
llvm-svn: 312223
This adds missed optimization remarks which report viable candidates that
were not outlined because they would increase code size.
Other remarks will come in separate commits.
This will help to diagnose code size regressions and changes in outliner
behaviour in projects using the outliner.
https://reviews.llvm.org/D37085
llvm-svn: 312194
It caused PR34387: Assertion failed: (RegNo < NumRegs && "Attempting to access record for invalid register number!")
> Issues identified by buildbots addressed since original review:
> - Fixed ARMLoadStoreOptimizer bug exposed by this change in r311907.
> - The pass no longer forwards COPYs to physical register uses, since
> doing so can break code that implicitly relies on the physical
> register number of the use.
> - The pass no longer forwards COPYs to undef uses, since doing so
> can break the machine verifier by creating LiveRanges that don't
> end on a use (since the undef operand is not considered a use).
>
> [MachineCopyPropagation] Extend pass to do COPY source forwarding
>
> This change extends MachineCopyPropagation to do COPY source forwarding.
>
> This change also extends the MachineCopyPropagation pass to be able to
> be run during register allocation, after physical registers have been
> assigned, but before the virtual registers have been re-written, which
> allows it to remove virtual register COPY LiveIntervals that become dead
> through the forwarding of all of their uses.
llvm-svn: 312178
Summary:
Remove a check for `ARMSubtarget::isTargetDarwin` when determining
whether to use Swift error registers, so that Swift errors work
properly on non-Darwin ARM32 targets (specifically Android).
Before this patch, generated code would save and restores ARM register r8 at
the entry and returns of a function that throws. As r8 is used as a virtual
return value for the object being thrown, this gets overwritten by the restore,
and calling code is unable to catch the error. In turn this caused Swift code
that used `do`/`try`/`catch` to work improperly on Android ARM32 targets.
Addresses Swift bug report https://bugs.swift.org/browse/SR-5438.
Patch by John Holdsworth.
Reviewers: manmanren, rjmccall, aschwaighofer
Reviewed By: aschwaighofer
Subscribers: srhines, aschwaighofer, aemerson, javed.absar, kristof.beyls, llvm-commits
Differential Revision: https://reviews.llvm.org/D35835
llvm-svn: 312164
Added a combiner which can clean up truncs/extends that are created in
order to make the types work during legalization.
Also moved the combineMerges to the LegalizeCombiner.
https://reviews.llvm.org/D36880
llvm-svn: 312158
Issues identified by buildbots addressed since original review:
- Fixed ARMLoadStoreOptimizer bug exposed by this change in r311907.
- The pass no longer forwards COPYs to physical register uses, since
doing so can break code that implicitly relies on the physical
register number of the use.
- The pass no longer forwards COPYs to undef uses, since doing so
can break the machine verifier by creating LiveRanges that don't
end on a use (since the undef operand is not considered a use).
[MachineCopyPropagation] Extend pass to do COPY source forwarding
This change extends MachineCopyPropagation to do COPY source forwarding.
This change also extends the MachineCopyPropagation pass to be able to
be run during register allocation, after physical registers have been
assigned, but before the virtual registers have been re-written, which
allows it to remove virtual register COPY LiveIntervals that become dead
through the forwarding of all of their uses.
llvm-svn: 312154
This change simplifies code that has to deal with
DIGlobalVariableExpression and mirrors how we treat DIExpressions in
debug info intrinsics. Before this change there were two ways of
representing empty expressions on globals, a nullptr and an empty
!DIExpression().
If someone needs to upgrade out-of-tree testcases:
perl -pi -e 's/(!DIGlobalVariableExpression\(var: ![0-9]*)\)/\1, expr: !DIExpression())/g' <MYTEST.ll>
will catch 95%.
llvm-svn: 312144
Selecting 32-bit element logical ops without a select or broadcast requires matching a bitconvert on the inputs to the and. But that's a weird thing to rely on. It's entirely possible that one of the inputs doesn't have a bitcast and one does.
Since there's no functional difference, just remove the extra patterns and save some isel table size.
Differential Revision: https://reviews.llvm.org/D36854
llvm-svn: 312138
Summary:
Reverts r311008 to reinstate r310825 with a fix.
Refine alias checking for pseudo vs value to be conservative.
This fixes the original failure in builtbot unittest SingleSource/UnitTests/2003-07-09-SignedArgs.
Reviewers: hfinkel, nemanjai, efriedma
Reviewed By: efriedma
Subscribers: bjope, mcrosier, nhaehnle, javed.absar, llvm-commits
Differential Revision: https://reviews.llvm.org/D36900
llvm-svn: 312126
This patch supports one more pattern for bbit0 and bbit1
instructions, CBranchBitNum class is expanded so it can
take 32 bit immidate.
Differential Revision: https://reviews.llvm.org/D36222
llvm-svn: 312111
Support for scalars was committed in r311154, this adds support for allowing
v4f16 vector types (thus avoiding conversions from/to single precision for
these types).
Differential Revision: https://reviews.llvm.org/D37145
llvm-svn: 312104
NFC.
Replaced duplicated HASWELL prefixes in run commands in the X86 Code Gen regression tests by the SKYLAKE prefix when the -mcpu is set to skylake.
The fix is needed in preparation of an upcoming patch containing the Skylake scheduling info.
Reviewers: zvi, RKSimon, aymanmus, igorb
Differential Revision: https://reviews.llvm.org/D37258
llvm-svn: 312103
Summary:
This patch adjusts the patterns to make the result type of the broadcast node vXf64/vXi64. Then adds a bitcast to vXi32 after that. Intrinsic lowering was also adjusted to generate this new pattern.
Fixes PR34357
We should probably just drop the intrinsic entirely and use native IR, but I'll leave that for a future patch.
Any idea what instruction we should be lowering the floating point 128-bit result version of this pattern to? There's a 128-bit v2i32 integer broadcast but not an fp one.
Reviewers: aymanmus, zvi, igorb
Reviewed By: aymanmus
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D37286
llvm-svn: 312101
This enables the use of a smaller encoding by using a VEX instruction when possible.
Differential Revision: https://reviews.llvm.org/D37092
llvm-svn: 312100
Currently we start applying this on Haswell and newer. I don't believe anything changed in the Haswell architecture to make this the right cutoff point. The partial flag handling around this has been roughly the same since Sandybridge.
Differential Revision: https://reviews.llvm.org/D37250
llvm-svn: 312099
Summary:
Currently we determine if macro fusion is supported based on the AVX flag as a proxy for the processor being Sandy Bridge".
This is really strange as now AMD supports AVX. It also means if user explicitly disables AVX we disable macro fusion.
This patch adds an explicit macro fusion feature. I've also enabled for the generic 64-bit CPU (which doesn't have AVX)
This is probably another candidate for being in the MI layer, but for now I at least wanted to correct the overloading of the AVX feature.
Reviewers: spatel, chandlerc, RKSimon, zvi
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D37280
llvm-svn: 312097
If denorms are not flushed we can use max instead of multiplication
by 1. For double that is simply faster, while for float and half
it is shorter, because mul uses constant bus and VOP3.
Differential Revision: https://reviews.llvm.org/D36856
llvm-svn: 312095
Summary:
Based on Fred's patch here: https://reviews.llvm.org/D6771
I can't seem to commandeer the old review, so I'm creating a new one.
With that change the locations exrpessions are pretty printed inline in the
DIE tree. The output looks like this for debug_loc entries:
DW_AT_location [DW_FORM_data4] (0x00000000
0x0000000000000001 - 0x000000000000000b: DW_OP_consts +3
0x000000000000000b - 0x0000000000000012: DW_OP_consts +7
0x0000000000000012 - 0x000000000000001b: DW_OP_reg0 RAX, DW_OP_piece 0x4
0x000000000000001b - 0x0000000000000024: DW_OP_breg5 RDI+0)
And like this for debug_loc.dwo entries:
DW_AT_location [DW_FORM_sec_offset] (0x00000000
Addr idx 2 (w/ length 190): DW_OP_consts +0, DW_OP_stack_value
Addr idx 3 (w/ length 23): DW_OP_reg0 RAX, DW_OP_piece 0x4)
Simple locations without ranges are printed inline:
DW_AT_location [DW_FORM_block1] (DW_OP_reg4 RSI, DW_OP_piece 0x4, DW_OP_bit_piece 0x20 0x0)
The debug_loc(.dwo) dumping in changed accordingly to factor the code.
Reviewers: dblaikie, aprantl, friss
Subscribers: mgorny, javed.absar, hiraditya, llvm-commits, JDevlieghere
Differential Revision: https://reviews.llvm.org/D37123
llvm-svn: 312042
Summary:
Some variables show up in Visual Studio as "optimized out" even in -O0
-Od builds. This change fixes two issues that would cause this to
happen. The first issue is that not all DIExpressions we generate were
recognized by the CodeView writer. This has been addressed by adding
support for DW_OP_constu, DW_OP_minus, and DW_OP_plus. The second
issue is that we had no way to encode DW_OP_deref in CodeView. We get
around that by changinge the type we encode in the debug info to be
a reference to the type in the source code.
This fixes PR34261.
The reland adds two extra checks to the original: It checks if the
DbgVariableLocation is valid before checking any of its fields, and
it only emits ranges with nonzero registers.
Reviewers: aprantl, rnk, zturner
Reviewed By: rnk
Subscribers: mgorny, llvm-commits, aprantl, hiraditya
Differential Revision: https://reviews.llvm.org/D36907
llvm-svn: 312034
Support the selection of G_GLOBAL_VALUE in the PIC relocation model. For
simplicity we use the same pseudoinstructions for both Darwin and ELF:
(MOV|LDRLIT)_ga_pcrel(_ldr).
This is new for ELF, so it requires a small update to the ARM pseudo
expansion pass to make sure it adds the correct constant pool modifier
and add-current-address in the case of ELF.
Differential Revision: https://reviews.llvm.org/D36507
llvm-svn: 311992
Summary: Knights Landing, because it is Atom derived, has slow two memory operand instructions. Mark the Knights Landing CPU model accordingly.
Patch by David Zarzycki.
Reviewers: craig.topper
Reviewed By: craig.topper
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D37224
llvm-svn: 311979
Summary:
Some variables show up in Visual Studio as "optimized out" even in -O0
-Od builds. This change fixes two issues that would cause this to
happen. The first issue is that not all DIExpressions we generate were
recognized by the CodeView writer. This has been addressed by adding
support for DW_OP_constu, DW_OP_minus, and DW_OP_plus. The second
issue is that we had no way to encode DW_OP_deref in CodeView. We get
around that by changinge the type we encode in the debug info to be
a reference to the type in the source code.
This fixes PR34261.
Reviewers: aprantl, rnk, zturner
Reviewed By: rnk
Subscribers: mgorny, llvm-commits, aprantl, hiraditya
Differential Revision: https://reviews.llvm.org/D36907
llvm-svn: 311957
Summary:
STRQro* instructions are slower than the alternative ADD/STRQui expanded
instructions on Falkor, so avoid generating them unless we're optimizing
for code size.
Reviewers: t.p.northover, mcrosier
Subscribers: aemerson, rengolin, javed.absar, kristof.beyls, llvm-commits
Differential Revision: https://reviews.llvm.org/D37020
llvm-svn: 311931
ARMv4 doesn't support the "BX" instruction, which has been introduced
with ARMv4t. Adjust the call lowering and tail call implementation
accordingly.
Further changes are necessary to ensure that presence of the v4t feature
is correctly set. Most importantly, the "generic" CPU for thumb-*
triples should include ARMv4t, since thumb mode without thumb support
would naturally be pointless.
Add a couple of asserts to ensure thumb instructions are not emitted
without CPU support.
Differential Revision: https://reviews.llvm.org/D37030
llvm-svn: 311921
Summary:
ARMLoadStoreOpt::FixInvalidRegPairOp() was only checking if one of the
load destination registers to be split overlapped with the base register
if the base register was marked as killed. Since kill flags may not
always be present, this can lead to incorrect code.
This bug was exposed by my MachineCopyPropagation change D30751 breaking
the sanitizer-x86_64-linux-android buildbot.
Also clean up some dead code and add an assert that a register offset is
never encountered by this code, since it does not handle them correctly.
Reviewers: MatzeB, qcolombet, t.p.northover
Subscribers: aemerson, javed.absar, kristof.beyls, mcrosier, llvm-commits
Differential Revision: https://reviews.llvm.org/D37164
llvm-svn: 311907