A bool is represented by a single byte, which the ARM ABI requires to be either
0 or 1. So we cannot use G_ANYEXT when legalizing the type.
llvm-svn: 298439
This adds a parameter to @llvm.objectsize that makes it return
conservative values if it's given null.
This fixes PR23277.
Differential Revision: https://reviews.llvm.org/D28494
llvm-svn: 298430
including the amended (no UB anymore) fix for adding/subtracting -2147483648.
This reverts r298328 "[ARM] Revert r297443 and r297820."
and partially reverts r297842 "Revert "[Thumb1] Fix the bug when adding/subtracting -2147483648""
llvm-svn: 298417
Summary: ModuleSummary should use the standard interface of ProfileSummary::getProfileCount.
Reviewers: eraman, tejohnson
Reviewed By: tejohnson
Subscribers: tejohnson, mehdi_amini, llvm-commits
Differential Revision: https://reviews.llvm.org/D31154
llvm-svn: 298404
[Hexagon] Recognize polynomial-modulo loop idiom again
Regain the ability to recognize loops calculating polynomial modulo
operation. This ability has been lost due to some changes in the
preceding optimizations. Add code to preprocess the IR to a form
that the pattern matching code can recognize.
llvm-svn: 298400
If a register location can only be described by a complex expression
(i.e., multiple subregisters) it doesn't safely compose with another
complex expression. For example, it is not possible to apply a
DW_OP_deref operation to multiple DW_OP_pieces.
llvm-svn: 298389
Summary:
To support negative immediates for certain arithmetic instructions, the
instruction is converted to the inverse instruction with a negated (or inverted)
immediate. For example, "ADD r0, r1, #FFFFFFFF" cannot be encoded as an ADD
instruction. However, "SUB r0, r1, #1" is equivalent.
These conversions are different from instruction aliases. An alias maps
several assembler instructions onto one encoding. A conversion, however, maps
an *invalid* instruction--e.g. with an immediate that cannot be represented in
the encoding--to a different (but equivalent) instruction.
Several instructions with negative immediates were being converted already, but
this was not systematically tested, nor did it cover all instructions.
This patch implements all possible substitutions for ARM, Thumb1 and
Thumb2 assembler and adds tests. It also adds a feature flag
(-mattr=+no-neg-immediates) to turn these substitutions off. This is
helpful for users who want their code to assemble to exactly what they
wrote.
Reviewers: t.p.northover, rovka, samparker, javed.absar, peter.smith, rengolin
Reviewed By: javed.absar
Subscribers: aadg, aemerson, llvm-commits
Differential Revision: https://reviews.llvm.org/D30571
llvm-svn: 298380
We could do better by splitting any oversized type into whatever vector size the target supports,
but I left that for future work if it ever comes up. The motivating case is memcmp() calls on 16-byte
structs, so I think we can wire that up with a TLI hook that feeds into this.
Differential Revision: https://reviews.llvm.org/D31156
llvm-svn: 298376
Summary:
First iteration of SDWA peephole.
This pass tries to combine several instruction into one SDWA instruction. E.g. it converts:
'''
V_LSHRREV_B32_e32 %vreg0, 16, %vreg1
V_ADD_I32_e32 %vreg2, %vreg0, %vreg3
V_LSHLREV_B32_e32 %vreg4, 16, %vreg2
'''
Into:
'''
V_ADD_I32_sdwa %vreg4, %vreg1, %vreg3 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
'''
Pass structure:
1. Iterate over machine instruction in basic block and try to apply "SDWA patterns" to each of them. SDWA patterns match machine instruction into either source or destination SDWA operand. E.g. ''' V_LSHRREV_B32_e32 %vreg0, 16, %vreg1''' is matched to source SDWA operand '''%vreg1 src_sel:WORD_1'''.
2. Iterate over found SDWA operands and find instruction that could be potentially coverted into SDWA. E.g. for source SDWA operand potential instruction are all instruction in this basic block that uses '''%vreg0'''
3. Iterate over all potential instructions and check if they can be converted into SDWA.
4. Convert instructions to SDWA.
This review contains basic implementation of SDWA peephole pass. This pass requires additional testing fot both correctness and performance (no performance testing done).
There are several ways this pass can be improved:
1. Make this pass work on whole function not only basic block. As I can see this can be done right now without changes to pass.
2. Introduce more SDWA patterns
3. Introduce mnemonics to limit when SDWA patterns should apply
Reviewers: vpykhtin, alex-t, arsenm, rampitec
Subscribers: wdng, nhaehnle, mgorny
Differential Revision: https://reviews.llvm.org/D30038
llvm-svn: 298365
This patch fixes an issue in the Optimize LEAs pass where redundant LEAs were
not removed because they were being used by debug values. The debug values are
now ignored when determining whether LEAs are redundant.
For now the debug values for the redundant LEAs are marked as undefined,
effectively lost. The intention is for a follow up patch which will attempt to
preserve the debug values where possible.
Patch by Andrew Ng.
Differential Revision: https://reviews.llvm.org/D30835
llvm-svn: 298360
Previously, PromoteIntRes_TRUNCATE() did not handle the case where
the operand needs widening, which resulted in llvm_unreachable().
This patch adds the needed handling, along with a test case.
Review: Eli Friedman, Simon Pilgrim.
https://reviews.llvm.org/D31077
llvm-svn: 298357
After the loop unroll threshold was increased in r295538, very
large constant expressions can be created. This prevents them
from having to be recursively scanned, leading to a compile
time blow-up.
Differential Revision: https://reviews.llvm.org/D30689
llvm-svn: 298356
The def operand of the new LG/LD should have the old def operands
flags and subreg index.
New test: test/CodeGen/SystemZ/fold-memory-op-impl.ll
Review: Ulrich Weigand
llvm-svn: 298341
The glueless lowering of addc/adde in Thumb1 has known serious
miscompiles (see https://reviews.llvm.org/D31081), and r297820
causes an infinite loop for certain constructs. It's not
clear when they will be fixed, so let's just take them out
of the tree for now.
(I resolved a small conflict with r297453.)
llvm-svn: 298328
The special case of zero sized values was previously not handled correctly.
This patch handles this by not promoting if the size is zero.
Patch by Tim Neumann.
Differential Revision: https://reviews.llvm.org/D31116
llvm-svn: 298320
If loop bound containing calculations like min(a,b), the Scalar
Evolution API getSmallConstantTripMultiple returns 4294967295 "-1"
as the trip multiple. The problem is that, SCEV use -1 * umax to
represent umin. The multiple constant -1 was returned, and the logic
of guarding against huge trip counts was skipped. Because -1 has 32
active bits.
The fix attempt to factor more general cases. First try to get the
greatest power of two divisor of trip count expression. In case
overflow happens, the trip count expression is still divisible by the
greatest power of two divisor returned. Returns 1 if not divisible by 2.
Patch by Huihui Zhang <huihuiz@codeaurora.org>
Differential Revision: https://reviews.llvm.org/D30840
llvm-svn: 298301
and test cases for each of the error checks.
To do this more plumbing was needed so that the segment indexes and
segment offsets can be checked. Basically what was done was the SegInfo
from llvm-objdump’s MachODump.cpp was moved into libObject for Mach-O
objects as BindRebaseSegInfo and it is only created when an iterator for
bind or rebase entries are created.
This commit really only adds the error checking and test cases for the
bind table entires and the checking for the lazy bind and weak bind entries
are still to be fully done as well as the rebase entires. Though some of
the plumbing for those are added with this commit. Those other error
checks and test cases will be added in follow on commits.
Note, the two llvm_unreachable() calls should now actually be unreachable
with the error checks in place and would take a logic bug in the error
checking code to be reached if the segment indexes and segment
offsets are used from a checked bind entry. Comments have been added
to the methods that require the arguments to have been checked
prior to calling.
llvm-svn: 298292
Regain the ability to recognize loops calculating polynomial modulo
operation. This ability has been lost due to some changes in the
preceding optimizations. Add code to preprocess the IR to a form
that the pattern matching code can recognize.
llvm-svn: 298282
Move the check for "MF->hasWinCFI()" up into the calculation of the
shouldEmitMoves boolean, rather than putting it in the early returning
if. This ensures that endFunction doesn't try to emit .seh_* directives
for leaf functions.
llvm-svn: 298276
This is a safeguard against data loss if the user specifies a directory
that is not a cache directory. Teach the existing cache pruning clients
to create files with appropriate names.
Differential Revision: https://reviews.llvm.org/D31109
llvm-svn: 298271
Summary: Inliner should update the branch_weights annotation to scale it to proper value.
Reviewers: davidxl, eraman
Reviewed By: eraman
Subscribers: zzheng, llvm-commits
Differential Revision: https://reviews.llvm.org/D30767
llvm-svn: 298270
Summary:
Prepare the way for nested instruction matching support by having actions
like CopyRenderer look up operands in the RuleMatcher rather than a
specific InstructionMatcher. This allows actions to reference any operand
from any matched instruction.
It works by checking the 'shape' of the match and capturing
each matched instruction to a local variable. If the shape is wrong
(not enough operands, leaf nodes where non-leafs are expected, etc.), then
the rule exits early without checking the predicates. Once we've captured
the instructions, we then test the predicates as before (except using the
local variables). If the match is successful, then we render the new
instruction as before using the local variables.
It's not noticable in this patch but by the time we support multiple
instruction matching, this patch will also cause a significant improvement
to readability of the emitted code since
MRI.getVRegDef(I->getOperand(0).getReg()) will simply be MI1 after
emitCxxCaptureStmts().
This isn't quite NFC because I've also fixed a bug that I'm surprised we
haven't encountered yet. It now checks there are at least the expected
number of operands before accessing them with getOperand().
Depends on D30531
Reviewers: t.p.northover, qcolombet, aditya_nandakumar, ab, rovka
Reviewed By: rovka
Subscribers: dberris, kristof.beyls, llvm-commits
Differential Revision: https://reviews.llvm.org/D30535
llvm-svn: 298257
This commit adds a parameter that lets us pass in the calling convention
of the call to CallLowering::lowerCall. This allows us to handle
situations where the calling convetion of the callee is different from
that of the caller.
Differential Revision: https://reviews.llvm.org/D31039
llvm-svn: 298254