Otherwise there's some mismatch, and we'll either form an illegal type or an
illegal node.
Thanks to Eli Friedman for pointing out the problem with my original solution.
llvm-svn: 301036
SI_MASKED_UNREACHABLE does not have machine instruction encoding.
It needs special handling in AMDGPUAsmPrinter::EmitInstruction like some
other pseudo instructions.
This patch fixes compilation failure of RadeonRays.
Differential Revision: https://reviews.llvm.org/D32364
llvm-svn: 301025
immediate operands.
This commit adds an AArch64 dag-combine that optimizes code generation
for logical instructions taking immediate operands. The optimization
uses demanded bits to change a logical instruction's immediate operand
so that the immediate can be folded into the immediate field of the
instruction.
This recommits r300932 and r300930, which was causing dag-combine to
loop forever. The problem was that optimizeLogicalImm was returning
true even when there was no change to the immediate node (which happened
when the immediate was all zeros or ones), which caused dag-combine to
push and pop the same node to the work list over and over again without
making any progress.
This commit fixes the bug by returning false early in optimizeLogicalImm
if the immediate is all zeros or ones. Also, it changes the code to
compare the immediate with 0 or Mask rather than calling
countPopulation.
rdar://problem/18231627
Differential Revision: https://reviews.llvm.org/D5591
llvm-svn: 301019
Doing these transformations check that the result of integer addition is representable in the FP type.
(fadd double (sitofp x), fpcst) --> (sitofp (add int x, intcst))
(fadd double (sitofp x), (sitofp y)) --> (sitofp (add int x, y))
This is a fix for https://bugs.llvm.org//show_bug.cgi?id=27036
Reviewed By: andrew.w.kaylor, scanon, spatel
Differential Revision: https://reviews.llvm.org/D31182
llvm-svn: 301018
This marks the beginning of an effort to port remaining
MSVC toolchain miscellaneous utilities to all platforms.
Currently clang-cl shells out to certain additional tools
such as the IDL compiler, resource compiler, and a few
other tools, but as these tools are Windows-only it
limits the ability of clang to target Windows on other
platforms. having a full suite of these tools directly
in LLVM should eliminate this constraint.
The current implementation provides no actual functionality,
it is just an empty skeleton executable for the purposes
of making incremental changes.
Differential Revision: https://reviews.llvm.org/D32095
Patch by Eric Beckmann (ecbeckmann@google.com)
llvm-svn: 301004
DAG combine was mistakenly assuming that the step-up it was looking at was
always a doubling, but it can sometimes be a larger extension in which case
we'd crash.
llvm-svn: 301002
Summary:
The SelectionDAG importer now imports rules with Predicate's attached via
Requires, PredicateControl, etc. These predicates are implemented as
bitset's to allow multiple predicates to be tested together. However,
unlike the MC layer subtarget features, each target only pays for it's own
predicates (e.g. AArch64 doesn't have 192 feature bits just because X86
needs a lot).
Both AArch64 and X86 derive at least one predicate from the MachineFunction
or Function so they must re-initialize AvailableFeatures before each
function. They also declare locals in <Target>InstructionSelector so that
computeAvailableFeatures() can use the code from SelectionDAG without
modification.
Reviewers: rovka, qcolombet, aditya_nandakumar, t.p.northover, ab
Reviewed By: rovka
Subscribers: aemerson, rengolin, dberris, kristof.beyls, llvm-commits, igorb
Differential Revision: https://reviews.llvm.org/D31418
llvm-svn: 300993
Currently we choose PostBB as the single successor of QFB, but its possible that QTB's single successor is QFB which would make QFB the correct choice.
Differential Revision: https://reviews.llvm.org/D32323
llvm-svn: 300992
places based on it.
Existing constant hoisting pass will merge a group of contants in a small range
and hoist the const materialization code to the common dominator of their uses.
However, if the uses are all in cold pathes, existing implementation may hoist
the materialization code from cold pathes to a hot place. This may hurt performance.
The patch introduces BFI to the pass and selects the best insertion places based
on it.
The change is controlled by an option consthoist-with-block-frequency which is
off by default for now.
Differential Revision: https://reviews.llvm.org/D28962
llvm-svn: 300989
Phi nodes in non-header blocks are converted to select instructions after
if-conversion. This patch updates the cost model to account for the selects.
Differential Revision: https://reviews.llvm.org/D31906
llvm-svn: 300980
It's causing llvm-clang-x86_64-expensive-checks-win to fail to compile and I
haven't worked out why. Reverting to make it green while I figure it out.
llvm-svn: 300978
Select them as copies. We only select if both the source and the
destination are on the same register bank, so this shouldn't cause any
trouble.
llvm-svn: 300971
The condition in isSupportedType didn't handle struct/array arguments
properly. Fix the check and add a test to make sure we use the fallback
path in this kind of situation. The test deals with some common cases
where the call lowering should error out. There are still some issues
here that need to be addressed (tail calls come to mind), but they can
be addressed in other patches.
llvm-svn: 300967
Summary:
The SelectionDAG importer now imports rules with Predicate's attached via
Requires, PredicateControl, etc. These predicates are implemented as
bitset's to allow multiple predicates to be tested together. However,
unlike the MC layer subtarget features, each target only pays for it's own
predicates (e.g. AArch64 doesn't have 192 feature bits just because X86
needs a lot).
Both AArch64 and X86 derive at least one predicate from the MachineFunction
or Function so they must re-initialize AvailableFeatures before each
function. They also declare locals in <Target>InstructionSelector so that
computeAvailableFeatures() can use the code from SelectionDAG without
modification.
Reviewers: rovka, qcolombet, aditya_nandakumar, t.p.northover, ab
Reviewed By: rovka
Subscribers: aemerson, rengolin, dberris, kristof.beyls, llvm-commits, igorb
Differential Revision: https://reviews.llvm.org/D31418
llvm-svn: 300964
when the subtarget has fast strings.
This has two advantages:
- Speed is improved. For example, on Haswell thoughput improvements increase
linearly with size from 256 to 512 bytes, after which they plateau:
(e.g. 1% for 260 bytes, 25% for 400 bytes, 40% for 508 bytes).
- Code is much smaller (no need to handle boundaries).
llvm-svn: 300957
CodeExtractor looks up the dominator node corresponding to return blocks
when splitting them. If one of these blocks is unreachable, there's no
node in the Dom and CodeExtractor crashes because it doesn't check
for domtree node validity.
In theory, we could add just a check for skipping null DTNodes in
`splitReturnBlock` but the fix I propose here is slightly different. To the
best of my knowledge, unreachable blocks are irrelevant for the algorithm,
therefore we can just skip them when building the candidate set in the
constructor.
Differential Revision: https://reviews.llvm.org/D32335
llvm-svn: 300946
It seems that r300930 was creating an infinite loop in dag-combine when
compling the following file:
MultiSource/Benchmarks/MiBench/consumer-typeset/z21.c
llvm-svn: 300940
immediate operands.
This commit adds an AArch64 dag-combine that optimizes code generation
for logical instructions taking immediate operands. The optimization
uses demanded bits to change a logical instruction's immediate operand
so that the immediate can be folded into the immediate field of the
instruction.
This recommits r300913, which broke bots because I didn't fix a call to
ShrinkDemandedConstant in SIISelLowering.cpp after changing the APIs of
TargetLoweringOpt and TargetLowering.
rdar://problem/18231627
Differential Revision: https://reviews.llvm.org/D5591
llvm-svn: 300930
There have been multiple reports of this causing problems: a
compile-time explosion on the LLVM testsuite, and a stack
overflow for an opencl kernel.
llvm-svn: 300928
immediate operands.
This commit adds an AArch64 dag-combine that optimizes code generation
for logical instructions taking immediate operands. The optimization
uses demanded bits to change a logical instruction's immediate operand
so that the immediate can be folded into the immediate field of the
instruction.
rdar://problem/18231627
Differential Revision: https://reviews.llvm.org/D5591
llvm-svn: 300913
Single-threaded fences aren't required to provide any synchronization with
other processing elements so there's no need for a DMB. They should still be a
barrier for compiler optimizations though.
llvm-svn: 300904
Before, we assumed that any ConstantInt offset was precisely the access width,
so we could use the "[rN]!" form. ISelLowering only ever created that kind, but
further simplification during combining could lead to unexpected constants and
incorrect codegen.
Should fix PR32658.
llvm-svn: 300878
Associate the version-when-defined with definitions of standard DWARF
constants. Identify the "vendor" for DWARF extensions.
Use this information to verify FORMs in .debug_abbrev are defined as
of the DWARF version specified in the associated unit.
Removed two tests that had specified DWARF v1 (which essentially does
not exist).
Differential Revision: http://reviews.llvm.org/D30785
llvm-svn: 300875
Recently alloca address space has been added to data layout. Due to this
change, pointer returned by alloca may have different size as pointer in
address space 0.
However, currently the value type of frame index is assumed to be of the
same size as pointer in address space 0.
This patch fixes that.
Most targets assume alloca returning pointer in address space 0, which
is the default alloca address space. Therefore it is NFC for them.
AMDGCN target with amdgiz environment requires this change since it
assumes alloca returning pointer to addr space 5 and its size is 32,
which is different from the size of pointer in addr space 0 which is 64.
Differential Revision: https://reviews.llvm.org/D32021
llvm-svn: 300864
Adds scalable vector machine value types, and updates
the switch statements required for tablegen.
Patch by Graham Hunter.
Differential Revision: https://reviews.llvm.org/D32018
llvm-svn: 300840
Masked vectors which hold shift amounts when creating the following nodes:
ISD::SHL, ISD::SRL or ISD::SRA.
Instructions that use said nodes, which have had their arguments altered are
sll, srl, sra, bneg, bclr and bset.
For said instructions, the shift amount or the bit position that is
specified in the corresponding vector elements will be interpreted as the
shift amount/bit position modulo the size of the element in bits.
The problem lies in compiling with -O2 enabled, where the instructions for
formats .w and .d are not generated, but are instead optimized away.
In this case, having shift amounts that are either negative or greater than
the element bit size results in generation of incorrect results when
constant folding.
We remedy this by masking the operands for the nodes mentioned above before
actually creating them, so that the final result is correct before placed
into the constant pool.
Patch by Stefan Maksimovic.
Differential Revision: https://reviews.llvm.org/D31331
llvm-svn: 300839
ChangeSection incorrectly registers LastEMSInfo as belonging to the previous
section, not the current section. This happens to work when changing sections
using .section, as the previous section is set to the current section before
the call to ChangeSection, but not when using .popsection.
Differential Revision: https://reviews.llvm.org/D32225
llvm-svn: 300831
Currently fmov #0 with a vector destination is handle incorrectly and results in
fmov #-1.9375 being emitted but should instead give an error. This is due to the
way we cope with fmov #0 with a scalar destination being an alias of fmov zr, so
fix this by actually doing it through an alias.
Differential Revision: https://reviews.llvm.org/D31949
llvm-svn: 300830
When an integer is used as an fp immediate we're failing to check the return
value of getFP64Imm, so invalid values are silently permitted. Fix this by
merging together the integer and real handling.
llvm-svn: 300828
- introduced in r300522 and found via the Swift LLDB testsuite.
The fix is to set the location kind to memory whenever an FrameIndex
location is emitted.
rdar://problem/31707602
llvm-svn: 300793
This change is correct because the verifier requires that at most one
argument be marked 'sret'.
NFC, removes a use of AttributeList slot APIs.
llvm-svn: 300784
Debug information is calculated with getFrameIndexReference() which was
missing some logic for the fixed object cases (= parameters on the stack).
rdar://24557797
Differential Revision: https://reviews.llvm.org/D32204
llvm-svn: 300781
I've changed one of the tests to not fold away, but we didn't and still don't do the transform
that the comment claims we do (and I don't know why we'd want to do that).
Follow-up to:
https://reviews.llvm.org/rL300725https://reviews.llvm.org/rL300763
llvm-svn: 300772
This allows forming more 'not' ops, so we get improvements for ISAs that have and-not.
Follow-up to:
https://reviews.llvm.org/rL300725
llvm-svn: 300763
Re-commit after revert in r300668. Changed getMaxFPOffset() to a
more conservative heuristic instead of trying to be clever and missing
for some exotic calling conventions.
We need to reserve an emergency spill slot in cases with large argument
types that could overflow immediate offsets for FP relative address
calculations.
rdar://31317893
Differential Revision: https://reviews.llvm.org/D31643
llvm-svn: 300761
Use haveNoCommonBitsSet to figure out whether an "or" instruction
is equivalent to addition. This handles more cases than just
checking for a constant on the RHS.
Differential Revision: https://reviews.llvm.org/D32239
llvm-svn: 300746
Summary:
In the current implementation, to find inline stack for an address incurs expensive linear search in 2 places:
* linear search for the top-level DIE
* recursive linear traverse the DIE tree to find the path to the leaf DIE
In this patch, a map is built from address to its corresponding leaf DIE. The inline stack is built by traversing from the leaf DIE up to the root DIE. This speeds up batch symbolization by ~10X without noticible memory overhead.
Reviewers: dblaikie
Reviewed By: dblaikie
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D32177
llvm-svn: 300742
This is inserted directly in the text section. The relocation
for the function ends up resolving to the beginning of the
amd_kernel_code_t header rather than the actual function
entry point.
Also skip some of the comments for initialization
that only makes sense for kernels.
llvm-svn: 300736
The most common case for a branch condition is
a single use compare. Directly invert the branch
predicate rather than adding a lot of xor i1 true
which the DAG will have to fold later.
This produces nicer to read structurizer output.
This produces some random changes in codegen
due to the DAG swapping branch conditions itself,
and then does a poor job of dealing with those
inverts.
llvm-svn: 300732
Summary:
See http://llvm.org/docs/LangRef.html#non-integral-pointer-type
The NewGVN test does not fail without these changes (perhaps it does
try to coerce pointers <-> integers to begin with?), but I added the
test case anyway.
Reviewers: dberlin
Subscribers: mcrosier, llvm-commits, Prazek
Differential Revision: https://reviews.llvm.org/D32208
llvm-svn: 300730
The patch itself is simple: stop discriminating against vectors in visitAnd() and again in
SimplifyDemandedBits().
Some notes for reference:
1. We're not consistent about calls to SimplifyDemandedBits in the various visitXXX functions.
Sometimes, we check if the RHS is a constant first. Other times (like here), we just dive in.
2. I'd like to break the vector shackles in steps for the sake of risk minimization, but we could
make similar simultaneous changes in other places if we think that would be better.
3. I don't know what the intent of the changed tests in this patch was supposed to be, but since
they wiggled in a positive way, I'm just going with that. :)
4. In the rotate tests, note that we can see through non-splat constants. This is a result of D24253.
5. My motivation for being here now is to make D31944 look better, so this is step 1 of N towards
improving the vector codegen in that patch without writing any actual new code.
Differential Revision: https://reviews.llvm.org/D32230
llvm-svn: 300725
This patch simplifies the examples from D31509 and D31927 (PR30630) and catches
the basic identity shuffle tests that Zvi recently added.
I'm not sure if we have something like this in DAGCombiner, but we should?
It's worth noting that "MaxRecurse / RecursionLimit" is only 3 on entry at the moment.
We might want to bump that up if there are longer shuffle chains like this in the wild.
For now, we're ignoring shuffles that have undef mask elements because it's not
clear how those should be handled.
Differential Revision: https://reviews.llvm.org/D31960
llvm-svn: 300714
Also, make a few changes to allow using the pass in .mir testcases.
Among other things, change the abbreviation from opt-amode to amode-opt,
because otherwise lit would expand the "opt" part to the full path to
the opt binary.
llvm-svn: 300707
InstSimplify returned the wrong type when simplifying a vector GEP
and we ended up crashing when trying to replace all uses with the
new value. Fixes PR32697.
Differential Revision: https://reviews.llvm.org/D32180
llvm-svn: 300693
A bunch of tests failed because memory operations have been reordered.
I am unsure which commit changed this behaviour as the AVR build was
failing at that point with an unrelated error.
This commit just reoders some of the CHECK lines in some tests to suit
current llc output.
llvm-svn: 300682
Support G_MUL, very similar to G_ADD and G_SUB. The only difference is
in the instruction selector, where we have to select either MUL or MULv5
depending on the target.
llvm-svn: 300665
This fixes PR32471.
As comment 10 on that bug report highlights
(https://bugs.llvm.org//show_bug.cgi?id=32471#c10), there are quite a
few different defendable design tradeoffs that could be made, including
not representing pointers at all in LLT.
I decided to go for representing vector-of-pointer as a concept in LLT,
while keeping the size of the LLT type 64 bits (this is an increase from
48 bits before). My rationale for keeping pointers explicit is that on
some targets probably it's very handy to have the distinction between
pointer and non-pointer (e.g. 68K has a different register bank for
pointers IIRC). If we keep a scalar pointer, it probably is easiest to
also have a vector-of-pointers to keep LLT relatively conceptually clean
and orthogonal, while we don't have a very strong reason to break that
orthogonality. Once we gain more experience on the use of LLT, we can
of course reconsider this direction.
Rejecting vector-of-pointer types in the IRTranslator is also an option
to avoid the crash reported in PR32471, but that is only a very
short-term solution; also needs quite a bit of code tweaks in places,
and is probably fragile. Therefore I didn't consider this the best
option.
llvm-svn: 300664
Summary: In case all predecessor go to a single successor of current BB. We want to fold (not thread).
Reviewers: efriedma, sanjoy
Reviewed By: sanjoy
Subscribers: dberlin, majnemer, llvm-commits
Differential Revision: https://reviews.llvm.org/D30869
llvm-svn: 300657
We need to reserve an emergency spill slot in cases with large argument
types that could overflow immediate offsets for FP relative address
calculations.
rdar://31317893
Differential Revision: https://reviews.llvm.org/D31643
llvm-svn: 300639
Summary:
This allows us to, if the symbol names are available in the binary, be
able to provide the function name in the YAML output.
Reviewers: dblaikie, pelikan
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D32153
llvm-svn: 300624
Android x86_64 target uses f128 type and stores f128 values in %xmm* registers.
SoftenFloatRes_EXTRACT_VECTOR_ELT should not convert result value
from f128 to i128.
Differential Revision: http://reviews.llvm.org/D32102
llvm-svn: 300583
In tryToVectorizeList, under a very limited circumstance (when entered
from tryToVectorizePair), the values may be reordered (swapped) and the
SLP tree is built with the new order. This extends that to the case when
starting from phis in vectorizeChainsInBlock when there are exactly two
phis. The textual order of phi nodes shouldn't really matter. Without
this change, the loop body in the accompnaying test case is fully vectorized
when we swap the orde of the phis but not with this order. While this
doesn't solve the phi-ordering problem in a general way (for more than 2
phis), this is simple fix that piggybacks on an existing mechanism and
is useful in cases like multiplying two complex numbers.
Differential revision: https://reviews.llvm.org/D32065
llvm-svn: 300574
Remove non-consecutive stores from store merge candidate search as
they cannot be merged and will prevent us from finding subsequent
mergeable store cases.
Reviewers: jyknight, bogner, javed.absar, spatel
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D32086
llvm-svn: 300561
Summary:
* Add checks for store. That is needed because GlobalsAA is called
twice in the current pipeline with different sets of Function passes
following it. However, the loads are eliminated using instcombine
which happens everywhere. On the other hand, DeadStoreElimination is
performed only once so by checking for store we'll be able to catch
more cases when GlobalsAA is invalidated unintentionally.
* Add empty function above/below the test so that we don't depend on
the relative order of instcombine/dead-store-elimination and the
pass that invalidates the analysis (inside the same
FunctionPassManager).
Reviewers: kristof.beyls
Reviewed By: kristof.beyls
Subscribers: llvm-commits, n.bozhenov
Differential Revision: https://reviews.llvm.org/D32015
Patch by Andrei Elovikov <andrei.elovikov@intel.com>
llvm-svn: 300553
In the assembler, we should emit build attributes based on the target
selected with command-line options. This matches the GNU assembler's
behaviour. We only do this for build attributes which describe the
hardware that is expected to be available, not the ones that describe
ABI compatibility.
This is done by moving some of the attribute emission code to
ARMTargetStreamer, so that it can be shared between the assembly and
code-generation code paths. Since the assembler only creates a
MCSubtargetInfo, not an ARMSubtarget, the code had to be changed to
check raw features, and not use the convenience functions in
ARMSubtarget.
If different attributes are later specified using the .eabi_attribute
directive, then they will take precedence, as happens when the same
.eabi_attribute is specified twice.
This must be enabled by an option, because we don't want to do this when
parsing inline assembly. The attributes would match the ones emitted at
the start of the file, so wouldn't actually change the emitted object
file, but the extra directives would be added to every inline assembly
block when emitting assembly, which we'd like to avoid.
The majority of the changes in the build-attributes.ll test are just
re-ordering the directives, because the hardware attributes are now
emitted before the ABI ones. However, I did fix one bug which I spotted:
Tag_CPU_arch_profile was not being emitted for v6M.
Differential revision: https://reviews.llvm.org/D31812
llvm-svn: 300547
This reverts r300535 and r300537.
The newly added tests in test/CodeGen/AArch64/GlobalISel/arm64-fallback.ll
produces slightly different code between LLVM versions being built with different compilers.
E.g., dependent on the compiler LLVM is built with, either one of the following
can be produced:
remark: <unknown>:0:0: unable to legalize instruction: %vreg0<def>(p0) = G_EXTRACT_VECTOR_ELT %vreg1, %vreg2; (in function: vector_of_pointers_extractelement)
remark: <unknown>:0:0: unable to legalize instruction: %vreg2<def>(p0) = G_EXTRACT_VECTOR_ELT %vreg1, %vreg0; (in function: vector_of_pointers_extractelement)
Non-determinism like this is clearly a bad thing, so reverting this until
I can find and fix the root cause of the non-determinism.
llvm-svn: 300538
For subtargets that use the custom lowering for divmod, e.g. gnueabi,
we used to check if the subtarget has hardware divide and then lower to
a div-mul-sub sequence if true, or to a libcall if false.
However, judging by the usage of hasDivide vs hasDivideInARMMode, it
seems that hasDivide only refers to Thumb. For instance, in the
ARMTargetLowering constructor, the code that specifies whether to use
libcalls for (S|U)DIV looks like this:
bool hasDivide = Subtarget->isThumb() ? Subtarget->hasDivide()
: Subtarget->hasDivideInARMMode();
In the case of divmod for arm-gnueabi, using only hasDivide() to
determine what to do means that instead of lowering to __aeabi_idivmod
to get the remainder, we lower to div-mul-sub and then further lower the
div to __aeabi_idiv. Even worse, if we have hardware divide in ARM but
not in Thumb, we generate a libcall instead of using it (this is not an
issue in practice since AFAICT none of the cores that we support have
hardware divide in ARM but not Thumb).
This patch fixes the code dealing with custom lowering to take into
account the mode (Thumb or ARM) when deciding whether or not hardware
division is available.
Differential Revision: https://reviews.llvm.org/D32005
llvm-svn: 300536
This fixes PR32471.
As comment 10 on that bug report highlights
(https://bugs.llvm.org//show_bug.cgi?id=32471#c10), there are quite a
few different defendable design tradeoffs that could be made, including
not representing pointers at all in LLT.
I decided to go for representing vector-of-pointer as a concept in LLT,
while keeping the size of the LLT type 64 bits (this is an increase from
48 bits before). My rationale for keeping pointers explicit is that on
some targets probably it's very handy to have the distinction between
pointer and non-pointer (e.g. 68K has a different register bank for
pointers IIRC). If we keep a scalar pointer, it probably is easiest to
also have a vector-of-pointers to keep LLT relatively conceptually clean
and orthogonal, while we don't have a very strong reason to break that
orthogonality. Once we gain more experience on the use of LLT, we can
of course reconsider this direction.
Rejecting vector-of-pointer types in the IRTranslator is also an option
to avoid the crash reported in PR32471, but that is only a very
short-term solution; also needs quite a bit of code tweaks in places,
and is probably fragile. Therefore I didn't consider this the best
option.
llvm-svn: 300535
The DWARF specification knows 3 kinds of non-empty simple location
descriptions:
1. Register location descriptions
- describe a variable in a register
- consist of only a DW_OP_reg
2. Memory location descriptions
- describe the address of a variable
3. Implicit location descriptions
- describe the value of a variable
- end with DW_OP_stack_value & friends
The existing DwarfExpression code is pretty much ignorant of these
restrictions. This used to not matter because we only emitted very
short expressions that we happened to get right by accident. This
patch makes DwarfExpression aware of the rules defined by the DWARF
standard and now chooses the right kind of location description for
each expression being emitted.
This would have been an NFC commit (for the existing testsuite) if not
for the way that clang describes captured block variables. Based on
how the previous code in LLVM emitted locations, DW_OP_deref
operations that should have come at the end of the expression are put
at its beginning. Fixing this means changing the semantics of
DIExpression, so this patch bumps the version number of DIExpression
and implements a bitcode upgrade.
There are two major changes in this patch:
I had to fix the semantics of dbg.declare for describing function
arguments. After this patch a dbg.declare always takes the *address*
of a variable as the first argument, even if the argument is not an
alloca.
When lowering a DBG_VALUE, the decision of whether to emit a register
location description or a memory location description depends on the
MachineLocation — register machine locations may get promoted to
memory locations based on their DIExpression. (Future) optimization
passes that want to salvage implicit debug location for variables may
do so by appending a DW_OP_stack_value. For example:
DBG_VALUE, [RBP-8] --> DW_OP_fbreg -8
DBG_VALUE, RAX --> DW_OP_reg0 +0
DBG_VALUE, RAX, DIExpression(DW_OP_deref) --> DW_OP_reg0 +0
All testcases that were modified were regenerated from clang. I also
added source-based testcases for each of these to the debuginfo-tests
repository over the last week to make sure that no synchronized bugs
slip in. The debuginfo-tests compile from source and run the debugger.
https://bugs.llvm.org/show_bug.cgi?id=32382
<rdar://problem/31205000>
Differential Revision: https://reviews.llvm.org/D31439
llvm-svn: 300522
Summary: If there is suffix added in the function name (e.g. module hash added by thinLTO), we will not be able to find a match in profile as the suffix does not exist in profile. This patch build a map from function name to Function *. The map includes the entry for the stripped function name so that inlineHotFunctions can find the corresponding function to promote/inline.
Reviewers: davidxl, dnovillo, tejohnson
Reviewed By: davidxl
Subscribers: mehdi_amini, llvm-commits
Differential Revision: https://reviews.llvm.org/D31952
llvm-svn: 300507
Summary:
Refactoring changed paramHasAttr(1 + i) to paramHasAttr(0), fix that to
paramHasAttr(i).
Add more tests to WebAssemblyOptimizeReturned that catch that
regression.
Reviewers: dschuff
Subscribers: jfb, sbc100, llvm-commits
Differential Revision: https://reviews.llvm.org/D32136
llvm-svn: 300502
So, `cast<Instruction>` is not guaranteed to succeed. Change the
code so that we create a new constant and use it in the newly
created instruction, as it's done in other places in InstCombine.
OK'ed by Sanjay/Craig. Fixes PR32686.
llvm-svn: 300495
Avoid looping through program to determine register counts.
This avoids needing to look at regmask operands.
Also fixes some counting errors with flat_scr when there
are no stack objects.
llvm-svn: 300482
The splitIndirectCriticalEdges function generates and invalid CFG when the
'Target' basic block is a loop to itself. When this occurs, the code that
updates the predecessor terminator needs to update the terminator in the split
basic block.
This occurs when there is an edge from block D back to D. Since D is split in
to D0 and D1, the code needs to update the terminator in D1. But D1 is not in
the OtherPreds vector, so it was not getting updated.
Differential Revision: https://reviews.llvm.org/D32126
llvm-svn: 300480
It's basically a terrible idea anyway but objc_msgSend gets emitted like that.
We can decide on a better way to deal with it in the unlikely event that anyone
actually uses it.
llvm-svn: 300474
Add a top-level STRTAB block containing a string table blob, and start storing
strings for module codes FUNCTION, GLOBALVAR, ALIAS, IFUNC and COMDAT in
the string table.
This change allows us to share names between globals and comdats as well
as between modules, and improves the efficiency of loading bitcode files by
no longer using a bit encoding for symbol names. Once we start writing the
irsymtab to the bitcode file we will also be able to share strings between
it and the module.
On my machine, link time for Chromium for Linux with ThinLTO decreases by
about 7% for no-op incremental builds or about 1% for full builds. Total
bitcode file size decreases by about 3%.
As discussed on llvm-dev:
http://lists.llvm.org/pipermail/llvm-dev/2017-April/111732.html
Differential Revision: https://reviews.llvm.org/D31838
llvm-svn: 300464
It's almost certainly not a good idea to actually use it in most cases (there's
a pretty large code size overhead on AArch64), but we can't do those
experiments until it's supported.
llvm-svn: 300462
Causes some VGPR usage improvements in shaderdb, but
introduces some SGPR spilling regressions due to random
scheduling changes later.
llvm-svn: 300453
This patch is a generalization of the improvement introduced in rL296898.
Previously, we were able to peel one iteration of a loop to get rid of a Phi that becomes
an invariant on the 2nd iteration. In more general case, if a Phi becomes invariant after
N iterations, we can peel N times and turn it into invariant.
In order to do this, we for every Phi in loop's header we define the Invariant Depth value
which is calculated as follows:
Given %x = phi <Inputs from above the loop>, ..., [%y, %back.edge].
If %y is a loop invariant, then Depth(%x) = 1.
If %y is a Phi from the loop header, Depth(%x) = Depth(%y) + 1.
Otherwise, Depth(%x) is infinite.
Notice that if we peel a loop, all Phis with Depth = 1 become invariants,
and all other Phis with finite depth decrease the depth by 1.
Thus, peeling N first iterations allows us to turn all Phis with Depth <= N
into invariants.
Reviewers: reames, apilipenko, mkuper, skatkov, anna, sanjoy
Reviewed By: sanjoy
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D31613
llvm-svn: 300446
When peeling loops basing on phis becoming invariants, we make a wrong loop size check.
UP.Threshold should be compared against the total numbers of instructions after the transformation,
which is equal to 2 * LoopSize in case of peeling one iteration.
We should also check that the maximum allowed number of peeled iterations is not zero.
Reviewers: sanjoy, anna, reames, mkuper
Reviewed By: mkuper
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D31753
llvm-svn: 300441
Metadata potentially is more precise than any heuristics we use, so
it makes sense to use first metadata info if it is available. However it makes
sense to examine it against other strong heuristics like unreachable one.
If edge coming to unreachable block has higher probability then it is expected
by unreachable heuristic then we use heuristic and remaining probability is
distributed among other reachable blocks equally.
An example where metadata might be more strong then unreachable heuristic is
as follows: it is possible that there are two branches and for the branch A
metadata says that its probability is (0, 2^25). For the branch B
the probability is (1, 2^25).
So the expectation is that first edge of B is hotter than first edge of A
because first edge of A did not executed at least once.
If first edge of A points to the unreachable block then using the unreachable
heuristics we'll set the probability for A to (1, 2^20) and now edge of A
becomes hotter than edge of B.
This is unexpected behavior.
This fixed the biggest part of https://bugs.llvm.org/show_bug.cgi?id=32214
Reviewers: sanjoy, junbuml, vsk, chandlerc
Reviewed By: chandlerc
Subscribers: llvm-commits, reames, davidxl
Differential Revision: https://reviews.llvm.org/D30631
llvm-svn: 300440
Our 16 bit support is assembler-only + the terrible hack that is
.code16gcc. Simply using 32 bit registers does the right thing for the
latter.
Fixes PR32681.
llvm-svn: 300429
This patch adds new optimization (Folding cmp(sub(a,b),0) into cmp(a,b))
to instCombineCall pass and was written specific for X86 CMP intrinsics.
Differential Revision: https://reviews.llvm.org/D31398
llvm-svn: 300422
Summary:
In PR32594, inline assembly using the 'A' constraint on x86_64 causes
llvm to crash with a "Cannot select" stack trace. This is because
`X86TargetLowering::getRegForInlineAsmConstraint` hardcodes that 'A'
means the EAX and EDX registers.
However, on x86_64 it means the RAX and RDX registers, and on 16-bit x86
(ia16?) it means the old AX and DX registers.
Add new register classes in `X86RegisterInfo.td` to support these cases,
and amend the logic in `getRegForInlineAsmConstraint` to cope with
different subtargets. Also add a test case, derived from PR32594.
Reviewers: craig.topper, qcolombet, RKSimon, ab
Reviewed By: ab
Subscribers: ab, emaste, royger, llvm-commits
Differential Revision: https://reviews.llvm.org/D31902
llvm-svn: 300404
Now that the libObect support for wasm is better we can
have readobj and nm produce more useful output too.
Differential Revision: https://reviews.llvm.org/D31514
llvm-svn: 300365
...when C1 differs from C2 by one bit and C1 <u C2:
http://rise4fun.com/Alive/Vuo
And move related folds to a helper function. This reduces code duplication and
will make it easier to remove the scalar-only restriction as a follow-up step.
llvm-svn: 300364
We currently only support folding a subtract into a select but not a PHI. This fixes that.
I had to fix an assumption in FoldOpIntoPhi that assumed the PHI node was always in operand 0. Now we pass it in like we do for FoldOpIntoSelect. But we still require some dancing to find the Constant when we create the BinOp or ConstantExpr. This is based code is similar to what we do for selects.
Since I touched all call sites, this also renames FoldOpIntoPhi to foldOpIntoPhi to match coding standards.
Differential Revision: https://reviews.llvm.org/D31686
llvm-svn: 300363
If a kernel's pointer argument is known to be readonly
set access qualifier accordingly. This allows RT not to
flush caches before dispatches.
Differential Revision: https://reviews.llvm.org/D32091
llvm-svn: 300362
MOVNTDQA non-temporal aligned vector loads can be correctly represented using generic builtin loads, allowing us to remove the existing x86 intrinsics.
Clang companion patch: D31766.
Differential Revision: https://reviews.llvm.org/D31767
llvm-svn: 300325
Start using it in LLD to avoid needing to read bitcode again just to get the
target triple, and in llvm-lto2 to avoid printing symbol table information
that is inappropriate for the target.
Differential Revision: https://reviews.llvm.org/D32038
llvm-svn: 300300
This further improves Ahmed's change in rL299482. See the new comment for the
rationale.
The patch recovers most of the regression for bzip2 after D31965. We're down
to +2.68% from +6.97%.
Differential Revision: https://reviews.llvm.org/D32028
llvm-svn: 300276
If the offset cannot fit into the instruction, an addition to the
pointer is emitted before the actual access. However, BPF offsets are
16-bit but LLVM considers them to be, for the matter of this check,
to be 32-bit long.
This causes the following program:
int bpf_prog1(void *ign)
{
volatile unsigned long t = 0x8983984739ull;
return *(unsigned long *)((0xffffffff8fff0002ull) + t);
}
To generate the following (wrong) code:
0: 18 01 00 00 39 47 98 83 00 00 00 00 89 00 00 00
r1 = 590618314553ll
2: 7b 1a f8 ff 00 00 00 00 *(u64 *)(r10 - 8) = r1
3: 79 a1 f8 ff 00 00 00 00 r1 = *(u64 *)(r10 - 8)
4: 79 10 02 00 00 00 00 00 r0 = *(u64 *)(r1 + 2)
5: 95 00 00 00 00 00 00 00 exit
Fix it by changing the offset check to 16-bit.
Patch by Nadav Amit <nadav.amit@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Differential Revision: https://reviews.llvm.org/D32055
llvm-svn: 300269
Switch from Euclid's algorithm to Stein's algorithm for computing GCD. This
avoids the (expensive) APInt division operation in favour of bit operations.
Remove all memory allocation from within the GCD loop by tweaking our `lshr`
implementation so it can operate in-place.
Differential Revision: https://reviews.llvm.org/D31968
llvm-svn: 300252
Summary:
Bug noticed by inspection.
Extend the test to handle invokes as well as calls, and rewrite it to
not depend on the inliner and other passes.
Also simplify the call site replacement code with CallSite, similar to
what I did to dead arg elimination and arg promotion (rL300235 and
rL300229).
Reviewers: danielcdh, davidxl
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D32041
llvm-svn: 300251
Summary: For iterative SamplePGO, an indirect call can be speculatively promoted to multiple direct calls and get inlined. All these promoted direct calls will share the same callsite location (offset+discriminator). With the current implementation, we cannot distinguish between different promotion candidates and its inlined instance. This patch adds callee_name to the key of the callsite sample map. And added helper functions to get all inlined callee samples for a given callsite location. This helps the profile annotator promote correct targets and inline it before annotation, and ensures all indirect call targets to be annotated correctly.
Reviewers: davidxl, dnovillo
Reviewed By: davidxl
Subscribers: andreadb, llvm-commits
Differential Revision: https://reviews.llvm.org/D31950
llvm-svn: 300240
Summary:
In first order recurrences where phi's are used outside the loop,
we should generate an additional vector.extract of the second last element from
the vectorized phi update.
This is because we require the phi itself (which is the value at the second last
iteration of the vector loop) and not the phi's update within the loop.
Also fix the code gen when we just unroll, but don't vectorize.
Fixes PR32396.
Reviewers: mssimpso, mkuper, anemet
Subscribers: llvm-commits, mzolotukhin
Differential Revision: https://reviews.llvm.org/D31979
llvm-svn: 300238
This is effectively a retry of:
https://reviews.llvm.org/rL299851
but now we have tests and an assert to make sure the bug
that was exposed with that attempt will not happen again.
I'll fix the code duplication and missing sibling fold next,
but I want to make this change as small as possible to reduce
risk since I messed it up last time.
This should fix:
https://bugs.llvm.org/show_bug.cgi?id=32524
llvm-svn: 300236
Noticed by inspection while doing attribute work. DAE, InstCombineCalls,
and ArgPromotion have a fair amount of duplicated code for hacking on
call sites, and you can find bugs by comparing them.
Add a test case for this.
llvm-svn: 300229
In many cases ds operations can be combined even if offsets do not
fit into 8 bit encoding. What it takes is to adjust base address.
Differential Revision: https://reviews.llvm.org/D31993
llvm-svn: 300227
Summary:
* Add a bitreverse case in the demanded bits analysis pass.
* Add tests for the bitreverse (and bswap) intrinsic in the
demanded bits pass.
* Add a test case to the BDCE tests: that manipulations to
high-order bits are eliminated once the bits are reversed
and then right-shifted.
Reviewers: mkuper, jmolloy, hfinkel, trentxintong
Reviewed By: jmolloy
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D31857
llvm-svn: 300215
Summary:
The linker needs to be able to determine whether a symbol is text or data to
handle the case of a common being overridden by a strong definition in an
archive. If the archive contains a text member of the same name as the common,
that function is discarded. However, if the archive contains a data member of
the same name, that strong definition overrides the common. This is a behavior
of ld.bfd, which the Qualcomm linker also supports in LTO.
Here's a test case to illustrate:
####
cat > 1.c << \!
int blah;
!
cat > 2.c << \!
int blah() {
return 0;
}
!
cat > 3.c << \!
int blah = 20;
!
clang -c 1.c
clang -c 2.c
clang -c 3.c
ar cr lib.a 2.o 3.o
ld 1.o lib.a -t
####
The correct output is:
1.o
(lib.a)3.o
Thanks to Shankar Easwaran and Hemant Kulkarni for the test case!
Reviewers: mehdi_amini, rafael, pcc, davide
Reviewed By: pcc
Subscribers: davide, llvm-commits, inglorion
Differential Revision: https://reviews.llvm.org/D31901
llvm-svn: 300205
If we had these tests, the bug caused by https://reviews.llvm.org/rL299851 would have been caught sooner.
There's also an assert in the code that should have caught that bug, but the assert line itself has a bug.
llvm-svn: 300201
In a followup patch I intend to introduce an additional dumping
mode which dumps a graphical representation of a class's layout.
In preparation for this, the text-based layout printer needs to
be split out from the graphical layout printer, and both need
to be able to use the same code for printing the intro and outro
of a class's definition (e.g. base class list, etc).
This patch does so, and in the process introduces a skeleton
definition for the graphical printer, while currently making
the graphical printer just print nothing.
NFC
llvm-svn: 300134
Previously the dumping of class definitions was very primitive,
and it made it hard to do more than the most trivial of output
formats when dumping. As such, we would only dump one line for
each field, and then dump non-layout items like nested types
and enums.
With this patch, we do a complete analysis of the object
hierarchy including aggregate types, bases, virtual bases,
vftable analysis, etc. The only immediately visible effects
of this are that a) we can now dump a line for the vfptr where
before we would treat that as padding, and b) we now don't
treat virtual bases that come at the end of a class as padding
since we have a more detailed analysis of the class's storage
usage.
In subsequent patches, we should be able to use this analysis
to display a complete graphical view of a class's layout including
recursing arbitrarily deep into an object's base class / aggregate
member hierarchy.
llvm-svn: 300133
If workgroup size is known inform llvm about range returned by local
id and local size queries.
Differential Revision: https://reviews.llvm.org/D31804
llvm-svn: 300102
Summary:
Readnone attribute would cause CSE of two barriers with
the same argument, which is invalid by example:
struct Base {
virtual int foo() { return 42; }
};
struct Derived1 : Base {
int foo() override { return 50; }
};
struct Derived2 : Base {
int foo() override { return 100; }
};
void foo() {
Base *x = new Base{};
new (x) Derived1{};
int a = std::launder(x)->foo();
new (x) Derived2{};
int b = std::launder(x)->foo();
}
Here 2 calls of std::launder will produce @llvm.invariant.group.barrier,
which would be merged into one call, causing devirtualization
to devirtualize second call into Derived1::foo() instead of
Derived2::foo()
Reviewers: chandlerc, dberlin, hfinkel
Subscribers: llvm-commits, rsmith, amharc
Differential Revision: https://reviews.llvm.org/D31531
llvm-svn: 300101
As discussed in:
https://bugs.llvm.org/show_bug.cgi?id=32486
...the canonicalization of vector select to shufflevector does not hold up
when undef elements are present in the condition vector.
Try to make the undef handling clear in the code and the LangRef.
Differential Revision: https://reviews.llvm.org/D31980
llvm-svn: 300092
Currently if we reach an instruction with multiples uses we know we can't do any optimizations to that instruction itself since we only have the demanded bits for one of the users. But if we know all of the bits are zero/one for that one user we can still go ahead and create a constant to give to that user.
This might then reduce the instruction to having a single use and allow additional optimizations on the other path.
This picks up an additional case that r300075 didn't catch.
Differential Revision: https://reviews.llvm.org/D31552
llvm-svn: 300084
If we are adding/subtractings 0s below the highest demanded bit we can just use the other operand and remove the operation.
My primary motivation is observing that we can call ShrinkDemandedConstant for the add/sub and create a 0 constant, rather than removing the add completely. In the case I saw, we modified the constant on an add instruction to a 0, but the add is not put into the worklist. So we didn't revisit it until the next InstCombine iteration. This caused an IR modification to remove add and a subsequent iteration to be ran.
With this change we get bypass the add in the first iteration and prevent the second iteration from changing anything.
Differential Revision: https://reviews.llvm.org/D31120
llvm-svn: 300075
One potential way to make InstCombine (very slightly?) faster is to recycle instructions
when possible instead of creating new ones. It's not explicitly stated AFAIK, but we don't
consider this an "InstSimplify". We could, however, make a new layer to house transforms
like this if that makes InstCombine more manageable (just throwing out an idea; not sure
how much opportunity is actually here).
Differential Revision: https://reviews.llvm.org/D31863
llvm-svn: 300067
Use '2>&1 |' and not '|&' to pipe debug output to FileCheck
Hopefully handles a "shell parser error" on
llvm-clang-x86_64-expensive-checks-win
test/Transforms/SLPVectorizer/SystemZ/SLP-cmp-cost-query.ll
llvm-svn: 300064
In getEntryCost(), make the scalar type for a compare instruction that of the
operands, not i1. This is needed in order to call getCmpSelInstrCost() for a
compare in a sensible way, the same way as the LoopVectorizer does.
New test: test/Transforms/SLPVectorizer/SystemZ/SLP-cmp-cost-query.ll
Review: Matthew Simpson
https://reviews.llvm.org/D31601
llvm-svn: 300061
The cost for a branch after vectorization is very different depending on if
the vectorizer will if-convert the block (branch is eliminated), or if
scalarized and predicated blocks will be produced (branch duplicated before
each block). There is also the case of remaining scalar branches, such as the
back-edge branch.
This patch handles these cases differently with TTI based cost estimates.
Review: Matthew Simpson
https://reviews.llvm.org/D31175
llvm-svn: 300058
Since SystemZ supports vector element load/store instructions, there is no
need for extracts/inserts if a vector load/store gets scalarized.
This patch lets Target specify that it supports such instructions by means of
a new TTI hook that defaults to false.
The use for this is in the LoopVectorizer getScalarizationOverhead() method,
which will with this patch produce a smaller sum for a vector load/store on
SystemZ.
New test: test/Transforms/LoopVectorize/SystemZ/load-store-scalarization-cost.ll
Review: Adam Nemet
https://reviews.llvm.org/D30680
llvm-svn: 300056
getArithmeticInstrCost(), getShuffleCost(), getCastInstrCost(),
getCmpSelInstrCost(), getVectorInstrCost(), getMemoryOpCost(),
getInterleavedMemoryOpCost() implemented.
Interleaved access vectorization enabled.
BasicTTIImpl::getCastInstrCost() improved to check for legal extending loads,
in which case the cost of the z/sext instruction becomes 0.
Review: Ulrich Weigand, Renato Golin.
https://reviews.llvm.org/D29631
llvm-svn: 300052
Summary:
As far as instruction selection is concerned, all three appear to be same thing.
Support for these operands is experimental since AArch64 doesn't make use
of them and the in-tree targets that do use them (AMDGPU for
OperandWithDefaultOps, AMDGPU/ARM/Hexagon/Lanai for PredicateOperand, and ARM
for OperandWithDefaultOps) are not using tablegen-erated GlobalISel yet.
Reviewers: rovka, aditya_nandakumar, t.p.northover, qcolombet, ab
Reviewed By: rovka
Subscribers: inglorion, aemerson, rengolin, mehdi_amini, dberris, kristof.beyls, igorb, tpr, llvm-commits
Differential Revision: https://reviews.llvm.org/D31135
llvm-svn: 300037
Summary:
Dead basic blocks may be forming a loop, for which SSA form is
fulfilled, but with a circular def-use chain. LoadCombine could
enter an infinite loop when analysing such dead code. This patch
solves the problem by simply avoiding to analyse all basic blocks
that aren't forward reachable, from function entry, in LoadCombine.
Fixes https://bugs.llvm.org/show_bug.cgi?id=27065
Reviewers: mehdi_amini, chandlerc, grosser, Bigcheese, davide
Reviewed By: davide
Subscribers: dberlin, zzheng, bjope, grandinj, Ka-Ka, materi, jholewinski, llvm-commits, mzolotukhin
Differential Revision: https://reviews.llvm.org/D31032
llvm-svn: 300034