Summary:
Take the target's endianness into account when splitting the
debug information in DAGTypeLegalizer::SetExpandedInteger.
This patch fixes so that, for big-endian targets, the fragment
expression corresponding to the high part of a split integer
value is placed at offset 0, in order to correctly represent
the memory address order.
I have attached a PPC32 reproducer where the resulting DWARF
pieces for a 64-bit integer were incorrectly reversed.
Patch by: dstenb
Reviewers: JDevlieghere, aprantl, dblaikie
Reviewed By: JDevlieghere, aprantl, dblaikie
Subscribers: nemanjai
Differential Revision: https://reviews.llvm.org/D38172
llvm-svn: 314666
This patch add a support of ISD::ZERO_EXTEND in PPCDAGToDAGISel::tryBitPermutation to increase the opportunity to use rotate-and-mask by reordering ZEXT and ANDI.
Since tryBitPermutation stops analyzing nodes if it hits a ZEXT node while traversing SDNodes, we want to avoid ZEXT between two nodes that can be folded into a rotate-and-mask instruction.
For example, we allow these nodes
t9: i32 = add t7, Constant:i32<1>
t11: i32 = and t9, Constant:i32<255>
t12: i64 = zero_extend t11
t14: i64 = shl t12, Constant:i64<2>
to be folded into a rotate-and-mask instruction.
Such case often happens in array accesses with logical AND operation in the index, e.g. array[i & 0xFF];
Differential Revision: https://reviews.llvm.org/D37514
llvm-svn: 314655
I continue to support different VF interleaved and in this pass for this patch,
I added the vf64 stride3 support for both load and store.
I also added support fot the stride4 store.
Reviewers:
1. zvi
2. dorit
3. igorb
4. guyblank
Differential Revision: https://reviews.llvm.org/D37687
Change-Id: I3d238efedf217d1768b348d710de1efa2f19d27b
llvm-svn: 314651
This unifies the patterns between both modes. This should be effectively NFC since all the available registers in 32-bit mode statisfy this constraint.
llvm-svn: 314643
If the two instructions being compared for equivalence have corresponding operands
that are integer constants, then check their values to determine equivalence.
Patch by Suyog Sarda!
llvm-svn: 314642
Summary: This currently uses ConstantExpr to do its math, but as noted in a TODO it can all be done directly on APInt.
Reviewers: spatel, majnemer
Reviewed By: majnemer
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D38440
llvm-svn: 314640
Summary:
Intel documentation shows the memory operand as the first operand. But we currently treat it as the second operand. Conceptually the order doesn't matter since it doesn't write memory. We have aliases to parse with the operands in either order and the isel matching is commutable.
For the register®ister form order does matter for the assembly parser. PR22995 was previously filed and fixed by changing the register®ister form from MRMSrcReg to MRMDestReg to match gas. Ideally the memory form should match by using MRMDestMem.
I believe this supercedes D38025 which was trying to switch the register®ister form back to pre-PR22995.
Reviewers: aymanmus, RKSimon, zvi
Reviewed By: aymanmus
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D38120
llvm-svn: 314639
And follow-up r314585.
Leads to segfaults. I'll forward reproduction instructions to the patch
author.
Also, for a recommit, still add the original patch description.
Otherwise, it becomes really tedious to find out what a patch actually
does. The fact that it is a recommit with a fix is somewhat secondary.
llvm-svn: 314622
Fix llvm_tools_dir attribute access not to fail when the variable is not
present. This directory is not really necessary to run lit tests,
and the code already accounts for it being None.
The reference was added in r313407, and it breaks the stand-alone lit
package in Gentoo.
Differential Revision: https://reviews.llvm.org/D38442
llvm-svn: 314620
Summary: In SamplePGO ThinLTO compile phase, we will not invoke ICP as it may introduce confusion to the 2nd annotation. This patch extracted that logic and makes it clearer before profile annotation. In the mean time, we need to make function importing process both inlined callsites as well as not promoted indirect callsites.
Reviewers: tejohnson
Reviewed By: tejohnson
Subscribers: sanjoy, mehdi_amini, llvm-commits, inglorion
Differential Revision: https://reviews.llvm.org/D38094
llvm-svn: 314619
NFC.
Added code gen regression tests for avx512 instructions scheduling called avx512-schedule.ll and
avx512-shuffle-schedule.ll.
This patch is in preparation of a larger patch of adding all SKX instruction scheduling and therefore
the scheduling for the avx512 instructions are still missing.
Reviewers: zvi, delena, RKSimon, igorb
Differential Revision: https://reviews.llvm.org/D38035
Change-Id: I792762763127a921b9e13684b58af03646536533
llvm-svn: 314594
Implemented by splitting into two v32i8 mulhu/mulhs and concatenating the results.
Differential Revision: https://reviews.llvm.org/D38307
llvm-svn: 314584
This is now able to serialize DIALOG and DIALOGEX resources to .res
files. It still can't parse dialog-specific CAPTION, FONT, and STYLE
optional statement - these will be added in the following patch.
A limited set of controls is included. However, more can be easily added
by extending SupportedCtls map defined in ResourceScriptStmt.cpp.
Differential Revision: https://reviews.llvm.org/D37862
llvm-svn: 314578
We have a single library build without relaxation options.
When inlined library functions remove fast math attributes
from the functions they are integrated into.
This patch sets relaxation attributes on the functions after
linking provided corresponding relaxation options are given.
Math instructions inside the inlined functions remain to have
no fast flags, but inlining does not prevent fast math
transformations of a surrounding caller code anymore.
Differential Revision: https://reviews.llvm.org/D38325
llvm-svn: 314568
Currently expandUnalignedLoad/Store uses place holder pointer info for temporary memory operand
in stack, which does not have correct address space. This causes unaligned private double16 load/store to be
lowered to flat_load instead of buffer_load for amdgcn target.
This fixes failures of OpenCL conformance test basic/vload_private/vstore_private on target amdgcn---amdgizcl.
Differential Revision: https://reviews.llvm.org/D35361
llvm-svn: 314566
This allows MENU resources to be serialized.
MENU resource statement doc:
msdn.microsoft.com/en-us/library/windows/desktop/aa381025.aspx
POPUP sub-statement doc:
msdn.microsoft.com/en-us/library/windows/desktop/aa381030.aspx
MENUITEM sub-statement doc:
msdn.microsoft.com/en-us/library/windows/desktop/aa381024.aspx
MENUHEADER structure:
msdn.microsoft.com/en-us/library/windows/desktop/ms648018.aspx (and
NORMALMENUITEM, POPUPMENUITEM structs).
Thanks for Nico Weber for his original work in this area.
Differential Revision: https://reviews.llvm.org/D37828
llvm-svn: 314562
This patch will eliminate redundant intptr/ptrtoint that pessimizes
analyses such as SCEV, AA and will make optimization passes such
as auto-vectorization more powerful.
Differential revision: http://reviews.llvm.org/D37832
llvm-svn: 314561
Summary:
It appears polly makes use of the `CMAKE_RUNTIME_OUTPUT_DIRECTORY` variable
when configuring its lit test suite. Reverting this for now.
llvm-svn: 314551
Summary:
Three `CMAKE_.*_OUTPUT_DIRECTORY` variables used to be set in CMake and
referenced in various other parts of the project. However, in r198205
chapuni added a note to "don't set them anymore", and any remaining
references to them were subsequently removed in r198316 and r199592.
Now that the variables are no longer used anywhere, remove them, along
with the comments advising against using them any longer.
Test Plan:
I ran `check-all` and confirmed the tests built and passed.
Reviewers: beanz, chapuni
Reviewed By: beanz
Subscribers: mgorny
Differential Revision: https://reviews.llvm.org/D38389
llvm-svn: 314550
This allows llvm-rc to serialize ACCELERATORS resources.
Additionally, as this is the first type of resource to support basic
optional resource statements (LANGUAGE, CHARACTERISTICS, VERSION),
ACCELERATORS statement documentation:
msdn.microsoft.com/en-us/library/windows/desktop/aa380610.aspx
Accelerator table structure documentation:
msdn.microsoft.com/en-us/library/windows/desktop/ms648010.aspx
Optional resource statement fields are described in:
msdn.microsoft.com/en-us/library/windows/desktop/ms648027.aspx
Thanks for Nico Weber for his original work in this area.
Differential Revision: https://reviews.llvm.org/D37824
llvm-svn: 314549
When type shrinking reductions, we should insert the truncations and extends at
the end of the loop latch block. Previously, these instructions were inserted
at the end of the loop header block. The difference is only a problem for loops
with predicated instructions (e.g., conditional stores and instructions that
may divide by zero). For these instructions, we create new basic blocks inside
the vectorized loop, which cause the loop header and latch to no longer be the
same block. This should fix PR34687.
Reference: https://bugs.llvm.org/show_bug.cgi?id=34687
llvm-svn: 314542
This is a part of llvm-rc serialization patch set (serialization, pt 1.5).
This:
* Unifies the internal representation of flags in ACCELERATORS and MENU
with the corresponding representation in .res files (noticed in
https://reviews.llvm.org/D37828#inline-329828).
* Creates an RCResource subclass, OptStatementsRCResource, describing
resource statements that can declare resource-local optional statements
(proposed in https://reviews.llvm.org/D37824#inline-329775).
These modifications don't fit to any of the current patches, so I'm
submitting them as a separate patch.
Differential Revision: https://reviews.llvm.org/D37841
llvm-svn: 314541
This allows to process HTML resources defined in .rc scripts and output
them to resulting .res files. Additionally, some infrastructure allowing
to output these files is created.
This is the first resource type we can operate on.
Thanks to Nico Weber for his original work in this area.
Differential Revision: reviews.llvm.org/D37283
llvm-svn: 314538
I've seen cases where tiny inlined functions have such a high execution count
that most everything would show up with a relative of hotness of 0%. Since
the inlined functions effectively disappear you need to tune in the lower
range, thus we need more precision.
llvm-svn: 314537
Summary:
Also disables leak checking on lto tests, due to many leaks reported
in the system's ld64.
Reviewers: kcc, pcc, bogner, kubamracek
Subscribers: mehdi_amini, llvm-commits
Differential Revision: https://reviews.llvm.org/D37781
llvm-svn: 314535
The type of a SCEVConstant may not match the corresponding LLVM Value.
In this case, we skip the constant folding for now.
TODO: Replace ConstantInt Zero by ConstantPointerNull
llvm-svn: 314531
The test attempts to use -1 as carry-in for v_addc_*.
Before writing r314522, I did actually test this on real hardware,
and found that it doesn't work. So r314522 is correct in restricting
the carry-in operand: just remove those tests to make things pass
again.
llvm-svn: 314530
Summary:
Demanglers such as libiberty know how to strip suffixes of the form
\.[a-zA-Z]+\.\d+, but our current promoted value suffixes are
.llvm.${modulehash}, where the module hash is in hex. Change the
module hash to decimal to allow demanglers to handle this.
Reviewers: danielcdh
Subscribers: llvm-commits, inglorion
Differential Revision: https://reviews.llvm.org/D38405
llvm-svn: 314527
This implement the insertion operator for DWARF address ranges so they
are consistently printed as [LowPC, HighPC).
While a dump method might have felt more consistent, it is used
exclusively for printing error messages in the verifier and never used
for actual dumping. Hence this approach is more intuitive and creates
less clutter at the call sites.
Differential revision: https://reviews.llvm.org/D38395
llvm-svn: 314523
The hardware will only forward EXEC_LO; the high 32 bits will be zero.
Additionally, inline constants do not work. At least,
v_addc_u32_e64 v0, vcc, v0, v1, -1
which could conceivably be used to combine (v0 + v1 + 1) into a single
instruction, acts as if all carry-in bits are zero.
The llvm.amdgcn.ps.live test is adjusted; it would be nice to combine
s_mov_b64 s[0:1], exec
v_cndmask_b32_e64 v0, v1, v2, s[0:1]
into
v_mov_b32 v0, v3
but it's not particularly high priority.
Fixes dEQP-GLES31.functional.shaders.helper_invocation.value.*
llvm-svn: 314522
Summary:
Currently, getGEPCost() returns TCC_FREE whenever a GEP is a legal addressing mode in the target.
However, since it doesn't check its actual users, it will return FREE even in cases
where the GEP cannot be folded away as a part of actual addressing mode.
For example, if an user of the GEP is a call instruction taking the GEP as a parameter,
then the GEP may not be folded in isel.
Reviewers: hfinkel, efriedma, mcrosier, jingyue, haicheng
Reviewed By: hfinkel
Subscribers: javed.absar, llvm-commits
Differential Revision: https://reviews.llvm.org/D38085
llvm-svn: 314517
Implement shouldCoalesce() to help regalloc avoid running out of GR128
registers.
If a COPY involving a subreg of a GR128 is coalesced, the live range of the
GR128 virtual register will be extended. If this happens where there are
enough phys-reg clobbers present, regalloc will run out of registers (if
there is not a single GR128 allocatable register available).
This patch tries to allow coalescing only when it can prove that this will be
safe by checking the (local) interval in question.
Review: Ulrich Weigand, Quentin Colombet
https://reviews.llvm.org/D37899https://bugs.llvm.org/show_bug.cgi?id=34610
llvm-svn: 314516
New instructions are added to AArch32 and AArch64 to aid
floating-point multiplication and addition of complex numbers, where
the complex numbers are packed in a vector register as a pair of
elements. The Imaginary part of the number is placed in the more
significant element, and the Real part of the number is placed in the
less significant element.
This patch adds assembler for the ARM target.
Differential Revision: https://reviews.llvm.org/D36789
llvm-svn: 314511
Fix nested callseq* nodes by moving callseq_start after the
arguments calculation to temporary registers, so that callseq* nodes
in resulting DAG are linear.
Recommitting r314497. This version does not contain test which fails
when compiler is not build in debug mode.
Differential Revision: https://reviews.llvm.org/D37328
llvm-svn: 314507
Add missing license information to MicroMipsInstrFPU.td and
fix most of the formatting errors present. Others will be
addressed in a follow up commits.
llvm-svn: 314505
Added additional tests for vector multiplications with multipliers that are:
* powers of 2 displaced by 1,
* product of a power of 2 displaced by one with another power of 2.
Patch by @pacxx (Michael Haidl)
Differential Revision: https://reviews.llvm.org/D38350
llvm-svn: 314504
Summary:
This commit adds comments on how the AMDPAL OS type overloads the
existing AMDGPU_ calling conventions used by Mesa, and adds a couple of
new ones.
Reviewers: arsenm, nhaehnle, dstuttard
Subscribers: mehdi_amini, kzhuravl, wdng, yaxunl, t-tye, llvm-commits
Differential Revision: https://reviews.llvm.org/D37752
llvm-svn: 314502
Summary:
Added support for scratch (including spilling) for OS type amdpal:
generates code to set up the scratch descriptor if it is needed.
With amdpal, the scratch resource descriptor is loaded from offset 0 of
the global information table. The low 32 bits of the address of the
global information table is passed in s0.
Added amdgpu-git-ptr-high function attribute to hard-wire the high 32
bits of the address of the global information table. If the function
attribute is not specified, or is 0xffffffff, then the backend generates
code to use the high 32 bits of pc.
The documentation for the AMDPAL ABI will be added in a later commit.
Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, t-tye
Differential Revision: https://reviews.llvm.org/D37483
llvm-svn: 314501
Summary:
This operating system type represents the AMDGPU PAL runtime, and will
be required by the AMDGPU backend in order to generate correct code for
this runtime.
Currently it generates the same code as not specifying an OS at all.
That will change in future commits.
Patch from Tim Corringham.
Subscribers: arsenm, nhaehnle
Differential Revision: https://reviews.llvm.org/D37380
llvm-svn: 314500
This patch introduces 3 helper functions: error(), warn() and note() to
make printing during verification more consistent. When supported, the
respective prefixes are printed in color using the same color scheme as
clang.
Differential revision: https://reviews.llvm.org/D38368
llvm-svn: 314498
Fix nested callseq* nodes by moving callseq_start after the
arguments calculation to temporary registers, so that callseq* nodes
in resulting DAG are linear.
Differential Revision: https://reviews.llvm.org/D37328
llvm-svn: 314497
Allow the proper recognition of Enum values and global variables inside ms inline-asm memory / immediate expressions, as they require some additional overhead and treated incorrect if doesn't early recognized.
supersedes D33278, D35774
Differential Revision: https://reviews.llvm.org/D37412
llvm-svn: 314493
This is slightly less verbose for the common case of a single build directory
and more intuitive when using this API directly from the interpreter.
llvm-svn: 314491
Previous patch fixed one of LLVM buildbots (lld-x86_64-win7).
However, some others have already been failing because of make_unique
compilation error (llvm-clang-x86_64-expensive-checks-win).
llvm-svn: 314480
This allows llvm-rc to parse user-defined resources (ref:
msdn.microsoft.com/en-us/library/windows/desktop/aa381054.aspx).
These statements either import files, or put the specified raw data in
the resulting resource file.
Thanks to Nico Weber for his original work in this area.
Differential Revision: https://reviews.llvm.org/D37033
llvm-svn: 314478
This allows the ints to be written as integer expressions evaluating to
unsigned 16-bit/32-bit integers.
All the expressions may use the following operators: + - & | ~, and
parentheses. Minus token - can be also unary. There is no precedence of
the operators other than the unary operators binding stronger than their
binary counterparts.
Differential Revision: https://reviews.llvm.org/D37022
llvm-svn: 314477
This commit yanks out the repeated sections of code in pruneCandidates into
two lambdas: ShouldSkipCandidate and Prune. This simplifies the logic in
pruneCandidates significantly, and reduces the chance of introducing bugs by
folding all of the shared logic into one place.
llvm-svn: 314475
Summary:
X86ISelDAGToDAG tries to analyze ANDs compared with 0 to optimize to narrower immediates using subregisters.
I don't think we should be optimizing to 16-bit test instructions. It goes against our normal behavior of promoting i16 operations to i32. It only saves one byte due to the need to add a 0x66 prefix. I think it would also be subject to a length changing prefix penalty in the decoders on Intel CPUs.
Reviewers: RKSimon, zvi, spatel
Reviewed By: spatel
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D38273
llvm-svn: 314474
LR is an untypical callee saved register in that it is restored into a
different register (PC) and thus does not live-out of the return block.
This case requires the `Restored` flag in CalleeSavedInfo to be cleared.
This fixes a number of cases where this wasn't handled correctly yet.
llvm-svn: 314471
This extends the set of llvm-rc parser's available resources by
another one, VERSIONINFO.
Ref: msdn.microsoft.com/en-us/library/windows/desktop/aa381058.aspx
Thanks to Nico Weber for his original work in this area.
Differential Revision: https://reviews.llvm.org/D37021
llvm-svn: 314468
The expensive-checks build bot found a problem with the r314428 commit:
if CC is live after a ATOMIC_CMP_SWAPW instruction, it needs to be
marked as live-in to the block after the loop the pseudo gets expanded
to. This actually fixes a code-gen bug as well, since if the CC isn't
live, the CR and JLH are merged to a CRJLH which doesn't actually set
the condition code any more.
llvm-svn: 314465
If we have BWI, we can truncate in a much simpler way by using vpmovwb. This even works without VLX by using the wider zmm->ymm truncate with a subvector extract.
Differential Revision: https://reviews.llvm.org/D38375
llvm-svn: 314457
/code/llvm-project/llvm/unittests/ExecutionEngine/Orc/RTDyldObjectLinkingLayerTest.cpp:260:38: error: lambda capture 'this' is not used [-Werror,-Wunused-lambda-capture]
[this](decltype(ObjLayer)::ObjHandleT,
llvm-svn: 314454
In setupEntryBlockAndCallSites in CodeGen/SjLjEHPrepare.cpp,
we fetch and store the actual frame pointer, but on return via
the longjmp intrinsic, it always was restored into the r7 variable.
On windows, the frame pointer should be restored into r11 instead of r7.
On Darwin (where sjlj exception handling is used by default), the frame
pointer is always r7, both in arm and thumb mode, and likewise, on
windows, the frame pointer always is r11.
On linux however, if sjlj exception handling is enabled (which it isn't
by default), libcxxabi and the user code can be built in differing modes
using different registers as frame pointer. Therefore, when restoring
registers on a platform where we don't always use the same register
depending on code mode, restore both r7 and r11.
Differential Revision: https://reviews.llvm.org/D38253
llvm-svn: 314451
We aren't do any in register extends here so we should be able to just the target independent nodes directly and allow them to be lowered as necessary.
llvm-svn: 314447
This patch implements the dwarfdump option --find=<name>. This option
looks for a DIE in the accelerator tables and dumps it if found. This
initial patch only adds support for .apple_names to keep the review
small, adding the other sections and pubnames support should be
trivial though.
Differential Revision: https://reviews.llvm.org/D38282
llvm-svn: 314439
JumpThreading now preserves dominance and lazy value information across the
entire pass. The pass manager is also informed of this preservation with
the goal of DT and LVI being recalculated fewer times overall during
compilation.
This change prepares JumpThreading for enhanced opportunities; particularly
those across loop boundaries.
Patch by: Brian Rzycki <b.rzycki@samsung.com>,
Sebastian Pop <s.pop@samsung.com>
Differential revision: https://reviews.llvm.org/D37528
llvm-svn: 314435
Summary: If we have BWI instructions we can widen to v32i16 to do the multiply instead of splitting.
Reviewers: RKSimon, spatel, zvi
Reviewed By: zvi
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D38305
llvm-svn: 314432
Summary:
Lowering never creates X86ISD::UMUL for 8-bit types. X86ISD::UMUL8 is used instead. If X86ISD::UMUL 8-bit were ever used it would crash.
DAGCombiner replaces UMUL_LOHI/SMUL_LOHI with a wider MUL and a shift if the type twice as wide is legal. So we should never see i8 UMUL_LOHI/SMUL_LOHI. In fact I think there was a bug in part of the i8 code. Similar is true for i16 though without the bug.
Reviewers: RKSimon, spatel, zvi
Reviewed By: zvi
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D38276
llvm-svn: 314430
Previously we were using one of the subvector indices twice. The included test case causes an assert without this change.
Thanks to Simon Pilgrim for catching this.
llvm-svn: 314429
The SystemZ compare-and-swap instructions already provide the "success"
indication via a condition-code value, so the default expansion of those
operations generates an unnecessary extra comparsion.
llvm-svn: 314428
This patch adds a check to the DWARF verifier to detect CUs without a
unit DIE.
Differential revision: https://reviews.llvm.org/D38363
llvm-svn: 314426
This patch disables codegen support for branch likely instructions to
address a potential bug. These branches were unselectable as
they had the same patterns as the normal branches but came after them
when ISel was concerned.
The branch likely instructions were marked as having no delay
slots when they have annulling delay slots. The delay slot filler
does not currently handle annulling delay slot branches, so this
would lead to wrong codegen if these branches were generated.
Reviewers: atanasyan, nitesh.jain
Differential Revision: https://reviews.llvm.org/D38169
llvm-svn: 314421
Summary:
A DBG_VALUE that is referring to a physical register is
valid up until the next def of the register, or the end
of the basic block that it belongs to.
LiveDebugVariables is computing live intervals (slot index
ranges) for DBG_VALUE instructions, before regalloc, in order
to be able to re-insert DBG_VALUE instructions again after
regalloc. When the DBG_VALUE is mapping a variable to a
physical register we do not need to compute the range. We
should simply re-insert the DBG_VALUE at the start position.
The problem that was found, resulting in this patch, was a
situation when the DBG_VALUE was the last real use of the
physical register. The computeIntervals/extendDef methods
extended the range to cover the whole basic block, even though
the physical register very well could be allocated to some
virtual register inside the basic block. So the extended
range could not be trusted.
This patch is a preparation for https://reviews.llvm.org/D38229,
where the goal is to insert DBG_VALUE after each new definition
of a variable, even if the virtual registers that the variable
was connected to has been coalesced into using the same physical
register (e.g. due to two address instructions). For more info
see https://bugs.llvm.org/show_bug.cgi?id=34545
Reviewers: aprantl, rnk, echristo
Reviewed By: aprantl
Subscribers: Ka-Ka, llvm-commits
Differential Revision: https://reviews.llvm.org/D38140
llvm-svn: 314414
The second argument for Allocator::Deallocate is the number of elements,
not the size of a single element. In asan mode specifying a large number
of elements poisoned random memory regions, leading to crashes
everywhere.
llvm-svn: 314413
Summary:
This allows sharing the lattice value code between LVI and SCCP (D36656).
It also adds a `satisfiesPredicate` function, used by D36656.
Reviewers: davide, sanjoy, efriedma
Reviewed By: sanjoy
Subscribers: mgorny, llvm-commits
Differential Revision: https://reviews.llvm.org/D37591
llvm-svn: 314411
MS allows the following size directives: float/double and long as synonymous to dword/qword and dword, respectively.
Differential Revision: https://reviews.llvm.org/D37190
llvm-svn: 314410
Before this change using any of the -name*= command line options with an output
directory would result in a single file (functions.txt/functions.html)
containing the coverage for those specific functions. Now you get the same
directory structure as when not using any -name*= options.
Differential Revision: https://reviews.llvm.org/D38280
llvm-svn: 314396
It's currently quite difficult to test passes like branch relaxation, which
requires branches with large displacement to be generated. The .space assembler
directive makes it easy to create arbitrarily large basic blocks, but
getInlineAsmLength is not able to parse it and so the size of the block is not
correctly estimated. Other backends (AArch64, AMDGPU) introduce options just
for testing that artificially restrict the ranges of branch instructions (e.g.
aarch64-tbz-offset-bits). Although parsing a single form of the .space
directive feels inelegant, it does allow a more direct testing approach.
This patch adapts the .space parsing code from
Mips16InstrInfo::getInlineAsmLength and removes it now the extra functionality
is provided by the base implementation. I want to move this functionality to
the generic getInlineAsmLength as 1) I need the same for RISC-V, and 2) I feel
other backends will benefit from more direct testing of large branch
displacements.
Differential Revision: https://reviews.llvm.org/D37798
llvm-svn: 314393
This is a follow-on of D37211.
D37211 eliminates a compare instruction if two conditional branches can be made based on the one compare instruction, e.g.
if (a == 0) { ... }
else if (a < 0) { ... }
This patch extends this optimization to support partially redundant cases, which often happen in while loops.
For example, one compare instruction is moved from the loop body into the preheader by this optimization in the following example.
do {
if (a == 0) dummy1();
a = func(a);
} while (a > 0);
Differential Revision: https://reviews.llvm.org/D38236
llvm-svn: 314390
%lo(), %hi(), and %pcrel_hi() are supported and test cases have been added to
ensure the appropriate fixups and relocations are generated. I've added an
instruction format field which is used in RISCVMCCodeEmitter to, for
instance, tell whether it should emit a lo12_i fixup or a lo12_s fixup
(RISC-V has two 12-bit immediate encodings depending on the instruction
type).
Differential Revision: https://reviews.llvm.org/D23568
llvm-svn: 314389
Summary:
If we have a non-allocated register, we allow us to try recoloring of an
already allocated and "Done" register, even if they are of the same
register class, if the non-allocated register has at least one tied def
and the allocated one has none.
It should be easier to recolor the non-tied register than the tied one, so
it might be an improvement even if they use the same regclasses.
Reviewers: qcolombet
Reviewed By: qcolombet
Subscribers: llvm-commits, MatzeB
Differential Revision: https://reviews.llvm.org/D38309
llvm-svn: 314388
Without this, we could end up trying to get the Nth (0-indexed) element
from a subvector of size N.
Differential Revision: https://reviews.llvm.org/D37880
llvm-svn: 314380
This patch adds new insn, "reg = be16/be32/be64 reg",
for bswap to little endian for big-endian target (bpfeb).
It also adds new insn for negation "reg = -reg".
Currently, for source code, e.g.,
b = -a
LLVM still prefers to generate:
b = 0 - a
But "reg = -reg" format can be used in assembly code.
Signed-off-by: Yonghong Song <yhs@fb.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
llvm-svn: 314376
Summary:
And now that we no longer have to explicitly free() the Loop instances, we can
(with more ease) use the destructor of LoopBase to do what LoopBase::clear() was
doing.
Reviewers: chandlerc
Subscribers: mehdi_amini, mcrosier, llvm-commits
Differential Revision: https://reviews.llvm.org/D38201
llvm-svn: 314375
concept.
Add a unit-test to make sure we don't backslide, and tweak the MockBaseLayer
utility to make it easier to test this kind of thing in the future.
llvm-svn: 314374
If we do not initialize Prefix here, Prefix.data() returns a nullptr.
Later, it is passed to memcpy. memcpy's behavior is undefined if src (or
dst) is a nullptr even if a given size is 0. That's why this code
triggered UBsan.
llvm-svn: 314368
Summary:
This avoids C++ UB if the GEP is weird and the calculation overflows
int64_t, and it's also observable in the cost model's results.
Such GEPs are almost surely not valid pointers, but LLVM nonetheless
generates them sometimes.
Reviewers: sanjoy
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D38337
llvm-svn: 314362
This patch produces a crash and hexagon_vector_loop_carried_reuse_constant.ll test fails on Windows (llvm-clang-x86_64-expensive-checks-win build bot).
llvm-svn: 314361
This reverts r314017 and similar code added in later commits. It seems to not work for pointer compares and is causing a bot failure for the last several days.
llvm-svn: 314360
The tar format originally supported up to 99 byte filename. The two
extensions are proposed later: Ustar or PAX.
In the UStar extension, a pathanme is split at a '/' and its "prefix"
and "suffix" are stored in different locations in the tar header. Since
"prefix" can be up to 155 byte, it can represent up to 254 byte
filename (but exact limit depends on the location of '/' character in
a pathname.)
Our TarWriter first attempt to use UStar extension and then fallback to
PAX extension.
But there's a bug in UStar header creation. "Suffix" part must be a NUL-
terminated string, but we didn't handle it correctly. As a result, if
your filename just 100 characters long, the last character was droppped.
This patch fixes the issue.
Differential Revision: https://reviews.llvm.org/D38149
llvm-svn: 314349
Summary:
*In-source builds* of LLVM, in which a user invokes `cmake` from within the
LLVM source directory, or invokes `cmake -B/path/to/source/dir/of/llvm`,
are explicitly checked for and disallowed by LLVM's `CMakeLists.txt`.
*In-tree builds*, on the other hand, refer to when the source directories
of projects such as Clang are nested within the `llvm/tools` source
directory. These are not disallowed, and are in fact a common way of
building LLVM and Clang.
Revise the comment to match the logic underneath it: it checks for an
"in-source build", not an "in-tree build".
Reviewers: beanz
Reviewed By: beanz
Subscribers: mgorny
Differential Revision: https://reviews.llvm.org/D38317
llvm-svn: 314348
FileOutputBuffer::create() attempts to remove a target file if the file
is a regular one, which results in an unexpected result in a failure
scenario.
If something goes wrong and the user of FileOutputBuffer decides to not
call commit(), it leaves nothing. An existing file is removed, and no
new file is created.
What we should do is to atomically replace an existing file with a new
file using rename(), so that it wouldn't remove an existing file without
creating a new one.
Differential Revision: https://reviews.llvm.org/D38283
llvm-svn: 314345
This commit allows the outliner to avoid saving and restoring the link register
on AArch64 when it is dead within an entire class of candidates.
This introduces changes to the way the outliner interfaces with the target.
For example, the target now interfaces with the outliner using a
MachineOutlinerInfo struct rather than by using getOutliningCallOverhead and
getOutliningFrameOverhead.
This also improves several comments on the outliner's cost model.
https://reviews.llvm.org/D36721
llvm-svn: 314341
Summary:
According to https://gcc.gnu.org/wiki/SplitStacks, the linker expects a zero-sized .note.GNU-split-stack section if split-stack is used (and also .note.GNU-no-split-stack section if it also contains non-split-stack functions), so it can handle the cases where a split-stack function calls non-split-stack function.
This change adds the sections if needed.
Fixes PR #34670.
Reviewers: thanm, rnk, luqmana
Reviewed By: rnk
Subscribers: llvm-commits
Patch by Cherry Zhang <cherryyz@google.com>
Differential Revision: https://reviews.llvm.org/D38051
llvm-svn: 314335
We already have zeroable bits in an APInt. We might as well use that instead of checking for an all zero BUILD_VECTOR.
Differential Revision: https://reviews.llvm.org/D37950
llvm-svn: 314332
In some cases the result psadbw is smaller than the type of the add that started the match. Currently in these cases we are using a smaller add and inserting the result.
If we instead combine the psadbw with zeros and use the full size add we can take advantage of implicit zeroing we get if we emit a narrower move before the add.
In a future patch, I want to make isel aware that the psadbw itself already zeroed the upper bits and remove the move entirely.
Differential Revision: https://reviews.llvm.org/D37453
llvm-svn: 314331
reductions.
If both operands of the newly created SelectInst are Undefs the
resulting operation is also Undef, not SelectInst. It may cause crashes
when trying to propagate IR flags because function expects exactly
SelectInst instruction, nothing else.
llvm-svn: 314323
These changes faciliate positive behavior for arithmetic based select
expressions that match its translation criteria, keeping code size gated to
neutral or improved scenarios.
Patch by Michael Berg <michael_c_berg@apple.com>!
Differential Revision: https://reviews.llvm.org/D38263
llvm-svn: 314320
Summary:
Found when testing stage-2 build with D38101.
```
In file included from /build/llvm/lib/Support/Path.cpp:1045:
/build/llvm/lib/Support/Unix/Path.inc:648:14: error: comparison 'uint64_t' (aka 'unsigned long') > 18446744073709551615 is always false [-Werror,-Wtautological-constant-compare]
if (length > std::numeric_limits<size_t>::max()) {
~~~~~~ ^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
```
`size_t` is `uint64_t` here, apparently, thus any `uint64_t` value
always fits into `size_t`.
Initial patch was to use some preprocessor logic to
not check if the size is known to fit at compile time.
But Zachary Turner suggested using this approach.
Reviewers: Bigcheese, rafael, zturner, mehdi_amini
Reviewed by (via email): zturner
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D38132
llvm-svn: 314312
Before this change using any of the -name*= command line options with an output
directory would result in a single file (functions.txt/functions.html)
containing the coverage for those specific functions. Now you get the same
directory structure as when not using any -name*= options.
Differential Revision: https://reviews.llvm.org/D38280
llvm-svn: 314310
This was intended to be no-functional-change, but it's not - there's a test diff.
So I thought I should stop here and post it as-is to see if this looks like what was expected
based on the discussion in PR34603:
https://bugs.llvm.org/show_bug.cgi?id=34603
Notes:
1. The test improvement occurs because the existing 'LateSimplifyCFG' marker is not carried
through the recursive calls to 'SimplifyCFG()->SimplifyCFGOpt().run()->SimplifyCFG()'.
The parameter isn't passed down, so we pick up the default value from the function signature
after the first level. I assumed that was a bug, so I've passed 'Options' down in all of the
'SimplifyCFG' calls.
2. I split 'LateSimplifyCFG' into 2 bits: ConvertSwitchToLookupTable and KeepCanonicalLoops.
This would theoretically allow us to differentiate the transforms controlled by those params
independently.
3. We could stash the optional AssumptionCache pointer and 'LoopHeaders' pointer in the struct too.
I just stopped here to minimize the diffs.
4. Similarly, I stopped short of messing with the pass manager layer. I have another question that
could wait for the follow-up: why is the new pass manager creating the pass with LateSimplifyCFG
set to true no matter where in the pipeline it's creating SimplifyCFG passes?
// Create an early function pass manager to cleanup the output of the
// frontend.
EarlyFPM.addPass(SimplifyCFGPass());
-->
/// \brief Construct a pass with the default thresholds
/// and switch optimizations.
SimplifyCFGPass::SimplifyCFGPass()
: BonusInstThreshold(UserBonusInstThreshold),
LateSimplifyCFG(true) {} <-- switches get converted to lookup tables and loops may not be in canonical form
If this is unintended, then it's possible that the current behavior of dropping the 'LateSimplifyCFG'
setting via recursion was masking this bug.
Differential Revision: https://reviews.llvm.org/D38138
llvm-svn: 314308
InlineCost can understand Select IR now. This patch finds free Select IRs and
continue the propagation of SimplifiedValues, ConstantOffsetPtrs, and
SROAArgValues.
Differential Revision: https://reviews.llvm.org/D37198
llvm-svn: 314307
NFC.
Updated 8 regression tests to use -mattr instead of -mcpu flag as follows:
-mcpu=knl --> -mattr=+avx512f
-mcpu=skx --> -mattr=+avx512f,+avx512bw,+avx512vl,+avx512dq
The updates are as part of the preparation of a large commit to add all instruction scheduling for the SKX target.
Reviewers: delena, zvi, RKSimon
Differential Revision: https://reviews.llvm.org/D38222
Change-Id: I2381c9b5bb75ecacfca017243c22d054f6eddd14
llvm-svn: 314306
This patch makes analyzeBranch eliminate unconditional branch to the next instruction.
After basic blocks are re-organized by optimizers, such as machine block placement, a BB may end with an unconditional branch to the next (fallthrough) BB. This patch removes such redundant branch instruction.
Differential Revision: https://reviews.llvm.org/D37730
llvm-svn: 314297
As commented on D37849 and rL313547, AVX1 targets were missing a chance to use vmovmskpd for v4f64/v4i64 results for bool vector bitcasts
llvm-svn: 314293
I implemented isTruncateFree in rL313533, this patch fixes the logic
to match my comment, as the previous logic was too general. Now the
only truncates that are free are i64 -> i32.
Differential Revision: https://reviews.llvm.org/D38234
llvm-svn: 314280
This is necessary, but not sufficient, for having working SJLJ exception
handling on x86_64.
Differential Revision: https://reviews.llvm.org/D38254
llvm-svn: 314277
The callsite value is already stored indexed from 0 in
the _Unwind_Context struct. When accessed via the functions
_Unwind_GetIP and _Unwind_SetIP, the value is indexed from 1,
but those functions handle the offseting. When reading directly
from the struct here, we shouldn't subtract 1.
This matches the code generated by the ARM target, where SJLJ
exception handling is used by default on iOS.
This makes clang-built object files for 32 bit x86 mingw work when
linked with libgcc/libstdc++.
Differential Revision: https://reviews.llvm.org/D38251
llvm-svn: 314276
This matches the types of the struct members defined in
lib/CodeGen/SjLjEHPrepare.cpp, and the definition of this struct in libgcc.
Differential Revision: https://reviews.llvm.org/D38248
llvm-svn: 314275