On failing to find sequences that can be converted into dual macs,
try to find sequential 16-bit loads that are used by muls which we
can then use smultb, smulbt, smultt with a wide load.
Differential Revision: https://reviews.llvm.org/D51983
llvm-svn: 342210
As preparation for LoopInterchange becoming a loop pass, it needs to
preserve ScalarEvolution. Even though interchanging should not change
the trip count of the loop, it modifies loop entry, latch and exit
blocks.
I added -verify-scev to some loop interchange tests, but the verification does
not catch problems caused by missing invalidation of SE in loop interchange, as
the trip counts themselves do not change. So there might be potential to
make the SE verification covering more stuff in the future.
Reviewers: mkazantsev, efriedma, karthikthecool
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D52026
llvm-svn: 342209
After recent improvements which makes better use of LOC instead of IPM, the
TTI cost functions also needs to be updated to reflect this.
This involves sext, zext and xor of i1.
The tests were updated so that for z13 the new costs are expected, while the
old costs are still checked for on zEC12.
Review: Ulrich Weigand
https://reviews.llvm.org/D51339
llvm-svn: 342207
When reading directives from a .drectve section, the directives are
tokenized as a normal windows command line. However in these cases,
link.exe allows the directives to be separated by null bytes, not only by
spaces.
A test case for this change will be added in the lld repo.
Differential Revision: https://reviews.llvm.org/D52014
llvm-svn: 342204
Summary:
[VPlan] Implement vector code generation support for simple outer loops.
Context: Patch Series #1 for outer loop vectorization support in LV using VPlan. (RFC: http://lists.llvm.org/pipermail/llvm-dev/2017-December/119523.html).
This patch introduces vector code generation support for simple outer loops that are currently supported in the VPlanNativePath. Changes here essentially do the following:
- force vector code generation using explicit vectorize_width
- add conservative early returns in cost model and other places for VPlanNativePath
- add code for setting up outer loop inductions
- support for widening non-induction PHIs that can result from inner loops and uniform conditional branches
- support for generating uniform inner branches
We plan to add a handful C outer loop executable tests once the initial code generation support is committed. This patch is expected to be NFC for the inner loop vectorizer path. Since we are moving in the direction of supporting outer loop vectorization in LV, it may also be time to rename classes such as InnerLoopVectorizer.
Reviewers: fhahn, rengolin, hsaito, dcaballe, mkuper, hfinkel, Ayal
Reviewed By: fhahn, hsaito
Subscribers: dmgreen, bollu, tschuett, rkruppe, rogfer01, llvm-commits
Differential Revision: https://reviews.llvm.org/D50820
llvm-svn: 342197
Summary:
I accidentally left this behind in D50306, and it causes a build warning
when I build with gcc7.
Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, t-tye, llvm-commits
Differential Revision: https://reviews.llvm.org/D52022
Change-Id: I30f7a47047e9d9d841f652da66d2fea19e74842c
llvm-svn: 342189
Summary:
Place global arrays in comdat sections with their associated functions.
This makes sure they are stripped along with the functions they
reference, even on the BFD linker.
Reviewers: eugenis
Reviewed By: eugenis
Subscribers: eraman, hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D51902
llvm-svn: 342186
INLINEASM instructions use extra operands to carry flags. If a register operand is removed without removing the flag operand, then the flags will no longer make sense.
This patch fixes this by preventing the removal when a flag operand is present.
The included test case was generated by MS inline assembly. Longer term maybe we should fix the inline assembly parsing to not generate redundant operands.
Differential Revision: https://reviews.llvm.org/D51829
llvm-svn: 342176
When replacing a named register input to the appropriately sized
sub/super-register. In the case of a 64-bit value being assigned to a
register in 32-bit mode, match GCC's assignment.
Reviewers: eli.friedman, craig.topper
Subscribers: nickdesaulniers, llvm-commits, hiraditya
Differential Revision: https://reviews.llvm.org/D51502
llvm-svn: 342175
Summary:
It is sometimes important to check that some newly-computed value
is non-negative and only n bits wide (where n is a variable.)
There are many ways to check that:
https://godbolt.org/z/o4RB8D
The last variant seems best?
(I'm sure there are some other variations i haven't thought of..)
More complicated, canonical pattern:
https://rise4fun.com/Alive/uhA
We do need to have two `switch()`'es like this,
to not mismatch the swappable predicates.
https://bugs.llvm.org/show_bug.cgi?id=38708
Reviewers: spatel, craig.topper, RKSimon
Reviewed By: spatel
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D52001
llvm-svn: 342173
This adds DebugCounter support to the PartiallyInlineLibCalls pass,
which should make debugging/automated bisection easier in the future.
Patch by Zhizhou Yang!
Differential Revision: https://reviews.llvm.org/D50093
llvm-svn: 342172
This allows the xor to be removed completely.
This might help with recomitting r341674, but seems good regardless.
Coincidentally fixes PR38915.
Differential Revision: https://reviews.llvm.org/D51964
llvm-svn: 342163
Summary:
Fixed assertions due to invalid fixup when encoding compressed instructions
(c.addi, c.addiw, c.li, c.andi) with bare symbols with/without modifiers.
This matches GAS behavior as well.
This bug was uncovered by a LLVM MC Disassembler Protocol Buffer Fuzzer
for the RISC-V assembly language.
Reviewers: asb
Reviewed By: asb
Subscribers: rbar, johnrusso, simoncook, sabuasal, niosHD, kito-cheng, shiva0217, zzheng, edward-jones, mgrang, rogfer01, MartinMosbeck, brucehoult, the_o, rkruppe, PkmX, jocewei, asb
Differential Revision: https://reviews.llvm.org/D52005
llvm-svn: 342160
Summary:
The illegal instruction 0x00 0x00 is being wrongly decoded as
c.addi4spn with 0 immediate.
The invalid instruction 0x01 0x61 is being wrongly decoded as
c.addi16sp with 0 immediate.
This bug was uncovered by a LLVM MC Disassembler Protocol Buffer Fuzzer
for the RISC-V assembly language.
Reviewers: asb
Reviewed By: asb
Subscribers: rbar, johnrusso, simoncook, sabuasal, niosHD, kito-cheng, shiva0217, zzheng, edward-jones, mgrang, rogfer01, MartinMosbeck, brucehoult, the_o, rkruppe, PkmX, jocewei, asb
Differential Revision: https://reviews.llvm.org/D51815
llvm-svn: 342159
Also, add a check to ensure that when main has the expected signature
we do not create a wrapper.
Differential Revision: https://reviews.llvm.org/D51562
llvm-svn: 342157
I accidentally committed this diff with rL342147 because
I had applied D51964. We probably do need those checks,
but D51964 has tests and more discussion/motivation,
so they should be re-added with that patch.
llvm-svn: 342149
I don't have a test case for this, but it's motivated by
the discussion in D51964, and I've added TODO comments for
the better fix - move simplifications into instsimplify
because that's more efficient and reduces risk of infinite
loops in instcombine caused by transforms trying to do the
opposite folds.
In this case, we know that the transform that tries to move
'not' through min/max can be fooled by the multiple uses
of a value in another min/max, so try to squash the
foldSPFofSPF() patterns first.
llvm-svn: 342147
We previously only allowed truncs as sinks, but now allow them as
sources too. We do this by checking that the result type is the
narrow type that we're trying to optimise for.
Differential Revision: https://reviews.llvm.org/D51978
llvm-svn: 342141
Part of FixConsts wrongly assumes either a 8- or 16-bit constant
which can result in the wrong constants being generated during
promotion.
Differential Revision: https://reviews.llvm.org/D52032
llvm-svn: 342140
In r319995, we fixed the line table format to version 2 on Darwin
because dsymutil didn't yet understand the new format which caused test
failures for the LLDB bots. This has been resolved in the meantime so
there's no reason to keep this limitation.
rdar://problem/35968332
llvm-svn: 342136
If an argument was passed on the stack, this
was using the default alignment.
I'm not sure there's an observable change from this. This
was observable due to bugs in expansion of unaligned
loads and stores, but since that is fixed I don't think
this matters much.
llvm-svn: 342133
The Technical Reference Manuals for these two CPUs state that branching
to an unaligned 32-bit instruction incurs an extra pipeline reload
penalty. That's bad.
This also enables the optimization at -Os since it costs on average one
byte per loop in return for 1 cycle per iteration, which is pretty good
going.
llvm-svn: 342127
Summary:
This change has a number of fixes for FDR mode in compiler-rt along with
changes to the tooling handling the traces in llvm.
In the runtime, we do the following:
- Advance the "last record" pointer appropriately when writing the
custom event data in the log.
- Add XRAY_NEVER_INSTRUMENT in the rewinding routine.
- When collecting the argument of functions appropriately marked, we
should not attempt to rewind them (and reset the counts of functions
that can be re-wound).
In the tooling, we do the following:
- Remove the state logic in BlockIndexer and instead rely on the
presence/absence of records to indicate blocks.
- Move the verifier into a loop associated with each block.
Reviewers: mboerger, eizan
Subscribers: llvm-commits, hiraditya
Differential Revision: https://reviews.llvm.org/D51965
llvm-svn: 342122
This implements suggesting alternative mnemonics when an invalid one is
specified. For example `addru $9, $6, 17767` leads to the following
error message:
error: unknown instruction, did you mean: add, addiu, addu, maddu?
Differential revision: https://reviews.llvm.org/D40646
llvm-svn: 342119
Summary:
Previously we type legalized v2i32 div/rem by promoting to v2i64. But we don't support div/rem of vectors so op legalization would then scalarize it using i64 scalar ops since it doesn't know about the original promotion. 64-bit scalar divides on Intel hardware are known to be slow and in 32-bit mode they require a libcall.
This patch switches type legalization to do the scalarizing itself using i32.
It looks like the division by power of 2 optimization is still kicking in and leaving the code as a vector. The division by other constant optimization doesn't kick in pre type legalization since it ignores illegal types. And previously, after type legalization we scalarized the v2i64 since we don't have v2i64 MULHS/MULHU support.
Another option might be to widen v2i32 to v4i32 so we could do division by constant optimizations, but we'd have to be careful to only do that for constant divisors or we risk scalaring to 4 scalar divides.
Reviewers: RKSimon, spatel
Reviewed By: spatel
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D51325
llvm-svn: 342114
The `IMAGE_REL_ARM_BRANCH20T` applies only to a `b.w` instruction. A
thumb-2 `bl` should be relocated using a `IMAGE_REL_ARM_BRANCH24T`.
Correct the relocation that we emit in such a case.
Resolves PR38620! Based on the patch by Jordan Rhee!
llvm-svn: 342109
Shufflevector instructions in LLVM IR that extract a subset of elements
of a longer input into a shorter vector can be done using VECTOR_SHUFFLEs.
This will avoid expanding them into constly extracts and inserts.
llvm-svn: 342091
This is available on most platforms (Linux/Mac/Win/BSD) with no extra syscalls.
On other platforms (e.g. Solaris) we stat() if this information is requested.
This will allow switching clang's VFS to efficiently expose (path, type) when
traversing a directory. Currently it exposes an entire Status, but does so by
calling fs::status() on all platforms.
Almost all callers only need the path, and all callers only need (path, type).
Patch by sammccall (Sam McCall)
Differential Revision: https://reviews.llvm.org/D51918
llvm-svn: 342089
template methods in JITDylib out-of-line.
This also splits JITDylib::define into a pair of template methods, one taking an
lvalue reference and the other an rvalue reference. This simplifies the
templates at the cost of a small amount of code duplication.
llvm-svn: 342087
construction, a new convenience lookup method, and add-to layer methods.
ExecutionSession now creates a special 'main' JITDylib upon construction. All
subsequently created JITDylibs are added to the main JITDylib's search order by
default (controlled by the AddToMainDylibSearchOrder parameter to
ExecutionSession::createDylib). The main JITDylib's search order will be used in
the future to properly handle cross-JITDylib weak symbols, with the first
definition in this search order selected.
This commit also adds a new ExecutionSession::lookup convenience method that
performs a blocking lookup using the main JITDylib's search order, as this will
be a very common operation for clients.
Finally, new convenience overloads of IRLayer and ObjectLayer's add methods are
introduced that add the given program representations to the main dylib, which
is likely to be the common case.
llvm-svn: 342086
Summary:
rL341389 broke code with tied register operands in inline assembly. For
example, `asm("" : "=r"(var) : "0"(var));`
The code above specifies the input operand to be in the same register
with the output operand, tying the two register. This patch makes this
kind of code work again.
Reviewers: dschuff
Subscribers: sbc100, jgravelle-google, eraman, sunfish, llvm-commits
Differential Revision: https://reviews.llvm.org/D51991
llvm-svn: 342084
r342003 added support for emitting FPO data from the
DEBUG_S_FRAMEDATA subsection of the .debug$S section to the PDB
file. However, that is not the end of the story. FPO can end
up in two different destinations in a PDB, each corresponding to
a different FPO data source.
The case handled by r342003 involves copying data from the
DEBUG_S_FRAMEDATA subsection of the .debug$S section to the
"New FPO" stream in the PDB, which is then referred to by the
DBI stream. The case handled by this patch involves copying
records from the .debug$F section of an object file to the "FPO"
stream (or perhaps more aptly, the "Old FPO" stream) in the PDB
file, which is also referred to by the DBI stream.
The formats are largely similar, and the difference is mostly
only visible in masm generated object files, such as some of the
low-level CRT object files like memcpy. MASM doesn't appear to
support writing the DEBUG_S_FRAMEDATA subsection, and instead
just writes these records to the .debug$F section.
Although clang-cl does not emit a .debug$F section ever, lld still
needs to support it so we have good debugging for CRT functions.
Differential Revision: https://reviews.llvm.org/D51958
llvm-svn: 342080
Scalarization of a shuffle will break up the source vectors into individual
elements, and use them to assemble the resulting vector. An element type of
a legal vector type may not necessarily be a legal scalar type, so make
sure that the extracted values are extended to a legal scalar type.
llvm-svn: 342079
Move isa version determination into TargetParser.
Also switch away from target features to CPU string when
determining isa version. This fixes an issue when we
output wrong isa version in the object code when features
of a particular CPU are altered (i.e. gfx902 w/o xnack
used to result in gfx900).
llvm-svn: 342069
Summary:
It is sometimes important to check that some newly-computed value
is non-negative and only `n` bits wide (where `n` is a variable.)
There are **many** ways to check that:
https://godbolt.org/z/o4RB8D
The last variant seems best?
(I'm sure there are some other variations i haven't thought of..)
Let's handle the second variant first, since it is much simpler.
https://rise4fun.com/Alive/LYjYhttps://bugs.llvm.org/show_bug.cgi?id=38708
Reviewers: spatel, craig.topper, RKSimon
Reviewed By: spatel
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D51985
llvm-svn: 342067
Summary:
Match the ordering semantics of non-vector comparisons. For
floating point comparisons that do not correspond to instructions, the
tests check that some vector comparison instruction was emitted but do
not care about the full implementation.
Reviewers: aheejin, dschuff
Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits
Differential Revision: https://reviews.llvm.org/D51765
llvm-svn: 342064
There's no advantage to this instruction unless you need to avoid touching other flag bits. It's encoding is longer, it can't fold an immediate, it doesn't write all the flags.
I don't think gcc will generate this instruction either.
Fixes PR38852.
Differential Revision: https://reviews.llvm.org/D51754
llvm-svn: 342059
Fix for https://bugs.llvm.org/show_bug.cgi?id=38912.
In GVNHoist::computeInsertionPoints() we iterate over the Value
Numbers and calculate the Iterated Dominance Frontiers without
clearing the IDFBlocks vector first. IDFBlocks ends up accumulating
an insane number of basic blocks, which bloats the compilation time
of SemaChecking.cpp with ubsan enabled.
Differential Revision: https://reviews.llvm.org/D51980
llvm-svn: 342055
This patch adds codegen support for the saving/restoring
V8-V23 for functions specified with the aarch64_vector_pcs
calling convention attribute, as added in patch D51477.
Reviewers: t.p.northover, gberry, thegameg, rengolin, javed.absar, MatzeB
Reviewed By: thegameg
Differential Revision: https://reviews.llvm.org/D51479
llvm-svn: 342049
Eliminating some duplication of rangelist dumping code at the expense of
some version-dependent code in dump and extract routines.
Reviewer: dblaikie, JDevlieghere, vleschuk
Differential revision: https://reviews.llvm.org/D51081
llvm-svn: 342048
The output of splitLargeGEPOffsets does not appear to be deterministic because
of the way that we iterate over a DenseMap. I've changed it to a MapVector for
consistent output.
The test here isn't particularly great, only showing a consmetic difference in
output. The original reproducer is much larger but show a diffierence in
instruction ordering, leading to different codegen.
Differential Revision: https://reviews.llvm.org/D51851
llvm-svn: 342043
Previously the alignment on the newly created switch table data was not set,
meaning that DataLayout::getPreferredAlignment was free to overalign it to 16
bytes. This causes unnecessary code bloat.
Differential Revision: https://reviews.llvm.org/D51800
llvm-svn: 342039
This patch refactors several parts of AArch64FrameLowering
so that it can be easily extended to support saving/restoring
of FPR128 (Q) registers.
Reviewers: t.p.northover, gberry, thegameg, rengolin, javed.absar
Reviewed By: thegameg
Differential Revision: https://reviews.llvm.org/D51478
llvm-svn: 342038
SMLAD and SMLALD instructions also come in the form of SMLADX and
SMLALDX which perform an exchange on their second operand. To support
this, more of the loads in the MAC candidates are compared for
sequential access and a boolean value has been added to BinOpChain.
AddMACCandiate has been refactored into a small pattern matching
state machine to reduce the amount of duplicated code, but also to
enable the matching to be more flexible. CreateParallelMACPairs now
iterates through all the candidates to find parallel ones.
Differential Revision: https://reviews.llvm.org/D51424
llvm-svn: 342033
This patch adds parsing support for the 'aarch64_vector_pcs'
calling convention attribute to calls and function declarations.
More information describing the vector ABI and procedure call standard
can be found here:
https://developer.arm.com/products/software-development-tools/\
hpc/arm-compiler-for-hpc/vector-function-abi
Reviewers: t.p.northover, rnk, rengolin, javed.absar, thegameg, SjoerdMeijer
Reviewed By: SjoerdMeijer
Differential Revision: https://reviews.llvm.org/D51477
llvm-svn: 342030
Move the 2 classes out of LoopVectorize.cpp to make it easier to re-use
them for VPlan outside LoopVectorize.cpp
Reviewers: Ayal, mssimpso, rengolin, dcaballe, mkuper, hsaito, hfinkel, xbolva00
Reviewed By: rengolin, xbolva00
Differential Revision: https://reviews.llvm.org/D49488
llvm-svn: 342027
This fixes a layering violation:
Analysis/IVDescrtors.cpp can't include Transforms/Utils/BasicBlockUtils.h,
since TransformUtils depends on Analysis.
llvm-svn: 342024
I suspect this became unecessary when the CSE of mgather was fixed in r338080. It may still be possible to hit this if we widen the element type of a gather outside of type legalization and the promote the mask of a separate gather node so they become the same. But that seems pretty unlikely.
llvm-svn: 342022
Summary:
The InductionDescriptor and RecurrenceDescriptor classes basically analyze the IR to identify the respective IVs. So, it is better to have them in the "Analysis" directory instead of the "Transforms" directory.
The rationale for this is to make the Induction and Recurrence descriptor classes available for analysis passes. Currently including them in an analysis pass produces link error (http://lists.llvm.org/pipermail/llvm-dev/2018-July/124456.html).
Induction and Recurrence descriptors are moved from Transforms/Utils/LoopUtils.h|cpp to Analysis/IVDescriptors.h|cpp.
Reviewers: dmgreen, llvm-commits, hfinkel
Reviewed By: dmgreen
Subscribers: mgorny
Differential Revision: https://reviews.llvm.org/D51153
llvm-svn: 342016
Summary:
In GNUX23, is64BitMode returns true, but pointers are 32-bits. So we shouldn't copy pointer values into RSI/RDI since the widths don't match.
Fixes PR38865 despite what the title says. I think the llvm_unreachable in the copyPhysReg code tricked the optimizer and made the fatal error trigger.
Reviewers: rnk, efriedma, MatzeB, echristo
Reviewed By: efriedma
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D51893
llvm-svn: 342015
Since the outliner is a module pass, it doesn't get codegen size remarks like
the other codegen passes do. This adds size remarks *to* the outliner.
This is kind of a workaround, so it's peppered with FIXMEs; size remarks
really ought to not ever be handled by the pass itself. However, since the
outliner is the only "MachineModulePass", this works for now. Since the
entire purpose of the MachineOutliner is to produce code size savings, it
really ought to be included in codgen size remarks.
If we ever go ahead and make a MachineModulePass (say, something similar to
MachineFunctionPass), then all of this ought to be moved there.
llvm-svn: 342009
Name: op_ugt_sum
%a = add i8 %x, %y
%r = icmp ugt i8 %x, %a
=>
%notx = xor i8 %x, -1
%r = icmp ugt i8 %y, %notx
Name: sum_ult_op
%a = add i8 %x, %y
%r = icmp ult i8 %a, %x
=>
%notx = xor i8 %x, -1
%r = icmp ugt i8 %y, %notx
https://rise4fun.com/Alive/ZRxI
AFAICT, this doesn't interfere with any add-saturation patterns
because those have >1 use for the 'add'. But this should be
better for IR analysis and codegen in the basic cases.
This is another fold inspired by PR14613:
https://bugs.llvm.org/show_bug.cgi?id=14613
llvm-svn: 342004
Summary:
Previously, check-llvm on my Windows 10 workstation took about 300s to
run, and it would lock up my mouse. Now, it takes just under 60s.
Previously running the tests only used about 15% of the available CPU
time, now it uses up to 60%.
Shell32.dll and ole32.dll have direct dependencies on user32.dll and
gdi32.dll. These DLLs are mostly used to for Windows GUI functionality,
which most LLVM tools don't need. It seems that these DLLs acquire and
release some system resources on startup and exit, and that requires
acquiring the same highly contended lock discussed in this post:
https://randomascii.wordpress.com/2017/07/09/24-core-cpu-and-i-cant-move-my-mouse/
Most LLVM tools still have a transitive dependency on
SHGetKnownFolderPathW, which is used to look up the user home directory
and local AppData directory. We also use SHFileOperationW to recursively
delete a directory, but only LLDB and clang-doc use this today. At some
point, we might consider removing these last shell32.dll deps, but for
now, I would like to take this free performance.
Many thanks to Bruce Dawson for suggesting this fix. He looked at an ETW
trace of an LLVM test run, noticed these DLLs, and suggested avoiding
them.
Reviewers: zturner, pcc, thakis
Subscribers: mgorny, llvm-commits, hiraditya
Differential Revision: https://reviews.llvm.org/D51952
llvm-svn: 342002
This reverts rL341954.
The builder `sanitizer-x86_64-linux-bootstrap-ubsan` has been
failing with timeouts at stage2 clang/ubsan:
[3065/3073] Linking CXX executable bin/lld
command timed out: 1200 seconds without output running python
../sanitizer_buildbot/sanitizers/buildbot_selector.py,
attempting to kill
llvm-svn: 342001
Summary:
There are two registers encoded in the S_FRAMEPROC flags: one for locals
and one for parameters. The encoding is described by the
ExpandEncodedBasePointerReg function in cvinfo.h. Two bits are used to
indicate one of four possible values:
0: no register - Used when there are no variables.
1: SP / standard - Variables are stored relative to the standard SP
for the ISA.
2: FP - Variables are addressed relative to the ISA frame
pointer, i.e. EBP on x86. If realignment is required, parameters
use this. If a dynamic alloca is used, locals will be EBP relative.
3: Alternative - Variables are stored relative to some alternative
third callee-saved register. This is required to address highly
aligned locals when there are dynamic stack adjustments. In this
case, both the incoming SP saved in the standard FP and the current
SP are at some dynamic offset from the locals. LLVM uses ESI in
this case, MSVC uses EBX.
Most of the changes in this patch are to pass around the CPU so that we
can decode these into real, named architectural registers.
Subscribers: hiraditya
Differential Revision: https://reviews.llvm.org/D51894
llvm-svn: 341999
In this diff we adjust the code of getSymbols to avoid creating LLVMContext when it's not necessary.
Without this patch when the function getSymbols was called on a MachO object with a __bitcode section
it was parsing the embedded bitcode and then ignoring the result.
Test plan: make check-all
Differential revision: https://reviews.llvm.org/D51759
llvm-svn: 341998
These are the folds in Alive;
Name: xor_ult
Pre: isPowerOf2(-C1)
%xor = xor i8 %x, C1
%r = icmp ult i8 %xor, C1
=>
%r = icmp ugt i8 %x, ~C1
Name: xor_ugt
Pre: isPowerOf2(C1+1)
%xor = xor i8 %x, C1
%r = icmp ugt i8 %xor, C1
=>
%r = icmp ugt i8 %x, C1
https://rise4fun.com/Alive/Vty
The ugt case in its simplest form was already handled by DemandedBits,
but that's not ideal as shown in the multi-use test.
I'm not sure if these are all of the symmetrical folds, but I adjusted
the existing code for one of the folds to try to show the similarities.
There's no obvious connection, but this is another preliminary step
for PR14613...
https://bugs.llvm.org/show_bug.cgi?id=14613
llvm-svn: 341997
Summary: Initial support for nsw, nuw and exact flags in MI
Reviewers: spatel, hfinkel, wristow
Reviewed By: spatel
Subscribers: nlopes
Differential Revision: https://reviews.llvm.org/D51738
llvm-svn: 341996
Fixes at_file.c test failure caused by r341988. We may want to change
how we treat \n in our tokenizer, but this is probably a good fix
regardless, since we can invoke all kinds of programs with different
interpretations of the command line quoting rules.
llvm-svn: 341992
Summary:
Shell32.dll depends on gdi32.dll and user32.dll, which are mostly DLLs
for Windows GUI functionality. LLVM's utilities don't typically need GUI
functionality, and loading these DLLs seems to be slowing down startup.
Also, we already have an implementation of Windows command line
tokenization in cl::TokenizeWindowsCommandLine, so we can just use it.
The goal is to get the original argv in UTF-8, so that it can pass
through most LLVM string APIs. A Windows process starts life with a
UTF-16 string for its command line, and it can be retreived with
GetCommandLineW from kernel32.dll.
Previously, we would:
1. Get the wide command line
2. Call CommandLineToArgvW to handle quoting rules and separate it into
arguments.
3. For each wide argument, expand wildcards (* and ?) using
FindFirstFileW.
4. Convert each argument to UTF-8
Now we:
1. Get the wide command line, convert the whole thing to UTF-8
2. Tokenize the UTF-8 command line with cl::TokenizeWindowsCommandLine
3. For each argument, expand wildcards if present
- This requires converting back to UTF-16 to call FindFirstFileW
- Results of FindFirstFileW must be converted back to UTF-8
Reviewers: zturner
Subscribers: hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D51941
llvm-svn: 341988
Summary:
Place global arrays in comdat sections with their associated functions.
This makes sure they are stripped along with the functions they
reference, even on the BFD linker.
Reviewers: eugenis
Reviewed By: eugenis
Subscribers: eraman, hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D51902
llvm-svn: 341987
Summary:
Update MemorySSA in old LoopUnswitch pass.
Actual dependency and update is disabled by default.
Subscribers: sanjoy, jlebar, Prazek, llvm-commits
Differential Revision: https://reviews.llvm.org/D45301
llvm-svn: 341984
into TargetParser.
Also switch away from target features to CPU string when
determining isa version. This fixes an issue when we
output wrong isa version in the object code when features
of a particular CPU are altered (i.e. gfx902 w/o xnack
used to result in gfx900).
Differential Revision: https://reviews.llvm.org/D51890
llvm-svn: 341982
I noticed that we were not back-propagating undef lanes to shuffle masks when we have a
shuffle that reduces the vector width. This is part of investigating/solving PR38691:
https://bugs.llvm.org/show_bug.cgi?id=38691
The DAG equivalent was proposed with:
D51696
Differential Revision: https://reviews.llvm.org/D51433
llvm-svn: 341981
Right now, the counters are added in regards of the number of successors
for a given BasicBlock: it's good when we've only 1 or 2 successors (at
least with BranchInstr). But in the case of a switch statement, the
BasicBlock after switch has several predecessors and we need know from
which BB we're coming from.
So the idea is to revert what we're doing: add a PHINode in each block
which will select the counter according to the incoming BB. They're
several pros for doing that:
- we fix the "switch" bug
- we remove the function call to "__llvm_gcov_indirect_counter_increment"
and the lookup table stuff
- we replace by PHINodes, so the optimizer will probably makes a better
job.
Patch by calixte!
Differential Revision: https://reviews.llvm.org/D51619
llvm-svn: 341977
Summary: Add function name when verification fails as an initial breadcrumb for debugging.
Patch by David Callahan.
Reviewers: mehdi_amini, modocache
Reviewed By: modocache
Subscribers: llvm-commits, modocache
Differential Revision: https://reviews.llvm.org/D51386
llvm-svn: 341974
In r337348, I changed lowering to prefer X86ISD::UNPCKL/UNPCKH opcodes over MOVLHPS/MOVHLPS for v2f64 {0,0} and {1,1} shuffles when we have SSE2. This enabled the removal of a bunch of weirdly bitcasted isel patterns in r337349. To avoid changing the tests I placed a gross hack in isel to still emit movhlps instructions for fake unary unpckh nodes. A similar hack was not needed for unpckl and movlhps because we do execution domain switching for those. But unpckh and movhlps have swapped operand order.
This patch removes the hack.
This is a code size increase since unpckhpd requires a 0x66 prefix and movhlps does not. But if that's a big concern we should be using movhlps for all unpckhpd opcodes and let commuteInstruction turnit into unpckhpd when its an advantage.
Differential Revision: https://reviews.llvm.org/D49499
llvm-svn: 341973
GNUX32 uses 32-bit pointers despite is64BitMode being true. So we should use EAX to return the value.
Fixes ones of the failures from PR38865.
Differential Revision: https://reviews.llvm.org/D51940
llvm-svn: 341972
For vectors, getPrimitiveSizeInBits returns the full vector width. This code should using the element size for vectors. This could be fixed by calling getScalarSizeInBits, but its even easier to just get it from the APInt we're checking.
Differential Revision: https://reviews.llvm.org/D51938
llvm-svn: 341971
There are 2 cases when we create PHI nodes:
* For the result of the call that was duplicated in the split blocks.
Those PHI nodes should have the debug location of the call.
* For values produced before the call. Those instructions need to be
duplicated in the split blocks and the PHI nodes should have the
debug locations of those instructions.
Fixes PR37962.
Reviewers: junbuml, gbedwell, vsk
Reviewed By: junbuml
Tags: #debug-info
Differential Revision: https://reviews.llvm.org/D51919
llvm-svn: 341970
An fp_to_sint node would be incorrectly lowered to a TruncIntFP node in
single-float mode. This would trigger an "Unexpected illegal type!"
assert.
Patch by Dan Ravensloft.
Differential revision: https://reviews.llvm.org/D51810
llvm-svn: 341952
Summary:
Place global arrays in comdat sections with their associated functions.
This makes sure they are stripped along with the functions they
reference, even on the BFD linker.
Reviewers: eugenis
Reviewed By: eugenis
Subscribers: eraman, hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D51902
llvm-svn: 341951
Fix for https://bugs.llvm.org/show_bug.cgi?id=38807, which occurred
while compiling SemaTemplateInstantiate.cpp with clang and GVNHoist
enabled. In the following example:
1=def(entry)
/ \
2=def(1) 4=def(1)
3=def(2) 5=def(4)
When removing the MemoryDef 2=def(1) from its basic block, and just
before adding it to the end of the parent basic block, we first
replace all its uses with the defining memory access:
3=def(2) -> 3=def(1)
Then we call insertDef for adding 2=def(1) to the parent basic block,
where we replace the uses of 1=def(entry) with 2=def(1). Doing so we
create a self reference:
2=def(1) -> 2=def(2) (bad)
3=def(1) -> 3=def(2) (ok)
4=def(1) -> 4=def(2) (ok)
Differential Revision: https://reviews.llvm.org/D51801
llvm-svn: 341947
Makes the produced pdbs more deterministic; before they'd contain 2 arbitary
bytes where this padding was.
Also reorder initialization to match the order of the fields in the struct (nfc)
llvm-svn: 341945
Search from i64 reducing phis, as well as i32, to allow the
generation of smlald instructions.
Differential Revision: https://reviews.llvm.org/D51101
llvm-svn: 341941
This patch adds support for ORC JIT for mips/mips64 architecture.
In common code $static is changed to __ORCstatic because on MIPS
architecture "$" is a reserved character.
Patch by Luka Ercegovcevic
Differential Revision: https://reviews.llvm.org/D49665
llvm-svn: 341934
We've had the pass enabled downstream for a couple of weeks and it
seems to be okay, so enable it by default.
Differential Revision: https://reviews.llvm.org/D51920
llvm-svn: 341932
The presence of readnone and an access range attribute (argmemonly,
inaccessiblememonly, inaccessiblemem_or_argmemonly) is considered an
error by the verifier. This seems strict but also not wrong. This
patch makes sure function attribute detection will remove all access
range attributes for readnone functions.
llvm-svn: 341927
The previous implementation traversed all loop blocks and bailed if one
was not a latch block. Since we are only interested in latch blocks, we
should only traverse those.
llvm-svn: 341926
MIPS ISAs start to support third operand for the `rdhwr` instruction
starting from Revision 6. But LLVM generates assembler code with
three-operands version of this instruction on any MIPS64 ISA. The third
operand is always zero, so in case of direct code generation we get
correct code.
This patch fixes the bug by adding an instruction alias. The same alias
already exists for 32-bit ISA.
Ideally, we also need to reject three-operands version of the `rdhwr`
instruction in an assembler code if ISA revision is less than 6. That is
a task for a separate patch.
This fixes PR38861 (https://bugs.llvm.org/show_bug.cgi?id=38861)
Differential revision: https://reviews.llvm.org/D51773
llvm-svn: 341919
MOVMSKPS and MOVMSKPD both take FP types, but likely the operations before it are on integer types with just a int->fp bitcast between them. If the bitcast isn't used by anything else and doesn't change the element width we can look through it to simplify the integer ops.
llvm-svn: 341915
Summary:
In this change, we overhaul the implementation for loading
`llvm::xray::Trace` objects from files by using the combination of
specific FDR Record types and visitors breaking up the logic to
reconstitute an execution trace from flight-data recorder mode traces.
This change allows us to handle out-of-temporal order blocks as written
in files, and more consistently recreate an execution trace spanning
multiple blocks and threads. To do this, we use the `WallclockRecord`
associated with each block to maintain temporal order of blocks, before
attempting to recreate an execution trace.
The new addition in this change is the `TraceExpander` type which can be
thought of as a decompression/decoding routine. This allows us to
maintain the state of an execution environment (thread+process) and
create `XRayRecord` instances that fit nicely into the `Trace`
container. We don't have a specific unit test for the TraceExpander
type, since the end-to-end tests for the `llvm-xray convert` tools
already cover precisely this codepath.
This change completes the refactoring started with D50441.
Depends on D51911.
Reviewers: mboerger, eizan
Subscribers: mgorny, hiraditya, mgrang, llvm-commits
Differential Revision: https://reviews.llvm.org/D51912
llvm-svn: 341906
Summary:
This more correctly reflects the data written by the FDR mode runtime.
This is a continuation of the work in D50441.
Reviewers: mboerger, eizan
Subscribers: hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D51911
llvm-svn: 341905
The was previously committed as r341749 then reverted as r341750 because
bit_cast needed to do its own thing to check is_trivially_copyable on GCC 4.x.
This is now done and std;:array should now get accepted.
llvm-svn: 341897
Currently we re-use cached info from sub loops or traverse them
to populate AliasSetTracker. But after that we traverse all basic blocks
from the current loop. This is redundant work.
All what we need is traversing the all basic blocks from the loop except
those which are used to get the data from the cache.
This should improve compile time only.
Reviewers: mkazantsev, reames, kariddi, anna
Reviewed By: reames
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D51715
llvm-svn: 341896
IndVarSimplify's design is somewhat odd in the way how it reports that
some transform has made a change. It has a `Changed` field which can
be set from within any function, which makes it hard to track whether or
not it was set properly after a transform was made. It leads to oversights
in setting this flag where needed, see example in PR38855.
This patch removes the `Changed` field, turns it into a local and unifies
the signatures of all relevant transform functions to return boolean value
which designates whether or not this transform has made a change.
Differential Revision: https://reviews.llvm.org/D51850
Reviewed By: skatkov
llvm-svn: 341893
I'd made exactly this same change before, but it appears to have been accidentally reverted in another change. (I'm assuming accidental since it was without comment or test case, and in an unrelated change.)
llvm-svn: 341892
With the merge of TUs and CUs into a single container, some code that
relied on the CU range having an ordered range of contiguous addresses
(for locating a CU at a given offset) broke. But the units from
debug_info (currently only CUs, but CUs and TUs in DWARFv5) are in a
contiguous sub-range of that container - searching only through that
subrange is still valid & so do that.
llvm-svn: 341889
This patch does the following things:
1. update SymbolicallyEvaluateGEP so that it bails out if it cannot preserve inrange arribute;
2. update llvm/test/Analysis/ConstantFolding/gep.ll to remove UB in it;
3. remove inaccurate comment above ConstantFoldInstOperandsImpl in llvm/lib/Analysis/ConstantFolding.cpp;
4. add a new regression test that makes sure that no optimizations change an inrange GEP in an unexpected way.
Patch by Zhaomo Yang!
Differential Revision: https://reviews.llvm.org/D51698
llvm-svn: 341888
Summary:
In this change, we implement a `BlockPrinter` which orders records in a
Block that's been indexed by the `BlockIndexer`. This is used in the
`llvm-xray fdr-dump` tool which ties together the various types and
utilities we've been working on, to allow for inspection of XRay FDR
mode traces both with and without verification.
This change is the final step of the refactoring of D50441.
Reviewers: mboerger, eizan
Subscribers: mgorny, hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D51846
llvm-svn: 341887
Summary:
Revert min/max changes in rL341674 dues to high compile times causing timeouts (PR38897).
Checking in to unblock failing builds. Patch available for post-commit review and re-revert once resolved.
Working on a smaller reproducer for PR38897.
Reviewers: craig.topper, spatel
Subscribers: sanjoy, jlebar, llvm-commits
Differential Revision: https://reviews.llvm.org/D51897
llvm-svn: 341883
This adds per-function size remarks to codegen, similar to what we have in the
IR layer as of r341588. This only impacts MachineFunctionPasses.
This does the same thing, but for `MachineInstr`s instead of just
`Instructions`. After this, when a `MachineFunctionPass` modifies the number of
`MachineInstr`s in the function it ran on, you'll get a remark.
To enable this, use the size-info analysis remark as before.
llvm-svn: 341876
clang-format was getting confused due to the presence of a macro
invocation that was not terminated by a semicolon. Fixed this by
terminating the macro lines with semicolons and re-ran clang-format
on the file.
llvm-svn: 341864
I'm having a hard time finding a test case for this, but we should be consistent here. The fact that we canonicalize all zeros and all ones constants to vXi32 and all other constants to loads makes this hard to hit the easy DAG combine infinite loop we get for some of the other types.
llvm-svn: 341859
Summary:
End goal is to update MemorySSA in all loop passes. LoopUnswitch clones all blocks in a loop. SimpleLoopUnswitch clones some blocks. LoopRotate clones some instructions.
Some of these loop passes also make CFG changes.
This is an API based on what I found needed in LoopUnswitch, SimpleLoopUnswitch, LoopRotate, LoopInstSimplify, LoopSimplifyCFG.
Adding dependent patches using this API for context.
Reviewers: george.burgess.iv, dberlin
Subscribers: sanjoy, jlebar, Prazek, llvm-commits
Differential Revision: https://reviews.llvm.org/D45299
llvm-svn: 341855
The only point to this change is the test diffs. When I remove this code entirely (in favor of the recently added generic handling), I don't want there to be any confusion due to spurious test diffs.
As an aside, the fact out tests are AST construction order dependent is not great. I thought about fixing that, but the reasonable schemes I might want (e.g. sort by name) need the test diffs anyways.
Philip
llvm-svn: 341841
Before tagging a function with coldcc make sure the target supports cold calling
convention. Without this patch HotColdSplitting pass fails on aarch64 with:
fatal error: error in backend: Unsupported calling convention.
llvm-svn: 341838
There were two combines not covered by the check before now, neither of which
actually differed from normal in the benefit analysis.
The most recent seems to be because it was just added at the top of the
function (naturally). The older is from way back in 2008 (r46687) when we just
didn't put those checks in so routinely, and has been diligently maintained
since.
llvm-svn: 341831
- Log the reason for a PDB or precompiled-OBJ load failure
- Properly handle out-of-date PDB or precompiled-OBJ signature by displaying a corresponding error
- Slightly change behavior on PDB failure: any subsequent load attempt from another OBJ would result in the same error message being logged
- Slightly change behavior on PDB failure: retry with filename only if previous error was ENOENT ("no such file or directory")
- Tests: a. for native PDB errors; b. cover all the cases above
Differential Revision: https://reviews.llvm.org/D51559
llvm-svn: 341825
Disassemblers cannot depend on main target headers. The same is true for
MCTargetDesc, but there's a lot more cleanup needed for that.
llvm-svn: 341822
When GVN propagates an equality by replacing one value with another it also
needs to invalidate the cached information for the value being replaced.
Differential Revision: https://reviews.llvm.org/D51218
llvm-svn: 341820
This already worked if only one register piece was used,
but didn't if a type was split into multiple, unequal
sized pieces.
Fixes not splitting 3i16/v3f16 into two registers for
AMDGPU.
This will also allow fixing the ABI for 16-bit vectors
in a future commit so that it's the same for all subtargets.
llvm-svn: 341801
Summary:
This fixes a bug where a large number of implicit def instructions can fill the GCNHazardRecognizer lookahead buffer causing required NOPs to not be inserted.
Reviewers: nhaehnle, arsenm
Reviewed By: arsenm
Subscribers: sheredom, kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits
Differential Revision: https://reviews.llvm.org/D51726
Change-Id: Ie75338f94de704ee5816b05afd0c922c6748a95b
llvm-svn: 341798
Currently, `rewriteFirstIterationLoopExitValues` does not set Changed flag even if it makes
changes in the IR. There is no clear evidence that it can cause a crash, but it
looks highly suspicious and likely invalid.
Differential Revision: https://reviews.llvm.org/D51779
Reviewed By: skatkov
llvm-svn: 341779
Currently, `sinkUnusedInvariants` does not set Changed flag even if it makes
changes in the IR. There is no clear evidence that it can cause a crash, but it
looks highly suspicious and likely invalid.
Differential Revision: https://reviews.llvm.org/D51777
Reviewed By: skatkov
llvm-svn: 341777
Summary:
Move InductionDescriptor::transform() routine from LoopUtils to its only uses in LoopVectorize.cpp.
Specifically, the function is renamed as InnerLoopVectorizer::emitTransformedIndex().
This is a child to D51153.
Reviewers: dmgreen, llvm-commits
Reviewed By: dmgreen
Differential Revision: https://reviews.llvm.org/D51837
llvm-svn: 341776
Summary:
This patch implements a `BlockVerifier` type which enforces the
invariants of the log structure of FDR mode logs on a per-block basis.
This ensures that the data we encounter from an FDR mode log
semantically correct (i.e. that records follow the documented "grammar"
for FDR mode log records).
This is another part of the refactoring of D50441.
This is a slightly modified version of rL341628, avoiding the
`std::tuple<...>` constructor that is not constexpr in C++11.
Reviewers: mboerger, eizan
Subscribers: mgorny, hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D51723
llvm-svn: 341769
We have isel patterns for v4i32/v4f64 that artificially widen to v8i32/v8f64 so just use that.
If x86-experimental-vector-widening-legalization is enabled, we don't need any custom legalization and can just return. I've modified the test RUN lines to cover this case.
llvm-svn: 341765
This is the DAG equivalent of D51433.
If we know we're not using all vector lanes, use that knowledge to potentially simplify a vselect condition.
The reduction/horizontal tests show that we are eliminating AVX1 operations on the upper half of 256-bit
vectors because we don't need those anyway.
I'm not sure what the pr34592 test is showing. That's run with -O0; is SimplifyDemandedVectorElts supposed
to be running there?
Differential Revision: https://reviews.llvm.org/D51696
llvm-svn: 341762
Summary:
This patch allows vectors with a power of 2 number of elements and i8/i16 element type to select paddus/psubus instructions. ReplaceNodeResults has been updated to custom widen these operations up to 128 bits like we already do for PAVG.
Another step towards fixing PR38691
Reviewers: RKSimon, spatel
Reviewed By: RKSimon, spatel
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D51818
llvm-svn: 341753
Summary: I saw a few places that were punning through a union of FP and integer, and that made me sad. Luckily, C++20 adds bit_cast for exactly that purpose. Implement our own version in ADT (without constexpr, leaving us a bit sad), and use it in the few places my grep-fu found silly union punning.
This was originally committed as r341728 and reverted in r341730.
Reviewers: javed.absar, steven_wu, srhines
Subscribers: dexonsmith, llvm-commits
Differential Revision: https://reviews.llvm.org/D51693
llvm-svn: 341741
This patch removes addBlockByrefAddress(), it is dead code as far as
clang is concerned: Every byref block capture is emitted with a
complex expression that is equivalent to what this function does.
rdar://problem/31629055
Differential Revision: https://reviews.llvm.org/D51763
llvm-svn: 341737
They were unintentionally calling DIA directly, which requires
Windows. We need to pass the -native flag, and this then required
fixing up one or two tests.
llvm-svn: 341731
In order to start testing this, I've added a new mode to
llvm-pdbutil which is only really useful for writing tests.
It just dumps the value of raw fields in record format.
This isn't really ideal and it won't allow us to test some
important cases, but it's better than nothing for now.
llvm-svn: 341729
Summary: I saw a few places that were punning through a union of FP and integer, and that made me sad. Luckily, C++20 adds bit_cast for exactly that purpose. Implement our own version in ADT (without constexpr, leaving us a bit sad), and use it in the few places my grep-fu found silly union punning.
Reviewers: javed.absar
Subscribers: dexonsmith, llvm-commits
Differential Revision: https://reviews.llvm.org/D51693
llvm-svn: 341728
Summary:
MSVC and LLD sort sections ASCII-betically, so we need to use section
names that sort between .CRT$XCA (the start) and .CRT$XCU (the default
priority).
In the general case, use .CRT$XCT12345 as the section name, and let the
linker sort the zero-padded digits.
Users with low priorities typically want to initialize as early as
possible, so use .CRT$XCA00199 for prioties less than 200. This number
is arbitrary.
Implements PR38552.
Reviewers: majnemer, mstorsjo
Subscribers: hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D51820
llvm-svn: 341727
Summary:
Since the shuffle mask is not exposed as an operand in the native ISel
DAG, create a new WebAssembly ISD node exposing the mask. The mask is
lowered as sixteen immediate byte indices no matter what type the
original vector shuffle was operating on.
This CL depends on D51656
Reviewers: aheejin, dschuff
Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits
Differential Revision: https://reviews.llvm.org/D51659
llvm-svn: 341718
AliasSetTracker has special case handling for memset, memcpy and memmove which pre-existed argmemonly on functions and readonly and writeonly on arguments. This patch generalizes it using the AA infrastructure to any call correctly annotated.
The motivation here is to cut down on confusion, not performance per se. For most instructions, there is a direct mapping to alias set. However, this is not guaranteed by the interface and was not in fact true for these three intrinsics *and only these three intrinsics*. I kept getting myself confused about this invariant, so I figured it would be good to clearly distinguish between a instructions and alias sets. Calls happened to be an easy target.
The nice side effect is that custom implementations of memset/memcpy/memmove - including wrappers discovered by IPO - can now be optimized the same as builts by LICM.
Note: The actual removal of the memset/memtransfer specific handling will happen in a follow on NFC patch. It was originally part of this one, but separate for ease of review and rebase.
Differential Revision: https://reviews.llvm.org/D50730
llvm-svn: 341713
The main use case for this directive is to allow assembly writers to
write their own FPO data strings without going through the .cv_fpo*
directive family.
I'm experimenting with different RPN programs to fix PR38857, and I
figured I should go ahead and make this directive permanent.
llvm-svn: 341712
Summary:
Block splitting is done with either identical edges being merged, or not.
Only critical edges can be split without merging identical edges based on an option.
Teach the memoryssa updater to take this into account: for the same edge between two blocks only move one entry from the Phi in Old to the new Phi in New.
Reviewers: george.burgess.iv
Subscribers: sanjoy, jlebar, Prazek, llvm-commits
Differential Revision: https://reviews.llvm.org/D51563
llvm-svn: 341709
shuf (sel (shuf NarrowCond, undef, WideMask), X, Y), undef, NarrowMask) -->
sel NarrowCond, (shuf X, undef, NarrowMask), (shuf Y, undef, NarrowMask)
The motivating case from:
https://bugs.llvm.org/show_bug.cgi?id=38691
...is the last regression test. In that case, we're just left with the narrow select.
Note that if we do create new shuffles, they use the existing extraction identity mask,
so there's no danger that this transform creates arbitrary shuffles.
Differential Revision: https://reviews.llvm.org/D51496
llvm-svn: 341708
Summary: To explicitly opt out of LEB encoding for these immediates.
Reviewers: aheejin, dschuff
Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits
Differential Revision: https://reviews.llvm.org/D51766
llvm-svn: 341707
Summary:
Reserving registers x1-7 is used to support CONFIG_ARM64_LSE_ATOMICS in Linux kernel. This change adds support for reserving registers x1 through x7.
Reviewers: javed.absar, phosek, srhines, nickdesaulniers, efriedma
Reviewed By: nickdesaulniers, efriedma
Subscribers: niravd, jfb, manojgupta, nickdesaulniers, jyknight, efriedma, kristof.beyls, llvm-commits
Differential Revision: https://reviews.llvm.org/D48580
llvm-svn: 341706
The generic type legalizer will scalarize vXi1 instructions getting rid of the vector entirely. Creating wider vector instructions is just going to prevent that.
llvm-svn: 341705
The type legalizer will try to scalarize this and fail.
It looks like there's some other v1iX oddities out there too since we still generated some vector instructions.
llvm-svn: 341704
Similar to what was recently done for addcarry/subborrow and has been done for rdrand/rdseed for a while. It's better to use two results and an explicit store in IR when the store isn't part of the semantics of the instruction. This allows store->load forwarding to happen in the middle end. Or the store to be removed if its never loaded.
Differential Revision: https://reviews.llvm.org/D51803
llvm-svn: 341698