KNL is based on a modified Silvermont core so I don't think these features apply. I think the LEA flag is probably also wrong, but I'm less sure as I barely understand the 3 LEA flags we have currently.
Differential Revision: https://reviews.llvm.org/D53671
llvm-svn: 345285
Summary:
Currently, Legalizer is trying to lower G_LOAD with a vector type
that has more than two elements due to the incorrect LegalityPredicate.
This patch fixes the issue by removing the multiplication by 8
as `MemDesc.Size` already contains the size in bits.
Reviewers: dsanders, aemerson
Reviewed By: dsanders
Subscribers: rovka, javed.absar, kristof.beyls, llvm-commits
Differential Revision: https://reviews.llvm.org/D53679
llvm-svn: 345282
If we have a 64-bit EXT where one of the operands is a subvector of a 128-bit
vector then in some cases we can eliminate an extract_subvector by converting
to a 128-bit EXT of the 128-bit vector.
Differential Revision: https://reviews.llvm.org/D53582
llvm-svn: 345275
This mirrors what we already do for AArch64 as the cores are similar.
As discussed in the review, enabling the machine scheduler causes
more variations in performance changes so it is not enabled for now.
This patch improves LNT scores by a geomean of 1.57% at -O3.
Differential Revision: https://reviews.llvm.org/D53562
llvm-svn: 345272
Currently a vector move of 0 or -1 will use different instructions depending on
the size of the vector. Using a single instruction (the 128-bit one) for both
gives more opportunity for Machine CSE to eliminate instructions.
Differential Revision: https://reviews.llvm.org/D53579
llvm-svn: 345270
Summary:
If the instruction in the eliminateFrameIndex function is a DBG_VALUE
instruction, it requires special processing. The frame register is set
to VRFrame and the offset is based on the object offset.
The code is similar to the code used in
lib/CodeGen/PrologEpilogInserter.cpp.
Reviewers: tra
Subscribers: jholewinski, llvm-commits
Differential Revision: https://reviews.llvm.org/D53657
llvm-svn: 345269
The current state of the llc invocation is:
* Running all the passes from dwarfehprepare to stack coloring
(included)
* It runs it from the LLVM IR included in the file
* It *ADDS* the generated MI from ISel to the MI in the MIR file
* The machine verifier doesn't like it.
Differential Revision: https://reviews.llvm.org/D53698
llvm-svn: 345266
As suggested on D52965, this patch moves the i64 to f64 UINT_TO_FP expansion code from LegalizeDAG into TargetLowering and makes it available to LegalizeVectorOps as well.
Not only does this help perform X86 lowering as a true vectorization instead of (partially vectorized) scalar conversions, it avoids the HADDPD op from the scalar code which can be slow on most targets.
The AVX512F does have the vcvtusi2sdq scalar operation but we don't unroll to use it as it seems to only help for the v2f64 case - otherwise the unrolling cost will certainly be too high. My feeling is that we should leave it to the vectorizers - and if it generates the vector UINT_TO_FP we should use it.
Differential Revision: https://reviews.llvm.org/D53649
llvm-svn: 345256
As was already mentioned in comments for D53364, DWARF 5
spec says about DW_LLE_startx_length:
"This is a form of bounded location description that has two unsigned ULEB operands.
The first value is an address index (into the .debug_addr section) that indicates the beginning of the address range
over which the location is valid. The second value is the length of the range. ")
Currently, the length is always parsed as U32.
Patch change the behavior to parse DW_LLE_startx_length as ULEB128 for DWARF 5
and keeps it as U32 for DWARF4+(pre-DWARF5) for compatibility.
Differential revision: https://reviews.llvm.org/D53564
llvm-svn: 345254
When SimplifyCFG changes the PHI node into a select instruction, the debug line records becomes ambiguous. It causes the debugger to display unreachable source lines.
Differential Revision: https://reviews.llvm.org/D53287
llvm-svn: 345250
GNU readelf tool prints hex value of the ELF header flags field and the
flags names. This change adds the same functionality to llvm-readobj.
Now llvm-readobj can print MIPS and RISCV flags.
New GNUStyle::printFlags() method is a copy of ScopedPrinter::printFlags()
routine. Probably we can escape code duplication and / or simplify the
printFlags() method. But it's a task for separate commit.
Differential revision: https://reviews.llvm.org/D52027
llvm-svn: 345238
Summary: Fixes part of the problem reported in bug 39275.
Reviewers: aheejin, dschuff
Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits, alexcrichton
Differential Revision: https://reviews.llvm.org/D53542
llvm-svn: 345230
This makes the offsets larger (since they are further from the base
address) but those are in the .dwo - and allows removing addresses and
relocations from the .o file.
This could be built into the AddressPool more fundamentally, perhaps -
when you ask for an AddressPool entry you could say "or give me some
other entry and an offset I need to use" - though what to do about
situations where the first use of an address in a section is not the
earliest address in that section... is tricky.
At least with range addresses we can be fairly sure we've seen the
earliest address first because we see the start address for the
function.
llvm-svn: 345224
Summary:
Currently when assigning depths 'rethrow' does not take the whole
control flow stack into accounts but only considers EH pad stacks. When
assigning depth immmediates to rethrows, in normal cases it is done
correctly but when a rethrow instruction throws up to a caller, i.e., we
convert a pseudo RETHROW_TO_CALLER instruction to a rethrow, it
mistakenly compute the whole stack depth.
Reviewers: dschuff
Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits
Differential Revision: https://reviews.llvm.org/D53619
llvm-svn: 345223
Summary:
Changing the node type in lowering was violating assumptions made in
the DAG combiner, so don't change the node type any more. This fixes
one of the issues reported in bug 39275.
Reviewers: aheejin, dschuff
Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits, alexcrichton
Differential Revision: https://reviews.llvm.org/D53537
llvm-svn: 345221
Instead of using the MOVGOT64r pseudo, use the existing
MO_PIC_BASE_OFFSET support on symbol operands. Now I don't have to
create a "scratch register operand" for the pseudo to use, and the
register allocator can make better decisions.
Fixes some X86 verifier errors tracked in PR27481.
llvm-svn: 345219
Summary:
Changes all uses of minnan/maxnan to minimum/maximum
globally. These names emphasize that the semantic difference between
these operations is more than just NaN-propagation.
Reviewers: arsenm, aheejin, dschuff, javed.absar
Subscribers: jholewinski, sdardis, wdng, sbc100, jgravelle-google, jrtc27, atanasyan, llvm-commits
Differential Revision: https://reviews.llvm.org/D53112
llvm-svn: 345218
The current splitting algorithm works in three stages:
1) Identify cold blocks, then
2) Use forward/backward propagation to mark hot blocks, then
3) Grow a SESE region of blocks *outside* of the set of hot blocks and
start outlining.
While testing this pass on Apple internal frameworks I noticed that some
kinds of control flow (e.g. loops) are never outlined, even though they
unconditionally lead to / follow cold blocks. I noticed two other issues
related to how cold regions are identified:
- An inconsistency can arise in the internal state of the hotness
propagation stage, as a block may end up in both the ColdBlocks set
and the HotBlocks set. Further inconsistencies can arise as these sets
do not match what's in ProfileSummaryInfo.
- It isn't necessary to limit outlining to single-exit regions.
This patch teaches the splitting algorithm to identify maximal cold
regions and outline them. A maximal cold region is defined as the set of
blocks post-dominated by a cold sink block, or dominated by that sink
block. This approach can successfully outline loops in the cold path. As
a side benefit, it maintains less internal state than the current
approach.
Due to a limitation in CodeExtractor, blocks within the maximal cold
region which aren't dominated by a single entry point (a so-called "max
ancestor") are filtered out.
Results:
- X86 (LNT + -Os + externals): 134KB of TEXT were outlined compared to
47KB pre-patch, or a ~3x improvement. Did not see a performance impact
across two runs.
- AArch64 (LNT + -Os + externals + Apple-internal benchmarks): 149KB
of TEXT were outlined. Ditto re: performance impact.
- Outlining results improve marginally in the internal frameworks I
tested.
Follow-ups:
- Outline more than once per function, outline large single basic
blocks, & try to remove unconditional branches in outlined functions.
Differential Revision: https://reviews.llvm.org/D53627
llvm-svn: 345209
(Relands r344930, reverted in r344935, and now hopefully fixed for
Windows.)
While this change specifically targets FileCheck, it affects any tool
using the same SourceMgr facilities.
Previously, -color was documented in FileCheck's -help output, but
-color had no effect. Now, -color obeys its documentation: it forces
colors to be used in FileCheck diagnostics even when stderr is not a
terminal.
-color is especially helpful when combined with FileCheck's -v, which
can produce a long series of diagnostics that you might wish to pipe
to a pager, such as less -R. The WithColor extensions here will also
help to clean up color usage in FileCheck's annotated dump of input,
which is proposed in D52999.
Reviewed By: JDevlieghere, zturner
Differential Revision: https://reviews.llvm.org/D53419
llvm-svn: 345202
It's possible to do a tail call to a stack argument. LLVM already
calculates the right stack offset to call through.
Fixes the sibcall* and musttail* verifier failures tracked at PR27481.
llvm-svn: 345197
Summary:
This renames the IsParsingMSInlineAsm member variable of AsmLexer to
LexMasmIntegers and moves it up to MCAsmLexer. This is the only behavior
controlled by that variable. I added a public setter, so that it can be
set from outside or from the llvm-mc command line. We may need to
arrange things so that users can get this behavior from clang, but
that's future work.
I also put additional hex literal lexing functionality under this flag
to fix PR32973. It appears that this hex literal parsing wasn't intended
to be enabled in non-masm-style blocks.
Now, masm integers (0b1101 and 0ABCh) work in __asm blocks from clang,
but 0b label references work when using .intel_syntax in standalone .s
files.
However, 0b label references will *not* work from __asm blocks in clang.
They will work from GCC inline asm blocks, which it sounds like is
important for Crypto++ as mentioned in PR36144.
Essentially, we only lex masm literals for inline asm blobs that use
intel syntax. If the .intel_syntax directive is used inside a gnu-style
inline asm statement, masm literals will not be lexed, which is
compatible with gas and llvm-mc standalone .s assembly.
This fixes PR36144 and PR32973.
Reviewers: Gerolf, avt77
Subscribers: eraman, hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D53535
llvm-svn: 345189
Add X86 SimplifyDemandedBitsForTargetNode and use it to simplify PMULDQ/PMULUDQ target nodes.
This enables us to repeatedly simplify the node's arguments after the previous approach had to be reverted due to PR39398.
Differential Revision: https://reviews.llvm.org/D53643
llvm-svn: 345182
Summary:
The current default of appending "_"+entry block label to the new
extracted cold function breaks demangling. Change the deliminator from
"_" to "." to enable demangling. Because the header block label will
be empty for release compile code, use "extracted" after the "." when
the label is empty.
Additionally, add a mechanism for the client to pass in an alternate
suffix applied after the ".", and have the hot cold split pass use
"cold."+Count, where the Count is currently 1 but can be used to
uniquely number multiple cold functions split out from the same function
with D53588.
Reviewers: sebpop, hiraditya
Subscribers: llvm-commits, erik.pilkington
Differential Revision: https://reviews.llvm.org/D53534
llvm-svn: 345178
The BKPT instruction is specified to cause a software breakpoint,
and at least on Linux results in a SIGTRAP. This makes it more
suitable for implementing debugtrap than TRAP (aka UDF #254), which
is specified to cause an undefined instruction exception and results
in a SIGILL on Linux.
Moreover, BKPT is not marked as a terminator, which is not only
consistent with the IR instruction but allows the analyzeBlock
function to correctly analyze a basic block containing the instruction,
which fixes an assertion failure in the machine block placement pass
previously triggered by the included test case.
Because BKPT is only supported starting with ARMv5T, we continue to
use UDF #254 when targeting v4T.
Differential Revision: https://reviews.llvm.org/D53614
llvm-svn: 345171
This patch brings back the MOV64r0 pseudo instruction for zeroing a 64-bit register. This replaces the SUBREG_TO_REG MOV32r0 sequence we use today. Post register allocation we will rewrite the MOV64r0 to a 32-bit xor with an implicit def of the 64-bit register similar to what we do for the various XMM/YMM/ZMM zeroing pseudos.
My main motivation is to enable the spill optimization in foldMemoryOperandImpl. As we were seeing some code that repeatedly did "xor eax, eax; store eax;" to spill several registers with a new xor for each store. With this optimization enabled we get a store of a 0 immediate instead of an xor. Though I admit the ideal solution would be one xor where there are multiple spills. I don't believe we have a test case that shows this optimization in here. I'll see if I can try to reduce one from the code were looking at.
There's definitely some other machine CSE(and maybe other passes) behavior changes exposed by this patch. So it seems like there might be some other deficiencies in SUBREG_TO_REG handling.
Differential Revision: https://reviews.llvm.org/D52757
llvm-svn: 345165
Non-uniform division/remainder handling was added back at D49248/D50765 - so share the 'mul+sub' costs that already exist for uniform cases.
llvm-svn: 345164
A lifetime end intrinsic between a tail call and the return should not
prevent the call from being tail call optimized.
Differential Revision: https://reviews.llvm.org/D53519
llvm-svn: 345163
The original patch was committed here:
rL344609
...and reverted:
rL344612
...because it did not properly check/test data types before calling
ComputeNumSignBits().
The tests that caused bot failures for the previous commit are
over-reaching front-end tests that run the entire -O optimizer
pipeline:
Clang :: CodeGen/builtins-systemz-zvector.c
Clang :: CodeGen/builtins-systemz-zvector2.c
I've added a negative test here to ensure coverage for that case.
The new early exit check also tests the type of the 'B' parameter,
so we don't waste time on matching if either value is unsuitable.
Original commit message:
This is part of solving PR37549:
https://bugs.llvm.org/show_bug.cgi?id=37549
The patterns shown here are a special case of something
that we already convert to select. Using ComputeNumSignBits()
catches that case (but not the more complicated motivating
patterns yet).
The backend has hooks/logic to convert back to logic ops
if that's better for the target.
llvm-svn: 345149
This work is to avoid regressions when we seperate FNeg from the FSub IR instruction.
Differential Revision: https://reviews.llvm.org/D53205
llvm-svn: 345146
Summary:
If the target does not support `.asciz` and `.ascii` directives, the
strings are represented as bytes and each byte is placed on the new line
as a separate byte directive `.b8 <data>`. NVPTX target allows to
represent the vector of the data of the same type as a vector, where
values are separated using `,` symbol: `.b8 <data1>,<data2>,...`. This
allows to reduce the size of the final PTX file. Ptxas tool includes ptx
files into the resulting binary object, so reducing the size of the PTX
file is important.
Reviewers: tra, jlebar, echristo
Subscribers: jholewinski, llvm-commits
Differential Revision: https://reviews.llvm.org/D45822
llvm-svn: 345142
On Windows at least, llvm-strings was crashing if it encountered bytes
that mapped to negative chars, as it was passing these into
std::isgraph and std::isblank functions, resulting in undefined
behaviour. On debug builds using MSVC, these functions verfiy that the
value passed in is representable as an unsigned char. Since the char is
promoted to an int, a value greater than 127 would turn into a negative
integer value, and fail the check. Using the llvm::isPrint function is
sufficient to solve the issue.
Reviewed by: ruiu, mstorsjo
Differential Revision: https://reviews.llvm.org/D53509
llvm-svn: 345137
A new class named InstructionError has been added to Support.h in order to
improve the error reporting from class InstrBuilder.
The llvm-mca driver is responsible for handling InstructionError objects, and
printing them out to stderr.
The goal of this patch is to remove all the remaining error handling logic from
the library code.
In particular, this allows us to:
- Simplify the logic in InstrBuilder by removing a needless dependency from
MCInstrPrinter.
- Centralize all the error halding logic in a new function named 'runPipeline'
(see llvm-mca.cpp).
This is also a first step towards generalizing class InstrBuilder, so that in
future, we will be able to reuse its logic to also "lower" MachineInstr to
mca::Instruction objects.
Differential Revision: https://reviews.llvm.org/D53585
llvm-svn: 345129
masked-interleaving is enabled
Enable interleave-groups under fold-tail scenario for Opt for size compilation;
D50480 added support for vectorizing loops of arbitrary trip-count without a
remiander, which in turn makes everything in the loop conditional, including
interleave-groups if any. It therefore invalidated all interleave-groups
because we didn't have support for vectorizing predicated interleaved-groups
at the time. In the meantime, D53011 introduced this support, so we don't
have to invalidate interleave-groups when masked-interleaved support is enabled.
Reviewers: Ayal, hsaito, dcaballe, fhahn
Reviewed By: hsaito
Differential Revision: https://reviews.llvm.org/D53559
llvm-svn: 345115
LSR reassociates constants as unfolded offsets when the constants fit as
immediate add operands, which currently prevents such constants from being
combined later with loop invariant registers.
This patch modifies GenerateCombinations() to generate a second formula which
includes the unfolded offset in the combined loop-invariant register.
Differential Revision: https://reviews.llvm.org/D51861
llvm-svn: 345114
This patch adds support for dumping the unwind info from ARM64 COFF object
files.
Differential Revision: https://reviews.llvm.org/D53264
llvm-svn: 345108
A global alias may use indices which are not considered in bounds. In
such a case, accessing the base object will fail as it only peers
through inbounds accesses. This pattern is used by the swift compiler
to create references to preceeding members in the type metadata. This
would cause the code generation to fail when targeting a platform that
used ELF as the object file format. Be conservative and fail the
read-only check if we run into an alias that we cannot peer through.
llvm-svn: 345107
in the same round of SCC update.
In https://reviews.llvm.org/rL309784, inline history is added to prevent
infinite inlining across multiple run of inliner and SCC update, but the
history will only be kept when new SCC is actually generated during SCC update.
We found a case that SCC can be split and then merge into itself in the same
round of SCC update, so the same SCC will be pop out from UR.CWorklist and
then added back immediately, without any new SCC generated, that is why the
existing patch cannot catch the infinite inline case.
What the patch does is even if no new SCC is generated, if only the current
SCC appears in UR.CWorklist again, then keep the inline history.
Differential Revision: https://reviews.llvm.org/D52915
llvm-svn: 345103
When implementing memset's today we often see this pattern:
$x0 = MOV 0xXYXYXYXYXYXYXYXY
store $x0, ...
$w1 = MOV 0xXYXYXYXY
store $w1, ...
We first create a 64bit constant in a 64bit register with all bytes the
same and then create a 32bit constant with all bytes the same in a 32bit
register. In many targets we could just access the lower byte of the
64bit register instead.
- Ideally this would be handled by the ConstantHoist pass but it runs
too early when memset isn't expanded yet.
- The memset expansion code already had this optimization implemented,
however SelectionDAG constantfolding would constantfold the
"trunc(bigconstnat)" pattern to "smallconstant".
- This patch makes the memset expansion mark the constant as Opaque and
stop DAGCombiner from constant folding in this situation. (Similar to
how ConstantHoisting marks things as Opaque to avoid folding
ADD/SUB/etc.)
Differential Revision: https://reviews.llvm.org/D53181
llvm-svn: 345102
Summary:
Fix the new PM to only perform hot cold splitting once during ThinLTO,
by skipping it in the pre-link phase.
This was already fixed in the old PM by the move of the hot cold split
pass later (after the early return when PrepareForThinLTO) by r344869.
Reviewers: vsk, sebpop, hiraditya
Subscribers: mehdi_amini, inglorion, eraman, steven_wu, dexonsmith, llvm-commits
Differential Revision: https://reviews.llvm.org/D53611
llvm-svn: 345096
Summary:
This is a revised version of D41474.
When the debug location is parsed in BitcodeReader::parseFunction, the
scope and inlinedAt MDNodes are obtained via MDLoader->getMDNodeFwdRefOrNull(),
which will create a forward ref if they were not yet loaded.
Specifically, if one of these MDNodes is in the module level metadata
block, and this is during ThinLTO importing, that metadata block is
lazily loaded.
Most places in that invoke getMDNodeFwdRefOrNull have a corresponding call
to resolveForwardRefsAndPlaceholders which will take care of resolving them.
E.g. places that call getMetadataFwdRefOrLoad, or at the end of parsing a
function-level metadata block, or at the end of the initial lazy load of
module level metadata in order to handle invocations of getMDNodeFwdRefOrNull
for named metadata and global object attachments. However, the calls for
the scope/inlinedAt of debug locations are not backed by any such call to
resolveForwardRefsAndPlaceholders.
To fix this, change the scope and inlinedAt parsing to instead use
getMetadataFwdRefOrLoad, which will ensure the forward refs to lazily
loaded metadata are resolved.
Fixes PR35472.
Reviewers: dexonsmith, Sunil_Srivastava, vsk
Subscribers: inglorion, eraman, steven_wu, sebpop, mehdi_amini, dmikulin, vsk, hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D53596
llvm-svn: 345095
Summary:
This patch will print out {Counter, Skip, StopAfter} info of all passes which have DebugCounter set at destruction.
It can be used to monitor how many times does certain transformation happen in a pass, and also help check if -debug-counter option is set correctly.
Please refer to this [[ http://lists.llvm.org/pipermail/llvm-dev/2018-July/124722.html | thread ]] for motivation.
Reviewers: george.burgess.iv, davide, greened
Reviewed By: greened
Subscribers: kristina, llozano, mgorny, llvm-commits, mgrang
Differential Revision: https://reviews.llvm.org/D50031
llvm-svn: 345085
Using -diff and -verbose together doesn't work today. We should audit
where these two options interact and fix them. In the meantime we error
out when the user try to specify both.
llvm-svn: 345084
Clearing LargeOffsetGEPMap at the end fixes a bug where if a large
offset GEP is in a dead basic block, we fail an assertion when trying
to delete the block due to the asserting VH in LargeOffsetGEPMap.
Differential Revision: https://reviews.llvm.org/D53464
llvm-svn: 345082
Doesn't build on Windows. The call to 'lookup' is ambiguous. Clang and
MSVC agree, anyway.
http://lab.llvm.org:8011/builders/clang-x64-windows-msvc/builds/787
C:\b\slave\clang-x64-windows-msvc\build\llvm.src\unittests\ExecutionEngine\Orc\CoreAPIsTest.cpp(315): error C2668: 'llvm::orc::ExecutionSession::lookup': ambiguous call to overloaded function
C:\b\slave\clang-x64-windows-msvc\build\llvm.src\include\llvm/ExecutionEngine/Orc/Core.h(823): note: could be 'llvm::Expected<llvm::JITEvaluatedSymbol> llvm::orc::ExecutionSession::lookup(llvm::ArrayRef<llvm::orc::JITDylib *>,llvm::orc::SymbolStringPtr)'
C:\b\slave\clang-x64-windows-msvc\build\llvm.src\include\llvm/ExecutionEngine/Orc/Core.h(817): note: or 'llvm::Expected<llvm::JITEvaluatedSymbol> llvm::orc::ExecutionSession::lookup(const llvm::orc::JITDylibSearchList &,llvm::orc::SymbolStringPtr)'
C:\b\slave\clang-x64-windows-msvc\build\llvm.src\unittests\ExecutionEngine\Orc\CoreAPIsTest.cpp(315): note: while trying to match the argument list '(initializer list, llvm::orc::SymbolStringPtr)'
llvm-svn: 345078
In the new scheme the client passes a list of (JITDylib&, bool) pairs, rather
than a list of JITDylibs. For each JITDylib the boolean indicates whether or not
to match against non-exported symbols (true means that they should be found,
false means that they should not). The MatchNonExportedInJD and MatchNonExported
parameters on lookup are removed.
The new scheme is more flexible, and easier to understand.
This patch also updates JITDylib search orders to be lists of (JITDylib&, bool)
pairs to match the new lookup scheme. Error handling is also plumbed through
the LLJIT class to allow regression tests to fail predictably when a lookup from
a lazy call-through fails.
llvm-svn: 345077
Outlined code is cold by assumption, so it makes sense to optimize it
for minimal code size rather than performance.
After r344869 moved the splitting pass to the end of the IR pipeline,
this does not result in much of a code size reduction. This is probably
because a comparatively small number backend transforms make use of the
MinSize hint.
Running LNT on x86_64, I see that 33/1020 binaries shrink for a total of
919 bytes of TEXT reduction. I didn't measure a significant performance
impact.
Differential Revision: https://reviews.llvm.org/D53518
llvm-svn: 345072
We can't add the MULDQ node back to the worklist after the demanded bits change has been committed in case the node has been removed entirely. This will have to wait until we have SimplifyDemandedBitsForTargetNode.
llvm-svn: 345070
Summary:
GNU strip supports both `-s` and `-S` as aliases for `--strip-all` and `--strip-debug`, respectfully.
As part of this, it turns out that strip/objcopy were accepting case insensitive command line args. I'm not sure if there was an explicit reason for this. The only others uses of this are llvm-cvtres/llvm-mt/llvm-lib, which are all tools specific for windows support. Forcing case sensitivity allows both aliases to exist, but seems like a good idea anyway.
And as a surprise test case adjustment, the llvm-strip unit test was running with `-keep=unavailable_symbol`, despite `keep` not be a valid flag for strip. This is because there is a flag `-K` which, when case insensitivity is permitted, allows it to be interpreted as `-K` = `eep=unavailable_symbol` (e.g. to allow `-Kfoo` == `--keep-symbol=foo`).
Reviewers: jakehehrlich, jhenderson, alexshap
Reviewed By: jakehehrlich
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D53163
llvm-svn: 345068
As suggested on D53258, this patch shares common CTLZ expansion code between VectorLegalizer and SelectionDAGLegalize by putting it in TargetLowering.
Extension to D53474
llvm-svn: 345060
Add support to allow bit-casting from f128 to i128 and then
extracting 64 bits from the result.
Differential Revision: https://reviews.llvm.org/D49507
llvm-svn: 345053
Summary:
TryToShrinkGlobalToBoolean, when possible, will split store <value> + load <value> into store <bool> + select <bool ? value : 0>. This preserves DebugLoc during that pass.
Fixes PR37959. The test case here is the simplified .ll for:
```
static int foo;
int bar() {
foo = 5;
return foo;
}
```
Reviewers: dblaikie, gbedwell, aprantl
Reviewed By: dblaikie
Subscribers: mehdi_amini, JDevlieghere, dexonsmith, llvm-commits
Tags: #debug-info
Differential Revision: https://reviews.llvm.org/D53531
llvm-svn: 345046
We need to update this code before introducing an 'fneg' instruction in IR,
so we might as well kill off the integer neg/not queries too.
This is no-functional-change-intended for scalar code and most vector code.
For vectors, we can see that the 'match' API allows for undef elements in
constants, so we optimize those cases better.
Ideally, there would be a test for each code diff, but I don't see evidence
of that for the existing code, so I didn't try very hard to come up with new
vector tests for each code change.
Differential Revision: https://reviews.llvm.org/D53533
llvm-svn: 345042
Expand arithmetic reduction to include mul/and/or/xor instructions.
This patch just fixes the SLPVectorizer - the effective reduction costs for AVX1+ are still poor (see rL344846) and will need to be improved before SLP sees this as a valid transform - but we can already see the effect on SSE2 tests.
This partially helps PR37731, but doesn't fix it all as it still falls over on the extraction/reduction order for some reason.
Differential Revision: https://reviews.llvm.org/D53473
llvm-svn: 345037
The transform is broken in 2 ways - it doesn't correct metadata (or even drop it),
and it doesn't work with vectors with undef elements.
llvm-svn: 345033
This pass could probably be modified slightly to allow
vector splat transforms for practically no cost, but
it only works on scalars for now. So the use of the
newer 'match' API should make no functional difference.
llvm-svn: 345030
This initially landed in rL345014, but was reverted in rL345017
due to sanitizer-x86_64-linux-fast buildbot failure in
check-lld (ELF/relocatable-versioned.s) test.
While i'm not yet quite sure what is the problem, one obvious
thing here is that extra truncation roundtrip.
Maybe that's it? If not, will re-revert.
Differential Revision: https://reviews.llvm.org/D53521
llvm-svn: 345027
Summary:
Continuation of D52348.
We also get the `c) x & (-1 >> (32 - y))` pattern here, because of the D48768.
I will add extra-uses into those tests and follow-up with a patch to handle those patterns too.
Reviewers: RKSimon, craig.topper
Reviewed By: craig.topper
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D53521
llvm-svn: 345014
Non-loaded sections (whose unused load-address defaults to zero) should not
be taken into account when calculating ImageBase, or ImageBase will be
incorrectly set to 0.
Patch by Andrew Scheidecker. Thanks Andrew!
https://reviews.llvm.org/D51343
+ // The Sections list may contain sections that weren't loaded for
+ // whatever reason: they may be debug sections, and ProcessAllSections
+ // is false, or they may be sections that contain 0 bytes. If the
+ // section isn't loaded, the load address will be 0, and it should not
+ // be included in the ImageBase calculation.
llvm-svn: 344995
Summary:
At compile-time, create an array of {PC,HumanReadableStackFrameDescription}
for every function that has an instrumented frame, and pass this array
to the run-time at the module-init time.
Similar to how we handle pc-table in SanitizerCoverage.
The run-time is dummy, will add the actual logic in later commits.
Reviewers: morehouse, eugenis
Reviewed By: eugenis
Subscribers: srhines, llvm-commits, kubamracek
Differential Revision: https://reviews.llvm.org/D53227
llvm-svn: 344985
Summary:
Due to previous work to make WebAssembly MC by default stack-only
inline assembly now "just works" (previously it didn't since it had
no way to know types of registers), so no further work required.
So far we only have tests (in inline-asm.ll) which test with
non-existing instructions, so this adds a test that roundtrips
both the inline assembly and its surrounding code thru the assembler.
Reviewers: dschuff, sunfish
Subscribers: sbc100, jgravelle-google, eraman, aheejin, llvm-commits
Differential Revision: https://reviews.llvm.org/D52914
llvm-svn: 344977
Add an intrinsic that takes 2 integers and perform unsigned saturation
addition on them.
This is a part of implementing fixed point arithmetic in clang where some of
the more complex operations will be implemented as intrinsics.
Differential Revision: https://reviews.llvm.org/D53340
llvm-svn: 344971
analyzeBranch()/insertBranch() etc. do not properly deal with an undef
flag on the eflags input and used to produce invalid MIR. I don't see
this ever affecting real world inputs (I don't think it is possible to
produce undef flags with llvm IR), so I simply changed the code to bail
out in this case.
rdar://42122367
llvm-svn: 344970
I was trying to provide test coverage for D53533
with rL344964, but these don't do it...and I don't
think they add any value, so deleting.
llvm-svn: 344969
I've included a fix to DAGCombiner::ForwardStoreValueToDirectLoad that I believe will prevent the previous miscompile.
Original commit message:
Theoretically this was done to simplify the amount of isel patterns that were needed. But it also meant a substantial number of our isel patterns have to match an explicit bitcast. By making the vXi32/vXi16/vXi8 types legal for loads, DAG combiner should be able to change the load type to rem
I had to add some additional plain load instruction patterns and a few other special cases, but overall the isel table has reduced in size by ~12000 bytes. So it looks like this promotion was hurting us more than helping.
I still have one crash in vector-trunc.ll that I'm hoping @RKSimon can help with. It seems to relate to using getTargetConstantFromNode on a load that was shrunk due to an extract_subvector combine after the constant pool entry was created. So we end up decoding more mask elements than the lo
I'm hoping this patch will simplify the number of patterns needed to remove the and/or/xor promotion.
Reviewers: RKSimon, spatel
Reviewed By: RKSimon
Subscribers: llvm-commits, RKSimon
Differential Revision: https://reviews.llvm.org/D53306
llvm-svn: 344965