I'm not sure all the microarchitectural tuning flags that have been added to IVBFeatures are relevant for KNL. Separating will allow us to see and audit them. There might even be some simplification opportunities in the Sandy Bridge through Icelake inheritance line without KNL using the same chain.
llvm-svn: 345183
Add X86 SimplifyDemandedBitsForTargetNode and use it to simplify PMULDQ/PMULUDQ target nodes.
This enables us to repeatedly simplify the node's arguments after the previous approach had to be reverted due to PR39398.
Differential Revision: https://reviews.llvm.org/D53643
llvm-svn: 345182
Summary:
The current default of appending "_"+entry block label to the new
extracted cold function breaks demangling. Change the deliminator from
"_" to "." to enable demangling. Because the header block label will
be empty for release compile code, use "extracted" after the "." when
the label is empty.
Additionally, add a mechanism for the client to pass in an alternate
suffix applied after the ".", and have the hot cold split pass use
"cold."+Count, where the Count is currently 1 but can be used to
uniquely number multiple cold functions split out from the same function
with D53588.
Reviewers: sebpop, hiraditya
Subscribers: llvm-commits, erik.pilkington
Differential Revision: https://reviews.llvm.org/D53534
llvm-svn: 345178
The BKPT instruction is specified to cause a software breakpoint,
and at least on Linux results in a SIGTRAP. This makes it more
suitable for implementing debugtrap than TRAP (aka UDF #254), which
is specified to cause an undefined instruction exception and results
in a SIGILL on Linux.
Moreover, BKPT is not marked as a terminator, which is not only
consistent with the IR instruction but allows the analyzeBlock
function to correctly analyze a basic block containing the instruction,
which fixes an assertion failure in the machine block placement pass
previously triggered by the included test case.
Because BKPT is only supported starting with ARMv5T, we continue to
use UDF #254 when targeting v4T.
Differential Revision: https://reviews.llvm.org/D53614
llvm-svn: 345171
This will allow other generators of LLVM IR to use the auto-vectorizer
without having to change that flag.
Note: on its own, this patch will enable auto-vectorization on Hexagon
in all cases, regardless of the -fvectorize flag. There is a companion
clang patch that together with this one forms an NFC for clang users.
llvm-svn: 345169
This patch brings back the MOV64r0 pseudo instruction for zeroing a 64-bit register. This replaces the SUBREG_TO_REG MOV32r0 sequence we use today. Post register allocation we will rewrite the MOV64r0 to a 32-bit xor with an implicit def of the 64-bit register similar to what we do for the various XMM/YMM/ZMM zeroing pseudos.
My main motivation is to enable the spill optimization in foldMemoryOperandImpl. As we were seeing some code that repeatedly did "xor eax, eax; store eax;" to spill several registers with a new xor for each store. With this optimization enabled we get a store of a 0 immediate instead of an xor. Though I admit the ideal solution would be one xor where there are multiple spills. I don't believe we have a test case that shows this optimization in here. I'll see if I can try to reduce one from the code were looking at.
There's definitely some other machine CSE(and maybe other passes) behavior changes exposed by this patch. So it seems like there might be some other deficiencies in SUBREG_TO_REG handling.
Differential Revision: https://reviews.llvm.org/D52757
llvm-svn: 345165
Non-uniform division/remainder handling was added back at D49248/D50765 - so share the 'mul+sub' costs that already exist for uniform cases.
llvm-svn: 345164
A lifetime end intrinsic between a tail call and the return should not
prevent the call from being tail call optimized.
Differential Revision: https://reviews.llvm.org/D53519
llvm-svn: 345163
Also, removed the initialization of vectors used for processor resource masks.
Support function 'computeProcResourceMasks()' already calls method resize on
those vectors.
No functional change intended.
llvm-svn: 345161
Summary: This function was performing two hash lookups when a new function type was requested: first checking if it exists and second to insert it. This patch updates the function to perform a single hash lookup in this case by updating the value in the hash table in-place in case the function type was not there before.
Reviewers: bkramer
Reviewed By: bkramer
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D53471
llvm-svn: 345151
The original patch was committed here:
rL344609
...and reverted:
rL344612
...because it did not properly check/test data types before calling
ComputeNumSignBits().
The tests that caused bot failures for the previous commit are
over-reaching front-end tests that run the entire -O optimizer
pipeline:
Clang :: CodeGen/builtins-systemz-zvector.c
Clang :: CodeGen/builtins-systemz-zvector2.c
I've added a negative test here to ensure coverage for that case.
The new early exit check also tests the type of the 'B' parameter,
so we don't waste time on matching if either value is unsuitable.
Original commit message:
This is part of solving PR37549:
https://bugs.llvm.org/show_bug.cgi?id=37549
The patterns shown here are a special case of something
that we already convert to select. Using ComputeNumSignBits()
catches that case (but not the more complicated motivating
patterns yet).
The backend has hooks/logic to convert back to logic ops
if that's better for the target.
llvm-svn: 345149
Added begin()/end() methods to allow the usage of SourceMgr in foreach loops.
With this change, method getMCInstFromIndex() (as well as a couple of other
methods) are now redundant, and can be removed from the public interface.
llvm-svn: 345147
This work is to avoid regressions when we seperate FNeg from the FSub IR instruction.
Differential Revision: https://reviews.llvm.org/D53205
llvm-svn: 345146
Summary:
If the target does not support `.asciz` and `.ascii` directives, the
strings are represented as bytes and each byte is placed on the new line
as a separate byte directive `.b8 <data>`. NVPTX target allows to
represent the vector of the data of the same type as a vector, where
values are separated using `,` symbol: `.b8 <data1>,<data2>,...`. This
allows to reduce the size of the final PTX file. Ptxas tool includes ptx
files into the resulting binary object, so reducing the size of the PTX
file is important.
Reviewers: tra, jlebar, echristo
Subscribers: jholewinski, llvm-commits
Differential Revision: https://reviews.llvm.org/D45822
llvm-svn: 345142
On Windows at least, llvm-strings was crashing if it encountered bytes
that mapped to negative chars, as it was passing these into
std::isgraph and std::isblank functions, resulting in undefined
behaviour. On debug builds using MSVC, these functions verfiy that the
value passed in is representable as an unsigned char. Since the char is
promoted to an int, a value greater than 127 would turn into a negative
integer value, and fail the check. Using the llvm::isPrint function is
sufficient to solve the issue.
Reviewed by: ruiu, mstorsjo
Differential Revision: https://reviews.llvm.org/D53509
llvm-svn: 345137
A new class named InstructionError has been added to Support.h in order to
improve the error reporting from class InstrBuilder.
The llvm-mca driver is responsible for handling InstructionError objects, and
printing them out to stderr.
The goal of this patch is to remove all the remaining error handling logic from
the library code.
In particular, this allows us to:
- Simplify the logic in InstrBuilder by removing a needless dependency from
MCInstrPrinter.
- Centralize all the error halding logic in a new function named 'runPipeline'
(see llvm-mca.cpp).
This is also a first step towards generalizing class InstrBuilder, so that in
future, we will be able to reuse its logic to also "lower" MachineInstr to
mca::Instruction objects.
Differential Revision: https://reviews.llvm.org/D53585
llvm-svn: 345129
masked-interleaving is enabled
Enable interleave-groups under fold-tail scenario for Opt for size compilation;
D50480 added support for vectorizing loops of arbitrary trip-count without a
remiander, which in turn makes everything in the loop conditional, including
interleave-groups if any. It therefore invalidated all interleave-groups
because we didn't have support for vectorizing predicated interleaved-groups
at the time. In the meantime, D53011 introduced this support, so we don't
have to invalidate interleave-groups when masked-interleaved support is enabled.
Reviewers: Ayal, hsaito, dcaballe, fhahn
Reviewed By: hsaito
Differential Revision: https://reviews.llvm.org/D53559
llvm-svn: 345115
LSR reassociates constants as unfolded offsets when the constants fit as
immediate add operands, which currently prevents such constants from being
combined later with loop invariant registers.
This patch modifies GenerateCombinations() to generate a second formula which
includes the unfolded offset in the combined loop-invariant register.
Differential Revision: https://reviews.llvm.org/D51861
llvm-svn: 345114
This B/W VPTEST instructions are only available with AVX512BW. But lowering should prevent any byte or word elements from getting to isel so this can't be exposed.
llvm-svn: 345112
This patch adds support for dumping the unwind info from ARM64 COFF object
files.
Differential Revision: https://reviews.llvm.org/D53264
llvm-svn: 345108
A global alias may use indices which are not considered in bounds. In
such a case, accessing the base object will fail as it only peers
through inbounds accesses. This pattern is used by the swift compiler
to create references to preceeding members in the type metadata. This
would cause the code generation to fail when targeting a platform that
used ELF as the object file format. Be conservative and fail the
read-only check if we run into an alias that we cannot peer through.
llvm-svn: 345107
On GNU/Hurd, llvm-config is returning bogus value, such as:
$ llvm-config-6.0 --includedir
/usr/include
while it should be:
$ llvm-config-6.0 --includedir
/usr/lib/llvm-6.0/include
This is because getMainExecutable does not get the actual installation
path. On GNU/Hurd, /proc/self/exe is indeed a symlink to the path that
was used to start the program, and not the eventual binary file. Llvm's
getMainExecutable thus needs to run realpath over it to get the actual
place where llvm was installed (/usr/lib/llvm-6.0/bin/llvm-config), and
not /usr/bin/llvm-config-6.0. This will not change the result on Linux,
where /proc/self/exe already points to the eventual file.
Patch by Samuel Thibault!
While making changes here, I reformatted this block a bit to reduce
indentation and match 2 space indent style.
Differential Revision: https://reviews.llvm.org/D53557
llvm-svn: 345104
in the same round of SCC update.
In https://reviews.llvm.org/rL309784, inline history is added to prevent
infinite inlining across multiple run of inliner and SCC update, but the
history will only be kept when new SCC is actually generated during SCC update.
We found a case that SCC can be split and then merge into itself in the same
round of SCC update, so the same SCC will be pop out from UR.CWorklist and
then added back immediately, without any new SCC generated, that is why the
existing patch cannot catch the infinite inline case.
What the patch does is even if no new SCC is generated, if only the current
SCC appears in UR.CWorklist again, then keep the inline history.
Differential Revision: https://reviews.llvm.org/D52915
llvm-svn: 345103
When implementing memset's today we often see this pattern:
$x0 = MOV 0xXYXYXYXYXYXYXYXY
store $x0, ...
$w1 = MOV 0xXYXYXYXY
store $w1, ...
We first create a 64bit constant in a 64bit register with all bytes the
same and then create a 32bit constant with all bytes the same in a 32bit
register. In many targets we could just access the lower byte of the
64bit register instead.
- Ideally this would be handled by the ConstantHoist pass but it runs
too early when memset isn't expanded yet.
- The memset expansion code already had this optimization implemented,
however SelectionDAG constantfolding would constantfold the
"trunc(bigconstnat)" pattern to "smallconstant".
- This patch makes the memset expansion mark the constant as Opaque and
stop DAGCombiner from constant folding in this situation. (Similar to
how ConstantHoisting marks things as Opaque to avoid folding
ADD/SUB/etc.)
Differential Revision: https://reviews.llvm.org/D53181
llvm-svn: 345102
Summary:
Fix the new PM to only perform hot cold splitting once during ThinLTO,
by skipping it in the pre-link phase.
This was already fixed in the old PM by the move of the hot cold split
pass later (after the early return when PrepareForThinLTO) by r344869.
Reviewers: vsk, sebpop, hiraditya
Subscribers: mehdi_amini, inglorion, eraman, steven_wu, dexonsmith, llvm-commits
Differential Revision: https://reviews.llvm.org/D53611
llvm-svn: 345096
Summary:
This is a revised version of D41474.
When the debug location is parsed in BitcodeReader::parseFunction, the
scope and inlinedAt MDNodes are obtained via MDLoader->getMDNodeFwdRefOrNull(),
which will create a forward ref if they were not yet loaded.
Specifically, if one of these MDNodes is in the module level metadata
block, and this is during ThinLTO importing, that metadata block is
lazily loaded.
Most places in that invoke getMDNodeFwdRefOrNull have a corresponding call
to resolveForwardRefsAndPlaceholders which will take care of resolving them.
E.g. places that call getMetadataFwdRefOrLoad, or at the end of parsing a
function-level metadata block, or at the end of the initial lazy load of
module level metadata in order to handle invocations of getMDNodeFwdRefOrNull
for named metadata and global object attachments. However, the calls for
the scope/inlinedAt of debug locations are not backed by any such call to
resolveForwardRefsAndPlaceholders.
To fix this, change the scope and inlinedAt parsing to instead use
getMetadataFwdRefOrLoad, which will ensure the forward refs to lazily
loaded metadata are resolved.
Fixes PR35472.
Reviewers: dexonsmith, Sunil_Srivastava, vsk
Subscribers: inglorion, eraman, steven_wu, sebpop, mehdi_amini, dmikulin, vsk, hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D53596
llvm-svn: 345095
Summary:
This patch will print out {Counter, Skip, StopAfter} info of all passes which have DebugCounter set at destruction.
It can be used to monitor how many times does certain transformation happen in a pass, and also help check if -debug-counter option is set correctly.
Please refer to this [[ http://lists.llvm.org/pipermail/llvm-dev/2018-July/124722.html | thread ]] for motivation.
Reviewers: george.burgess.iv, davide, greened
Reviewed By: greened
Subscribers: kristina, llozano, mgorny, llvm-commits, mgrang
Differential Revision: https://reviews.llvm.org/D50031
llvm-svn: 345085
Using -diff and -verbose together doesn't work today. We should audit
where these two options interact and fix them. In the meantime we error
out when the user try to specify both.
llvm-svn: 345084