Summary:
Extend image intrinsics to support data types of V1F32 and V2F32.
TODO: we should define a mapping table to change the opcode for data type of V2F32 but just one channel is active,
even though such case should be very rare.
Reviewers:
tstellarAMD
Differential Revision:
http://reviews.llvm.org/D26472
llvm-svn: 286860
Darwin's backtrace() function does not work with sigaltstack (which was
enabled when available with r270395) — it does a sanity check to make
sure that the current frame pointer is within the expected stack area
(which it is not when using an alternate stack) and gives up otherwise.
The alternative of _Unwind_Backtrace seems to work fine on macOS, so use
that when backtrace() fails. Note that we then use backtrace_symbols_fd()
with the addresses from _Unwind_Backtrace, but I’ve tested that and it
also seems to work fine. rdar://problem/28646552
llvm-svn: 286851
This restores the rest of r286297 (part was restored in r286475).
Specifically, it restores the part requiring adding a dependency from
the Analysis to Object library (downstream use changed to correctly
model split BitReader vs BitWriter libraries).
Original description of this part of patch follows:
Module level asm may also contain defs of values. We need to prevent
export of any refs to local values defined in module level asm (e.g. a
ref in normal IR), since that also requires renaming/promotion of the
local. To do that, the summary index builder looks at all values in the
module level asm string that are not marked Weak or Global, which is
exactly the set of locals that are defined. A summary is created for
each of these local defs and flagged as NoRename.
This required adding handling to the BitcodeWriter to look at GV
declarations to see if they have a summary (rather than skipping them
all).
Finally, added an assert to IRObjectFile::CollectAsmUndefinedRefs to
ensure that an MCAsmParser is available, otherwise the module asm parse
would silently fail. Initialized the asm parser in the opt tool for use
in testing this fix.
Fixes PR30610.
llvm-svn: 286844
The Stack slot coloring pass removes a store that is followed by a load
that deal with the same stack slot. The function isLoadFromStackSlot
is supposed to consider the loads that have no side-effects. This
patch fixed the issue by removing the unsafe loads from this function
Eg:
%vreg0<def> = L2_loadruh_io <fi#15>, 0
S2_storeri_io <fi#15>, 0, %vreg0
In this case, we load an unsigned extended half word and store this in to
the same stack slot. The Stack slot coloring pass considers safe to remove
the store. This patch marked all the non-vector byte and half word loads as
unsafe.
llvm-svn: 286843
The existing logic was to discard any symbols representing function template
instantiations, as the definitions were assumed to be inline. But there are
three explicit specializations of clang::Type::getAs that are only defined in
Clang's lib/AST/Type.cpp, and at least the plugin used by the LibreOffice build
(https://wiki.documentfoundation.org/Development/Clang_plugins) uses those
functions.
Differential Revision: https://reviews.llvm.org/D26455
llvm-svn: 286841
Summary:
The change in r285513 to prevent exporting of locals used in
inline asm added all locals in the llvm.used set to the reference
set of functions containing inline asm. Since these locals were marked
NoRename, this automatically prevented importing of the function.
Unfortunately, this caused an explosion in the summary reference lists
in some cases. In my particular example, it happened for a large protocol
buffer generated C++ file, where many of the generated functions
contained an inline asm call. It was exacerbated when doing a ThinLTO
PGO instrumentation build, where the PGO instrumentation included
thousands of private __profd_* values that were added to llvm.used.
We really only need to include a single llvm.used local (NoRename) value
in the reference list of a function containing inline asm to block it
being imported. However, it seems cleaner to add a flag to the summary
that explicitly describes this situation, which is what this patch does.
Reviewers: mehdi_amini
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D26402
llvm-svn: 286840
Add explicit v16i16/v32i8 ADD/SUB costs, matching the costs of v4i64/v8i32 - they were missing for some reason.
This has side effects on the LV max bandwidth tests (AVX1 now prefers 128-bit vectors vs AVX2 which still prefers 256-bit)
llvm-svn: 286832
Also,
Revert "test: remove the archive before modifying it"
Revert "test: explicitly use gnu format"
This reverts commits r286778, r286729 and r286767, as they are randomly failing
on many bots (AArch64, x86_64).
llvm-svn: 286820
When calculating the cost of a call instruction we were applying a heuristic penalty as well as the cost of the instruction itself.
However, when calculating the benefit from inlining we weren't discounting the equivalent penalty for the call instruction that would be removed! This caused skew in the calculation and meant we wouldn't inline in the following, trivial case:
int g() {
h();
}
int f() {
g();
}
llvm-svn: 286814
Summary:
Unfolding selects was previously done with the help of a vector
of pointers that was then sorted to be able to remove duplicates.
As this sorting depends on the memory addresses, it was
non-deterministic. A SetVector is used now so that duplicates are
removed without the need of sorting first.
Reviewers: mgrang, efriedma
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D26450
llvm-svn: 286807
Summary:
This patch adds explicit `(void)` casts to discarded `release()` calls to suppress -Wunused-result.
This patch fixes *all* warnings are generated as a result of [applying `[[nodiscard]]` within libc++](https://reviews.llvm.org/D26596).
Similar fixes were applied to Clang in r286796.
Reviewers: chandlerc, dberris
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D26598
llvm-svn: 286797
Only attempt to demangle symbols which have the itanium C++ prefix of `_Z`.
This ensures that we do not treat any symbol name as a managled named. We would
previously treat a C function `f` as a mangled name and decode that to `float`
incorrectly.
While it is easy to add tests for this, Mehdi recommended against introducing
tests for the demangler as libc++abi should cover the testing.
llvm-svn: 286795
-Don't print the 'x' suffix for the 128-bit reg/mem VEX encoded instructions in Intel syntax. This is consistent with the EVEX versions.
-Don't print the 'y' suffix for the 256-bit reg/reg VEX encoded instructions in Intel or AT&T syntax. This is consistent with the EVEX versions.
-Allow the 'x' and 'y' suffixes to be used for the reg/mem forms when we're assembling using Intel syntax.
-Allow the 'x' and 'y' suffixes on the reg/reg EVEX encoded instructions in Intel or AT&T syntax. This is consistent with what VEX was already allowing.
This should fix at least some of PR28850.
llvm-svn: 286787
`shl nsw i8 1, i8 8` is poison, but `mul i8 1, i8 128` is not.
This was discussed previously here:
http://lists.llvm.org/pipermail/llvm-dev/2015-April/084195.html. From
the discussion, it was not clear which semantics we want for `shl`, but
for now at least make the language reference more accurate.
llvm-svn: 286785
`c++filt` when given no arguments runs as a REPL, decoding each line as a
decorated name. Unify the test structure to be more uniform, with the tests for
llvm-cxxfilt living under test/tools/llvm-cxxfilt.
llvm-svn: 286777
nThis avoids the nasty problems caused by using
memory instructions that read the exec mask while
spilling / restoring registers used for control flow
masking, but only for VI when these were added.
This always uses the scalar stores when enabled currently,
but it may be better to still try to spill to a VGPR
and use this on the fallback memory path.
The cache also needs to be flushed before wave termination
if a scalar store is used.
llvm-svn: 286766
These will be used to replace the masked intrinsics so that InstCombineCalls can optimize the AVX-512 variable shifts the same way it does for AVX2.
llvm-svn: 286754
All existing callers were manually extracting information out of an existing
GEP instruction and passing it to getGEPExpr(). Simplify the interface by
changing it to take a GEPOperator instead.
llvm-svn: 286751
Until we have handling for ignoring unloaded sections, simplify the logic to
the point of triviality. This fixes the scanning of archives, particularly when
embedded in archives.
llvm-svn: 286727
After this I'll add the unmasked intrinsics to InstCombineCalls to finish making our handling of these types of shuffles consistent between AVX-512 and the legacy intrinsics.
llvm-svn: 286725
Clear cross-target test dependencies when using LLVM_OCAML_OUT_OF_TREE,
in order to make it possible to run check-llvm-bindings-ocaml without
rebuilding the whole LLVM.
Differential Revision: https://reviews.llvm.org/D26580
llvm-svn: 286720
Summary:
This is the first step towards being able to add the avx512 shift by immediate intrinsics to InstCombineCalls where we aleady support the sse2 and avx2 intrinsics. We need to the unmasked versions so we can avoid having to teach InstCombineCalls that it would need to insert selects sometimes. Instead we'll just add the selects around the new instrinsics in the frontend.
This change should also enable the shift by i32 intrinsics to take a non-constant shift value just like the avx2 and sse intrinsics. This will enable us to fix PR30691 once we update clang.
Next I'll switch clang to use the new builtins. Then we'll come back to the backend and remove/autoupgrade the old intrinsics. Then I'll work on the same series for variable shifts.
Reviewers: RKSimon, zvi, delena
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D26333
llvm-svn: 286711
Summary: VALIGND and VALIGNQ are similar to PALIGNR but instead of working on a 128-bit lane they work on the entire vector register. This change leverages the shuffle rotate detection code used for PALIGNR to detect these cases.
Reviewers: delena, RKSimon
Subscribers: Farhana, llvm-commits
Differential Revision: https://reviews.llvm.org/D26297
llvm-svn: 286709
return types.
This class allows user provided handlers to return either error-wrapped types
or plain types. In the latter case, the plain type is wrapped with a success
value of Error or Expected<T> type to fit it into the rest of the serialization
machinery.
This patch allows us to remove the RPC unit-test workaround added in r286646.
llvm-svn: 286701
This introduces a new type-safe general purpose formatting
library. It provides compile-time type safety, does not require
a format specifier (since the type is deduced), and provides
mechanisms for extending the format capability to user defined
types, and overriding the formatting behavior for existing types.
This patch additionally adds documentation for the API to the
LLVM programmer's manual.
Mailing List Thread:
http://lists.llvm.org/pipermail/llvm-dev/2016-October/105836.html
Differential Revision: https://reviews.llvm.org/D25587
llvm-svn: 286682
This patch defines a new function to add a SectionContribs stream
to a PDB file. Unlike SectionMap, SectionContribs contains a list
of input sections as opposed to output sections.
Note that this patch needs improving because currently we do not
set Module field in SectionContribs entries. In a follow-up patch,
I'll add Modules and then fix it after that.
Differential Revision: https://reviews.llvm.org/D26210
llvm-svn: 286677
Summary:
This pass was assuming that when a PHI instruction defined a register
used by another PHI instruction that the defining insstruction would
be legalized before the using instruction.
This assumption was causing the pass to not legalize some PHI nodes
within divergent flow-control.
This fixes a bug that was uncovered by r285762.
Reviewers: nhaehnle, arsenm
Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, tony-tye, llvm-commits
Differential Revision: https://reviews.llvm.org/D26303
llvm-svn: 286676
When providing the project directory to the merge script, print it out in the
commit instructions instead of the default project directory.
llvm-svn: 286675
This implements a function annotation that disables TSan checking for the
function at run time. The benefit over attribute((no_sanitize("thread")))
is that the accesses within the callees will also be suppressed.
The motivation for this attribute is a guarantee given by the objective C
language that the calls to the reference count decrement and object
deallocation will be synchronized. To model this properly, we would need
to intercept all ref count decrement calls (which are very common in ObjC
due to use of ARC) and also every single message send. Instead, we propose
to just ignore all accesses made from within dealloc at run time. The main
downside is that this still does not introduce any synchronization, which
means we might still report false positives if the code that relies on this
synchronization is not executed from within dealloc. However, we have not seen
this in practice so far and think these cases will be very rare.
Differential Revision: https://reviews.llvm.org/D25858
llvm-svn: 286663
This is PR28376.
Unfortunately given the current structure of optimization diagnostics we
lack the capability to tell whether the user has
passed -Rpass-analysis=loop-vectorize since this is local to the
front-end (BackendConsumer::OptimizationRemarkHandler).
So rather than printing this even if the user has already
passed -Rpass-analysis, this patch just punts and stops recommending
this option. I don't think that getting this right is worth the
complexity.
Differential Revision: https://reviews.llvm.org/D26563
llvm-svn: 286662
This is a temporary fix: The right solution is to make sure addHandler can
support mutable lambdas. I'll add that in a follow-up patch.
llvm-svn: 286661
The DAG mutators in the scheduler cannot really remove DAG nodes as
additional anlysis information such as ScheduleDAGToplogicalSort are
already computed at this point and rely on a fixed number of DAG nodes.
Alleviate the missing removal with a new flag: Setting the new skip
flag on a node ignores it during scheduling.
llvm-svn: 286655
Push VRegUses/collectVRegUses() down the class hierarchy towards its
only user ScheduleDAGMILive.
NFCI: The initialization of the map happens at a later point but that
should not matter.
This is in preparation to allow DAG mutators to merge nodes, which
relies on this map getting computed later.
llvm-svn: 286654
This is a follow-up on the recent refactoring of the FunctionMerge pass.
It should fix a fail of the new FunctionComparator unittest whe compiling with MSVC.
llvm-svn: 286648
return type.
This should be fixed permanently by having the RPCUtils header recognize the
ErrorSuccess type. I'll commit that in a follow up patch.
llvm-svn: 286646
This patch corresponds to review:
https://reviews.llvm.org/D26480
Adds all the intrinsics used for various permute builtins that will
be added to altivec.h.
llvm-svn: 286638
When a function pointer is replaced with a jumptable pointer, special
case is needed to preserve the semantics of extern_weak functions.
Since a jumptable entry can not be extern_weak, we emulate that
behaviour by replacing all references to F (the extern_weak function)
with the following expression: F != nullptr ? JumpTablePtr : nullptr.
Extra special care is needed for global initializers, since most (or
probably all) backends can not lower an initializer that includes
this kind of constant expression. Initializers like that are replaced
with a global constructor (i.e. a runtime initializer).
llvm-svn: 286636
This is pure refactoring. NFC.
This change moves the FunctionComparator (together with the GlobalNumberState
utility) in to a separate file so that it can be used by other passes.
For example, the SwiftMergeFunctions pass in the Swift compiler:
https://github.com/apple/swift/blob/master/lib/LLVMPasses/LLVMMergeFunctions.cpp
Details of the change:
*) The big part is just moving code out of MergeFunctions.cpp into FunctionComparator.h/cpp
*) Make FunctionComparator member functions protected (instead of private)
so that a derived comparator class can use them.
Following refactoring helps to share code between the base FunctionComparator
class and a derived class:
*) Add a beginCompare() function
*) Move some basic function property comparisons into a separate function compareSignature()
*) Do the GEP comparison inside cmpOperations() which now has a new
needToCmpOperands reference parameter
https://reviews.llvm.org/D25385
llvm-svn: 286632
The functions getBitcodeTargetTriple(), isBitcodeContainingObjCCategory(),
getBitcodeProducerString() and hasGlobalValueSummary() now return errors
via their return value rather than via the diagnostic handler.
To make this work, re-implement these functions using non-member functions
so that they can be used without the LLVMContext required by BitcodeReader.
Differential Revision: https://reviews.llvm.org/D26532
llvm-svn: 286623
(1) Add support for function key negotiation.
The previous version of the RPC required both sides to maintain the same
enumeration for functions in the API. This means that any version skew between
the client and server would result in communication failure.
With this version of the patch functions (and serializable types) are defined
with string names, and the derived function signature strings are used to
negotiate the actual function keys (which are used for efficient call
serialization). This allows clients to connect to any server that supports a
superset of the API (based on the function signatures it supports).
(2) Add a callAsync primitive.
The callAsync primitive can be used to install a return value handler that will
run as soon as the RPC function's return value is sent back from the remote.
(3) Launch policies for RPC function handlers.
The new addHandler method, which installs handlers for RPC functions, takes two
arguments: (1) the handler itself, and (2) an optional "launch policy". When the
RPC function is called, the launch policy (if present) is invoked to actually
launch the handler. This allows the handler to be spawned on a background
thread, or added to a work list. If no launch policy is used, the handler is run
on the server thread itself. This should only be used for short-running
handlers, or entirely synchronous RPC APIs.
(4) Zero cost cross type serialization.
You can now define serialization from any type to a different "wire" type. For
example, this allows you to call an RPC function that's defined to take a
std::string while passing a StringRef argument. If a serializer from StringRef
to std::string has been defined for the channel type this will be used to
serialize the argument without having to construct a std::string instance.
This allows buffer reference types to be used as arguments to RPC calls without
requiring a copy of the buffer to be made.
llvm-svn: 286620
Summary:
Fix off-by-one indexing error in loop checking that inserted value was a
splat vector.
Add code to check that INSERT_VECTOR_ELT nodes constructing the splat
vector have the expected constant index values.
Reviewers: t.p.northover, jmolloy, mcrosier
Subscribers: aemerson, llvm-commits, rengolin
Differential Revision: https://reviews.llvm.org/D26409
llvm-svn: 286616
The current implementation is emitting a global constant that happens
to evaluate to the same bytes + relocation as a jump instruction on
X86. This does not work for PIE executables and shared libraries
though, because we end up with a wrong relocation type. And it has no
chance of working on ARM/AArch64 which use different relocation types
for jump instructions (R_ARM_JUMP24) that is never generated for
data.
This change replaces the constant with module-level inline assembly
followed by a hidden declaration of the jump table. Works fine for
ARM/AArch64, but has some drawbacks.
* Extra symbols are added to the static symbol table, which inflate
the size of the unstripped binary a little. Stripped binaries are not
affected. This happens because jump table declarations must be
external (because their body is in the inline asm).
* Original functions that were anonymous are now named
<original name>.cfi, and it affects symbolization sometimes. This is
necessary because the only user of these functions is the (inline
asm) jump table, so they had to be added to @llvm.used, which does
not allow unnamed functions.
llvm-svn: 286611
This is a partial revert of r244615 (http://reviews.llvm.org/D11942),
which caused a major regression in debug info quality.
Turning the artificial __MergedGlobal symbols into private symbols
(l__MergedGlobal) means that the linker will not include them in the
symbol table of the final executable. Without a symbol table entry
dsymutil is not be able to process the debug info for any of the
merged globals and thus drops the debug info for all of them.
This patch is enabling the old behavior for all MachO targets while
leaving all other targets unaffected.
rdar://problem/29160481
https://reviews.llvm.org/D26531
llvm-svn: 286607
https://reviews.llvm.org/D26526
- Fixed DW_FORM_strp to be correctly sized and extracted for DWARF64
- Added some missing strp variants as well
- Fixed comment typo
llvm-svn: 286603
In preparation for a follow on patch that improves DWARF parsing speed, clean up DWARFFormValue so that we have can get the fixed byte size of a form value given a DWARFUnit or given the version, address byte size and dwarf32/64.
This patch cleans up code so that everyone is using one of the new DWARFFormValue functions:
static Optional<uint8_t> DWARFFormValue::getFixedByteSize(dwarf::Form Form, const DWARFUnit *U = nullptr);
static Optional<uint8_t> DWARFFormValue::getFixedByteSize(dwarf::Form Form, uint16_t Version, uint8_t AddrSize, bool Dwarf32);
This patch changes DWARFFormValue::skipValue() to rely on the output of DWARFFormValue::getFixedByteSize(...) instead of duplicating the code in each function. This will reduce the number of changes we need to make to DWARF to fewer places in DWARFFormValue when we add support for new form.
This patch also starts to support DWARF64 so that we can get correct byte sizes for forms that vary according the DWARF 32/64.
To reduce the code duplication a new FormSizeHelper pure virtual class was created that can be created as a FormSizeHelperDWARFUnit when you have a DWARFUnit, or FormSizeHelperManual where you manually specify the DWARF version, address byte size and DWARF32/DWARF64. There is now a single implementation of a function that gets the fixed byte size (instead of two where one took a DWARFUnit and one took the DWARF version, address byte size and DWARFFormat enum) and one function to skip the form values.
https://reviews.llvm.org/D26526
llvm-svn: 286597
This patch corresponds to review:
https://reviews.llvm.org/D26307
Adds all the intrinsics used for various conversion builtins that will
be added to altivec.h. These are type conversions between various types of
vectors.
llvm-svn: 286596
This adds support for the compare logical and trap (memory)
instructions that were added as part of the miscellaneous
instruction extensions feature with zEC12.
llvm-svn: 286587
This adds support for the LZRF/LZRG/LLZRGF instructions that were
added on z13, and uses them for code generation were appropriate.
SystemZDAGToDAGISel::tryRISBGZero is updated again to prefer LLZRGF
over RISBG where both would be possible.
llvm-svn: 286586
This adds support for the 31-to-64-bit zero extension instructions
LLGT and LLGTR and uses them for code generation where appropriate.
Since this operation can also be performed via RISBG, we have to
update SystemZDAGToDAGISel::tryRISBGZero so that we prefer LLGT
over RISBG in case both are possible. The patch includes some
simplification to the tryRISBGZero code; this is not intended
to cause any (further) functional change in codegen.
llvm-svn: 286585
Summary:
Split ReaderWriter.h which contains the APIs into both the BitReader and
BitWriter libraries into BitcodeReader.h and BitcodeWriter.h.
This is to address Chandler's concern about sharing the same API header
between multiple libraries (BitReader and BitWriter). That concern is
why we create a single bitcode library in our downstream build of clang,
which led to r286297 being reverted as it added a dependency that
created a cycle only when there is a single bitcode library (not two as
in upstream).
Reviewers: mehdi_amini
Subscribers: dlj, mehdi_amini, llvm-commits
Differential Revision: https://reviews.llvm.org/D26502
llvm-svn: 286566
This is forcing to use Error::success(), which is in a wide majority
of cases a lot more readable.
Differential Revision: https://reviews.llvm.org/D26481
llvm-svn: 286561
This is a replacement to binutils' string tool. It prints strings found in a
binary (object file, executable, or archive library). It is rather bare and
not functionally equivalent, however, it lays the groundwork necessary for the
strings tool, enabling iterative development of features to reach feature
parity.
llvm-svn: 286556
addSchedBarrierDeps() is supposed to add use operands to the ExitSU
node. The current implementation adds uses for calls/barrier instruction
and the MBB live-outs in all other cases. The use
operands of conditional jump instructions were missed.
Also added code to macrofusion to set the latencies between nodes to
zero to avoid problems with the fusing nodes lingering around in the
pending list now.
Differential Revision: https://reviews.llvm.org/D25140
llvm-svn: 286544
When a function is inlined, each instance is optimized in their own
inlining context. This can produce different remarks all pointing to
the same source line.
This adds a new column on the source view to display the inlining
context.
llvm-svn: 286537
There is no need to track dependencies for constant physregs, as they
don't change their value no matter in what order you read/write to them.
Differential Revision: https://reviews.llvm.org/D26221
llvm-svn: 286526
The NamedRegionTimer initializer without a group name puts the Timer
into the "Misc" group and is (nearly) unused. Remove it.
The only user of this constructor appears to be the HexagonGenInsert pass,
which creates a counter without group to count the complete execution
time of that pass, however since every pass gets a counter by the
PassManager anyway this should be unnecessary. Also removed the
pointless TimerGroup there.
Differential Revision: https://reviews.llvm.org/D25582
llvm-svn: 286524
The generic infrastructure to compute the Newton series for reciprocal and
reciprocal square root was conceived to allow a target to compute the series
itself. However, the original code did not properly consider this condition
if returned by a target. This patch addresses the issues to allow a target
to compute the series on its own.
Differential revision: https://reviews.llvm.org/D22975
llvm-svn: 286523
If the inrange keyword is present before any index, loading from or
storing to any pointer derived from the getelementptr has undefined
behavior if the load or store would access memory outside of the bounds of
the element selected by the index marked as inrange.
This can be used, e.g. for alias analysis or to split globals at element
boundaries where beneficial.
As previously proposed on llvm-dev:
http://lists.llvm.org/pipermail/llvm-dev/2016-July/102472.html
Differential Revision: https://reviews.llvm.org/D22793
llvm-svn: 286514
When copying to/from a constant register interferences can be ignored.
Also update the documentation for isConstantPhysReg() to make it more
obvious that this transformation is valid.
Differential Revision: https://reviews.llvm.org/D26106
llvm-svn: 286503
Currently runtime metadata is emitted as an ELF section with name .AMDGPU.runtime_metadata.
However there is a standard way to convey vendor specific information about how to run an ELF binary, which is called vendor-specific note element (http://www.netbsd.org/docs/kernel/elf-notes.html).
This patch lets AMDGPU backend emits runtime metadata as a note element in .note section.
Differential Revision: https://reviews.llvm.org/D25781
llvm-svn: 286502
This makes it possible to indent a binary blob by a certain
number of bytes, and also makes some things more idiomatic.
Finally, it integrates this binary blob formatter into ScopedPrinter
which used to have its own implementation of this algorithm.
Differential Revision: https://reviews.llvm.org/D26477
llvm-svn: 286495
The r283656 did this in the remark arguments. We also need to do this
in the main function attribute as that is written to YAML as well.
llvm-svn: 286482
This restores the part of r286297 that didn't require adding a
dependency from the Analysis to Object library. There are two parts
to the original fix, and this will address the handling for the case
where locals are used in module level asm.
The part that requires functionality in libObject handles local defs
in module level asm, and was reverted because our downstream build
of clang builds lib/Bitcode into a single library, and this new
dependency introduced a cycle there. I am trying to get that fixed
(see D26502), so for now that change isn't being restored
llvm-svn: 286475
Note that the existing metadata checking was re-added by hand because the
script doesn't currently know how to generate checks for lines outside of
functions.
llvm-svn: 286460
We were failing to extract a constant splat shift value if the shifted value was being masked.
The (shl (and (setcc) N01CV) N1CV) -> (and (setcc) N01CV<<N1CV) combine was unnecessarily preventing this.
llvm-svn: 286454
These examples are variations that were inspired from a small subgraph taken
from paper.ll which are interesting as they show certain issues with infinite
loops.
llvm-svn: 286450
The version of this instruction with the .w suffix already correctly accepts
this, but the alias without the .w did not.
Differential Revision: https://reviews.llvm.org/D26499
llvm-svn: 286446
Removing the limitation in visitInsertElementInst() causes several regressions
because we're not prepared to fold sequences of shuffles or inserts and extracts
separated by shuffles. Fixing that appears to be a difficult mission because we
are purposely trying to avoid creating shuffles with arbitrary shuffle masks
because some targets may choke on those.
https://llvm.org/bugs/show_bug.cgi?id=30923
llvm-svn: 286423
Summary: This adds all of the CodeGen tests which currently pass.
Reviewers: arsenm, kparzysz
Subscribers: japaric, wdng
Differential Revision: https://reviews.llvm.org/D26388
llvm-svn: 286418
No testcase included because I can't figure out how to reduce it.
(It's easy to write a testcase where rotation clones an assume,
but that doesn't actually seem to trigger the crash in opt on
its own; maybe an issue with the laziness?)
Differential Revision: https://reviews.llvm.org/D26434
llvm-svn: 286410
Summary:
The change will test the change in r286159.
The idea behind the change: Make the dbg location different between loop header and preheader/exit. Originally, dbg location 21 exists in 3 BBs: preheader, header, critical edge (exit). Update the debug location of inside the loop header from !21 to !22 so that it will reflect the correct location.
Reviewers: probinson
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D26428
llvm-svn: 286403
../tools/llvm-extract/llvm-extract.cpp: In function ‘int main(int, char**)’:
warning: ISO C++ forbids zero-size array ‘argv’ [-Wpedantic]
GCC reference bug https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61259
llvm-svn: 286396
Summary:
Unrolled Loop Size calculations moved to a function.
Constant representing number of optimized instructions
when "back edge" becomes "fall through" replaced with
variable.
Some comments added.
Reviewers: mzolotukhin
Differential Revision: http://reviews.llvm.org/D21719
From: Evgeny Stupachenko <evstupac@gmail.com>
llvm-svn: 286389
Suspected to be the cause of a sanitizer-windows bot failure:
Assertion failed: isImm() && "Wrong MachineOperand accessor", file C:\b\slave\sanitizer-windows\llvm\include\llvm/CodeGen/MachineOperand.h, line 420
llvm-svn: 286385
A relocatable immediate is either an immediate operand or an operand that
can be relocated by the linker to an immediate, such as a regular symbol
in non-PIC code.
Start using relocImm for 32-bit and 64-bit MOV instructions, and for operands
of type "imm32_su". Remove a number of now-redundant patterns.
Differential Revision: https://reviews.llvm.org/D25812
llvm-svn: 286384
For pairs of 32-bit registers: isub_lo, isub_hi.
For pairs of vector registers: vsub_lo, vsub_hi.
Add generic subreg indices: ps_sub_lo, ps_sub_hi, and a function
HexagonRegisterInfo::getHexagonSubRegIndex(RegClass, GenericSubreg)
that returns the appropriate subreg index for RegClass.
llvm-svn: 286377
Summary:
All changes are pretty straight-forward. I chose to use TimePoints with
second precision, as that is all that seems to be required here.
Reviewers: friss, zturner
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D25908
llvm-svn: 286358
The name/comment of the third argument to the ScheduleDAGMI constructor
is RemoveKillFlags and not IsPostRA. Only the comments are changed.
Review: A Trick
llvm-svn: 286350
Scalar Evolution asserts when not all the operands of an Add Recurrence
Expression are loop invariants. Loop Strength Reduction should only
create affine Add Recurrences, so that both the start and the step of
the expression are loop invariants.
Differential Revision: https://reviews.llvm.org/D26185
llvm-svn: 286347
This patch adds support for fptoui to 2i32 from both 2f64 and 2f32, building on Simon's change for the signed version in r284459 and using AVX-512 instructions.
If we don't have VLX support we need to use a 512-bit operation for v2f64->v2i32 and extract the result.
It also recognises that cvttpd2udq zeroes the upper 64-bits of the xmm result.
Differential Revision: https://reviews.llvm.org/D26331
llvm-svn: 286345
Summary: This allows the SSE intrinsic to use the EVEX instruction when available. It also fixes EVEX to not use a weird (v4i32 (fp_to_sint v2f64)) node and it merges some isel patterns. This also fixes some cases that weren't combining vzmovl with cvttpd2dq to remove extra moves.
Reviewers: delena, zvi, RKSimon
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D26330
llvm-svn: 286344
Summary:
This is needed to make the v64i8 and v32i16 types legal for the 512-bit VBMI instructions. Fixes PR30912.
Reviewers: delena, zvi
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D26322
llvm-svn: 286339
The BitcodeReader no longer produces BitcodeDiagnosticInfo diagnostics.
The only remaining reference was in the gold plugin; the code there has been
dead since we stopped producing InvalidBitcodeSignature error codes in r225562.
While at it remove the InvalidBitcodeSignature error code.
llvm-svn: 286326
Summary: For functions with profile data, we are confident that loop sink will be optimal in sinking code.
Reviewers: davidxl, hfinkel
Subscribers: mehdi_amini, mzolotukhin, llvm-commits
Differential Revision: https://reviews.llvm.org/D26155
llvm-svn: 286325
Summary:
This is the initial version of the documentation for how to use XRay as
it stands in LLVM, Clang, and compiler-rt. We leave some room for later
expansion mentioining what is work in progress and what could be
expected moving forward.
We also give a high level overview of future work that's both ongoing
and planned.
Reviewers: echristo, dblaikie, chandlerc
Subscribers: mehdi_amini, llvm-commits
Differential Revision: https://reviews.llvm.org/D26386
llvm-svn: 286319
The smallest tests that expose this are codegen tests (because SelectionDAGBuilder::visitSelect() uses matchSelectPattern
to create UMAX/UMIN nodes), but it's also possible to see the effects in IR alone with folds of min/max pairs.
If these were written as unsigned compares in IR, InstCombine canonicalizes the unsigned compares to signed compares.
Ie, running the optimizer pessimizes the codegen for this case without this patch:
define <4 x i32> @umax_vec(<4 x i32> %x) {
%cmp = icmp ugt <4 x i32> %x, <i32 2147483647, i32 2147483647, i32 2147483647, i32 2147483647>
%sel = select <4 x i1> %cmp, <4 x i32> %x, <4 x i32> <i32 2147483647, i32 2147483647, i32 2147483647, i32 2147483647>
ret <4 x i32> %sel
}
$ ./opt umax.ll -S | ./llc -o - -mattr=avx
vpmaxud LCPI0_0(%rip), %xmm0, %xmm0
$ ./opt -instcombine umax.ll -S | ./llc -o - -mattr=avx
vpxor %xmm1, %xmm1, %xmm1
vpcmpgtd %xmm0, %xmm1, %xmm1
vmovaps LCPI0_0(%rip), %xmm2 ## xmm2 = [2147483647,2147483647,2147483647,2147483647]
vblendvps %xmm1, %xmm0, %xmm2, %xmm0
Differential Revision: https://reviews.llvm.org/D26096
llvm-svn: 286318
As the test change shows, we can increase the critical path by adding
a 'not' instruction, so make sure that we're actually removing an
instruction if we do this transform.
This transform could also cause us to miss folds of min/max pairs.
llvm-svn: 286315
Previously support had been added for using CodeViewRecordIO
to read (deserialize) CodeView type records. This patch adds
support for writing those same records. With this patch,
reading and writing of CodeView type records finally uses a single
codepath.
Differential Revision: https://reviews.llvm.org/D26253
llvm-svn: 286304
if it is more specific than the one in its DW_AT_specification.
If a static member is an array, the translation unit containing the
member definition may have a more specific type (including its length)
than TUs only seeing the class declaration. This patch adds a
DW_AT_type to the member's DW_TAG_variable in addition to the
DW_AT_specification in these cases. The member type in the
DW_AT_specification still shows the more generic type (without the
length) to avoid defeating type uniquing.
The DWARF standard discourages “duplicating” a DW_AT_type in a member
variable definition but doesn’t explicitly forbid it. Having the more
specific type (with the array length) available is what allows the
debugger to print the contents of a static array member variable.
https://reviews.llvm.org/D26368
rdar://problem/28706946
llvm-svn: 286302
Summary:
There are two variables here that break. This change constrains both of them to
debug builds (via DEBUG() or #ifndef NDEBUG).
Reviewers: bkramer, t.p.northover
Subscribers: mehdi_amini, vkalintiris
Differential Revision: https://reviews.llvm.org/D26421
llvm-svn: 286300
Summary:
This patch uses the same approach added for inline asm in r285513 to
similarly prevent promotion/renaming of locals used or defined in module
level asm.
All static global values defined in normal IR and used in module level asm
should be included on either the llvm.used or llvm.compiler.used global.
The former were already being flagged as NoRename in the summary, and
I've simply added llvm.compiler.used values to this handling.
Module level asm may also contain defs of values. We need to prevent
export of any refs to local values defined in module level asm (e.g. a
ref in normal IR), since that also requires renaming/promotion of the
local. To do that, the summary index builder looks at all values in the
module level asm string that are not marked Weak or Global, which is
exactly the set of locals that are defined. A summary is created for
each of these local defs and flagged as NoRename.
This required adding handling to the BitcodeWriter to look at GV
declarations to see if they have a summary (rather than skipping them
all).
Finally, added an assert to IRObjectFile::CollectAsmUndefinedRefs to
ensure that an MCAsmParser is available, otherwise the module asm parse
would silently fail. Initialized the asm parser in the opt tool for use
in testing this fix.
Fixes PR30610.
Reviewers: mehdi_amini
Subscribers: johanengelen, krasin, llvm-commits
Differential Revision: https://reviews.llvm.org/D26146
llvm-svn: 286297
This addresses PR30746, <https://llvm.org/bugs/show_bug.cgi?id=30746>. The ASan pass iterates over entry-block instructions and checks each alloca whether it's in NonInstrumentedStaticAllocaVec, which is apparently slow. This patch gathers the instructions to move during visitAllocaInst.
Differential Revision: https://reviews.llvm.org/D26380
llvm-svn: 286296
Summary:
We've had support for auto upgrading old style scalar TBAA access
metadata tags into the "new" struct path aware TBAA metadata for 3 years
now. The only way to actually generate old style TBAA was explicitly
through the IRBuilder API. I think this is a good time for dropping
support for old style scalar TBAA.
I'm not removing support for textual or bitcode upgrade -- if you have
IR with the old style scalar TBAA tags that go through the AsmParser orf
the bitcode parser before LLVM sees them, they will keep working as
usual.
Note:
%val = load i32, i32* %ptr, !tbaa !N
!N = < scalar tbaa node >
is equivalent to
%val = load i32, i32* %ptr, !tbaa !M
!N = < scalar tbaa node >
!M = !{!N, !N, 0}
Reviewers: manmanren, chandlerc, sunfish
Subscribers: mcrosier, llvm-commits, mgorny
Differential Revision: https://reviews.llvm.org/D26229
llvm-svn: 286291
After instruction selection we perform some checks on each VReg just before
discarding the type information. These checks were assertions before, but that
breaks the fallback path so this patch moves the logic into the main flow and
reports a better error on failure.
llvm-svn: 286289
This completes assembler / disassembler support for all BFP
instructions provided by the floating-point extensions facility.
The instructions added here are not currently used for codegen.
llvm-svn: 286285
Add several instructions that operate on the program mask
or the addressing mode. These are not really needed for
code generation under Linux, but are provided for completeness
for the assembler/disassembler.
llvm-svn: 286284
Add the 16 access registers as LLVM registers. This allows removing
a lot of special cases in the assembler and disassembler where we
were handling access registers; this can all just use the generic
register code now.
Also add a bunch of instructions to operate on access registers,
for assembler/disassembler use only. No change in code generation
intended.
llvm-svn: 286283
Since IMPLIFIT_DEF instructions are omitted in the output, when the output
of an IMPLICIT_DEF instruction is stackified, the resulting register lacks
an explicit push, leading to a push/pop mismatch. Fix this by converting
such IMPLICIT_DEFs into CONST_I32 0 instructions so that they have explicit
pushes.
llvm-svn: 286274
Erasing reverse_iterators is problematic; iterate manually.
While there, keep track of the range of inserted instructions.
It can miss instructions inserted elsewhere, but those are harder
to track.
Differential Revision: http://reviews.llvm.org/D22924
llvm-svn: 286272
For example, it invalidates the domtree, causing assertions
in later passes which need dominator infos. Make it preserve
GlobalsAA, as suggested by Eli.
Differential Revision: https://reviews.llvm.org/D26381
llvm-svn: 286271
Define a couple of additional semantic classes and use them
throughout the .td files to make them more consistent and
more easily readable.
No functional change.
llvm-svn: 286268
This changes the InstRR (and related) patterns to no longer
automatically add an "r" at the end of the mnemonic. This
makes the .td files more obviously understandable, and also
allows using the patterns for those few instructions that
do not follow the *r scheme.
Also add some more sub-formats of the RRF format class, to
match operand names and sequence from the PoP better.
No functional change.
llvm-svn: 286267
Now that we've added instruction format subclasses like
InstRIb, it makes sense to rename the old InstRI to InstRIa.
Similar for InstRX, InstRXY, InstRS, InstRSY, and InstSS.
No functional change.
llvm-svn: 286266
Rework patterns for branches, call & return instructions,
compare-and-branch, compare-and-trap, and conditional move
instructions.
In particular, simplify creation of patterns for the extended
opcodes of instructions that take a CC mask.
Also, use semantical instruction classes for all the instructions
instead of open-coding them in SystemZInstrInfo.td.
Adds a couple of the basic branch instructions (that are unused
for codegen) for the assembler/disassembler.
llvm-svn: 286263