The 96-bit results need to be widened.
I find the interaction between LegalizerHelper and MIRBuilder somewhat
awkward. The custom legalization is called by the LegalizerHelper, but
then does not have access to the helper. You have to construct a new
helper, which then does not own the MachineIRBuilder, but does modify
it. Maybe custom legalization should be passed the helper?
Summary:
Currently type test assume sequences inserted for devirtualization are
removed during WPD. This patch delays their removal until later in the
optimization pipeline. This is an enabler for upcoming enhancements to
indirect call promotion, for example streamlined promotion guard
sequences that compare against vtable address instead of the target
function, when there are small number of possible vtables (either
determined via WPD or by in-progress type profiling). We need the type
tests to correlate the callsites with the address point offset needed in
the compare sequence, and optionally to associated type summary info
computed during WPD.
This depends on work in D71913 to enable invocation of LowerTypeTests to
drop type test assume sequences, which will now be invoked following ICP
in the ThinLTO post-LTO link pipelines, and also after the existing
export phase LowerTypeTests invocation in regular LTO (which is already
after ICP). We cannot simply move the existing import phase
LowerTypeTests pass later in the ThinLTO post link pipelines, as the
comment in PassBuilder.cpp notes (it must run early because when
performing CFI other passes may disturb the sequences it looks for).
This necessitated adding a new type test resolution "Unknown" that we
can use on the type test assume sequences previously removed by WPD,
that we now want LTT to ignore.
Depends on D71913.
Reviewers: pcc, evgeny777
Subscribers: mehdi_amini, Prazek, hiraditya, steven_wu, dexonsmith, arphaman, davidxl, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D73242
STL Algorithms are usually implemented in a tricky for performance
reasons which is too complicated for the analyzer. Furthermore inlining
them is costly. Instead of inlining we should model their behavior
according to the specifications.
This patch is the first step towards STL Algorithm modeling. It models
all the `find()`-like functions in a simple way: the result is either
found or not. In the future it can be extended to only return success if
container modeling is also extended in a way the it keeps track of
trivial insertions and deletions.
Differential Revision: https://reviews.llvm.org/D70818
The adjusted iterator range included the last we just inserted, and
don't want to process. Figure out the new iterator range before
inserting phis. This was a harmless problem, but added an unnecessary
complication for a future patch.
If we have s_pack_* instructions, legalize this to
G_BUILD_VECTOR_TRUNC from s32 elements. This is closer to how how the
s_pack_* instructions really behave.
If we don't have s_pack_ instructions, expand this by creating a merge
to s32 and bitcasting. This expands to the expected bit operations. I
think this eventually should go in a new bitcast legalize action type
in LegalizerHelper.
We already directly emit the shift operations in RegBankSelect for the
vector case. This could possibly be cleaned up, but I also may want to
defer doing this expansion to selection anyway. I'll see about that
when I try to actually match VOP3P instructions.
This breaks the selection of the build_vector since tablegen doesn't
know how to match G_BUILD_VECTOR_TRUNC yet, so just xfail it for now.
When a thread stops, this checks depending on the platform if the top frame is
an abort stack frame. If so, it looks for an assert stack frame in the upper
frames and set it as the most relavant frame when found.
To do so, the StackFrameRecognizer class holds a "Most Relevant Frame" and a
"cooked" stop reason description. When the thread is about to stop, it checks
if the current frame is recognized, and if so, it fetches the recognized frame's
attributes and applies them.
rdar://58528686
Differential Revision: https://reviews.llvm.org/D73303
Signed-off-by: Med Ismail Bennani <medismail.bennani@gmail.com>
Apply the fix of f780e15caf ("[OpenCL] Fix support for
cl_khr_mipmap_image_writes", 2020-01-27) also to the TableGen OpenCL
builtin function definitions.
As detailed on PR43462, clang static analyzer is complaining about a null pointer dereference as we provide a 'host' toolchain fallback if the ToolChain pointer is null, but then use that pointer anyhow to report the triple.
Tests indicate the ToolChain pointer is always valid and the 'host' code path is redundant.
Differential Revision: https://reviews.llvm.org/D74046
Once we have created a tail-predicated hardware-loop, and thus know the number
of elements that are processed, we want to clean-up the iteration count
expression of that loop. In D73682, we bailed the analysis on conditionally
executed instructions. This adds support for IT-blocks, so that we can handle
these cases again. The restriction is that we only support IT blocks containing
1 statement, but that seems to cover most cases and forms of the iteration
count expression.
Differential Revision: https://reviews.llvm.org/D73947
Summary:
To use new/delete in NVPTX code we need to define them. Implementation
copied from CUDA wrappers.
Reviewers: hfinkel, jdoerfert
Subscribers: mgorny, guansong, kkwli0, caomhin, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D73128
Summary:
It is often needed to map entire ranges rather than single values. To avoid
writing the same for loop every time, I have added an overload to the map
method.
Differential Revision: https://reviews.llvm.org/D73894
Field NumMicroOpcodes is currently used by mca to model the number of uOPs
dispatched from the uOp-Queue to the out of order backend. From a 'dispatch'
point of view, an instruction with zero opcodes is still valid; it simply
doesn't consume any dispatch group slots.
However, mca doesn't expect an instruction with zero uOPs to consume pipeline
resources because it is seen as a contradiction. In practice, it only makes
sense if such an instruction is eliminated and never really executed. It may be
that mca is being too conservative here. However I believe that mca is right,
and we should probably check that inconsistency in CodeGenSchedule.cpp (when we
also verify scheduling classes in general).
This patch removes the check for MayLoad and MayStore in mca. That check is
probably too conservative: we are already checking if a zero-uops instruction
consumes any processor resources. Note also that instructions with unmodelled
side-effects also tend to set the MayLoad/MayStore flags even if - theoretically
speaking - they might not even consume any hw resources in practice.
In future we may want to implement different checks (possibly outside of mca)
and potentially revisit the logic in mca that verifies instructions.
For that reason I have raised PR44797.
Checking that the use-def chain that performs the loop count
isSafeToRemove is not sufficient because it means that we can
remove register copies that we need to restore lr to its correct
value. This change now prevents the transform from kicking in for the
'remove-elem-moves' test which needs to addressed later on.
Differential Revision: https://reviews.llvm.org/D74037
While validating each MVE instruction, check that all instructions
that touch memory are somehow predicated upon the VCTP.
Differential Revision: https://reviews.llvm.org/D73616
Changing the date2 to an timezone independent value broke the test as the data formatters
uses the current time zone for the summary (so changing it to a time zone independent value
would again break the test in some time zones). We anyway just care about this for date2
which will be printed in a timezone-independent summary.
Introduce support for i386 platform that is shared with amd64
in the same plugin. The concept is partially based on the Linux
implementation.
The plugin tries to reuse as much code as possible. As a result, i386
register enums are mapped into amd64 values and those are used in actual
code. The code for accessing FPU and debug registers is shared,
although general-purpose register layouts do not match between the two
kernel APIs and need to be #ifdef-ed.
This layout will also make it possible to add support for debugging
32-bit programs on amd64 with minimal added code.
In order for this to work, I had to add missing data for debug registers
on i386.
Differential Revision: https://reviews.llvm.org/D73802
Summary:
This test creates its dates with `NSDate dateWithNaturalLanguageString` which is deprecated and uses the current time zone of the machine to
interpret the input string. This causes that the created NSDate has a different value depending on the locale of the machine
and we hardcoded the value for California's time zone (PST) but the data formatter gives out the GMT value as a string.
This just replaces the use with the timezone-independent dateWithTimeIntervalSince1970 (which we also use in the rest of the test)
to make this pass independently of the time zone of the machine running the test.
Reviewers: mib
Reviewed By: mib
Subscribers: lldb-commits, JDevlieghere
Tags: #lldb
Differential Revision: https://reviews.llvm.org/D74038
The disassembler of the AVR backend is incomplete: most instructions do
not correctly disassemble yet.
This patch is the first in a series to add disassembly support to the
AVR backend. It starts with adding disassembler tests for instructions
that already disassemble correctly.
Differential Revision: https://reviews.llvm.org/D73911
Summary:
Currently having a typedef for ObjC types is breaking member access in LLDB:
```
typedef NSString Str;
NSString *s; s.length; // OK
Str *s; s.length; // Causes: member reference base type 'Str *' (aka 'NSString *') is not a structure or union
```
This works for NSString as there the type building from `NSString` -> `NSString *` will correctly
build a ObjCObjectPointerType (which is necessary to make member access with a dot possible),
but for the typedef the `Str` -> `Str *` conversion will produce an incorrect PointerType. The reason
for this is that our check in TypeSystemClang::GetPointerType is not desugaring the base type,
which causes that `Str` is not recognised as a type to a `ObjCInterface` as the check only sees the
typedef sugar that was put around it. This causes that we fall back to constructing a PointerType
instead which does not allow member access with the dot operator.
This patch just changes the check to look at the desugared type instead.
Fixes rdar://17525603
Reviewers: shafik, mib
Reviewed By: mib
Subscribers: mib, JDevlieghere, lldb-commits
Tags: #lldb
Differential Revision: https://reviews.llvm.org/D73952
scalar_to_vector takes only one argument, not two.
The a16 tests now also check the packing of coordinates into registers
Differential Revision: https://reviews.llvm.org/D73482
This should lower the amount of used registers for gfx9.
I updated some of the changed tests with the update script because
changing them by hand is tedious.
Differential Revision: https://reviews.llvm.org/D73884
Previously the description allowed to describe symbols with use of
`Name` and `Index` keys. This patch removes them and now it is still
possible to use either names or symbol indexes, but the code is simpler
and the format is slightly different.
Such a change will be useful for another patches, e.g:
https://reviews.llvm.org/D73788#inline-671077
Differential revision: https://reviews.llvm.org/D73888
We currently only handle mem instructions with a single define.
Avoid the call site parameter debug info when we find the case with
multiple defs, rather than throwing an assert.
Differential Revision: https://reviews.llvm.org/D73954
Same for any_extend though we don't have coverage for that.
The test changes are because isel didn't check one use of the
setcc_carry. So in isel we would end up with two different
sized setcc_carry instructions. And since it clobbers
the flags we would need to recreate the flags for the second
instruction.
This code handles additional uses by truncating the new wide
setcc_carry back to the original size for those uses.
When building the default builtin and runtimes target, set the
CMAKE_SYSTEM_NAME to the current one. This is not necessary on
Linux and Darwin, but it appears to be necessary on Windows,
otherwise CMake fails.
Differential Revision: https://reviews.llvm.org/D73811
XRay builds uses llvm-config to obtain the ldflags and libs and then
passes those to CMake. Unfortunately, this breaks on Windows because
CMake tries to interpret backslashes followed by certain characters
as flags. We need to rewrite these into forward slashes that are used
by CMake (even on Windows).
Differential Revision: https://reviews.llvm.org/D73523
The old version might be faster on EG (RECIP_IEEE is Trans only),
but it'd need extra corner case checks.
This gives correct corner case behaviour and saves a register.
Fixes OCL CTS sqrt test (1-thread, scalar) on Turks.
Reviewer: arsenm
Differential Revision: https://reviews.llvm.org/D74017
Summary:
For now, this ABI simply expands all possible aggregate arguments and
returns all possible aggregates directly. This ABI will change rapidly
as we prototype and benchmark a new ABI that takes advantage of
multivalue return and possibly other changes from the MVP ABI.
Reviewers: aheejin, dschuff
Subscribers: sbc100, jgravelle-google, sunfish, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D72972
Summary:
This reverts commit 3ef169e586. The
purpose of this commit was to allow stack machines to perform
instruction selection for instructions with variadic defs. However,
MachineInstrs fundamentally cannot support variadic defs right now, so
this change does not turn out to be useful.
Depends on D73927.
Reviewers: aheejin
Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D73928
Explicitly check for a request to attach to a pid that doesn't
exist, to attach to a pid that is already being debugged, unify the
SIP process check, and an attempt at checking if developer mode is
enabled on the system (which isn't working in debugserver, for some
reason; I can't get the authorization record which should be an
unprivileged operation and works in a standalone program I wrote).
I'll debug the developer mode check later, but I wanted to land it
along with everything else; right now it will claim that developer
mode is always enabled so it's harmless to include as-is.
This binplaces `mlir-translate`, `mlir-cuda-runner`, and `mlir-cpu-runner` when building the CMake install target.
Differential Revision: https://reviews.llvm.org/D73986