Summary: Starting from GCN 2nd generation, ISA supports ds_read_b128 on top of ds_read_b64.
This patch supports ds_read_b128 instruction pattern and generation of this instruction.
In the vectorizer, this patch also widen the vector length so that vectorizer generates
128 bit loads for local address-space which gets translated to ds_read_b128.
Since the performance benefit is not clear; compiler generates ds_read_b128 under -amdgpu-ds128.
Author: FarhanaAleen
Reviewed By: rampitec, arsenm
Subscribers: llvm-commits, AMDGPU
Differential Revision: https://reviews.llvm.org/D44210
llvm-svn: 327153
Added helpers to build G_FCONSTANT, along with matching ConstantFP and
unit tests for the same.
Sample usage.
auto MIB = Builder.buildFConstant(s32, 0.5); // Build IEEESingle
For Matching the above
const ConstantFP* Tmp;
mi_match(DstReg, MRI, m_GFCst(Tmp));
https://reviews.llvm.org/D44128
reviewed by: volkan
llvm-svn: 327152
In r263618, JumpThreading learned to look trough simple cast instructions, but
only if the source of those cast instructions was a phi/cmp i1 (in an effort to
limit compile time effects). I think this condition is too restrictive. For
switches with limited value range, InstCombine will readily introduce an extra
trunc instruction to a smaller integer type (e.g. from i8 to i2), leaving us in
the somewhat perverse situation that jump-threading would work before running
instcombine, but not after. Since instcombine produces this pattern, I think we
need to consider it canonical and support it in JumpThreading. In general,
for limiting recursion, I think the existing restriction to phi and cmp nodes
should be sufficient to avoid looking through unprofitable chains of
instructions.
Patch by Keno Fischer!
Differential Revision: https://reviews.llvm.org/D42262
llvm-svn: 327150
Also, fix the undef vs. UB example to use 'sdiv' because that can trigger div-by-zero UB.
The existing text for the constrained intrinsics says:
"By default, LLVM optimization passes assume that the rounding mode is round-to-nearest
and that floating point exceptions will not be monitored. Constrained FP intrinsics are
used to support non-default rounding modes and accurately preserve exception behavior
without compromising LLVM’s ability to optimize FP code when the default behavior is
used."
...so the additional text with the normal FP opcodes should make the different modes
clear.
Differential Revision: https://reviews.llvm.org/D44216
llvm-svn: 327138
We improved the handling of errors and warnings in dwarfdump's verifier
in rL314498. This patch does the same thing for dsymutil.
Differential revision: https://reviews.llvm.org/D44052
llvm-svn: 327137
lib/WindowsManifest/CMakeLists.txt adds it to LLVM_SYSTEM_LIBS on that
target, but it was never getting picked up in
tools/llvm-config/CMakeLists.txt.
Differential Revision: https://reviews.llvm.org/D44302
llvm-svn: 327135
The code to match and produce more x86 vector blends was enabled for all
architectures even though the transform may pessimize the code for other
architectures that do not provide a vector blend instruction.
Added an aarch64 testcase to check that a VZIP instruction is generated instead
of byte movs.
Differential Revision: https://reviews.llvm.org/D44118
llvm-svn: 327132
Allows capturing a list of concrete instantiated defs.
This can be combined with foreach to create parallel sets of def
instantiations with less repetition in the source. This purpose is
largely also served by multiclasses, but in some cases multiclasses
can't be used.
The motivating example for this change is having a large set of
intrinsics, which are generated from the IntrinsicsBackend.td file
included by Intrinsics.td, and a corresponding set of instruction
selection patterns, which are generated via the backend's .td files.
Multiclasses cannot be used to eliminate the redundancy in this case,
because a multiclass cannot span both LLVM's common .td files and
the backend .td files at the same time.
Change-Id: I879e35042dceea542a5e6776fad23c5e0e69e76b
Differential revision: https://reviews.llvm.org/D44109
llvm-svn: 327121
The changes to FieldInit are required to make field references (Def.field)
work inside a ForeachDeclaration: previously, Def.field wasn't resolved
immediately when Def was already a fully resolved DefInit.
Change-Id: I9875baec2fc5aac8c2b249e45b9cf18c65ae699b
llvm-svn: 327120
Use the default ParseValueMode instead of ParseForeachMode when
parsing the rule
ForeachDeclaration ::= ID '=' '[' ValueList ']'
because the only difference between the two is how an open brace '{'
is handled at the end. In the context of foreach, the 'in' keyword
will appear after the ForeachDeclaration, so this special handling
of '{' is not required.
Change-Id: I4d86bb73bab9ec26752e1273e5213df77cf28d1d
llvm-svn: 327119
Summary:
Even though the getDIEOffset offset function was common for the two
accelerator table implementations, it was doing two different things:
for the Apple tables, it was returning the die offset relative to the
start of the section, whereas for DWARF v5 tables, it was relative to
the start of the CU.
I resolve this by renaming the function to getDIESectionOffset to make
it obvious what the function returns, and change the DWARF
implementation to return the section offset. I also keep the CU-relative
accessor, but only in the DWARF implementation (there is no way to get
this information for the Apple tables). This was not caught by existing
tests because the hand-written inputs also erroneously used section
offsets instead of CU-relative ones.
While looking at this, I noticed that the Apple implementation was not
fully correct either -- the header contains a DIEOffsetBase field, which
should be added to offsets encoded with the DW_FORM_ref*** family, but
this was not being used. This went unnoticed because all current writers
set this field to zero anyway. I fix this as well and add a hand-written
test which demonstrates the issue.
Reviewers: JDevlieghere, dblaikie
Subscribers: aprantl, llvm-commits
Differential Revision: https://reviews.llvm.org/D44202
llvm-svn: 327116
Fixes PR36311.
See more detailed analysis in
https://bugs.llvm.org/show_bug.cgi?id=36311.
isUniform() information is recomputed after LV started transforming the
underlying IR and that triggered an assert in SCEV.
From vectorizer's architectural perspective, such information, while
still useful in vector code gen, should not be recomputed after the
start of transforming the LLVM IR. Instead, we should collect and cache
such information during the analysis phase of LV and use the cached info
during code gen.
From the symptom perspective, this assert as it stands right now is not
very useful. Legality already rejected loops that would trigger the
assert. As such, commenting out the assert is NFC from vectorizer's
functionality perspective. On top of that, just above the assertion, we
check for unit-strided load/store or
gather scatter. Addresses can't be uniform below that check.
From vectorization theory point of view, we don't have to reject all
cases of stores to uniform addresses. Eventually, we should support
safe/profitable cases.
This patch resolves the issue by removing the useless assertion that is
invoking LAA's isUniform() that requires up-to-date DomTree ---- once
vector code gen starts modifying CFG, we don't have an up-to-date
DomTree.
Patch by Hideki Saito <hideki.saito@intel.com>.
llvm-svn: 327109
Move the DWARF syntax highlighting into support. This has several
advantages, most notably that this makes the WithColor RAII wrapper
available outside libDebugInfo. Furthermore, several projects all have
their own code for handling colored output. This provides a place to
centralize it.
Differential revision: https://reviews.llvm.org/D44215
llvm-svn: 327108
This patch starts simplifying the handling of .symver.
For now it just moves the responsibility for creating an alias down to
the streamer. With that the asm streamer can pass a .symver unchanged,
which is nice since gas cannot parse "foo@bar = zed".
In a followup I hope to move the handling down to the writer so that
we don't need special hacks for avoiding breaking names with @@@ on
windows.
llvm-svn: 327101
Previously we unpacked the even bytes of each input into the high byte of 16-bit elements then did an v8i16 arithmetic shift right by 8 bits to fill the upper bits of each word with sign bits. Then we did the v8i16 multiply and then masked to zero the upper 8-bits of each result. The similar was done for all the odd bytes. The results are then packed together with packuswb
Since we are masking each multiply result element to 8-bits, and those 8-bits are determined only by the lower 8-bits of each of the inputs, we don't need to fill the upper bits with sign bits. So we can just unpack into the low byte of each element and treat the upper bits as garbage. This is what gcc also does.
Differential Revision: https://reviews.llvm.org/D44267
llvm-svn: 327093
There is no point in lowering a dbg.declare describing an alloca that
has volatile loads or stores as users, since the alloca cannot be
elided. Lowering the dbg.declare will result in larger debug info that
may also have worse coverage than just describing the alloca.
rdar://problem/34496278
llvm-svn: 327092
On Windows, if the substitution contains a back reference, it would
removed due to the replacement of the escape character in lit. Create a
helper class to avoid this which will simply ignore the replacement and
mark the substitution as having capture groups being referenced.
llvm-svn: 327082
This patch is a fix for PR36642.
While legalizing long vector types, make sure the smaller types get the
flags of the wider type.
bugzilla link: https://bugs.llvm.org/show_bug.cgi?id=36642
Change-Id: I0c2829639f094c862c10a6b51b342d4c2563e1fa
llvm-svn: 327079
from core files. I tested this against the couple of core files that were
getting errors about unknown thread flavors and it now produce the same output as
the Xcode otool-classic(1) tool. Since the core files are huge I didn’t include
them as test cases.
rdar://38216356
llvm-svn: 327077
We were effectively overriding an explicit '.file' directive with info
for the assembler source. That shouldn't happen.
Fixes PR36636.
Differential Revision: https://reviews.llvm.org/D44265
llvm-svn: 327073
Summary:
This patch adds the DW_AT_byte_size dwarf attribute to vectors.
This fixes PR21924
LLVM will round a vector up to the next alignable address, which can result in
the vector's representation in the object file being larger than what the
debugger will calculate via NumberOfElements * ElementSize. In such a case calling sizeof(MyVec) in the source will result in a different value than what a debugger might present. This situation can occur because LLVM permits non-power of two 'vector_size' attributes.
Reviewers: echristo, dexonsmith, aprantl
Reviewed By: aprantl
Subscribers: probinson, aprantl, llvm-commits, JDevlieghere
Tags: #debug-info
Differential Revision: https://reviews.llvm.org/D44048
llvm-svn: 327072