There is an explicit option for the lexer to support this, but we crash
when `-preserve-comments` is enabled because it checks for
`getTok().getString().empty()` to detect the case. This doesn't
work currently because the lexer reports this case as a string of length
1, containing a null byte.
Change the lexer to instead report this case via an empty string, as the
null terminator isn't logically a part of the textual input, and the
check for `.empty()` seems natural and obvious in the calling code.
Reviewed By: niravd
Differential Revision: https://reviews.llvm.org/D92681
Rather than creating a series of associated calls and ensuring that
everything is lined up, use a table driven approach that ensures that
they two always stay in sync.
Avoids spurious newlines showing up in the output when emitting assembly
via MC.
Reviewed By: MaskRay, arsenm
Differential Revision: https://reviews.llvm.org/D92690
Pretty sure we meant to be checking signed 32 immediates here
rather than unsigned 32 bit. I suspect I messed this up because
in MathExtras.h we have isIntN and isUIntN so isIntN differs in
signedness depending on whether you're using APInt or plain integers.
This fixes a case where we didn't fold a constant created
by shrinkAndImmediate. Since shrinkAndImmediate doesn't topologically
sort constants it creates, we can fail to convert the Constant
to a TargetConstant. This leads to very strange behavior later.
Fixes PR48458.
Revert part of https://reviews.llvm.org/D92084 to make it simpler to
start consuming the EndOfStatement token within AMDGPU's
ParseInstruction in a future patch. This also brings us back to what
every other target currently does.
A future change to move the position back to the end of the statement
would likely need to audit all of the AMDGPUOperand SMLoc ranges, and
determine the SMLoc for the last character of the last operand.
Reviewed By: dp
Differential Revision: https://reviews.llvm.org/D92960
Add builtins required to implement vcmla and rotated variants from
the ACLE
Reviewed By: t.p.northover
Differential Revision: https://reviews.llvm.org/D92929
*************
* The problem
*************
See motivation examples in compiler-rt/test/dfsan/pair.cpp. The current
DFSan always uses a 16bit shadow value for a variable with any type by
combining all shadow values of all bytes of the variable. So it cannot
distinguish two fields of a struct: each field's shadow value equals the
combined shadow value of all fields. This introduces an overtaint issue.
Consider a parsing function
std::pair<char*, int> get_token(char* p);
where p points to a buffer to parse, the returned pair includes the next
token and the pointer to the position in the buffer after the token.
If the token is tainted, then both the returned pointer and int ar
tainted. If the parser keeps on using get_token for the rest parsing,
all the following outputs are tainted because of the tainted pointer.
The CL is the first change to address the issue.
**************************
* The proposed improvement
**************************
Eventually all fields and indices have their own shadow values in
variables and memory.
For example, variables with type {i1, i3}, [2 x i1], {[2 x i4], i8},
[2 x {i1, i1}] have shadow values with type {i16, i16}, [2 x i16],
{[2 x i16], i16}, [2 x {i16, i16}] correspondingly; variables with
primary type still have shadow values i16.
***************************
* An potential implementation plan
***************************
The idea is to adopt the change incrementially.
1) This CL
Support field-level accuracy at variables/args/ret in TLS mode,
load/store/alloca still use combined shadow values.
After the alloca promotion and SSA construction phases (>=-O1), we
assume alloca and memory operations are reduced. So if struct
variables do not relate to memory, their tracking is accurate at
field level.
2) Support field-level accuracy at alloca
3) Support field-level accuracy at load/store
These two should make O0 and real memory access work.
4) Support vector if necessary.
5) Support Args mode if necessary.
6) Support passing more accurate shadow values via custom functions if
necessary.
***************
* About this CL.
***************
The CL did the following
1) extended TLS arg/ret to work with aggregate types. This is similar
to what MSan does.
2) implemented how to map between an original type/value/zero-const to
its shadow type/value/zero-const.
3) extended (insert|extract)value to use field/index-level progagation.
4) for other instructions, propagation rules are combining inputs by or.
The CL converts between aggragate and primary shadow values at the
cases.
5) Custom function interfaces also need such a conversion because
all existing custom functions use i16. It is unclear whether custome
functions need more accurate shadow propagation yet.
6) Added test cases for aggregate type related cases.
Reviewed-by: morehouse
Differential Revision: https://reviews.llvm.org/D92261
This method previously always recursively checked both the left-hand
side and right-hand side of binary operations for splatted (broadcast)
vector values to determine if the parent DAG node is a splat.
Like several other SelectionDAG methods, limit the recursion depth to
MaxRecursionDepth (6). This prevents stack overflow.
See also https://issuetracker.google.com/173785481
Patch by Nicolas Capens. Thanks!
Differential Revision: https://reviews.llvm.org/D92421
To support llorg builds this patch provides the following changes:
1) Added cmake variable ITTAPI_GIT_REPOSITORY to control the location of ITTAPI repository.
Default value of ITTAPI_GIT_REPOSITORY is github location: https://github.com/intel/ittapi.git
Also, the separate cmake variable ITTAPI_GIT_TAG was added for repo tag.
2) Added cmake variable ITTAPI_SOURCE_DIR to control the place where the repo will be cloned.
Default value of ITTAPI_SOURCE_DIR is build area: PROJECT_BINARY_DIR
Reviewed By: etyurin, bader
Patch by ekovanov.
Differential Revision: https://reviews.llvm.org/D91935
This is an enhancement to load vectorization that is motivated by
a pattern in https://llvm.org/PR16739.
Unfortunately, it's still not enough to make a difference there.
We will have to handle multi-use cases in some better way to avoid
creating multiple overlapping loads.
Differential Revision: https://reviews.llvm.org/D92858
Add vfmk intrinsic instructions, a few pseudo instructions to expand
vfmk intrinsic using VM512 correctly, and regression tests.
Reviewed By: simoll
Differential Revision: https://reviews.llvm.org/D92758
For stores chain vectorization we choose the size of vector
elements to ensure we fit to minimum and maximum vector register
size for the number of elements given. This patch corrects vector
element size choosing the width of value truncated just before
storing instead of the width of value stored.
Fixes PR46983
Differential Revision: https://reviews.llvm.org/D92824
If a function parameter is marked as "undef", prevent creation
of CallSiteInfo for that parameter.
Without this patch, the parameter's call_site_value would be incorrect.
The incorrect call_value case reported in PR39716,
addressed in D85111.
Patch by Nikola Tesic
Differential revision: https://reviews.llvm.org/D92471
This patch adds the following DAGCombines, which apply if isVectorLoadExtDesirable() returns true:
- fold (and (masked_gather x)) -> (zext_masked_gather x)
- fold (sext_inreg (masked_gather x)) -> (sext_masked_gather x)
LowerMGATHER has also been updated to fetch the LoadExtType associated with the
gather and also use this value to determine the correct masked gather opcode to use.
Reviewed By: sdesmalen
Differential Revision: https://reviews.llvm.org/D92230
* Steps are scaled by `vscale`, a runtime value.
* Changes to circumvent the cost-model for now (temporary)
so that the cost-model can be implemented separately.
This can vectorize the following loop [1]:
void loop(int N, double *a, double *b) {
#pragma clang loop vectorize_width(4, scalable)
for (int i = 0; i < N; i++) {
a[i] = b[i] + 1.0;
}
}
[1] This source-level example is based on the pragma proposed
separately in D89031. This patch only implements the LLVM part.
Reviewed By: dmgreen
Differential Revision: https://reviews.llvm.org/D91077
This patch removes a number of asserts that VF is not scalable, even though
the code where this assert lives does nothing that prevents VF being scalable.
Reviewed By: dmgreen
Differential Revision: https://reviews.llvm.org/D91060
Adds the ExtensionType flag, which reflects the LoadExtType of a MaskedGatherSDNode.
Also updated SelectionDAGDumper::print_details so that details of the gather
load (is signed, is scaled & extension type) are printed.
Reviewed By: sdesmalen
Differential Revision: https://reviews.llvm.org/D91084
This commit adds two new intrinsics.
- llvm.experimental.vector.insert: used to insert a vector into another
vector starting at a given index.
- llvm.experimental.vector.extract: used to extract a subvector from a
larger vector starting from a given index.
The codegen work for these intrinsics has already been completed; this
commit is simply exposing the existing ISD nodes to LLVM IR.
Reviewed By: cameron.mcinally
Differential Revision: https://reviews.llvm.org/D91362
The original code was inserting the barrier at the location given by the
caller. Make sure it is always inserted at the end of the loop exit block
instead.
Reviewed By: Meinersbur
Differential Revision: https://reviews.llvm.org/D92849
The register operand was not being marked as a def when it should be. No tests
for this in the main branch as there are not yet any pseudos without a
non-negative VLIndex.
Also change the type of a virtual register operand from unsigned to Register
and adjust formatting.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D92823
This scans through blocks looking for constants used as predicates in
MVE instructions. When two constants are found which are the inverse of
one another, the second can be replaced by a VPNOT of the first,
potentially allowing that not to be folded away into an else predicate
of a vpt block.
Differential Revision: https://reviews.llvm.org/D92470
We defined SubRegIndex for 256/512 regs,
but we did not set the offset for higher part,
so the offset of lower and higher part are the same.
This may cause problem in assessing ranges of SubReg,
it is great that this haven't affected any testcases,
but I think we should fix it to avoid hidden bugs in the future.
Reviewed By: bsaleil, #powerpc
Differential Revision: https://reviews.llvm.org/D92864
The main this this test does is to add the `IsNotPIC` predicate to the
all the atomic instructions pattern that directly refer to
`tglobaladdr`.
This is because in PIC mode we need to generate separate instruction
sequence (either a direct global.get, or __memory_base + offset) for
accessing global addresses.
As part of this change I noticed that many of the `Requires` attributes
added to the instruction in `WebAssemblyInstrAtomics.td` were being
honored. This is because the wrapped in a `let Predicates =
[HasAtomics]` block and it seems that that outer wrapping overrides any
`Requires` on defs within it. As a workaround I removed the outer
`let` and added `HasAtomics` to all the inner `Requires`. I believe
that all the instrucitons that don't have `Requires` explicit bottom out
in `ATOMIC_I` and `ATOMIC_NRI` which have `HasAtomics` so this should
not remove this predicate from any patterns (at least that is the idea).
The alternative to this approach looks like implementing something
like `PredicateControl` in `Mips.td` where we can split the predicates
into groups so they don't clobber each other.
Differential Revision: https://reviews.llvm.org/D92744
Add an overload of `RedirectingFileSystem::create` that builds a
redirecting filesystem off of a simple vector of string pairs. This is
intended to be used to support `clang::arcmt::FileRemapper` and
`clang::PreprocessorOptions::RemappedFiles`.
Differential Revision: https://reviews.llvm.org/D91317
Uniformly return uniquely-owned filesystems from VFS creation APIs. The
one exception is `getRealFileSystem`, which has a single instance and
needs to be shared.
This is almost NFC, except that it fixes a memory leak in
`vfs::collectVFSFromYAML()`.
Depends on https://reviews.llvm.org/D92888
Differential Revision: https://reviews.llvm.org/D92890
MD5 is used.
Currently during sample profile loading, NameTable has to be loaded entirely
up front before any name string is retrieved. That is because NameTable is
stored using ULEB128 encoding and cannot be directly accessed like an array.
However, if MD5 is used to represent name in the NameTable, it has fixed
length. If MD5 names are stored in uint64_t type instead of ULEB128, NameTable
can be accessed like an array then in many cases only part of the NameTable
has to be read. This is helpful for reducing compile time especially when
small source file is compiled. We find that after this change, the elapsed
time to build a large application distributively is reduced by 5% and the
accumulative cpu time used for building is also reduced by 5%. The size of
the profile is slightly reduced with this change by ~0.2%, and that also
indicates encoding MD5 in ULEB128 doesn't save the storage space.
Differential Revision: https://reviews.llvm.org/D92621
This merges the SEW and LMUL enums that each used into singles enums in RISCVBaseInfo.h. The patch also adds a new encoding helper to take SEW, LMUL, tail agnostic, mask agnostic and turn it into a vtype immediate.
I also stopped storing the Encoding in the VTYPE operand in the assembler. It is easy to calculate when adding the operand which should only happen once per instruction.
Differential Revision: https://reviews.llvm.org/D92813
FEntryInserter prepends FENTRY_CALL to the first basic block. In case
there are other instructions, PostRA Machine Instruction Scheduler can
move FENTRY_CALL call around. This actually occurs on SystemZ (see the
testcase). This is bad for the following reasons:
* FENTRY_CALL clobbers registers.
* Linux Kernel depends on whatever FENTRY_CALL expands to to be the very
first instruction in the function.
Fix by adding isCall attribute to FENTRY_CALL, which prevents reordering
by making it a scheduling boundary for PostRA Machine Instruction
Scheduler.
Reviewed By: niravd
Differential Revision: https://reviews.llvm.org/D91218
This patch adds new PM support for the pass and the pass can be now used
during middle-end transforms. The old pass is remamed to
ScalarizeMaskedMemIntrinLegacyPass.
Reviewed-By: skatkov, aeubanks
Differential Revision: https://reviews.llvm.org/D92743
`TryFoldBinOpIntoSelect` didn't have a check for `Optimized`, meaning you could
end up folding twice. (e.g. a select with a G_ADD on the true side, and a G_SUB
on the false side)
Add in the missing `if` and a test.