Summary:
- Use `device_builtin_surface` and `device_builtin_texture` for
surface/texture reference support. So far, both the host and device
use the same reference type, which could be revised later when
interface/implementation is stablized.
Reviewers: yaxunl
Subscribers: cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D77583
Shuffle combining can insert zero byte sized elements into the shuffle mask, which combineX86ShufflesConstants will attempt to fold without taking into account whether the byte-sized type is legal (e.g. AVX512F only targets).
If we have a full-zeroable vector then we should just return a zero version of the root type, otherwise if the type isn't valid we should bail.
Fixes PR45443
The new format should be equivalent to the old format, and it is now the
default format when running the libc++ tests. This commit changes the
libc++abi tests to use the new format by default too. If unexpected failures
are discovered, it should be fine to revert this commit until they are
addressed.
Also note that it is still possible to use the old format by passing
`--param=use_old_format=True` when running Lit for the time being.
Summary:
Same restrictions apply as in the other direction: macro arguments are
not supported yet, only full macro expansions can be mapped.
Taking over from https://reviews.llvm.org/D72581.
Reviewers: gribozavr2, sammccall
Reviewed By: gribozavr2
Subscribers: cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D77209
Summary:
Our previous definition of "top-level" was too informal, and didn't
allow for overlapping macros that each directly produce expanded tokens.
See D77507 for previous discussion.
Fixes http://bugs.llvm.org/show_bug.cgi?id=45428
Reviewers: kadircet, vabridgers
Subscribers: cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D77615
Currently Clang does not respect -fno-unroll-loops during LTO. During
D76916 it was suggested to respect -fno-unroll-loops on a TU basis.
This patch uses the existing llvm.loop.unroll.disable metadata to
disable loop unrolling explicitly for each loop in the TU if
unrolling is disabled. This should ensure that loops from TUs compiled
with -fno-unroll-loops are skipped by the unroller during LTO.
This also means that if a loop from a TU with -fno-unroll-loops
gets inlined into a TU without this option, the loop won't be
unrolled.
Due to the fact that some transforms might drop loop metadata, there
potentially are cases in which we still unroll loops from TUs with
-fno-unroll-loops. I think we should fix those issues rather than
introducing a function attribute to disable loop unrolling during LTO.
Improving the metadata handling will benefit other use cases, like
various loop pragmas, too. And it is an improvement to clang completely
ignoring -fno-unroll-loops during LTO.
If that direction looks good, we can use a similar approach to also
respect -fno-vectorize during LTO, at least for LoopVectorize.
In the future, this might also allow us to remove the UnrollLoops option
LLVM's PassManagerBuilder.
Reviewers: Meinersbur, hfinkel, dexonsmith, tejohnson
Reviewed By: Meinersbur, tejohnson
Differential Revision: https://reviews.llvm.org/D77058
Summary:
The motivation here is fixing https://bugs.llvm.org/show_bug.cgi?id=45428, see
D77507. The fundamental problem is that a "top-level" expansion wasn't precisely
defined. Repairing this concept means that TokenBuffer's "top-level expansion"
may not correspond to a single macro expansion. Example:
```
M(2); // expands to 1+2
```
The expansions overlap, but neither expansion alone yields all the tokens.
We need a TokenBuffer::Mapping that corresponds to their union.
This is fairly easy to fix in CollectPPExpansions, but the current design of
TokenCollector::Builder needs a fix too as it relies on the macro's expansion
range rather than the captured expansion bounds. This fix is hard to make due
to the way code is reused within Builder. And honestly, I found that code pretty
hard to reason about too.
The new approach doesn't use the expansion range, but only the expansion
location: it assumes an expansion is the contiguous set of expanded tokens with
the same expansion location, which seems like a reasonable formalization of
the "top-level" notion.
And hopefully the control flow is easier to follow too, it's considerably
shorter even with more documentation.
Reviewers: kadircet
Subscribers: cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D77614
Otherwise, files don't link when using a GNU linker, which is more
sensitive on the order of the source file relative to the various
linked libraries. See http://c-faq.com/lib/libsearch.html for an
explanation of the problem.
Summary:
FileInputs are only written by ASTWorker thread, therefore it is safe
to read them without the lock inside that thread. It can still be read by other
threads through ASTWorker::getCurrentCompileCommand though.
This patch also gets rid of the smart pointer wrapping FileInputs as there is
never mutliple owners.
Reviewers: sammccall
Subscribers: ilya-biryukov, javed.absar, MaskRay, jkorous, arphaman, usaxena95, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D77309
Currently we have no dedicated warnings, but we return error message instead of a result.
It is generally not consistent with another warnings we have.
This change was suggested and discussed here:
https://reviews.llvm.org/D77216#1954873
This change refines error messages we report and also I had to update the API
to implement it.
Differential revision: https://reviews.llvm.org/D77399
Summary: Removes the `static` token when defining a function out of line if the function is a `CXXMethodDecl`
Reviewers: sammccall, kadircet, hokein
Reviewed By: kadircet
Subscribers: ilya-biryukov, MaskRay, jkorous, arphaman, usaxena95, cfe-commits
Tags: #clang, #clang-tools-extra
Differential Revision: https://reviews.llvm.org/D77534
If the stack pointer is altered for local variables and we are generating
Thumb2 execute-only code the .pad directive is missing.
Usually the size of the adjustment is stored in a PC-relative location
and loaded into a register which is then added to the stack pointer.
However when we are generating execute-only code code the size of the
adjustment is instead generated using the MOVW/MOVT instruction pair.
As a by product of handling the execute-only case this also fixes an
existing issue that in the none execute-only case the .pad directive was
generated against the load of the constant to a register instruction,
instead of the instruction which adds the register to the stack pointer.
Differential Revision: https://reviews.llvm.org/D76849
Warnings in emmintrin.h and xmmintrin.h are reported by
-fsanitize=implicit-integer-sign-change.
Reviewed By: RKSimon, craig.topper
Differential Revision: https://reviews.llvm.org/D77393
Introduce the alloca op for stack memory allocation. When converting to the
LLVM dialect, this is lowered to an llvm.alloca. Refactor the std to
llvm conversion for alloc op to reuse with alloca. Drop useAlloca option
with alloc op lowering.
Differential Revision: https://reviews.llvm.org/D76602
This patch updates the code that deals with conditions from predicate
info to make use of constant ranges.
For ssa_copy instructions inserted by PredicateInfo, we have 2 ranges:
1. The range of the original value.
2. The range imposed by the linked condition.
1. is known, 2. can be determined using makeAllowedICmpRegion. The
intersection of those ranges is the range for the copy.
With this patch, we get a nice increase in the number of instructions
eliminated by both SCCP and IPSCCP for some benchmarks:
For MultiSource, SPEC2000 & SPEC2006:
Tests: 237
Same hash: 170 (filtered out)
Remaining: 67
Metric: sccp.NumInstRemoved
Program base patch diff
test-suite...Source/Benchmarks/sim/sim.test 10.00 71.00 610.0%
test-suite...CFP2000/177.mesa/177.mesa.test 361.00 1626.00 350.4%
test-suite...encode/alacconvert-encode.test 141.00 602.00 327.0%
test-suite...decode/alacconvert-decode.test 141.00 602.00 327.0%
test-suite...CI_Purple/SMG2000/smg2000.test 1639.00 4093.00 149.7%
test-suite...peg2/mpeg2dec/mpeg2decode.test 75.00 163.00 117.3%
test-suite...T2006/401.bzip2/401.bzip2.test 358.00 513.00 43.3%
test-suite...rks/FreeBench/pifft/pifft.test 11.00 15.00 36.4%
test-suite...langs-C/unix-tbl/unix-tbl.test 4.00 5.00 25.0%
test-suite...lications/sqlite3/sqlite3.test 541.00 667.00 23.3%
test-suite.../CINT2000/254.gap/254.gap.test 243.00 299.00 23.0%
test-suite...ks/Prolangs-C/agrep/agrep.test 25.00 29.00 16.0%
test-suite...marks/7zip/7zip-benchmark.test 1135.00 1304.00 14.9%
test-suite...lications/ClamAV/clamscan.test 1105.00 1268.00 14.8%
test-suite...urce/Applications/lua/lua.test 398.00 436.00 9.5%
Metric: sccp.IPNumInstRemoved
Program base patch diff
test-suite...C/CFP2000/179.art/179.art.test 1.00 3.00 200.0%
test-suite...006/447.dealII/447.dealII.test 429.00 1056.00 146.2%
test-suite...nch/fourinarow/fourinarow.test 3.00 7.00 133.3%
test-suite...CI_Purple/SMG2000/smg2000.test 818.00 1748.00 113.7%
test-suite...ks/McCat/04-bisect/bisect.test 3.00 5.00 66.7%
test-suite...CFP2000/177.mesa/177.mesa.test 165.00 255.00 54.5%
test-suite...ediabench/gsm/toast/toast.test 18.00 27.00 50.0%
test-suite...telecomm-gsm/telecomm-gsm.test 18.00 27.00 50.0%
test-suite...ks/Prolangs-C/agrep/agrep.test 24.00 35.00 45.8%
test-suite...TimberWolfMC/timberwolfmc.test 43.00 62.00 44.2%
test-suite...encode/alacconvert-encode.test 46.00 66.00 43.5%
test-suite...decode/alacconvert-decode.test 46.00 66.00 43.5%
test-suite...langs-C/unix-tbl/unix-tbl.test 12.00 17.00 41.7%
test-suite...peg2/mpeg2dec/mpeg2decode.test 31.00 41.00 32.3%
test-suite.../CINT2000/254.gap/254.gap.test 117.00 154.00 31.6%
Reviewers: efriedma, davide
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D76611
Move the logic whether lowering of deopt value requires a spill slot in
a separate lambda.
Reviewers: reames, dantrushin
Reviewed By: dantrushin
Subscribers: hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D77629
From Arm v8 Architecture Reference Manual F5.1.84 LDREXD
The ldrexd instruction in Arm state has the following conditions:
t = UInt(Rt); t2 = t + 1; n = UInt(Rn);
if Rt<0> == '1' || t2 == 15 || n == 15 then UNPREDICTABLE;
In when Rt is odd or if Rt is 14 (making t2 15).
In the implementation when the pair is the UNPREDICTABLE R14_R15 we
would ideally return SOFT_FAIL. We can't because there is no R14_R15
value for us to return so we fail early returning FAIL.
The early return for registers outside the bounds of the table means
the check for Rt == 14 (0xE) redundant which causes a static analyzer
to flag the condition as never being true.
To fix the warning I've removed the check and replaced with a comment
explaining the difference with the specification.
Fixes pr41660
Differential Revision: https://reviews.llvm.org/D77463
Fix point-wise copy generation to work with bounds that have max/min.
Change structure of copy loop nest to use absolute loop indices and
subtracting base from the indexes of the fast buffers. Update supporting
utilities: Fix FlatAffineConstraints::getLowerAndUpperBound to look at
equalities as well and for a missing division. Update unionBoundingBox
to not discard common constraints (leads to a tighter system). Update
MemRefRegion::getConstantBoundingSizeAndShape to add memref dimension
constraints. Run removeTrivialRedundancy at the end of
MemRefRegion::compute. Run single iteration loop promotion and
load/store canonicalization after affine data copy (in its test pass as
well).
Differential Revision: https://reviews.llvm.org/D77320
Summary:
In `Unix/Process.inc`, we seed a random number generator from
`/dev/urandom` if possible, but if not, we're happy to fall back to
ordinary pseudorandom strategies, like the current time and PID.
The corresponding function on Windows calls `CryptGenRandom`, but it
//doesn't// have a fallback if that strategy fails. But `CryptGenRandom`
//can// fail, if a cryptography provider isn't properly initialized, or
occasionally (by our observation) simply intermittently.
If it's reasonable on Unix to implement traditional pseudorandom-number
seeding as a fallback, then it's surely reasonable to do the same on
Windows. So this patch adds a last-ditch use of ordinary rand(), using
much the same strategy as the Unix fallback code.
Reviewers: hans, sammccall
Reviewed By: hans
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D77553
My main work directory is on a separate partition, but I usually access
it through a symlink in my home directory. When running the tests,
either Clang or make resolves the symlink, and the real path of the
test directory ends up in the debug information.
This confuses this test as LLDB is trying to remap the real path, but
the remapping description uses the path with the symlink in
it. Calling realpath on the source path when constructing the
remapping description fixes it.
constructor with default arguments.
We used to try to rebuild the call as a call to the faked-up inherited
constructor, which is only a placeholder and lacks (for example) default
arguments. Instead, build the call by reference to the original
constructor.
In passing, add a note to say where a call that recursively uses a
default argument from within itself occurs. This is usually pretty
obvious, but still at least somewhat useful, and would have saved
significant debugging time for this particular bug.
Do not commit the llvm/test/ExecutionEngine/MCJIT/cet-code-model-lager.ll because it will
cause build bot fail(not suitable for window 32 target).
Summary:
This patch comes from H.J.'s 2bd54ce7fa
**This patch fix the failed llvm unit tests which running on CET machine. **(e.g. ExecutionEngine/MCJIT/MCJITTests)
The reason we enable IBT at "JIT compiled with CET" is mainly that: the JIT don't know the its caller program is CET enable or not.
If JIT's caller program is non-CET, it is no problem JIT generate CET code or not.
But if JIT's caller program is CET enabled, JIT must generate CET code or it will cause Control protection exceptions.
I have test the patch at llvm-unit-test and llvm-test-suite at CET machine. It passed.
and H.J. also test it at building and running VNCserver(Virtual Network Console), it works too.
(if not apply this patch, VNCserver will crash at CET machine.)
Reviewers: hjl.tools, craig.topper, LuoYuanke, annita.zhang, pengfei
Reviewed By: LuoYuanke
Subscribers: tstellar, efriedma, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D76900
We would return `LLDB_INVALID_IMAGE_TOKEN` for the address rather than
the correct value of `LLDB_IMAGE_ADDRESS`. This would result in the
check for the return value to silently pass on x64 as the invalid
address and invalid token are of different sizes (`size_t` vs
`uintprr_t`). This corrects the return value to `LLDB_INVALID_ADDRESS`
and addresses the rest to reset the mapped address to the invalid value.
This was found by inspection when trying to implement module support for
Windows.
Now that we have scalable vectors, there's a distinction that isn't
getting captured in the original SequentialType: some vectors don't have
a known element count, so counting the number of elements doesn't make
sense.
In some cases, there's a better way to express the commonality using
other methods. If we're dealing with GEPs, there's GEP methods; if we're
dealing with a ConstantDataSequential, we can query its element type
directly.
In the relatively few remaining cases, I just decided to write out
the type checks. We're talking about relatively few places, and I think
the abstraction doesn't really carry its weight. (See thread "[RFC]
Refactor class hierarchy of VectorType in the IR" on llvmdev.)
Differential Revision: https://reviews.llvm.org/D75661
instead of recursing on the stack.
This doesn't actually resolve PR45333, because we now hit stack overflow
somewhere else, but it does get us further. I've not found any way of
testing this that doesn't still crash elsewhere.