Given a min(max(fptosi, INT_MIN), INT_MAX) with the correct constants,
we can now generate a fptosi.sat. But in the arm backend, the constant
can be treated as high cost, pulling it out of the basic block in a way
that the DAG combine can no longer see it. This teaches it again that it
is a low cost constant, not worth hoisting out.
Recommitted from 0e98659ea1 with a fix for APInt comparison.
Differential Revision: https://reviews.llvm.org/D114380
This allows JITDylibs to be removed from the ExecutionSession. Calling
ExecutionSession::removeJITDylib will disconnect the JITDylib from the
ExecutionSession and clear it (removing all trackers associated with it). The
JITDylib object will then be destroyed as soon as the last JITDylibSP pointing
at it is destroyed.
GeneratorsMutex should prevent lookups from proceeding through the
generators of a single JITDylib concurrently (since this could
result in redundant attempts to generate definitions). Mutation of
the generators list itself should be done under the session lock.
This keeps the tracker alive for the lifetime of the MR. This is needed so that
we can check whether the tracker has become defunct before posting results (or
failure) for the MR.
Using a BufferSize of one for memory ProcResources will result in better
ILP since it more accurately models the dependencies between memory ops
and their consumers on an in-order processor. After this change, the
scheduler will treat the data edges from loads as blocking so that
stalls are guaranteed when waiting for data to be retreaved from memory.
Since we don't actually track waitcnt here, this should do a better job
at modeling their behavior.
Practically, this means that the scheduler will trigger the 'STALL'
heuristic more often.
This type of change needs to be evaluated experimentally. Preliminary
results are positive.
Fixes: SWDEV-282962
Reviewed By: rampitec
Differential Revision: https://reviews.llvm.org/D114777
We have reasonable fast sqrt and accurate rsqrt for half type due to the
limited fractions. So neither do we need multi steps refinement for
rsqrt nor replace sqrt by rsqrt.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D114844
When a comdat symbol is defined in both bitcode and regular object
files, which are contained in the same archive, the linker could lose
the flag that the symbol is used in the regular object file and allow
LTO to internalize it, which led to "error: undefined symbol".
The issue was introduced in D79300.
Differential Revision: https://reviews.llvm.org/D114801
When an attribute is optional & is given an additional constraint in
rewrite pattern that could lead to dereferencing null Attribute. Avoid
cases where the constraints checks attribute but has no check if null.
This should be improved to be more uniformly guarded.
Along with vector RC flags, this scalar flag will
make various regclass queries like `isVGPR` more
accurate.
Regclasses other than vectors are currently set
with the new flag even though certain unallocatable
classes aren't truly scalars. It would be ok as long
as they remain unallocatable.
Reviewed By: rampitec
Differential Revision: https://reviews.llvm.org/D110053
Implement P1989R2 which adds a range constructor for `string_view`.
Adjust `operator/=` in `path` to avoid atomic constraints caching issue
getting provoked from this PR.
Add defaulted template argument to `string_view`'s "sufficient
overloads" to avoid mangling issues in `clang-cl` builds. It is a
MSVC mangling bug that this works around.
Differential Revision: https://reviews.llvm.org/D113161
This is a lightweight operation, useful for writing unit tests. It will be utilized for testing in subsequent commits.
Differential Revision: https://reviews.llvm.org/D114693
5c77aa2b91 [unroll] Use early return in shouldFullUnroll [nfc]
wasn't quite NFC since !(x <= y) is x > y rather than x >= y
Credit to Justin Bogner for spotting the bug
Reviewed By: reames
Differential Revision: https://reviews.llvm.org/D114894
Split TestCxxChar8_t into two parts: one that check reading variables
without a process and another part with. This allows us to skip the
former on Apple Silicon, where lack of support for chained fix-ups
causes the test to fail.
Differential revision: https://reviews.llvm.org/D114819
This patch implements a small HTTP client library consisting primarily of the `HTTPRequest`, `HTTPResponseHandler`, and `BufferedHTTPResponseHandler` classes. Unit tests of the `HTTPResponseHandler` and `BufferedHTTPResponseHandler` are included.
Reviewed By: dblaikie
Differential Revision: https://reviews.llvm.org/D112751
Previously we missed cloning metadata on function declarations because
we don't call CloneFunctionInto() on declarations in CloneModule().
Reviewed By: dexonsmith
Differential Revision: https://reviews.llvm.org/D113812
The benefits of sampling-based PGO crucially depends on the quality of profile
data. This diff implements a flow-based algorithm, called profi, that helps to
overcome the inaccuracies in a profile after it is collected.
Profi is an extended and significantly re-engineered classic MCMF (min-cost
max-flow) approach suggested by Levin, Newman, and Haber [2008, Complementing
missing and inaccurate profiling using a minimum cost circulation algorithm]. It
models profile inference as an optimization problem on a control-flow graph with
the objectives and constraints capturing the desired properties of profile data.
Three important challenges that are being solved by profi:
- "fixing" errors in profiles caused by sampling;
- converting basic block counts to edge frequencies (branch probabilities);
- dealing with "dangling" blocks having no samples in the profile.
The main implementation (and required docs) are in SampleProfileInference.cpp.
The worst-time complexity is quadratic in the number of blocks in a function,
O(|V|^2). However a careful engineering and extensive evaluation shows that
the running time is (slightly) super-linear. In particular, instances with
1000 blocks are solved within 0.1 second.
The algorithm has been extensively tested internally on prod workloads,
significantly improving the quality of generated profile data and providing
speedups in the range from 0% to 5%. For "smaller" benchmarks (SPEC06/17), it
generally improves the performance (with a few outliers) but extra work in
the compiler might be needed to re-tune existing optimization passes relying on
profile counts.
UPD Dec 1st 2021:
- synced the declaration and definition of the option `SampleProfileUseProfi ` to use type `cl::opt<bool`;
- added `inline` for `SampleProfileInference<BT>::findUnlikelyJumps` and `SampleProfileInference<BT>::isExit` to avoid linking problems on windows.
Reviewed By: wenlei, hoy
Differential Revision: https://reviews.llvm.org/D109860
Add missing tests for std::vector funcionality to improve code coverage:
- Rewrote access tests to check modification of the container using
the reference returned by the non-const overload
- Added tests for reverse iterators: rbegin, rend, etc.
- Added exception test for vector::reserve
- Extended test cases for vector copy assignment
- Fixed insert_iter_value.pass.cpp to use insert overload with const
value_type& (not with value_type&& which is tested in
iter_rvalue.pass.cpp test)
Reviewed By: Quuxplusone, rarutyun, #libc
Differential Revision: https://reviews.llvm.org/D112438
`__wrap_iter` is currently only constexpr if it's not a debug built, but it isn't used in a constexpr context currently. Making it always constexpr and disabling the debugging utilities at constant evaluation is more usful since it has to be always constexpr to be used in a constexpr context.
Reviewed By: ldionne, #libc
Spies: libcxx-commits
Differential Revision: https://reviews.llvm.org/D114733
This patch fixes the build by removing
extractVectorTypeFromShapedValue. The last use was removed Dec 1,
2021 in commit extractVectorTypeFromShapedValue.
STOP statement output was sometimes failing to appear because
the runtime flushes and shuts down open Fortran units beforehand.
But when file descriptor 2 was closed, the STOP statement output
was suppressed. The fix is to not actually close file descriptors
0-2 if they are connected to Fortran units being closed. This was
already the policy when an OPEN statement was (re-)opening such a
unit, so that logic has been pulled out into a member function and
shared with CLOSE processing.
Differential Revision: https://reviews.llvm.org/D114897
This reverts commit f02c5f3478 and
addresses the issue mentioned in D114619 differently.
Repeating the issue here:
Currently, during symbol simplification we remove the original member
symbol from the equivalence class (`ClassMembers` trait). However, we
keep the reverse link (`ClassMap` trait), in order to be able the query
the related constraints even for the old member. This asymmetry can lead
to a problem when we merge equivalence classes:
```
ClassA: [a, b] // ClassMembers trait,
a->a, b->a // ClassMap trait, a is the representative symbol
```
Now let,s delete `a`:
```
ClassA: [b]
a->a, b->a
```
Let's merge ClassA into the trivial class `c`:
```
ClassA: [c, b]
c->c, b->c, a->a
```
Now, after the merge operation, `c` and `a` are actually in different
equivalence classes, which is inconsistent.
This issue manifests in a test case (added in D103317):
```
void recurring_symbol(int b) {
if (b * b != b)
if ((b * b) * b * b != (b * b) * b)
if (b * b == 1)
}
```
Before the simplification we have these equivalence classes:
```
trivial EQ1: [b * b != b]
trivial EQ2: [(b * b) * b * b != (b * b) * b]
```
During the simplification with `b * b == 1`, EQ1 is merged with `1 != b`
`EQ1: [b * b != b, 1 != b]` and we remove the complex symbol, so
`EQ1: [1 != b]`
Then we start to simplify the only symbol in EQ2:
`(b * b) * b * b != (b * b) * b --> 1 * b * b != 1 * b --> b * b != b`
But `b * b != b` is such a symbol that had been removed previously from
EQ1, thus we reach the above mentioned inconsistency.
This patch addresses the issue by making it impossible to synthesise a
symbol that had been simplified before. We achieve this by simplifying
the given symbol to the absolute simplest form.
Differential Revision: https://reviews.llvm.org/D114887
Detected on targets older then gfx10 (e.g. gfx9) for constants that are
too large to be inlined (constant are sgpr by default).
In med3 combine it is expected that regbankselect maps all operands of
min/max we try to match to vgpr. However constants are mapped to sgpr
and there will be a sgpr-to-vgpr copy. Matchers look through sgpr-to-vgpr
copies and return sgpr and these break constant bus restriction.
Build med3 with all vgpr operands. Use existing sgpr-to-vgpr copies for
matched sgprs. If there is no such copy (not expected) build one.
Differential Revision: https://reviews.llvm.org/D114700
Ignore undefined symbols; other minor code cleanup.
Replace test objects and their asm source with a yaml equivalent.
Differential Revision: https://reviews.llvm.org/D114478
Fixes PR#47614. Deduction guides, implicit or user-defined, look like
function declarations in the AST. They aren't really functions, though,
and they always have a trailing return type, so it doesn't make sense
to issue this warning for them.
This bases the CleanupConstantGlobalUsers() implementation around
the ConstantFoldLoadFromConst() API. The general approach is that
we discover all users while looking through casts, and then
constant fold loads and drop stores and memintrinsics.
This avoids special cases and limitations in the previous
implementation, which is also incompatible with opaque pointers.
The result is a bit more powerful than before, because we now use
more general load folding logic which can for example look through
pointer bitcasts between different sizes. This is where the test
changes come from, as we now fold more loads and can thus remove
more globals.
Differential Revision: https://reviews.llvm.org/D114889
Reviewed as part of D114658.
Ultimately this will probably have to be flipped around and renamed
`TEST_IS_RUNTIME`, and extended with `TEST_IS_RUNTIME_OR_CXX20` (once
constexpr std::string support is added) and so on for every new C++
version. But we don't need that flexibility yet, so we're not adding it.
Disable the constructors taking `(size_type, const value_type&,
allocator_type)` if `allocator_type` is not a valid allocator.
Otherwise, these constructors are considered when resolving e.g.
`(int*, int*, NotAnAllocator())`, leading to a hard error during
instantiation. A hard error makes the Standard's requirement to not
consider deduction guides of the form `(Iterator, Iterator,
BadAllocator)` during overload resolution essentially non-functional.
The previous approach was to SFINAE away `allocator_traits`. This patch
SFINAEs away the specific constructors instead, for consistency with
`basic_string` -- see [LWG3076](wg21.link/lwg3076) which describes
a very similar problem for strings (note, however, that unlike LWG3076,
no valid constructor call is affected by the bad instantiation).
Differential Revision: https://reviews.llvm.org/D114311
Instead of using `DenseMap::find()` and `DenseMap::insert()`, use
`DenseMap::operator[]` to get a reference to the profile data and update
the reference. This simplifies the changes in D114565.
Reviewed By: kyulee
Differential Revision: https://reviews.llvm.org/D114828