Use MachineOperand::ChangeToImmediate rather than reassigning
MachineOperands to new values created from MachineOperand::CreateImm,
so that their parent pointers are preserved.
This fixes "Instruction has operand with wrong parent set" errors
reported by the MachineVerifier.
llvm-svn: 341389
Summary:
Control height reduction merges conditional blocks of code and reduces the
number of conditional branches in the hot path based on profiles.
if (hot_cond1) { // Likely true.
do_stg_hot1();
}
if (hot_cond2) { // Likely true.
do_stg_hot2();
}
->
if (hot_cond1 && hot_cond2) { // Hot path.
do_stg_hot1();
do_stg_hot2();
} else { // Cold path.
if (hot_cond1) {
do_stg_hot1();
}
if (hot_cond2) {
do_stg_hot2();
}
}
This speeds up some internal benchmarks up to ~30%.
Reviewers: davidxl
Reviewed By: davidxl
Subscribers: xbolva00, dmgreen, mehdi_amini, llvm-commits, mgorny
Differential Revision: https://reviews.llvm.org/D50591
llvm-svn: 341386
The -diff option makes it easy to diff dwarf by hiding addresses and
offsets. However not all of them were hidden, which should be fixed by
this patch.
Differential revision: https://reviews.llvm.org/D51593
llvm-svn: 341377
One of the tests is failing 50% of the time when expensive checks are
enabled. Not sure how deep the problem is so just reverting while the
author can investigate so that the bots stop repeatedly failing and
blaming things incorrectly. Will respond with details on the original
commit.
llvm-svn: 341365
Check for definedness of the __cpp_sized_deallocation and
__cpp_aligned_new feature test macros. These will not be defined
when the feature is not available, and that prevents any code that
includes this header from compiling with -Wundef -Werror.
Differential Revision: https://reviews.llvm.org/D51171
llvm-svn: 341364
Load Hardening.
Wires up the existing pass to work with a proper IR attribute rather
than just a hidden/internal flag. The internal flag continues to work
for now, but I'll likely remove it soon.
Most of the churn here is adding the IR attribute. I talked about this
Kristof Beyls and he seemed at least initially OK with this direction.
The idea of using a full attribute here is that we *do* expect at least
some forms of this for other architectures. There isn't anything
*inherently* x86-specific about this technique, just that we only have
an implementation for x86 at the moment.
While we could potentially expose this as a Clang-level attribute as
well, that seems like a good question to defer for the moment as it
isn't 100% clear whether that or some other programmer interface (or
both?) would be best. We'll defer the programmer interface side of this
for now, but at least get to the point where the feature can be enabled
without relying on implementation details.
This also allows us to do something that was really hard before: we can
enable *just* the indirect call retpolines when using SLH. For x86, we
don't have any other way to mitigate indirect calls. Other architectures
may take a different approach of course, and none of this is surfaced to
user-level flags.
Differential Revision: https://reviews.llvm.org/D51157
llvm-svn: 341363
GCC triggers false positives if a nothrow function is called through a template argument. See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80985 for details. The LLVM libraries have no stable C++ API, so the warning is not useful.
llvm-svn: 341361
Also reverts follow-up commits r341343 and r341344.
The primary commit continues to break some build bots even after the
fixes in r341343 for UBSan issues:
http://lab.llvm.org:8011/builders/clang-cmake-aarch64-full/builds/5823
It is also failing for me locally (linux, x86-64).
llvm-svn: 341360
implementing the proposed mitigation technique described in the original
design document.
The idea is to check after calls that the return address used to arrive
at that location is in fact the correct address. In the event of
a mis-predicted return which reaches a *valid* return but not the
*correct* return, this will detect the mismatch much like it would
a mispredicted conditional branch.
This is the last published attack vector that I am aware of in the
Spectre v1 space which is not mitigated by SLH+retpolines. However,
don't read *too* much into that: this is an area of ongoing research
where we expect more issues to be discovered in the future, and it also
makes no attempt to mitigate Spectre v4. Still, this is an important
completeness bar for SLH.
The change here is of course delightfully simple. It was predicated on
cutting support for post-instruction symbols into LLVM which was not at
all simple. Many thanks to Hal Finkel, Reid Kleckner, and Justin Bogner
who helped me figure out how to do a bunch of the complex changes
involved there.
Differential Revision: https://reviews.llvm.org/D50837
llvm-svn: 341358
retpolines.
This implements the core design of tracing the intended target into the
target, checking it, and using that to update the predicate state. It
takes advantage of a few interesting aspects of SLH to make it a bit
easier to implement:
- We already split critical edges with conditional branches, so we can
assume those are gone.
- We already unfolded any memory access in the indirect branch
instruction itself.
I've left hard errors in place to catch if any of these somewhat subtle
invariants get violated.
There is some code that I can factor out and share with D50837 when it
lands, but I didn't want to couple landing the two patches, so I'll do
that in a follow-up cleanup commit if alright.
Factoring out the code to handle different scenarios of materializing an
address remains frustratingly hard. In a bunch of cases you want to fold
one of the cases into an immediate operand of some other instruction,
and you also have both symbols and basic blocks being used which require
different methods on the MI builder (and different operand kinds).
Still, I'll take a stab at sharing at least some of this code in
a follow-up if I can figure out how.
Differential Revision: https://reviews.llvm.org/D51083
llvm-svn: 341356
Summary:
Refactoring done by rL340872 accidentally appeared to be non-NFC, changing the way how
multiple instances of the same pass are handled - aggregation of results by PassName
forced data for multiple instances to be merged together and reported as one line.
Getting back to creating/reporting timers per pass instance.
Reporting was a bit enhanced by counting pass instances and adding #<num> suffix
to the pass description. Note that it is instances that are being counted,
not invocations of them.
time-passes test updated to account for multiple passes being run.
Reviewers: paquette, jhenderson, MatzeB, skatkov
Reviewed By: skatkov
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D51535
llvm-svn: 341346
This patch removes the function `expandSCEVIfNeeded` which behaves not as
it was intended. This function tries to make a lookup for exact existing expansion
and only goes to normal expansion via `expandCodeFor` if this lookup hasn't found
anything. As a result of this, if some instruction above the loop has a `SCEVConstant`
SCEV, this logic will return this instruction when asked for this `SCEVConstant` rather
than return a constant value. This is both non-profitable and in some cases leads to
breach of LCSSA form (as in PR38674).
Whether or not it is possible to break LCSSA with this algorithm and with some
non-constant SCEVs is still in question, this is still being investigated. I wasn't
able to construct such a test so far, so maybe this situation is impossible. If it is,
it will go as a separate fix.
Rather than do it, it is always correct to just invoke `expandCodeFor` unconditionally:
it behaves smarter about insertion points, and as side effect of this it will choose a
constant value for SCEVConstants. For other SCEVs it may end up finding a better insertion
point. So it should not be worse in any case.
NOTE: So far the only known case for which this transform may break LCSSA is mapping
of SCEVConstant to an instruction. However there is a suspicion that the entire algorithm
can compromise LCSSA form for other cases as well (yet not proved).
Differential Revision: https://reviews.llvm.org/D51286
Reviewed By: etherzhhb
llvm-svn: 341345
Usage:
llvm-objcopy --compress-debug-sections=zlib foo.o
llvm-objcopy --compress-debug-sections=zlib-gnu foo.o
In both cases the debug section contents is compressed with zlib. In the GNU
style case the header is the "ZLIB" magic string followed by the uint64 big-
endian decompressed size. In the non-GNU mode the header is the
Elf(32|64)_Chdr.
Decompression support is coming soon.
Differential Revision: https://reviews.llvm.org/D49678
llvm-svn: 341342
This patch modifies hasStandardEncoding() / inMicroMipsMode() /
inMips16Mode() methods of the MipsSubtarget class so only one can be
true at any one time. That prevents the selection of microMIPS and MIPS
instructions and patterns that are defined in TableGen files at the same
time. A few new patterns and instruction definitions hae been added to
keep test cases passed.
Differential revision: https://reviews.llvm.org/D51483
llvm-svn: 341338
Summary:
Original changeset (https://reviews.llvm.org/D46776) by @modocache. It was
reverted after the PS4 bot failed.
The issue has been determined to be with the way the PS4 SDK handles this
particular option. https://reviews.llvm.org/D50410 removes this test, so we
can push this again.
Patch by Arnaud Coomans!
Reviewers: cfe-commits, modocache
Reviewed By: modocache
Differential Revision: https://reviews.llvm.org/D50515
llvm-svn: 341329
A ReadAdvance was incorrectly added to the SchedReadWrite list associated with
the following SSE instructions:
sqrtss
sqrtsd
rsqrtss
rcpss
As a consequence, a wrong operand latency was computed for the register operand
used as the base address of the folded load operand.
This patch removes the wrong ReadAdvance, and updates the llvm-mca test cases.
There is still a problem with correctly modeling partial register writes on XMM
registers This other problem is currently tracked here:
https://bugs.llvm.org/show_bug.cgi?id=38813
Differential Revision: https://reviews.llvm.org/D51542
llvm-svn: 341326
Also adjust some of dsymutil's headers to put the header guards at the top,
otherwise the compiler will not recognize them as header guards.
llvm-svn: 341323
According to the standard, for the .debug_names (the "dwarf accelerator
tables"):
> If a subprogram or inlined subroutine is included, and has a
> DW_AT_linkage_name attribute, there will be an additional index entry
> for the linkage name.
For Swift we generate DW_structure_types with a linkage name and the
verifier was incorrectly rejecting this. This patch fixes that by only
considering the linkage name in those particular cases. The test is the
"reduced" debug info of the failing swift test on swift.org.
Differential revision: https://reviews.llvm.org/D51420
llvm-svn: 341311
When initial support for dllimport was added for aarch64 in
SVN r316555, ClassifyGlobalReference didn't set the MO_DLLIMPORT
flag - that was only completed in SVN r323810. Reuse the return
value from ClassifyGlobalReference for this purpose as well.
llvm-svn: 341310
A condition in isSpillInstruction() updates a small vector rather
than the 'FI' by-ref parameter, which was used in a subsequent
call to 'isSpillSlotObjectIndex()'. This patch fixes the condition
to check the FIs in the vector instead.
llvm-svn: 341305
For instructions that spill/fill to and from multiple frame-indices
in a single instruction, hasStoreToStackSlot and hasLoadFromStackSlot
should return an array of accesses, rather than just the first encounter
of such an access.
This better describes FI accesses for AArch64 (paired) LDP/STP
instructions.
Reviewers: t.p.northover, gberry, thegameg, rengolin, javed.absar, MatzeB
Reviewed By: MatzeB
Differential Revision: https://reviews.llvm.org/D51537
llvm-svn: 341301
Remove braces around two, single statement "if" blocks in line with rest
of the file and the general LLVM code style. NFC, testing commit access.
llvm-svn: 341294
When doing some instruction scheduling work, we noticed some missing itineraries.
Before we switch to machine scheduler, those missing itineraries might not have impact to actually scheduling,
because we can still get same latency due to default values.
With machine scheduler, however, itineraries will have impact to scheduling.
eg: NumMicroOps will default to be 0 if there is NO itineraries for specific instruction class.
And most of the instruction class with itineraries will have NumMicroOps default to 1.
This will has impact on the count of RetiredMOps, affects the Pending/Available Queue,
then causing different scheduling or suboptimal scheduling further.
Patch by jsji (Jinsong Ji)
Differential Revision: https://reviews.llvm.org/D51506
llvm-svn: 341293
The fold was implemented for the general case but use-limitation,
but the later constant version which didn't check uses was only
matching splat constants.
llvm-svn: 341292
In lib/CodeGen/LiveDebugVariables.cpp, it uses std::prev(MBBI) to
get DebugValue's SlotIndex. However, the previous instruction may be
also a debug instruction. It could not use a debug instruction to query
SlotIndex in mi2iMap.
Scan all debug instructions and use the first debug instruction to query
SlotIndex for following debug instructions. Only handle DBG_VALUE in
handleDebugValue().
Differential Revision: https://reviews.llvm.org/D50621
llvm-svn: 341289
If we have a pair of binops feeding another pair of binops, rearrange the operands so
the matching pair are together because that allows easy factorization folds to happen
in instcombine:
((X << S) & Y) & (Z << S) --> ((X << S) & (Z << S)) & Y (reassociation)
--> ((X & Z) << S) & Y (factorize shift from 'and' ops optimization)
This is part of solving PR37098:
https://bugs.llvm.org/show_bug.cgi?id=37098
Note that there's an instcombine version of this patch attached there, but we're trying
to make instcombine have less responsibility to improve compile-time efficiency.
For reasons I still don't completely understand, reassociate does this kind of transform
sometimes, but misses everything in my motivating cases.
This patch on its own is gluing an independent cleanup chunk to the end of the existing
RewriteExprTree() loop. We can build on it and do something stronger to better order the
full expression tree like D40049. That might be an alternative to the proposal to add a
separate reassociation pass like D41574.
Differential Revision: https://reviews.llvm.org/D45842
llvm-svn: 341288
Summary:
A follow-up for D49266 / rL337166 + D49497 / rL338044.
This is still the same pattern to check for the [lack of]
signed truncation, but in this case the constants and the predicate
are negated.
https://rise4fun.com/Alive/BDVhttps://rise4fun.com/Alive/n7Z
Reviewers: spatel, craig.topper, RKSimon, javed.absar, efriedma, dmgreen
Reviewed By: spatel
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D51532
llvm-svn: 341287
Removes the implicit conversion to the underlying type for
JITSymbolFlags::FlagNames and replaces it with some bitwise and comparison
operators.
llvm-svn: 341282
This is no-outwardly-visible-change intended, so no test.
But the code is smaller and more efficient. The check for
a 'not' op is intended to avoid the expensive value tracking
call when it should not be necessary, and it might prevent
infinite looping when we resurrect:
rL300977
llvm-svn: 341280
The 'rol Rd' instruction is equivalent to 'adc Rd'.
This caused compile warnings from tablegen because of conflicting bits
shared between each instruction.
llvm-svn: 341275
Summary:
Reid suggested making HasWinCFI a plain bool defaulting to false in D50288.
It's needed in order to add HasWinCFI to MIRPrinter. Otherwise, we'll get the
assertion:
HasWinCFI.hasValue() && "HasWinCFI not set yet!"'
Also, a few ARM64 Windows test cases will fail with the same assert if the ARM64
MCLayer part of EH work (D50166) goes in before the frame lowering part that
sets HasWinCFI (D50288 as of now).
Reviewers: rnk, mstorsjo, hans, javed.absar
Reviewed By: rnk
Subscribers: kristof.beyls, llvm-commits
Differential Revision: https://reviews.llvm.org/D51560
llvm-svn: 341270
Leverage existing logic in constant hoisting pass to transform constant GEP
expressions sharing the same base global variable. Multi-dimensional GEPs are
rewritten into single-dimensional GEPs.
Differential Revision: https://reviews.llvm.org/D51396
llvm-svn: 341269
ModuleCount = InstrCount was incorrect. It should have been
InstrCount = ModuleCount. This was making it emit an extra, incorrect remark
for Print Module IR.
The test didn't catch this, because it didn't ensure that the only remark
output was from the desired pass. So, it was possible to have an extra remark
come through and not fail. Updated the test so that we ensure that the last
remark that's output comes from the desired pass. This is done by ensuring
that whatever is being read after the last remark is YAML output rather than
some incorrect garbage.
llvm-svn: 341267
- Remove duplication: Both TestingGuide and TestSuiteMakefileGuide
would give a similar overview over the test-suite.
- Present cmake/lit as the default/normal way of running the test-suite:
- Move information about the cmake/lit testsuite into the new
TestSuiteGuide.rst file. Mark the remaining information in
TestSuiteMakefilesGuide.rst as deprecated.
- General simplification and shorting of language.
- Remove paragraphs about tests known to fail as everything should pass
nowadays.
- Remove paragraph about zlib requirement; it's not required anymore
since we copied a zlib source snapshot into the test-suite.
- Remove paragraph about comparison with "native compiler". Correctness is
always checked against reference outputs nowadays.
- Change cmake/lit quickstart section to recommend `pip` for installing
lit and use `CMAKE_C_COMPILER` and a cache file in the example as that
is what most people will end up doing anyway. Also a section about
compare.py to quickstart.
- Document `Bitcode` and `MicroBenchmarks` directories.
- Add section with commonly used cmake configuration options.
- Add section about showing and comparing result files via compare.py.
- Add section about using external benchmark suites.
- Add section about using custom benchmark suites.
- Add section about profile guided optimization.
- Add section about cross-compilation and running on external devices.
Differential Revision: https://reviews.llvm.org/D51465
llvm-svn: 341260
These intrinsics use the same implementation as PTEST intrinsics, but use vXi1 vectors.
New clang builtins will be accompanying them shortly.
llvm-svn: 341259
In basic block, loop, and function passes, we already have a function that
we can use to emit optimization remarks. We can use that instead of searching
the module for the first suitable function (that is, one that contains at
least one basic block.)
llvm-svn: 341253
Instead of counting the size of the entire module every time we run a pass,
pass along a delta instead and use that to emit the remark.
This means we only have to use (on average) smaller IR units to calculate
instruction counts. E.g, in a BB pass, we only need to look at the delta of
the BB instead of the delta of the entire module.
6/6
(This improved compile time for size remarks on sqlite3 + O2 significantly)
llvm-svn: 341250
Same vein as the previous commits. Pre-calculate the size of
the module and use that to decide if we're going to emit a
remark.
This one comes with a FIXME and TODO. First off, CallGraphSCC
and CallGraphNode don't have a getInstructionCount function. So,
for now, we do the same thing as in a module pass.
Second off, we're not really saving anything here yet, because
as before, I need to change emitInstrCountChangedRemark to take
in a delta. Keeping the patches small though, so that's coming up
next.
5/6
llvm-svn: 341249
Same as the previous NFC commits in the same vein.
This one introduces a TODO. I'm going to change emitInstrCountChangedRemark
so that it takes in a delta. Since the delta isn't necessary yet, it's not
there. For now, this means that we're calculating the size of the module
twice.
Just done separately to keep the patches small.
4/6
llvm-svn: 341248
Another commit reducing compile time in size remarks.
Cache the size of the module and loop, and update values based
off of deltas instead. Avoid recalculating the size of the
whole module whenever possible.
3/6
llvm-svn: 341247
Size remarks are slow due to lots of recalculation of the module.
This is similar to the previous commit. Cache the size of the module and
update counts in basic block passes based off a less-expensive delta.
2/6
llvm-svn: 341246
Size remarks are slow due to lots of recalculation of the module.
Pre-calculate the module size and initial function size for a remark. Use
deltas calculated using the less-expensive function IR count to update the
module counts for Function passes.
1/6
llvm-svn: 341245
Summary:
The python executable may not exist on all systems so use sys.executable
instead.
Reviewers: ddunbar, stella.stamenova
Subscribers: delcypher, llvm-commits
Differential Revision: https://reviews.llvm.org/D51511
llvm-svn: 341244
Since we changed the storage for the PID in PIDRecord instances, we need
to also update the way we load the data from a DataExtractor through the
RecordInitializer.
llvm-svn: 341243
Previously we've been reading and writing the wrong types which only
worked in little endian implementations. This time we're writing the
same typed values the runtime is using, and reading them appropriately
as well.
llvm-svn: 341241
I changed the seed slightly, but forgot to run the tests on a 32-bit system, so
tests which hard-code a specific hash value started breaking.
llvm-svn: 341240
This change allows us to let the compiler do the right thing for when
handling big-endian and little-endian records for FDR mode function
records.
Previously, we assumed that the encoding was little-endian that reading
the first byte to look for the function id and function record types was
ordered in a little-endian manner. This change allows us to better
handle function records where the first four bytes may actually be
encoded in big-endian thus giving us the wrong bytes where we're seeking
the function information from.
This is a follow-up to D51210 and D51289.
llvm-svn: 341236
This change makes the writer implementation more consistent with the way
fields are written down to avoid assumptions on bitfield order and
padding. We also fix an inconsistency between the type returned by the
`delta()` accessor to match the data member it's returning.
This is a follow-up to D51289 and D51210.
llvm-svn: 341230
Following D50807, and heading towards D50664, this intermediary change does the following:
1. Upgrade all custom Error types in llvm/trunk/lib/DebugInfo/ to use the new StringError behavior (D50807).
2. Implement std::is_error_code_enum and make_error_code() for DebugInfo error enumerations.
3. Rename GenericError -> PDBError (the file will be renamed in a subsequent commit)
4. Update custom error messages to follow the same formatting: (\w\s*)+\.
5. Keep generic "file not found" (ENOENT) errors as they are in PDB code. Previously, there used to be a custom enumeration for that purpose.
6. Remove a few extraneous LF in log() implementations. Printing LF is a responsability at a higher level, not at the error level.
Differential Revision: https://reviews.llvm.org/D51499
llvm-svn: 341228
This patch recognizes shuffles that shift elements and fill with zeros. I've copied and modified the shift matching code we use for normal vector registers to do this. I'm not sure if there's a good way to share more of this code without making the existing function more complex than it already is.
This will be used to enable kshift intrinsics in clang.
Differential Revision: https://reviews.llvm.org/D51401
llvm-svn: 341227
This change makes the XRay Trace loading functions first use a
little-endian data extractor, then on failures try a big-endian data
extractor. Without this change, the trace loading facility will not work
with data written from a big-endian machine.
Follow-up to D51210 and D51289.
llvm-svn: 341226
Before this patch, the FDRTraceWriter would not take endianness into
account when writing data into the output stream.
This is a follow-up to D51289 and D51210.
llvm-svn: 341223
The presence of a ReadAdvance for input operand #0 is problematic
because it changes the input latency of the register used as the base address
for the folded load.
A broadcast cannot start executing if the load address hasn't been computed yet.
In the llvm-mca example, the VBROADCASTSS is dependent on the address generated
by the LEAQ. That means, it cannot start until LEAQ reaches the write-back
stage. If we apply ReadAdvance, then we wrongly assume that the load can start 3
cycles in advance.
Differential Revision: https://reviews.llvm.org/D51534
llvm-svn: 341222
The `mtc1` and `mfc1` definitions in the MipsInstrFPU.td have MMRel,
but do not have StdMMR6Rel tags. When these instructions are emitted
for microMIPS R6 targets, `Mips::MipsR62MicroMipsR6` nor
`Mips::Std2MicroMipsR6` cannot find correct op-codes and as a result the
backend uses mips32 variant of the instructions encoding.
The patch fixes this problem by adding the StdMMR6Rel tag and check
instructions encoding in the test case.
Differential revision: https://reviews.llvm.org/D51482
llvm-svn: 341221
The intention is to enable the extract_vector_elt load combine,
and doing this for other operations interferes with more
useful optimizations on vectors.
Handle any type of load since in principle we should do the
same combine for the various load intrinsics.
llvm-svn: 341219
According to the timeline view, sqrtss/sd/rcpss start executing before the load
address for the memory operand is available.
This problem is caused by the presence of a ReadAfterLd (a ReadAdvance). Those
unary operations should not specify a ReadAdvance at all.
llvm-svn: 341213
When using -g and -dsym, llvm-objdump opens the dsym file and keeps the
MachOObjectFile alive, while the memory buffer that the MachOObjectFile
was based on gets destroyed.
Differential Revision: https://reviews.llvm.org/D51365
llvm-svn: 341209
This simplifies the implementation of the metadata lookup by using
scoped enums, rather than using enum classes. This way we can get the
number-name mapping without having to resort to comments.
Follow-up to D51289.
llvm-svn: 341205
/build/llvm/unittests/XRay/FDRProducerConsumerTest.cpp:90:27: error: declaration of ‘std::unique_ptr<llvm::xray::Record> llvm::xray::{anonymous}::RoundTripTest<T>::Record’ [-fpermissive]
std::unique_ptr<Record> Record;
^~~~~~
In file included from /build/llvm/include/llvm/XRay/FDRLogBuilder.h:12,
from /build/llvm/unittests/XRay/FDRProducerConsumerTest.cpp:15:
/build/llvm/include/llvm/XRay/FDRRecords.h:28:7: error: changes meaning of ‘Record’ from ‘class llvm::xray::Record’ [-fpermissive]
class Record {
^~~~~~
llvm-svn: 341189
This patch fixes the number of micro opcodes, and processor resource cycles for
the following AVX instructions:
vinsertf128rr/rm
vperm2f128rr/rm
vbroadcastf128
Tests have been regenerated using the usual scripts in the llvm/utils directory.
Differential Revision: https://reviews.llvm.org/D51492
llvm-svn: 341185
Summary:
This patch defines two new base types called `RecordProducer` and
`RecordConsumer` which have default implementations for convenience
(particularly for testing).
A `RecordProducer` implementation has one member function called
`produce()` which serves as a factory constructor for `Record`
instances. This code exercises the `RecordInitializer` code path in the
implementation for `FileBasedRecordProducer`.
A `RecordConsumer` has a single member function called `consume(...)`
which, as the name implies, consumes instances of
`std::unique_ptr<Record>`. We have two implementations, one of which is
used in the test to generate a vector of `std::unique_ptr<Record>`
similar to how the `LogBuilder` implementation works.
We introduce a test in `FDRProducerConsumerTest` which ensures that
records we write through the `FDRTraceWriter` can be loaded by the
`FileBasedRecordProducer`. The record(s) loaded this way are written
again through the `FDRTraceWriter` into a separate string, which we then
compare. This ensures that the read-in bytes to create the `Record`
instances in memory can be replicated when written out through the
`FDRTraceWriter`.
This change depends on D51210 and is part of the refactoring of D50441
into smaller, more focused changes.
Reviewers: eizan, kpw
Subscribers: mgorny, hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D51289
llvm-svn: 341180
These stubs should never be emitted for internal symbols, and
nothing in AsmPrinter ever actually use this value when producing
the stubs for COFF anyway.
llvm-svn: 341177
The runtime pseudo relocations can't handle the ARM format embedded
addresses in movw/movt pairs. By using stubs, the potentially
dllimported addresses can be touched up by the runtime pseudo relocation
framework.
Differential Revision: https://reviews.llvm.org/D51450
llvm-svn: 341176
This will cover the (v2i32 (setcc v2f32)) case in replaceNodeResults. That code shouldn't be needed at all in this mode. A future patch will skip it.
llvm-svn: 341171
management and materialization responsibility registration.
The setOverrideObjectFlagsWithResponsibilityFlags method instructs
RTDyldObjectlinkingLayer2 to override the symbol flags produced by RuntimeDyld with
the flags provided by the MaterializationResponsibility instance. This can be used
to enable symbol visibility (hidden/exported) for COFF object files, which do not
currently support the SF_Exported flag.
The setAutoClaimResponsibilityForObjectSymbols method instructs
RTDyldObjectLinkingLayer2 to claim responsibility for any symbols provided by a
given object file that were not already in the MaterializationResponsibility
instance. Setting this flag allows higher-level program representations (e.g.
LLVM IR) to be added based on only a subset of the symbols they provide, without
having to write intervening layers to scan and add the additional symbols. This
trades diagnostic quality for convenience however: If all symbols are enumerated
up-front then clashes can be detected and reported early. If this option is set,
clashes for the additional symbols may not be detected until late, and detection
may depend on the flow of control through JIT'd code.
llvm-svn: 341154
It has essentially the same benefit it has on 64-bit ARM: it
substantially reduces the number of constants used by large GEP
operations. Seems to be generally helpful across a few different
codebases I've tried.
Differential Revision: https://reviews.llvm.org/D51462
llvm-svn: 341136
It's always replaced with the same (short) static string, so just put that
there directly.
No intended behavior change.
https://reviews.llvm.org/D51357
llvm-svn: 341135
forward declarations.
Especially with template instantiations, there are legitimate reasons
why for declarations might be emitted into a DW_TAG_module skeleton /
forward-declaration sub-tree, that are not forward declarations in the
sense of that there is a more complete definition over in a .pcm file.
The example in the testcase is a constant DW_TAG_member of a
DW_TAG_class template instatiation.
rdar://problem/43623196
llvm-svn: 341123
$$Z appears between adjacent expanded parameter packs in the
same template instantiation. We don't need to print it, it's
only there to disambiguate between manglings that would otherwise
be ambiguous. So we just need to parse it and throw it away.
llvm-svn: 341119
Summary:
Currently, the SafeStack analysis disallows out-of-bounds writes but not
out-of-bounds reads for mem intrinsics like llvm.memcpy. This could
cause leaks of pointers to the safe stack by leaking spilled registers/
frame pointers. Check for allocas used as source or destination pointers
to mem intrinsics.
Reviewers: eugenis
Reviewed By: eugenis
Subscribers: pcc, llvm-commits, kcc
Differential Revision: https://reviews.llvm.org/D51334
llvm-svn: 341116
get_execution_seed returns a size_t which varies across platforms, but its
users actually always feed it into a uint64_t role so it makes sense to be
consistent.
Mostly this is just a tidy-up, but it also apparently allows PCH files to be
shared between Clang compilers built for 32-bit and 64-bit hosts.
llvm-svn: 341113
Summary:
RISCVAsmParser needs to handle the case the error message is of specific type, other than the generic Match_InvalidOperand, and the corresponding
operand is missing.
This bug was uncovered by a LLVM MC Assembler Protocol Buffer Fuzzer for the RISC-V assembly language.
Reviewers: asb
Reviewed By: asb
Subscribers: llvm-commits, jocewei, asb, rbar, johnrusso, simoncook, sabuasal, niosHD, kito-cheng, shiva0217, zzheng, edward-jones, mgrang, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX
Differential Revision: https://reviews.llvm.org/D50790
llvm-svn: 341104
This assert tried to check that AND constants are only on the RHS. But its possible for both operands to be constants if one is opaque which will prevent the AND from being constant folded.
Fixes PR38771
llvm-svn: 341102
Generalize the simplification of `pow(2.0, y)` to `pow(2.0 ** n, y)` for all
scalar and vector types.
This improvement helps some benchmarks in SPEC CPU2000 and CPU2006, such as
252.eon, 447.dealII, 453.povray. Otherwise, no significant regressions on
x86-64 or A64.
Differential revision: https://reviews.llvm.org/D49273
llvm-svn: 341095
Splitting an alloca can decrease the alignment of GEPs into the
partition. Normally, rewriting accounts for this, but the code was
missing for uses of PHI nodes and select instructions.
Fixes https://bugs.llvm.org/show_bug.cgi?id=38707 .
Differential Revision: https://reviews.llvm.org/D51335
llvm-svn: 341094
With r295105, some 'noreturn' blocks (those that don't return and have no
successors) may be merged.
If such blocks' predecessors have different outgoing offset or register, don't
report an error in CFIInstrInserter verify().
Thanks to Vlad Tsyrklevich for reporting the issue.
Differential Revision: https://reviews.llvm.org/D51161
llvm-svn: 341087
Summary: Add a new type for named metadata nodes. Use this to implement iterators and accessors for NamedMDNodes and extend the echo test to use them to copy module-level debug information.
Reviewers: whitequark, deadalnix, aprantl, dexonsmith
Reviewed By: whitequark
Subscribers: Wallbraker, JDevlieghere, llvm-commits, harlanhaskins
Differential Revision: https://reviews.llvm.org/D47179
llvm-svn: 341085
Summary:
Port libFuzzer to windows-msvc.
This patch allows libFuzzer targets to be built and run on Windows, using -fsanitize=fuzzer and/or fsanitize=fuzzer-no-link. It allows these forms of coverage instrumentation to work on Windows as well.
It does not fix all issues, such as those with -fsanitize-coverage=stack-depth, which is not usable on Windows as of this patch.
It also does not fix any libFuzzer integration tests. Nearly all of them fail to compile, fixing them will come in a later patch, so libFuzzer tests are disabled on Windows until them.
Patch By: metzman
Reviewers: morehouse, rnk
Reviewed By: morehouse, rnk
Subscribers: #sanitizers, delcypher, morehouse, kcc, eraman
Differential Revision: https://reviews.llvm.org/D51022
llvm-svn: 341082
Summary:
Now uses the StackBased bit from the tablegen defs to identify
stack instructions (and ignore register based or non-wasm instructions).
Also changed how we store operands, since we now have up to 16 of them
per instruction. To not cause static data bloat, these are compressed
into a tiny table.
+ a few other cleanups.
Tested:
- MCTest
- llvm-lit -v `find test -name WebAssembly`
Reviewers: dschuff, jgravelle-google, sunfish, tlively
Subscribers: sbc100, aheejin, llvm-commits
Differential Revision: https://reviews.llvm.org/D51320
llvm-svn: 341081
This was one of the potential follow-ups suggested in D48236,
and these will be used to make matching the patterns in PR38691 cleaner:
https://bugs.llvm.org/show_bug.cgi?id=38691
About the vocabulary: in the DAG, these would be concat_vector with an
undef operand or extract_subvector. Alternate names are discussed in the
review, but I think these are familiar/good enough to proceed. Once we
have uses of them in code, we might adjust if there are better options.
https://reviews.llvm.org/D51392
llvm-svn: 341075
..Move all target-dependent checks into new isCopyInstrImpl method.
This change allows us to treat MoveReg-type instructions and generic
COPY instruction in the same way
Differential Revision: https://reviews.llvm.org/D49913
llvm-svn: 341072
Summary:
This is patch 1 of the new DivergenceAnalysis (https://reviews.llvm.org/D50433).
The purpose of this patch is to free up the name DivergenceAnalysis for the new generic
implementation. The generic implementation class will be shared by specialized
divergence analysis classes.
Patch by: Simon Moll
Reviewed By: nhaehnle
Subscribers: jvesely, jholewinski, arsenm, nhaehnle, mgorny, jfb, llvm-commits
Differential Revision: https://reviews.llvm.org/D50434
Change-Id: Ie8146b11be2c50d5312f30e11c7a3036a15b48cb
llvm-svn: 341071
Summary:
In the case of (and reg, constant) or (or reg, constant), it can be
beneficial to use a ANDNrr/ORNrr instruction instead of ANDrr/ORrr,
if the complement of the constant can be encoded using a single SETHI
instruction instead of a SETHI/ORri pair.
If the constant has more than one use, it is probably better to keep it
in its original form.
Reviewers: jyknight, venkatra
Reviewed By: jyknight
Subscribers: fedor.sergeev, jrtc27, llvm-commits
Differential Revision: https://reviews.llvm.org/D50964
llvm-svn: 341069
FileError is meant to encapsulate both an Error and a file name/path. It should be used in cases where an Error occurs deep down the call chain, and we want to return it to the caller along with the file name.
StringError was updated to display the error messages in different ways. These can be:
1. display the error_code message, and convert to the same error_code (ECError behavior)
2. display an arbitrary string, and convert to a provided error_code (current StringError behavior)
3. display both an error_code message and a string, in this order; and convert to the same error_code
These behaviors can be triggered depending on the constructor. The goal is to use StringError as a base class, when a library needs to provide a explicit Error type.
Differential Revision: https://reviews.llvm.org/D50807
llvm-svn: 341064
Summary:
This is a continuation of https://reviews.llvm.org/D49727
Below the original text, current changes in the comments:
Currently, in line with GCC, when specifying reserved registers like sp or pc on an inline asm() clobber list, we don't always preserve the original value across the statement. And in general, overwriting reserved registers can have surprising results.
For example:
extern int bar(int[]);
int foo(int i) {
int a[i]; // VLA
asm volatile(
"mov r7, #1"
:
:
: "r7"
);
return 1 + bar(a);
}
Compiled for thumb, this gives:
$ clang --target=arm-arm-none-eabi -march=armv7a -c test.c -o - -S -O1 -mthumb
...
foo:
.fnstart
@ %bb.0: @ %entry
.save {r4, r5, r6, r7, lr}
push {r4, r5, r6, r7, lr}
.setfp r7, sp, #12
add r7, sp, #12
.pad #4
sub sp, #4
movs r1, #7
add.w r0, r1, r0, lsl #2
bic r0, r0, #7
sub.w r0, sp, r0
mov sp, r0
@APP
mov.w r7, #1
@NO_APP
bl bar
adds r0, #1
sub.w r4, r7, #12
mov sp, r4
pop {r4, r5, r6, r7, pc}
...
r7 is used as the frame pointer for thumb targets, and this function needs to restore the SP from the FP because of the variable-length stack allocation a. r7 is clobbered by the inline assembly (and r7 is included in the clobber list), but LLVM does not preserve the value of the frame pointer across the assembly block.
This type of behavior is similar to GCC's and has been discussed on the bugtracker: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=11807 . No consensus seemed to have been reached on the way forward. Clang behavior has briefly been discussed on the CFE mailing (starting here: http://lists.llvm.org/pipermail/cfe-dev/2018-July/058392.html). I've opted for following Eli Friedman's advice to print warnings when there are reserved registers on the clobber list so as not to diverge from GCC behavior for now.
The patch uses MachineRegisterInfo's target-specific knowledge of reserved registers, just before we convert the inline asm string in the AsmPrinter.
If we find a reserved register, we print a warning:
repro.c:6:7: warning: inline asm clobber list contains reserved registers: R7 [-Winline-asm]
"mov r7, #1"
^
Reviewers: efriedma, olista01, javed.absar
Reviewed By: efriedma
Subscribers: eraman, kristof.beyls, llvm-commits
Differential Revision: https://reviews.llvm.org/D51165
llvm-svn: 341062
Providing that the load is known to be 4 byte aligned, we can optimise a
ldr(adr address) to just ldr address.
Differential Revision: https://reviews.llvm.org/D51030
llvm-svn: 341058
This patch introduces the following changes to the DispatchStatistics view:
* DispatchStatistics now reports the number of dispatched opcodes instead of
the number of dispatched instructions.
* The "Dynamic Dispatch Stall Cycles" table now also reports the percentage of
stall cycles against the total simulated cycles.
This change allows users to easily compare dispatch group sizes with the
processor DispatchWidth.
Before this change, it was difficult to correlate the two numbers, since
DispatchStatistics view reported numbers of instructions (instead of opcodes).
DispatchWidth defines the maximum size of a dispatch group in terms of number of
micro opcodes.
The other change introduced by this patch is related to how DispatchStage
generates "instruction dispatch" events.
In particular:
* There can be multiple dispatch events associated with a same instruction
* Each dispatch event now encapsulates the number of dispatched micro opcodes.
The number of micro opcodes declared by an instruction may exceed the processor
DispatchWidth. Therefore, we cannot assume that instructions are always fully
dispatched in a single cycle.
DispatchStage knows already how to handle instructions declaring a number of
opcodes bigger that DispatchWidth. However, DispatchStage always emitted a
single instruction dispatch event (during the first simulated dispatch cycle)
for instructions dispatched.
With this patch, DispatchStage now correctly notifies multiple dispatch events
for instructions that cannot be dispatched in a single cycle.
A few views had to be modified. Views can no longer assume that there can only
be one dispatch event per instruction.
Tests (and docs) have been updated.
Differential Revision: https://reviews.llvm.org/D51430
llvm-svn: 341055
Previously CHECK prefixes weren't defined that can be used to check _only_ the
InstPrinter output when generating .s from llvm-mc, or that check _only_ the
output after passing the generated object through objdump. This means we can't
write useful checks for instructions that reference symbols.
Instead, use:
CHECK-S Match the .s output with aliases enabled
CHECK-S-NOALIAS Match the .s output with aliases disabled
CHECK-OBJ Match the objdumped object output with aliases enabled
CHECK-OBJ-NOALIAS Match the objdumped object output with aliases enabled
CHECK-S-OBJ Match both the .s and objdumped object output with
aliases enabled
CHECK-S-OBJ-NOALIAS Match both the .s and objdumped object output with
aliases disabled
While we're at it, use whitespace consistently within this file.
llvm-svn: 341050
Bots are unhappy:
/Users/buildslave/jenkins/workspace/clang-stage1-cmake-RA-incremental/llvm/test/CodeGen/Hexagon/swp-const-tc2.ll:10:14: error: CHECK-NOT: excluded string found in input
; CHECK-NOT: = mpy
^
<stdin>:22:6: note: found here
r5 += mpyi(r2,r3)
^~~~~
This reverts commit r341046.
llvm-svn: 341049
Hacker's Delight 10-17: when C is constant,
the result of X % C == 0 can be computed more cheaply
without actually calculating the remainder.
The motivation is discussed here:
https://bugs.llvm.org/show_bug.cgi?id=35479.
Patch by: hermord (Dmytro Shynkevych)!
For https://reviews.llvm.org/D50222
llvm-svn: 341047
Summary:
As suggested in D50222, this has been refactored into a separate patch.
The undef and the infinite loop at the end cause this test to be translated
unpredictably. In particular, the checked-for `mpy` disappears under
certain legal optimizations (e.g. the one in D50222).
Since the use of these constructs is not relevant to the behavior tested,
according to the header comment, this change, suggested by @kparzysz,
eliminates them.
Patch by: hermord (Dmytro Shynkevych)!
Reviewers: kparzysz
Reviewed By: kparzysz
Subscribers: llvm-commits, kparzysz
Differential Revision: https://reviews.llvm.org/D50944
llvm-svn: 341046
That resulted in the check-llvm-* targets not being avaliable
in the QtCreator-configured build directories.
Moreover, that was a clearly non-NFC change, and i can't find any review
for it.
This reverts commit rL340435.
llvm-svn: 341045
This reverts commit r340997.
This change turned out not to be NFC after all, but e.g. causes
clang to crash when building the linux kernel for aarch64.
llvm-svn: 341031
Summary:
This is the first step in the larger refactoring and reduction of
D50441.
This step in the process does the following:
- Introduces more granular types of `Record`s representing the many
kinds of records written/read by the Flight Data Recorder (FDR) mode
`Trace` loading function(s).
- Introduces an abstract `RecordVisitor` type meant to handle the
processing of the various `Record` derived types. This `RecordVisitor`
has two implementations in this patch: `RecordInitializer` and
`FDRTraceWriter`.
- We also introduce a convenience interface for building a collection of
`Record` instances called a `LogBuilder`. This allows us to generate
sequences of `Record` instances manually (used in unit tests but
useful otherwise).
- The`FDRTraceWriter` class implements the `RecordVisitor` interface and
handles the writing of metadata records to a `raw_ostream`. We
demonstrate that in the unit test, we can generate in-memory FDR mode
traces using the specific `Record` derived types, which we load
through the `loadTrace(...)` function yielding valid `Trace` objects.
This patch introduces the required types and concepts for us to start
replacing the logic implemented in the `loadFDRLog` function to use the
more granular types. In subsequent patches, we will introduce more
visitor implementations which isolate the verification, printing,
indexing, production/consumption, and finally the conversion of the FDR
mode logs.
The overarching goal of these changes is to make handling FDR mode logs
better tested, more understandable, more extensible, and more
systematic. This will also allow us to better represent the execution
trace, as we improve the fidelity of the events we represent in an XRay
`Trace` object, which we intend to do after FDR mode log processing is
in better shape.
Reviewers: eizan
Reviewed By: eizan
Subscribers: mgorny, hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D51210
llvm-svn: 341029
In computeRegisterLiveness, the max instructions to search
was counting dbg_value instructions, which could potentially
cause an observable codegen change from the presence of debug
info.
llvm-svn: 341028
If there is an unused def, this would previously
report that the register was live. Check for uses
first so that it is reported as dead if never used.
llvm-svn: 341027
If the end of the block is reached during the scan, check
the live ins of the successors. This was already done in the
other direction if the block entry was reached.
llvm-svn: 341026
Check that Machine CSE correctly handles during the transformation, the
debug location information for local variables.
Differential Revision: https://reviews.llvm.org/D50887
llvm-svn: 341025
We now only add +64bit to the CPU string for "generic" CPU. All other CPU names are assumed to have the feature flag already set if they support 64-bit. I've remove the implies from CMPXCHG8 so that Feature64Bit only comes in via CPUs or user passing -mattr=+64bit.
I've changed the assert to a report_fatal_error so it's not lost in Release builds.
The test updates are to fix things that tripped the new error.
Differential Revision: https://reviews.llvm.org/D51231
llvm-svn: 341022
If an ABI-like value is used in a different block,
the type split used is not necessarily the same as
the call's ABI. The value is used through an intermediate
copy virtual registers from the other block. This
resulted in copies with inconsistent sizes later.
Fixes regressions since r338197 when AMDGPU started
splitting vector types for calls.
llvm-svn: 341018
These classes don't make any changes to IR and have no reason to be in
Transform/Utils. This patch moves them to Analysis folder. This will allow
us reusing these classes in some analyzes, like MustExecute.
llvm-svn: 341015
rL340921 has been reverted by rL340923 due to linkage dependency
from Transform/Utils to Analysis which is not allowed. In this patch
this has been fixed, a new utility function moved to Analysis.
Differential Revision: https://reviews.llvm.org/D51152
llvm-svn: 341014
Summary:
This change implements the profile loading functionality in LLVM to
support XRay's profiling mode in compiler-rt.
We introduce a type named `llvm::xray::Profile` which allows building a
profile representation. We can load an XRay profile from a file to build
Profile instances, or do it manually through the Profile type's API.
The intent is to get the `llvm-xray` tool to generate `Profile`
instances and use that as the common abstraction through which all
conversion and analysis can be done. In the future we can generate
`Profile` instances from `Trace` instances as well, through conversion
functions.
Some of the key operations supported by the `Profile` API are:
- Path interning (`Profile::internPath(...)`) which returns a unique path
identifier.
- Block appending (`Profile::addBlock(...)`) to add thread-associated
profile information.
- Path ID to Path lookup (`Profile::expandPath(...)`) to look up a
PathID and return the original interned path.
- Block iteration.
A 'Path' in this context represents the function call stack in
leaf-to-root order. This is represented as a path in an internally
managed prefix tree in the `Profile` instance. Having a handle (PathID)
to identify the unique Paths we encounter for a particular Profile
allows us to reduce the amount of memory required to associate profile
data to a particular Path.
This is the first of a series of patches to migrate the `llvm-stacks`
tool towards using a single profile representation.
Depends on D48653.
Reviewers: kpw, eizan
Reviewed By: kpw
Subscribers: kpw, thakis, mgorny, llvm-commits, hiraditya
Differential Revision: https://reviews.llvm.org/D48370
llvm-svn: 341012
We don't have enough information to know if struct types being
bitcast will cause validation failures or not, so be conservative
and allow such cases to persist (fot now).
Fixes: https://bugs.llvm.org/show_bug.cgi?id=38711
Subscribers: dschuff, jgravelle-google, aheejin, sunfish, llvm-commits
Differential Revision: https://reviews.llvm.org/D51460
llvm-svn: 341010
Summary:
Global variables that are external and zero initialized are
supposed to be merged with global variables in the bss section
rather than the data section.
Reviewers: efriedma, rengolin, t.p.northover, javed.absar, asl, john.brawn, pcc
Reviewed By: efriedma
Subscribers: dmgreen, llvm-commits
Differential Revision: https://reviews.llvm.org/D51379
llvm-svn: 341008
The cost modeling was not accounting for the fact we were duplicating the instruction once per predecessor. With a default threshold of 1, this meant we were actually creating #pred copies.
Adding to the fun, there is *absolutely no* test coverage for this. Simply bailing for more than one predecessor passes all checked in tests.
llvm-svn: 341001
These bugs were found by writing a Python script which spidered
the entire Chromium build directory tree demangling every symbol
in every object file. At the start, the tool printed:
Processed 27443 object files.
2926377/2936108 symbols successfully demangled (99.6686%)
9731 symbols could not be demangled (0.3314%)
14589 files crashed while demangling (53.1611%)
After this patch, it prints:
Processed 27443 object files.
41295518/41295617 symbols successfully demangled (99.9998%)
99 symbols could not be demangled (0.0002%)
0 files crashed while demangling (0.0000%)
The issues fixed in this patch are:
* Ignore empty parameter packs. Previously we would encounter
a mangling for an empty parameter pack and add a null node
to the AST. Since we don't print these anyway, we now just
don't add anything to the AST and ignore it entirely. This
fixes some of the crashes.
* Account for "incorrect" string literal demanglings. Apparently
an older version of clang would not truncate mangled string
literals to 32 bytes of encoded character data. The demangling
code however would allocate a 32 byte buffer thinking that it
would not encounter more than this, and overrun the buffer.
We now demangle up to 128 bytes of data, since the buggy
clang would encode up to 32 *characters* of data.
* Extended support for demangling init-fini stubs. If you had
something like
struct Foo {
static vector<string> S;
};
this would generate a dynamic atexit initializer *for the
variable*. We didn't handle this, but now we print something
nice. This is actually an improvement over undname, which will
fail to demangle this at all.
* Fixed one case of static this adjustment. We weren't handling
several thunk codes so we didn't recognize the mangling. These
are now handled.
* Fixed a back-referencing problem. Member pointer templates
should have their components considered for back-referencing
The remaining 99 symbols which can't be demangled are all symbols
which are compiler-generated and undname can't demangle either.
llvm-svn: 341000
This should more accurately reflect what the AsmPrinter will actually
do.
This is NFC, as far as I can tell; all the places that might be affected
already have an extra check to avoid using the result of
getPreferredAlignment in this situation.
Differential Revision: https://reviews.llvm.org/D51377
llvm-svn: 340999
The restoreDateOnFile() method used to preserve dates uses sys::fs::openFileForWrite(). That method defaults to opening files with CD_CreateAlways, which truncates the output file if it exists. Use CD_OpenExisting instead to open it and *not* truncate it, which also has the side benefit of erroring if the file does not exist (it should always exist, because we just wrote it out).
Also, fix the test case to make sure the output is a valid output file, and not empty. The extra test assertions are enough to catch this regression.
llvm-svn: 340996
On AMDGPU we have 70 register classes, so iterating over all 70
each time and exiting is costly on the CPU, this flips the loop
around so that it loops over the 70 register classes first,
and exits without doing the inner loop if needed.
On my test just starting radv this takes
RegUsageInfoCollector::runOnMachineFunction
from 6.0% of total time to 2.7% of total time,
and reduces the startup from 2.24s to 2.19s
Patch by David Airlie!
Differential Revision: https://reviews.llvm.org/D48582
llvm-svn: 340993
Teach LICM to hoist stores out of loops when the store writes to a location otherwise unused in the loop, writes a value which is invariant, and is guaranteed to execute if the loop is entered.
Worth noting is that this transformation is partially overlapping with the existing promotion transformation. Reasons this is worthwhile anyway include:
* For multi-exit loops, this doesn't require duplication of the store.
* It kicks in for case where we can't prove we exit through a normal exit (i.e. we may throw), but can prove the store executes before that possible side exit.
Differential Revision: https://reviews.llvm.org/D50925
llvm-svn: 340974
Summary:
Assert from PR38737 happens on the dead block inside the parent loop
after unswitching nontrivial switch in the inner loop.
deleteDeadBlocksFromLoop now takes extra care to detect/remove dead
blocks in all the parent loops in addition to the blocks from original
loop being unswitched.
Reviewers: asbirlea, chandlerc
Reviewed By: asbirlea
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D51415
llvm-svn: 340955
This is a follow-up to rL339604 which did the same transform
for a sin libcall. The handling of intrinsics vs. libcalls
is unfortunately scattered, so I'm just adding this next to
the existing transform for llvm.cos for now.
This should resolve PR38458:
https://bugs.llvm.org/show_bug.cgi?id=38458
If the call was already negated, the negates will cancel
each other out.
llvm-svn: 340952
Summary:
Port libFuzzer to windows-msvc.
This patch allows libFuzzer targets to be built and run on Windows, using -fsanitize=fuzzer and/or fsanitize=fuzzer-no-link. It allows these forms of coverage instrumentation to work on Windows as well.
It does not fix all issues, such as those with -fsanitize-coverage=stack-depth, which is not usable on Windows as of this patch.
It also does not fix any libFuzzer integration tests. Nearly all of them fail to compile, fixing them will come in a later patch, so libFuzzer tests are disabled on Windows until them.
Reviewers: morehouse, rnk
Reviewed By: morehouse, rnk
Subscribers: #sanitizers, delcypher, morehouse, kcc, eraman
Differential Revision: https://reviews.llvm.org/D51022
llvm-svn: 340949
Expand the simplification of `pow(exp{,2}(x), y)` to all FP types.
This improvement helps some benchmarks in SPEC CPU2000 and CPU2006, such as
252.eon, 447.dealII, 453.povray. Otherwise, no significant regressions on
x86-64 or A64.
Differential revision: https://reviews.llvm.org/D51195
llvm-svn: 340948
Generalize the simplification of `pow(2.0, y)` to `pow(2.0 ** n, y)` for all
scalar and vector types.
This improvement helps some benchmarks in SPEC CPU2000 and CPU2006, such as
252.eon, 447.dealII, 453.povray. Otherwise, no significant regressions on
x86-64 or A64.
Differential revision: https://reviews.llvm.org/D49273
llvm-svn: 340947
This patch adds two new fields to the perf report generated by the SummaryView.
Fields are now logically organized into two small groups; only the second group
contains throughput indicators.
Example:
```
Iterations: 100
Instructions: 300
Total Cycles: 414
Total uOps: 700
Dispatch Width: 4
uOps Per Cycle: 1.69
IPC: 0.72
Block RThroughput: 4.0
```
This patch also updates the docs for llvm-mca.
Due to the nature of this change, several tests in the tools/llvm-mca directory
were affected, and had to be updated using script `update_mca_test_checks.py`.
llvm-svn: 340946
Variables declared with the dllimport attribute are accessed via a
stub variable named __imp_<var>. In MinGW configurations, variables that
aren't declared with a dllimport attribute might still end up imported
from another DLL with runtime pseudo relocs.
For x86_64, this avoids the risk that the target is out of range
for a 32 bit PC relative reference, in case the target DLL is loaded
further than 4 GB from the reference. It also avoids having to make the
text section writable at runtime when doing the runtime fixups, which
makes it worthwhile to do for i386 as well.
Add stub variables for all dso local data references where a definition
of the variable isn't visible within the module, since the DLL data
autoimporting might make them imported even though they are marked as
dso local within LLVM.
Don't do this for variables that actually are defined within the same
module, since we then know for sure that it actually is dso local.
Don't do this for references to functions, since there's no need for
runtime pseudo relocations for autoimporting them; if a function from
a different DLL is called without the appropriate dllimport attribute,
the call just gets routed via a thunk instead.
GCC does something similar since 4.9 (when compiling with -mcmodel=medium
or large; from that version, medium is the default code model for x86_64
mingw), but only for x86_64.
Differential Revision: https://reviews.llvm.org/D51288
llvm-svn: 340942
We were calling getNumUses to check for 1 or 2 uses. But getNumUses is linear in the number of uses. We can instead use !hasNUsesOrMore(3) which will stop the linear scan as soon as it determines there are at least 3 uses even if there are more.
llvm-svn: 340939
Previously, the DebugCounterTest was failing because CommandLineTest.GetCommandLineArguments was clearing all the global singletons.
Differential Revision: https://reviews.llvm.org/D51423
llvm-svn: 340935
MipsSEInstrInfo class defines for internal purpose unconditional
branches as Mips::B nad Mips:J even in case of microMIPS code
generation. Under some conditions that leads to the bug - for rather long
branch which fits to Mips jump instruction offset size, but does not fit
to microMIPS jump offset size, we generate 'short' branch and later show
an error 'out of range PC16 fixup' after check in the isBranchOffsetInRange
routine.
Differential revision: https://reviews.llvm.org/D50615
llvm-svn: 340932
Involves microMIPS's jump in the analyzable branch set to reduce some
code patterns.
Differential revision: https://reviews.llvm.org/D50613
llvm-svn: 340931
For a certain combination of options, BuildPairF64_{64}, ExtractElementF64{_64}
may be expanded into instructions using stack.
Add implicit operand $sp for such cases so that ShrinkWrapping doesn't move
prologue setup below them.
Fixes MultiSource/Benchmarks/MallocBench/cfrac for
'--target=mips-img-linux-gnu -mcpu=mips32r6 -mfpxx -mnan=2008'
and
'--target=mips-img-linux-gnu -mcpu=mips32r6 -mfp64 -mnan=2008 -mno-odd-spreg'.
Differential Revision: https://reviews.llvm.org/D50986
llvm-svn: 340927
Rebase rL338240 since the excessive memory usage observed when using
GVNHoist with UBSan has been fixed by rL340818.
Differential Revision: https://reviews.llvm.org/D49858
llvm-svn: 340922
We have multiple places in code where we try to identify whether or not
some instruction is a guard. This patch factors out this logic into a separate
utility function which works uniformly in all places.
Differential Revision: https://reviews.llvm.org/D51152
Reviewed By: fedor.sergeev
llvm-svn: 340921
Adjust missed test to avoid the X / X -> 1 & X % X -> 0 folds while keeping their original purposes.
Differential Revision: https://reviews.llvm.org/D50636
llvm-svn: 340917
Adjust tests to avoid the X / X -> 1 & X % X -> 0 folds while keeping their original purposes.
Differential Revision: https://reviews.llvm.org/D50636
llvm-svn: 340916
This patch creates file GuardUtils which will contain logic for work with guards
that can be shared across different passes.
Differential Revision: https://reviews.llvm.org/D51151
Reviewed By: fedor.sergeev
llvm-svn: 340914
Noticed while looking at D49562 codegen - we can avoid a large constant mask load and a slow VPBLENDVB select op by using VPBLENDW+VPBLENDD instead.
TODO: As discussed on the patch, we should investigate adding VPBLENDVB handling to target shuffle combining as well, that will allow us to extend this to VPBLENDW+VPBLENDW+VPBLENDD.
Differential Revision: https://reviews.llvm.org/D50074
llvm-svn: 340913
Summary:
Per clang-tidy:
function 'llvm::MCStreamer::checkCVLocSection' has a definition with different parameter names
.../llvm/lib/MC/MCStreamer.cpp:275:18: the definition seen here
.../llvm/include/llvm/MC/MCStreamer.h:235:8: differing parameters are named here: ('FuncId'), in definition: ('FunctionId')
Reviewers: bkramer
Reviewed By: bkramer
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D51406
llvm-svn: 340912
The problems with benchmark build should be fixed now, but Windows
buildbots still run into errors seemingly because of the bug in
clang-cl. Because of that, benchmark shouldn't be built on Windows at
this point.
llvm-svn: 340905
I am experimenting with a single split dwarf (.dwo sections in .o files).
I want to make linker to ignore .dwo sections in .o, for that I am trying to add
SHF_EXCLUDE flag ("E") for them in my asm sample.
I found that currently, it is impossible to add any flag for debug sections using llvm-mc.
That happens because we have a set of predefined unique sections created early with default flags:
https://github.com/llvm-mirror/llvm/blob/master/lib/MC/MCObjectFileInfo.cpp#L391
This patch allows a user to add any flags he wants.
I had to edit TargetLoweringObjectFileImpl.cpp to set MetaData type for debug sections.
Their kind was Data by default (so they were allocatable) and so after changes introduced by
this patch the SHF_ALLOC flag was applied for them, what does not make sense for debug sections.
One of OrcJITTests tests failed because of that.
Differential revision: https://reviews.llvm.org/D51361
llvm-svn: 340904
Summary:
Add some optional code to validate getInstSizeInBytes for emitted
instructions. This flushed out some issues which are fixed by this
patch:
- Streamline getInstSizeInBytes
- Properly define the VI readlane/writelane instruction as VOP3
- Fix the inline constant determination. Specifically, this change
fixes an issue where a 32-bit value of 0xffffffff was recorded
as unsigned. This is equal to -1 when restricting to a 32-bit
comparison, and an inline constant can be used.
Reviewers: arsenm, rampitec
Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits
Differential Revision: https://reviews.llvm.org/D50629
Change-Id: Id87c3b7975839da0de8156a124b0ce98c5fb47f2
llvm-svn: 340903
In the PR, LoopSink was trying to sink into a catchswitch block, which
doesn't have a valid insertion point.
Differential Revision: https://reviews.llvm.org/D51307
llvm-svn: 340900
These aren't used in tree and the number of operands in the type profile is wrong. X86 uses its own ISD opcode and type profile after op legalization.
llvm-svn: 340899
Mostly this includes <auto> and <decltype-auto> return values.
Additionally, this fixes a fairly obscure back-referencing bug
that was encountered in one of the C++14 tests, which is that
if you have something like Foo<&bar, &bar> then the `bar`
forms a backreference.
llvm-svn: 340896