This change tries to fix a place missing `moveAndDanglePseudoProbes `. In FoldValueComparisonIntoPredecessors, it folds the BB into predecessors and then marked the BB unreachable. However, the original logic from the BB is still alive, deleting the probe will mislead the SampleLoader mark it as zero count sample.
Reviewed By: hoy, wenlei
Differential Revision: https://reviews.llvm.org/D102721
These will be used for error propagation and handling in the ORC runtime.
The implementations of these types are cut-down versions of the error
support in llvm/Support/Error.h. Most advice on llvm::Error and llvm::Expected
(e.g. from the LLVM Programmer's manual) applies equally to __orc_rt::Error
and __orc_rt::Expected. The primary difference is the mechanism for testing
and handling error types: The ORC runtime uses a new 'error_cast' operation
to replace the handleErrors family of functions. See error_cast comments in
error.h.
Reland Note: This was accidentally reverted in 80d981eda6, but is an important improvement even outside of the driving motivator in D102567.
We currently use SourceMgr::getLineAndColumn to get the line and column for an SMLoc, but this includes a call to StringRef::find_last_of that ends up dominating compile time. In D102567, we start creating locations from the input file for block arguments which resulted in an extreme performance regression for modules with very large amounts of block arguments. This revision switches to just using a pointer offset from the beginning of the line to calculate the column(all MLIR files are simple ascii), resulting in a compile time reduction from 4700 seconds (1 hour and 18 minutes) to 8 seconds.
FullTy is only necessary when we need to figure out what type an
instruction works with given a pointer's pointee type. However, we just
end up using the value operand's type, so FullTy isn't necessary.
Reviewed By: dblaikie
Differential Revision: https://reviews.llvm.org/D102788
These symbols are long, and they tend to cause the PDB file size to
overflow. They are generally not necessary when debugging problems in
user code.
This change reduces the size of chrome.dll.pdb with coverage from
6,937,108,480 bytes to 4,690,210,816 bytes.
Differential Revision: https://reviews.llvm.org/D102719
No verifier changes needed, the verifier currently doesn't check that
the pointer operand's pointee type matches the GEP type. There is a
similar check in GetElementPtrInst::Create() though.
Reviewed By: dblaikie
Differential Revision: https://reviews.llvm.org/D102744
The added DumpDataExtractorTest uncovered that this is lshifting a negative
integer which upsets ubsan and breaks the sanitizer bot. This patch just
changes the variable we shift to be unsigned and adds a bunch of tests to make
sure this function does what it promises.
when implementing an optional protocol requirement
When an Objective-C method implements an optional protocol requirement,
allow the method to use a newer introduced or older obsoleted
availability version than what's specified on the method in the protocol
itself. This allows SDK adopters to adopt an optional method from a
protocol later than when the method is introduced in the protocol. The users
that call an optional method on an object that conforms to this protocol
are supposed to check whether the object implements the method or not,
so a lack of appropriate `if (@available)` check for a new OS version
is not a cause of concern as there's already another runtime check that's required.
Differential Revision: https://reviews.llvm.org/D102459
Summary:
Currently, only `OptimizationRemarks` can be emitted using a Function.
Add constructors to allow this for `OptimizationRemarksAnalysis` and
`OptimizationRemarkMissed` as well.
Reviewed By: jdoerfert thegameg
Differential Revision: https://reviews.llvm.org/D102784
This is another FMF gap exposed by D90901, but I don't see a way
to show the difference in a regression test as with:
f66ba4c6025663
We will see an asm difference if we add a test as part of D90901.
* adds `sized_range` and conformance tests
* moves `disable_sized_range` into namespace `std::ranges`
* removes explicit type parameter
Implements part of P0896 'The One Ranges Proposal'.
Differential Revision: https://reviews.llvm.org/D102434
Similar to 8854b27 -
All of the CHECK lines should be identical to before,
but without any of the x86-specific calls that were
replaced with generic FMA long ago.
The file still has value because it shows a miscompile
as demonstrated in D90901, but we probably need to
add tests with FMF to make that explicit without
losing coverage.
vector.transfer_read and vector.transfer_write operations are converted
to llvm intrinsics with specific alignment information, however there
doesn't seem to be a way in llvm to take information from llvm.assume
intrinsics and change this alignment information. In any
event, due the to the structure of the llvm.assume instrinsic, applying
this information at the llvm level is more cumbersome. Instead, let's
generate the masked vector load and store instrinsic with the right
alignment information from MLIR in the first place. Since
we're bothering to do this, lets just emit the proper alignment for
loads, stores, scatter, and gather ops too.
Differential Revision: https://reviews.llvm.org/D100444
For source-based coverage, the frontend sets the counter IDs and the
constraints of counter IDs is not defined. For e.g., the Rust frontend
until recently had a reserved counter #0
(https://github.com/rust-lang/rust/pull/83774). Rust coverage
instrumentation also creates counters on edges in addition to basic
blocks. Some functions may have more counters than regions.
This breaks an assumption in CoverageMapping.cpp where the number of
counters in a function is assumed to be bounded by the number of
regions:
Counts.assign(Record.MappingRegions.size(), 0);
This assumption causes CounterMappingContext::evaluate() to fail since
there are not enough counter values created in the above call to
`Counts.assign`. Consequently, some uncovered functions are not
reported in coverage reports.
This change walks a Function's CoverageMappingRecord to find the maximum
counter ID, and uses it to initialize the counter array when instrprof
records are missing for a function in sparse profiles.
Differential Revision: https://reviews.llvm.org/D101780
In order to create the code regions for llvm-mca to analyze, llvm-mca creates an
AsmCodeRegionGenerator and calls AsmCodeRegionGenerator::parseCodeRegions().
Within this function, both an MCAsmParser and MCTargetAsmParser are created so
that MCAsmParser::Run() can be used to create the code regions for us.
These parser classes were created for llvm-mc so they are designed to emit code
with an MCStreamer and MCTargetStreamer that are expected to be setup and passed
into the MCAsmParser constructor. Because llvm-mca doesn’t want to emit any
code, an MCStreamerWrapper class gets created instead and passed into the
MCAsmParser constructor. This wrapper inherits from MCStreamer and overrides
many of the emit methods to just do nothing. The exception is the
emitInstruction() method which calls Regions.addInstruction(Inst).
This works well and allows llvm-mca to utilize llvm-mc’s MCAsmParser to build
our code regions, however there are a few directives which rely on the
MCTargetStreamer. llvm-mc assumes that the MCStreamer that gets passed into the
MCAsmParser’s constructor has a valid pointer to an MCTargetStreamer. Because
llvm-mca doesn’t setup an MCTargetStreamer, when the parser encounters one of
those directives, a segfault will occur.
In x86, each one of these 7 directives will cause this segfault if they exist in
the input assembly to llvm-mca:
.cv_fpo_proc
.cv_fpo_setframe
.cv_fpo_pushreg
.cv_fpo_stackalloc
.cv_fpo_stackalign
.cv_fpo_endprologue
.cv_fpo_endproc
I haven’t looked at other targets, but I wouldn’t be surprised if some of the
other ones also have certain directives which could result in this same
segfault.
My proposed solution is to simply initialize an MCTargetStreamer after we
initialize the MCStreamerWrapper. The MCTargetStreamer requires an ostream
object, but we don’t actually want any of these directives to be emitted
anywhere, so I use an ostream created with the nulls() function. Since this
needs to happen after the MCStreamerWrapper has been initialized, it needs to
happen within the AsmCodeRegionGenerator::parseCodeRegions() function. The
MCTargetStreamer also needs an MCInstPrinter which is easiest to initialize
within the main() function of llvm-mca. So this MCInstPrinter gets constructed
within main() then passed into the parseCodeRegions() function as a parameter.
(If you feel like it would be appropriate and possible to create the
MCInstPrinter within the parseCodeRegions() function, then feel free to modify
my solution. That would stop us from having to pass it into the function and
would limit its scope / lifetime.)
My solution stops the segfault from happening and still passes all of the
current (expected) llvm-mca tests. I also added a new test for x86 that checks
for this segfault on an input that includes one of the .cv_fpo directives (this
test fails without my solution, but passes with it).
As far as I can tell, all of the functions that I modified are only called from
within llvm-mca so there shouldn’t be any worries about breaking other tools.
Differential Revision: https://reviews.llvm.org/D102709
Turns out simplifyLoopIVs sometimes returns a non-dead instruction in it's DeadInsts out param. I had done a bit of NFC cleanup which was only NFC if simplifyLoopIVs obeyed it's documentation. I'm simplfy dropping that part of the change.
Commit message from try 3:
Recommitting after fixing a bug found post commit. Amusingly, try 1 had been correct, and by reverting to incorporate last minute review feedback, I introduce the bug. Oops. :)
Original commit message:
The problem was that recursively deleting an instruction can delete instructions beyond the current iterator (via a dead phi), thus invalidating iteration. Test case added in LoopUnroll/dce.ll to cover this case.
LoopUnroll does a limited DCE pass after unrolling, but if you have a chain of dead instructions, it only deletes the last one. Improve the code to recursively delete all trivially dead instructions.
Differential Revision: https://reviews.llvm.org/D102511
If there are no counters, an mmap() of the counters section would fail
due to the size argument being too small (EINVAL).
rdar://78175925
Differential Revision: https://reviews.llvm.org/D102735
Sample profile loader can be run in both LTO prelink and postlink. Currently the counts annoation in postilnk doesn't fully overwrite what's done in prelink. I'm adding a switch (`-overwrite-existing-weights=1`) to enable a full overwrite, which includes:
1. Clear old metadata for calls when their parent block has a zero count. This could be caused by prelink code duplication.
2. Clear indirect call metadata if somehow all the rest targets have a sum of zero count.
3. Overwrite branch weight for basic blocks.
With a CS profile, I was seeing #1 and #2 help reduce code size by preventing post-sample ICP and CGSCC inliner working on obsolete metadata, which come from a partial global inlining in prelink. It's not expected to work well for non-CS case with a less-accurate post-inline count quality.
It's worth calling out that some prelink optimizations can damage counts quality in an irreversible way. One example is the loop rotate optimization. Due to lack of exact loop entry count (profiling can only give loop iteration count and loop exit count), moving one iteration out of the loop body leaves the rest iteration count unknown. We had to turn off prelink loop rotate to achieve a better postlink counts quality. A even better postlink counts quality can be archived by turning off prelink CGSCC inlining which is not context-sensitive.
Reviewed By: wenlei, wmi
Differential Revision: https://reviews.llvm.org/D102537
lld/MachO/Driver.cpp and lld/MachO/SyntheticSections.cpp include
llvm/Config/config.h which doesn't exist when building standalone lld.
This patch replaces llvm/Config/config.h include with llvm/Config/llvm-config.h
just like it is in lld/ELF/Driver.cpp and HAVE_LIBXAR with LLVM_HAVE_LIXAR and
moves LLVM_HAVE_LIBXAR from config.h to llvm-config.h
Also it adds LLVM_HAVE_LIBXAR to LLVMConfig.cmake and links liblldMachO2.so
with XAR_LIB if LLVM_HAVE_LIBXAR is set.
Differential Revision: https://reviews.llvm.org/D102084
The operation of some VP intrinsics do/will not map to regular
instruction opcodes. Returning 'None' seems more intuitive here than
'Instruction::Call'.
Reviewed By: frasercrmck
Differential Revision: https://reviews.llvm.org/D102778
- This patch (is one in a series of patches) which introduces HLASM Parser support (for the first parameter of inline asm statements) to LLVM ([[ https://lists.llvm.org/pipermail/llvm-dev/2021-January/147686.html | main RFC here ]])
- This patch in particular introduces HLASM Parser support for Z machine instructions.
- The approach taken here was to subclass `AsmParser`, and make various functions and variables as "protected" wherever appropriate.
- The `HLASMAsmParser` class overrides the `parseStatement` function. Two new private functions `parseAsHLASMLabel` and `parseAsMachineInstruction` are introduced as well.
The general syntax is laid out as follows (more information available in [[ https://www.ibm.com/support/knowledgecenter/SSENW6_1.6.0/com.ibm.hlasm.v1r6.asm/asmr1023.pdf | HLASM V1R6 Language Reference Manual ]] - Chapter 2 - Instruction Statement Format):
```
<TokA><spaces.*><TokB><spaces.*><TokC><spaces.*><TokD>
```
1. TokA is referred to as the Name Entry. This token is optional
2. TokB is referred to as the Operation Entry. This token is mandatory.
3. TokC is referred to as the Operand Entry. This token is mandatory
4. TokD is referred to as the Remarks Entry. This token is optional
- If TokA is provided, then we either parse TokA as a possible comment or as a label (Name Entry), Tok B as the Operation Entry and so on.
- If TokA is not provided (i.e. we have one or more spaces and then the first token), then we will parse the first token (i.e TokB) as a possible Z machine instruction, TokC as the operands to the Z machine instruction and TokD as a possible Remark field
- TokC (Operand Entry), no spaces are allowed between OperandEntries. If a space occurs it is classified as an error.
- TokD if provided is taken as is, and emitted as a comment.
The following additional approach was examined, but not taken:
- Adding custom private only functions to base AsmParser class, and only invoking them for z/OS. While this would eliminate the need for another child class, these private functions would be of non-use to every other target. Similarly, adding any pure virtual functions to the base MCAsmParser class and overriding them in AsmParser would also have the same disadvantage.
Testing:
- This patch doesn't have tests added with it, for the sole reason that MCStreamer Support and Object File support hasn't been added for the z/OS target (yet). Hence, it's not possible generate code outright for the z/OS target. They are in the process of being committed / process of being worked on.
- Any comments / feedback on how to combat this "lack of testing" due to other missing required features is appreciated.
Reviewed By: Kai, uweigand
Differential Revision: https://reviews.llvm.org/D98276
The current implementation assumes the destination type of shuffle is the same as the decomposed ones. Add the check to avoid crush when the condition is not satisfied.
This fixes PR37616.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D102751
DW_AT_ranges can use DW_FORM_sec_offset (instead of DW_FORM_rnglistx).
In such case DW_AT_rnglists_base does not need to be present.
DWARF-5 spec:
"If the offset_entry_count is zero, then DW_FORM_rnglistx cannot
be used to access a range list; DW_FORM_sec_offset must be used
instead. If the offset_entry_count is non-zero, then
DW_FORM_rnglistx may be used to access a range list;"
This fix is for TestTypeCompletion.py category `dwarf` using GCC with DWARF-5.
The fix just provides GetRnglist() lazy getter for `m_rnglist_table`.
The testcase is easier to review by:
diff -u lldb/test/Shell/SymbolFile/DWARF/DW_AT_low_pc-addrx.s \
lldb/test/Shell/SymbolFile/DWARF/DW_AT_range-DW_FORM_sec_offset.s
Differential Revision: https://reviews.llvm.org/D98289
Generalize the fix from rGd0902a8665b1 by ensuring we widen/narrow the indices subvector first and then perform the ZERO_EXTEND_VECTOR_INREG (if necessary), which should allow us to perform the variable permutes with source/destination/indices vectors of any widths.
Match whats documented in the Intel AOM (and Agner/instlatx64 agree) - these are all Port0 only.
Now that we can use in-order models in llvm-mca, the atom model is a good "worst case scenario" analysis for x86.
The patch extends the yaml code generation to support the following new OpDSL constructs:
- captures
- constants
- iteration index accesses
- predefined types
These changes have been introduced by revision
https://reviews.llvm.org/D101364.
Differential Revision: https://reviews.llvm.org/D102075
`bool` is considered to be unsigned according to `std::is_unsigned<bool>::value` (and `Type::GetTypeInfo`). Encoding it as signed int works fine for normal variables and fields, but breaks when reading the values of boolean bitfields. If the field is declared as `bool b : 1` and has a value of `0b1`, the call to `SBValue::GetValueAsSigned()` will return `-1`.
Reviewed By: teemperor
Differential Revision: https://reviews.llvm.org/D102685