llvm-project

Commit Graph

Author	SHA1	Message	Date
Sam Clegg	e4b2f3054a	[WebAssembly][libObject] Avoid re-use of Section object during parsing The re-use of this struct across iterations of the loop was causing fields (specifically Name) to be incorrectly shared between multiple sections. Differential Revision: https://reviews.llvm.org/D108984	2021-09-10 09:30:50 -04:00
Serge Bazanski	231bfaab31	[Lanai] fix MC / objdump D78776 removed is{Call,Branch,UnconditionalBranch} guards in objdump before calling MCInstrAnalysis::evaluateBranch. This is fine for other architectures as they gracefully handle evaluateBranch being called on non-branches. However, the Lanai MCInstrAnalysis implementation didn't and that change caused it to crash. This inserts the same guards back into Lanai's evaluateBranch implementation and adds a smoke test that exercises `llc \| objdump` so this kind of regression is hopefully caught next time. Reviewed By: jpienaar, MaskRay Differential Revision: https://reviews.llvm.org/D107593	2021-09-10 10:46:13 +00:00
Alfonso Sánchez-Beato	b25ab4f313	[llvm-objcopy][COFF] Fix test for debug dir presence If the number of directories was 6 (equal to the DEBUG_DIRECTORY index), patchDebugDirectory() was run even though the debug directory is actually the 7th entry. Use <= in the comparison to fix that. This fixes https://llvm.org/PR51243 Differential Revision: https://reviews.llvm.org/D106940 Reviewed by: jhenderson	2021-09-10 09:57:18 +01:00
Alfonso Sánchez-Beato	b33fd31772	[yaml2obj][COFF] Allow variable number of directories Allow variable number of directories, as allowed by the specification. NumberOfRvaAndSize will default to 16 if not specified, as in the past. Reviewed by: jhenderson Differential Revision: https://reviews.llvm.org/D108825	2021-09-09 11:16:56 +01:00
Wei Mi	8eb617d719	[SampleFDO] Allow forward compatibility when adding a new section for extbinary format. Currently when we add a new section in the profile format and generate a profile containing the new section, older compiler which reads the new profile will issue an error. The forward incompatibility can cause unnecessary churn when extending the profile. This patch removes the incompatibility when adding a new section for extbinary format. Differential Revision: https://reviews.llvm.org/D109398	2021-09-07 19:38:43 -07:00
Maksim Panchenko	6300e4ac58	[llvm-objdump] Fix 'llvm-objdump -dr' for executables with relocations Print relocations interleaved with disassembled instructions for executables with relocatable sections, e.g. those built with "-Wl,-q". Differential Revision: https://reviews.llvm.org/D109016	2021-09-07 11:24:24 -07:00
Roman Lebedev	e030f808ec	[Exegesis] Native clusterization: sub-partition by sched class id Currently native clusterization simply groups all benchmarks by the opcode of key instruction, but that is suboptimal in certain cases, e.g. where we can already tell that the particular instructions already resolve into different sched classes.	2021-09-07 17:54:37 +03:00
Roman Lebedev	b3b9b297a0	[NFC][exegesis] Add test for the following patch	2021-09-07 17:54:36 +03:00
Simon Pilgrim	056b409ceb	[llvm-exegesis][x86] Limit llvm-exegesis analysis tests to x86_64 triple hosts Attempting to fix an issue with test failures on arm m1 apple macintoshes reported on D109353	2021-09-07 14:35:52 +01:00
Simon Pilgrim	6a9e2764f6	[llvm-exegesis] Analysis tests should run even without libpfm (PR51687) Move inverse_throughput, latency and uops to sub-directories (like we already do for lbr), which require libpfm, so we can relax the lit limits for analysis tests in the x86 root directory. Differential Revision: https://reviews.llvm.org/D109353	2021-09-07 13:58:05 +01:00
Andrew Litteken	bd4b1b5f6d	[IRSim] Adding support for recognizing branch similarity The current IRSimilarityIdentifier does not try to find similarity across blocks, this patch provides a mechanism to compare two branches against one another, to find similarity across basic blocks, rather than just within them. This adds a step in the similarity identification process that labels all of the basic blocks so that we can identify the relative branching locations. Within an IRSimilarityCandidate we use these relative locations to determine whether if the branching to other relative locations in the same region is the same between branches. If they are, we consider them similar. We do not consider the relative location of the branch if the target branch is outside of the region. In this case, both branches must exit to a location outside the region, but the exact relative location does not matter. Reviewers: paquette, yroux Differential Revision: https://reviews.llvm.org/D106989	2021-09-06 11:55:38 -07:00
Simon Pilgrim	2005ae15a6	[X86][SLM] WriteVecIMul instructions only take 1uop (REAPPLIED) The xmm variant have half the throughput (and +1cy latency) of the mmx variants, but are still 1uop. I still need to do more thorough testing of SLM on test-suite before fixing the obvious bad numbers for WritePMULLD. But this helps the D103695 helper script get to more accurate numbers for vXi32 multiplies of extended operands (i.e. we can use PMADDWD, PMULLW/PMULHW etc). Matches what Intel AoM / Agner / llvm-exegesis reports.	2021-09-04 15:03:56 +01:00
Simon Pilgrim	ac51d69208	Revert rG994da657076900f5ad7fe593c3b5e5f89ab3d53d "[X86][SLM] WriteVecIMul instructions only take 1uop" This changed some codegen tests that I forgot about in my rebase, I'll recommit shortly with a fix.	2021-09-04 13:39:10 +01:00
Simon Pilgrim	994da65707	[X86][SLM] WriteVecIMul instructions only take 1uop The xmm variant have half the throughput (and +1cy latency) of the mmx variants, but are still 1uop. I still need to do more thorough testing of SLM on test-suite before fixing the obvious bad numbers for WritePMULLD. But this helps the D103695 helper script get to more accurate numbers for vXi32 multiplies of extended operands (i.e. we can use PMADDWD, PMULLW/PMULHW etc). Matches what Intel AoM / Agner / llvm-exegesis reports.	2021-09-04 13:21:34 +01:00
Simon Pilgrim	c6371020a8	[X86][SLM] RMW instructions don't require an extra uop For RMW instructions, the load and store hold the MEC for an extra cycle, but within the same single uop. This is alluded to in the Intel AOM: "The MEC also owns the MEC RSV, which is responsible for scheduling of all loads and stores. Load and store instructions go through addresses generation phase in program order to avoid on-the-fly memory ordering later in the pipeline. Therefore, an unknown address will stall younger memory instructions." Noticed while trying to get a cheap SLM test box up and running with llvm-exegesis - RMW arithmetic is always 1uop - and matches what Agner / InstLatX64 report as well.	2021-09-04 13:21:34 +01:00
Simon Pilgrim	da965a77d5	[X86][SLM] Fix MUL uops, latency and throughput These were all set to the same best case mul i32 values (which seems to be the only version of MUL that SLM actually performs well with). Noticed while trying to improve multiplication costs for vectorization via the D103695 helper script. Confirmed with Intel AoM / Agner / InstLatX64.	2021-09-04 13:21:34 +01:00
Simon Pilgrim	7d062d2c47	[X86][Atom] MUL/DIV instructions require both ports, not either. Noticed while trying to improve multiplication costs for vectorization via the D103695 helper script. Confirmed with Intel AoM.	2021-09-04 11:58:09 +01:00
Richard Smith	02fe58d628	DebugInfo: additional fix missed in `bc066e2`.	2021-09-03 15:28:00 -07:00
David Blaikie	bc066e26c9	DebugInfo: Fix a few bot failures for type dumping fixes	2021-09-03 14:08:58 -07:00
David Blaikie	40f1593558	DebugInfo: Correct/improve type formatting (pointers to function types especially) This does add some extra superfluous whitespace (eg: "int *") intended to make the Simplified Template Names work easier - this makes the DIE-based names match more exactly the clang-generated names, so it's easier to identify cases that don't generate matching names. (arguably we could change clang to skip that whitespace or add some fuzzy matching to accommodate differences in certain whitespace - but this seemed easier and fairly low-impact)	2021-09-03 12:22:28 -07:00
Jinsong Ji	343a72a24d	[NFC][CSSPGO] Add end of file newline to test input On some platform (eg: AIX), diff will complain about newline. diff: Missing newline at the end of file .../llvm/test/tools/llvm-profdata/Inputs/cs-sample.proftext.	2021-09-03 17:42:32 +00:00
Simon Pilgrim	6ba0b9f68a	[X86][SLM] Fix PBLENDVB uops and throughput SLM PBLENDVB is just as bad as BLENDVPD/PS - so model it as such, fixing the rr vs rm uops diff as well. The Intel AoM appears to have a copy+paste typo with PBLENDW, it doesn't match Agner or InstLatX64. Noticed while investigating some of the weird discrepancies reported by the D103695 helper script (SLM had much better vector shift throughputs than it should).	2021-09-03 11:31:29 +01:00
gbreynoo	e28cd75a50	[OptTable] Reapply Improve error message output for grouped short options This reapplies `71d7fed3bc` which was reverted by `3e2bd82f02`. This change includes the fix for breaking the sanitizer bots. As seen in https://bugs.llvm.org/show_bug.cgi?id=48880 the current implementation for parsing grouped short options can return unclear error messages. This change fixes the example given in the ticket in which a flag is incorrectly given an argument. Also when parsing a group we now keep reading past the first incorrect option and output errors for all incorrect options in the group. Differential Revision: https://reviews.llvm.org/D108770	2021-09-03 11:13:52 +01:00
Hongtao Yu	7ca8030030	[CSSPGO] Enable loading MD5 CS profile. Adding the compiler support of MD5 CS profile based on pervious context split work D107299. A MD5 CS profile is about 40% smaller than the string-based extbinary profile. As a result, the compilation is 15% faster. There are a few conversion from real names to md5 names that have been made on the sample loader and context tracker side to get it work. Reviewed By: wenlei, wmi Differential Revision: https://reviews.llvm.org/D108342	2021-09-01 09:19:47 -07:00
Kevin Athey	3e2bd82f02	Revert "[OptTable] Improve error message output for grouped short options" This reverts commit `71d7fed3bc`. Reason: broke sanitizer bots more info: https://reviews.llvm.org/D108770	2021-08-31 14:06:11 -07:00
wlei	964053d56f	[llvm-profgen] Support LBR only perf script This change aims at supporting LBR only sample perf script which is used for regular(Non-CS) profile generation. A LBR perf script includes a batch of LBR sample which starts with a frame pointer and a group of 32 LBR entries is followed. The FROM/TO LBR pair and the range between two consecutive entries (the former entry's TO and the latter entry's FROM) will be used to infer function profile info. An example of LBR perf script(created by `perf script -F ip,brstack -i perf.data`) ``` 40062f 0x40062f/0x4005b0/P/-/-/9 0x400645/0x4005ff/P/-/-/1 0x400637/0x400645/P/-/-/1 ... 4005d7 0x4005d7/0x4005e5/P/-/-/8 0x40062f/0x4005b0/P/-/-/6 0x400645/0x4005ff/P/-/-/1 ... ... ``` For implementation: - Extended a new child class `LBRPerfReader` for the sample parsing, reused all the functionalities in `extractLBRStack` except for an extension to parsing leading instruction pointer. - `HybridSample` is reused(just leave the call stack empty) and the parsed samples is still aggregated in `AggregatedSamples`. After that, range samples, branch sample, address samples are computed and recorded. - Reused `ContextSampleCounterMap` to store the raw profile, since it's no need to aggregation by context, here it just registered one sample counter with a fake context key. - Unified to use `show-raw-profile` instead of `show-unwinder-output` to dump the intermediate raw profile, see the comments of the format of the raw profile. For CS profile, it remains to output the unwinder output. Profile generation part will come soon. Differential Revision: https://reviews.llvm.org/D108153	2021-08-31 13:28:17 -07:00
gbreynoo	71d7fed3bc	[OptTable] Improve error message output for grouped short options As seen in https://bugs.llvm.org/show_bug.cgi?id=48880 the current implementation for parsing grouped short options can return unclear error messages. This change fixes the example given in the ticket in which a flag is incorrectly given an argument. Also when parsing a group we now keep reading past the first incorrect option and output errors for all incorrect options in the group. Differential Revision: https://reviews.llvm.org/D108770	2021-08-31 16:41:08 +01:00
Simon Pilgrim	7ec7272b80	[MCA][X86] Add basic coverage for icelake arch Copy the skylake-avx512 tests for icelake-server coverage. Add icelake/rocketlake/tigerlake test coverage to the relevent generic tests as well.	2021-08-31 12:20:09 +01:00
Hongtao Yu	b9db70369b	[CSSPGO] Split context string to deduplicate function name used in the context. Currently context strings contain a lot of duplicated function names and that significantly increase the profile size. This change split the context into a series of {name, offset, discriminator} tuples so function names used in the context can be replaced by the index into the name table and that significantly reduce the size consumed by context. A follow-up improvement made in the compiler and profiling tools is to avoid reconstructing full context strings which is time- and memory- consuming. Instead a context vector of `StringRef` is adopted to represent the full context in all scenarios. As a result, the previous prevalent profile map which was implemented as a `StringRef` is now engineered as an unordered map keyed by `SampleContext`. `SampleContext` is reshaped to using an `ArrayRef` to represent a full context for CS profile. For non-CS profile, it falls back to use `StringRef` to represent a contextless function name. Both the `ArrayRef` and `StringRef` objects are underpinned by real array and string objects that are stored in producer buffers. For compiler, they are maintained by the sample reader. For llvm-profgen, they are maintained in `ProfiledBinary` and `ProfileGenerator`. Full context strings can be generated only in those cases of debugging and printing. When it comes to profile format, nothing has changed to the text format, though internally CS context is implemented as a vector. Extbinary format is only changed for CS profile, with an additional `SecCSNameTable` section which stores all full contexts logically in the form of `vector<int>`, which each element as an offset points to `SecNameTable`. All occurrences of contexts elsewhere are redirected to using the offset of `SecCSNameTable`. Testing This is no-diff change in terms of code quality and profile content (for text profile). For our internal large service (aka ads), the profile generation is cut to half, with a 20x smaller string-based extbinary format generated. The compile time of ads is dropped by 25%. Differential Revision: https://reviews.llvm.org/D107299	2021-08-30 20:09:29 -07:00
Keith Smiley	b5da3120b8	[llvm-cov][NFC] Add test for coverage-prefix-map remappings This test covers acts as a regression test for these fixes: `c75a0a1e9d` `dd388ba3e0` Differential Revision: https://reviews.llvm.org/D108805	2021-08-30 17:19:57 -07:00
Haowei Wu	31e61c58b0	[ifs] Add option to hide undefined symbols This change add an option to llvm-ifs to hide undefined symbols from its output. Differential Revision: https://reviews.llvm.org/D108428	2021-08-27 11:15:56 -07:00
Roman Lebedev	d4d459e747	[X86] AMD Zen 3: MULX w/ mem operand has the same throughput as with reg op Exegesis is faulty and sometimes when measuring throughput^-1 produces snippets that have loop-carried dependencies, which must be what caused me to incorrectly measure it originally. After looking much more carefully, the inverse throughput should match that of the MULX w/ reg op. As per llvm-exegesis measurements.	2021-08-27 13:27:05 +03:00
Roman Lebedev	0f04936a2d	[X86] AMD Zen 3: MULX produces low part of the result in 3cy, +1cy for high part As per llvm-exegesis measurements.	2021-08-27 13:27:05 +03:00
Roman Lebedev	db2c6cd99c	[NFC][X86][MCA] AMD Zen 3: improve MULX test coverage Latency for MULX isn't right	2021-08-27 13:27:05 +03:00
Andrea Di Biagio	4a5b191703	[X86][MCA] Address the latest issues with MULX reported in PR51495. It turns out that SchedWrite WriteIMulH was always assigned to the low half of the result of a MULX (rather than to the high half). To avoid confusion, this patch swaps the two MULX writes in the tablegen definition of MULX32/64. That way, write names better describe what they actually refer to; this also avoids further complications if in future we decide to reuse the same MulH writes to also model other scalar integer multiply instructions. I also had to swap the latency values for the two MULX writes to make sure that the change is effectively an NFC. In fact, none of the existing x86 tests were affected by this small refactoring. This patch also fixes a bug in MCA: a wrong latency value was propagated for instructions that perform multiple writes to a same register. This last issue was found by Roman while testing MULX on targets that define a different latency for the Low/High part of the result. Differential Revision: https://reviews.llvm.org/D108727	2021-08-26 12:08:20 +01:00
David Green	6ffc6951a3	[AArch64] Remove unpredictable from narrowing instructions. Like other similar instructions the xtn2 family do not have side effects, and explicitly marking them as such can help improve scheduling freedom.	2021-08-26 09:43:44 +01:00
David Green	9474b03d41	[AArch64] Add a Cortex-A55 NEON scheduler test case.	2021-08-26 09:43:44 +01:00
Esme-Yi	b21ed75e10	[llvm-readobj][XCOFF] Add support for `--needed-libs` option. Summary: This patch is trying to add support for llvm-readobj --needed-libs option under XCOFF. For XCOFF, the needed libraries can be found from the Import File ID Name Table of the Loader Section. Currently, I am using binary inputs in the test since yaml2obj does not yet support for writing the Loader Section and the import file table. Reviewed By: jhenderson Differential Revision: https://reviews.llvm.org/D106643	2021-08-26 07:17:06 +00:00
Fangrui Song	4a66a11286	[LLVMgold.so][test] Make comdat-nodeduplicate.ll work with binutils<2.27	2021-08-25 16:59:06 -07:00
Andrea Di Biagio	6181427bb9	[X86][MCA] Add more tests for MULX (PR51495). llvm-mca still reports a wrong latency for the case where the two destination registers of MULX are the same.	2021-08-25 21:28:21 +01:00
Alfonso Sánchez-Beato	cdd407286a	[llvm-objcopy] [COFF] Consider section flags when adding section The --set-section-flags option was being ignored when adding a new section. Take it into account if present. Fixes https://llvm.org/PR51244 Reviewed By: jhenderson, MaskRay Differential Revision: https://reviews.llvm.org/D106942	2021-08-25 23:11:41 +03:00
Rong Xu	24201b6437	[SampleFDO] Set ProfileIsFS bit properly from the internal option We have "-profile-isfs" internal option for text, binary, and compactbinary format (mostly for debug and test purpose). We need to set the related flag in FunctionSamples so that ProfileIsFS is written to the header in extbinary format. Differential Revision: https://reviews.llvm.org/D108707	2021-08-25 09:07:34 -07:00
Wenlei He	a6f15e9a49	[CSSPGO] Use probe inline tree to track zero size fully optimized context for pre-inliner This is a follow up diff for BinarySizeContextTracker to track zero size for fully optimized inlinee. When an inlinee is fully optimized away, we won't be able to get its size through symbolizing instructions, hence we will treat the corresponding context size as unknown. However by traversing the inlined probe forest, we know what're original inlinees regardless of optimization. If a context show up in inlined probes, but not during symbolization, we know that it's fully optimized away hence its size is zero instead of unknown. It should provide more accurate size cost estimation for pre-inliner to make better inline decisions in llvm-profgen. Differential Revision: https://reviews.llvm.org/D108350	2021-08-25 09:01:11 -07:00
Andrea Di Biagio	5f848b311f	[X86][SchedModel] Fix latency the Hi register write of MULX (PR51495). Before this patch, WriteIMulH reported a latency value which is correct for the RR variant of MULX, but not for the RM variant. This patch fixes the issue by introducing a new WriteIMulHLd, which is meant to be used only by the RM variant of MULX. Differential Revision: https://reviews.llvm.org/D108701	2021-08-25 16:12:09 +01:00
Vyacheslav Zakharin	2e192ab1f4	[CodeExtractor] Preserve topological order for the return blocks. Differential Revision: https://reviews.llvm.org/D108673	2021-08-25 08:09:01 -07:00
Andrea Di Biagio	fe13b81ed9	[X86][NFC] Pre-commit llvm-mca tests for PR51495. WriteIMulH reports an incorrect latency for RM variants of MULX.	2021-08-25 14:17:17 +01:00
Fangrui Song	9b96b0865d	llvm-xray {convert,extract}: Add --demangle No demangling may be a better default in the future. Add `--demangle` for migration convenience. Reviewed By: Enna1 Differential Revision: https://reviews.llvm.org/D108100	2021-08-24 13:35:19 -07:00
Patrick Holland	e4ebfb5786	[MCA] Adding an AMDGPUCustomBehaviour implementation. This implementation allows mca to model the desired behaviour of the s_waitcnt instruction. This patch also adds the RetireOOO flag to the AMDGPU instructions within the scheduling model. This flag is only used by mca and allows instructions to finish out-of-order which helps mca's simulations more closely model the actual device. Differential Revision: https://reviews.llvm.org/D104730	2021-08-24 13:33:58 -07:00
Arthur Eubanks	d2e103644b	[llvm-reduce] Remove various module data This removes the data layout, target triple, source filename, and module identifier when possible. Reviewed By: swamulism Differential Revision: https://reviews.llvm.org/D108568	2021-08-24 09:45:31 -07:00
David Green	50f4ae58eb	[AArch64] Correct store ReadAdrBase operand It appears that the Read operand for stores was being placed on the first operand (the stored value) not the address base. This adds a ReadST for the stored value operand, allowing the ReadAdrBase to correctly act upon the address. Differential Revision: https://reviews.llvm.org/D108287	2021-08-23 21:07:55 +01:00

1 2 3 4 5 ...

5336 Commits