llvm-project

Commit Graph

Author	SHA1	Message	Date
Jonas Paulsson	e77cb4ae63	[SystemZ] Return true from preferZeroCompareBranch(). Review: Ulrich Weigand Differential Revision: https://reviews.llvm.org/D103057	2021-05-25 10:24:14 -05:00
Yonghong Song	6a2ea84600	BPF: Add more relocation kinds Currently, BPF only contains three relocations: R_BPF_NONE for no relocation R_BPF_64_64 for LD_imm64 and normal 64-bit data relocation R_BPF_64_32 for call insn and normal 32-bit data relocation Also .BTF and .BTF.ext sections contain symbols in allocated program and data sections. These two sections reserved 32bit space to hold the offset relative to the symbol's section. When LLVM JIT is used, the LLVM ExecutionEngine RuntimeDyld may attempt to resolve relocations for .BTF and .BTF.ext, which we want to prevent. So we used R_BPF_NONE for such relocations. This all works fine until when we try to do linking of multiple objects. . R_BPF_64_64 handling of LD_imm64 vs. normal 64-bit data is different, so lld target->relocate() needs more context to do a correct job. . The same for R_BPF_64_32. More context is needed for lld target->relocate() to differentiate call insn vs. normal 32-bit data relocation. . Since relocations in .BTF and .BTF.ext are set to R_BPF_NONE, they will not be relocated properly when multiple .BTF/.BTF.ext sections are merged by lld. This patch intends to address this issue by adding additional relocation kinds: R_BPF_64_ABS64 for normal 64-bit data relocation R_BPF_64_ABS32 for normal 32-bit data relocation R_BPF_64_NODYLD32 for .BTF and .BTF.ext style relocations. The old R_BPF_64_{64,32} semantics: R_BPF_64_64 for LD_imm64 relocation R_BPF_64_32 for call insn relocation The existing R_BPF_64_64/R_BPF_64_32 mapping to numeric values is maintained. They are the most common use cases for bpf programs and we want to maintain backward compatibility as much as possible. ExecutionEngine RuntimeDyld BPF relocations are adjusted as well. R_BPF_64_{ABS64,ABS32} relocations will be resolved properly and other relocations will be ignored. Two tests are added for RuntimeDyld. Not handling R_BPF_64_NODYLD32 in RuntimeDyldELF.cpp will result in "Relocation type not implemented yet!" fatal error. FK_SecRel_4 usages in BPFAsmBackend.cpp and BPFELFObjectWriter.cpp are removed as they are not triggered in BPF backend. BPF backend used FK_SecRel_8 for LD_imm64 instruction operands. Differential Revision: https://reviews.llvm.org/D102712	2021-05-25 08:19:13 -07:00
Anirudh Prasad	993f38d0a7	[SystemZ][z/OS] Implement getHostCPUName for z/OS - Currently, the host cpu information is not easily available on z/OS as in other platforms. - This information is stored in the Communications Vector Table (https://www.ibm.com/docs/en/zos/2.2.0?topic=information-cvt-mapping) Reviewed By: uweigand Differential Revision: https://reviews.llvm.org/D102793	2021-05-25 11:18:12 -04:00
Arthur O'Dwyer	bb523cc82b	[libc++] [test] Make iter_difference_t.pass.cpp into a .compile.pass.cpp. NFCI.	2021-05-25 11:12:42 -04:00
Arthur O'Dwyer	148c19a5b5	[libc++] [test] Format some C++20 iterator_traits tests. NFCI. cxx20_iterator_traits.compile.pass.cpp actually depends on implementation details of libc++, which is not great; but I just left a comment and moved on.	2021-05-25 11:12:36 -04:00
Joe Nash	b67ea3d0c9	[AMDGPU] Allow no-modifier operands in cvtDPP NFC, since no instructions have their AsmMatchConverter changed, but prepares for that to happen. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D103046 Change-Id: I6afefad899076de7b9a412374d09b95b29e012fa	2021-05-25 10:58:06 -04:00
Simon Pilgrim	c909ddddda	[CostModel][X86] Improve accuracy of vXi64 vector non-uniform shift costs on AVX2+ targets rG1ad4f887bd7692a9e63fb42586f0ece366f2fe01 incorrectly assumed that vXi64 non-uniform shifts were slow like vXi32 were - but llvm-mca (+Agner) both confirm that Haswell/Broadwell are full rate.	2021-05-25 15:58:23 +01:00
Simon Pilgrim	e02a4f6bda	[X86][SSE] Regenerate vector shift codegen tests. NFCI.	2021-05-25 15:58:22 +01:00
Momchil Velikov	21aa107eb7	Reland "Do not create LLVM IR `constant`s for objects with dynamic initialisation" This relands commit `13dd65b3a1`. The original commit contained a test, which failed when compiled for a MACH-O target. This patch changes the test to run for x86_64-linux instead of `%itanium_abi_triple`, to avoid having invalid syntax for MACH-O sections. The patch itself does not care about section attribute syntax and a x86 backend does not even need to be included in the build. Differential Revision: https://reviews.llvm.org/D102693	2021-05-25 15:54:40 +01:00
David Spickett	8427053f81	[clang][ARM] When handling multiple -mimplicit-it mark all as used Since `4468e5b899` clang will prefer the last one it finds of "-mimplicit-it" or "-Wa,-mimplicit-it". Due to a mistake in that patch the compiler argument "-mimplicit-it" was never marked as used, even if it was the last one and was passed to llvm. Move the Claim call back to the start of the loop and update the testing to check we don't get any unused argument warnings. Reviewed By: mstorsjo Differential Revision: https://reviews.llvm.org/D103086	2021-05-25 14:53:07 +00:00
Joe Nash	67c3707b31	[AMDGPU] More accurate names for dpp operand types NFC. Renames the variable in the dpp input operand generators from DstRC to OldRC, because that is what it actually sets. Also documents the importance of setting HasModifiers = 0 in the dpp8 asm string. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D103047 Change-Id: Ice69ae38f644de7f228a75ca47c43e88b1f7d9e1	2021-05-25 10:35:25 -04:00
David Goldblatt	8607a02357	[InstSimplify] Transform X * Y % Y --> 0 simplifyDiv already handles the case X * Y / Y --> X (barring overflow). This adds the equivalent handling to simplifyRem. Correctness: https://alive2.llvm.org/ce/z/J2cUbS https://alive2.llvm.org/ce/z/us9NUM https://alive2.llvm.org/ce/z/AvaDGJ https://alive2.llvm.org/ce/z/kq9ige Extending the situations in which we apply this transform would not be correct: https://alive2.llvm.org/ce/z/Lf9V63 https://alive2.llvm.org/ce/z/6RPQK3 https://alive2.llvm.org/ce/z/p9UdxC https://alive2.llvm.org/ce/z/A2zlhE https://alive2.llvm.org/ce/z/vHTtLw https://alive2.llvm.org/ce/z/lvpH42 Differential Revision: https://reviews.llvm.org/D102864	2021-05-25 10:16:04 -04:00
Florian Hahn	a92376d297	[VectorCombine] Add test that combines load & store scalarization.	2021-05-25 14:28:37 +01:00
Sanjay Patel	16e78ec0b4	[Headers][WASM] adjust test that runs the optimizer; NFC This broke with the LLVM change in `0bab0f6161`	2021-05-25 09:17:10 -04:00
Florian Hahn	575e2aff55	[VectorCombine] Use constant range info for index scalarization legality. We can only scalarize memory accesses if we know the index is valid. This patch adjusts canScalarizeAcceess to fall back to computeConstantRange to check if the index is known to be valid. Reviewed By: nlopes Differential Revision: https://reviews.llvm.org/D102476	2021-05-25 13:58:42 +01:00
Markus Böck	9b99336d5d	[mlir][doc] Fix links and references in documentation of Dialects This patch is the first in a series of patches fixing markdown links and references inside the mlir documentation. I chose to split it in a few reviews to be able to iterate quicker and to ease review. This patch addresses all broken references to other markdown files and sections inside the Dialects folder. One change that was also done was to insert '/' between the markdown files and section: Example: Builtin.md#integertype was changed to: Builtin.md/#integertype After compilation, hugo then translates the later to jump directly to the integer type section, but not the former. Not inserting the slash would simply jump to just the Builtin page, instead of the integertype section. I therefore changed occurrences of the former version to the later as well. Differential Revision: https://reviews.llvm.org/D103011	2021-05-25 14:51:15 +02:00
Tres Popp	6054bfa813	[mlir] Support buffer hoisting on allocas This adds support for hoisting allocas in both BufferHoisting and BufferLoopHoisting. Differential Revision: https://reviews.llvm.org/D102681	2021-05-25 14:50:01 +02:00
Markus Böck	5e2a302e37	[mlir][doc] Fix links and references in documentation of Rationale This patch is the second in a series of patches fixing markdown links and references inside the mlir documentation. This patch addresses all broken references to other markdown files and sections inside the Rationale folder. In addition to fixing the links and references like in the previous patch, I also changed references which are URLs to the mlir.llvm.org/docs website, to proper relative markdown references instead. Differential Revision: https://reviews.llvm.org/D103013	2021-05-25 14:48:07 +02:00
Sanjay Patel	0bab0f6161	[InstCombine] canonicalize cast before unary shuffle We could go either direction on this transform. VectorCombine already goes this way for bitcasts (and handles more complicated cases using the cost model), so let's try cast-first. Deferring completely to VectorCombine is another possibility. But the backend should be able to invert this easily when the vectors have the same shape, so it doesn't seem like a transform that we need to avoid. The motivating example from https://llvm.org/PR49081 has an int-to-float sandwiched between 2 shuffles, and the backend currently does not reduce that, so on x86, we get something like: pshufd $249, %xmm0, %xmm0] cvtdq2ps %xmm0, %xmm0 shufps $144, %xmm0, %xmm0 ...instead of just a single conversion instruction. Differential Revision: https://reviews.llvm.org/D103038	2021-05-25 08:43:09 -04:00
Sanjay Patel	06eae35689	[InstCombine] add tests for cast-of-shuffle; NFC	2021-05-25 08:43:09 -04:00
Matthias Springer	f718a53d7e	[mlir] Disallow certain transfer ops in VectorToSCF Disallow transfer ops that change the element type of the transfer. Such transfers could be supported in the future, if needed. Differential Revision: https://reviews.llvm.org/D102746	2021-05-25 21:39:43 +09:00
Tom Weaver	fc0acd10c0	[Dexter] Remove erroneously added diff file Delete d.diff from debuginfo-tests/dexter directory.	2021-05-25 13:36:11 +01:00
Chuanqi Xu	400a9d3501	[NFC] [Coroutines] Remove unused variable: UnreachableCache	2021-05-25 20:33:46 +08:00
OCHyams	4b55102aff	[dexter] Change --source-root-dir and add --debugger-use-relative-paths We want to use `DexDeclareFile` to specify paths relative to a project root directory. The option `--source-root-dir`, prior to this patch, causes dexter to strip the path prefix from commands before passing them to a debugger, and appends the prefix to file paths returned from a debugger. This patch changes the behviour of `--source-root-dir`. Relative paths in commands, made possible with `DexDeclareFile(relative/path)`, are appended to the `--source-root-dir` directory. A new option, `--debugger-use-relative-paths`, can be used alongside `--source-root-dir` to reproduce the old behaviour: all paths passed to the debugger will be made relative to `--source-root-dir`. I've added a regression test source_root_dir.dex for this new behaviour, and modified the existing `--source-root-dir` regression and unit tests to use `--debugger-use-relative-paths`. Reviewed By: jmorse Differential Revision: https://reviews.llvm.org/D100307	2021-05-25 13:28:06 +01:00
Roman Lebedev	8f4db14d1c	[LoopIdiom] Support 'left-shift until zero' idiom This adds support for the "count active bits" pattern, i.e.: ``` int countBits(unsigned val) { int cnt = 0; for( ; (val << cnt) != 0; ++cnt) ; return cnt; } ``` but a somewhat more general one: ``` int countBits(unsigned val, int start, int off) { int cnt; for (cnt = start; val << (cnt + off); cnt++) ; return cnt; } ``` alive2 is happy with all the tests there. Note that, again, much like with the right-shift cases, we don't require the `val != 0` guard. This is the last pattern that was supported by `detectShiftUntilZeroIdiom()`, which now becomes obsolete.	2021-05-25 15:26:35 +03:00
Roman Lebedev	980e0107a1	[NFC][LoopIdiom] Add tests for 'left-shift until zero' idiom	2021-05-25 15:26:34 +03:00
Pushpinder Singh	b0d68c7141	[AMDGPU][Libomptarget] Mark lambda_by_value test as XFAIL Reason: Missing printf definition Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D103078	2021-05-25 12:16:54 +00:00
Bradley Smith	f3c577ed38	[AArch64][SVE] Add fixed length codegen for FP_TO_{S,U}INT/{S,U}INT_TO_FP Depends on D102607 Differential Revision: https://reviews.llvm.org/D102777	2021-05-25 12:54:55 +01:00
Tom Weaver	c2c2be44ed	[Dexter] Add DexDeclareFile command to Dexter DexDeclareFile allows test producers to write test files with .dex extensions that contain pure dexter commands. .dex file commands do not need to be commented out like they do when written inline within test source files. DexDeclareFile commands are declarative in behaviour, they state that any Dexter command seen from this point on will have its path attribute set to the path declared in the DexDeclareFile command. Differential Revision: https://reviews.llvm.org/D99651	2021-05-25 12:47:16 +01:00
Raphael Isemann	ae58cf5f45	[lldb] Fix that LLDB doesn't print NaN's sign on Darwin It seems std::ostringstream ignores NaN signs on Darwin while it prints them on Linux. This causes that LLDB behaves differently on those platforms which is both confusing for users and it also means we have to deal with that in our tests. This patch manually implements the NaN/Inf printing (which are apparently implementation defined) to make LLDB print the same thing on all platforms. The only output difference in practice seems to be that we now print negative NaNs as `-nan`, but this potentially also changes the output on other systems I haven't tested this on. Reviewed By: JDevlieghere Differential Revision: https://reviews.llvm.org/D102845	2021-05-25 13:33:28 +02:00
Roman Lebedev	f1c5f78d38	[LoopIdiom] Support 'arithmetic right-shift until zero' idiom This adds support for the "count active bits" pattern, i.e.: ``` int countActiveBits(signed val) { int cnt = 0; for( ; (val >> cnt) != 0; ++cnt) ; return cnt; } ``` but a somewhat more general one: ``` int countActiveBits(signed val, int start, int off) { int cnt; for (cnt = start; val >> (cnt + off); cnt++) ; return cnt; } ``` This directly matches the existing 'logical right-shift until zero' idiom. alive2 is happy with all the tests there. Note that, again, much like with the original unsigned case, we don't require the `val != 0` guard. The old `detectShiftUntilZeroIdiom()` already supports this pattern, the idea here is that the `val` must be positive (have at least one leading zero), because otherwise the loop is non-terminating, but since it is not `while(1)`, that would have been UB.	2021-05-25 14:30:49 +03:00
Roman Lebedev	8a0e4ae772	[NFC][LoopIdiom] Add tests for 'arithmetic right-shift until zero' idiom	2021-05-25 14:30:49 +03:00
Raphael Isemann	1dee479ff6	[lldb][NFC] Remove misleading ModulePass base class for IRForTarget IRForTarget is never used by a pass manager or any other interface that requires this class to inherit from `Pass`. Also IRForTarget doesn't implement the current interface correctly because it uses the `runOnModule` return value to indicate success/failure instead of changed/not-changed, so if this ever ends up being used as a pass it would most likely not work as intended. Reviewed By: JDevlieghere Differential Revision: https://reviews.llvm.org/D102677	2021-05-25 13:27:07 +02:00
Raphael Isemann	a3a95286a7	[lldb] X-FAIL TestCPPStaticMembers on Windows This was originally failed because of llvm.org/pr21765 which describes that LLDB can't call a debugee's functions, but I removed the (unnecessary) function call in the rewrite. It seems that the actual bug here is that we can't lookup static members at all, so let's X-FAIL the test for the right reason.	2021-05-25 13:10:19 +02:00
Marco Elver	280333021e	[SanitizeCoverage] Add support for NoSanitizeCoverage function attribute We really ought to support no_sanitize("coverage") in line with other sanitizers. This came up again in discussions on the Linux-kernel mailing lists, because we currently do workarounds using objtool to remove coverage instrumentation. Since that support is only on x86, to continue support coverage instrumentation on other architectures, we must support selectively disabling coverage instrumentation via function attributes. Unfortunately, for SanitizeCoverage, it has not been implemented as a sanitizer via fsanitize= and associated options in Sanitizers.def, but rolls its own option fsanitize-coverage. This meant that we never got "automatic" no_sanitize attribute support. Implement no_sanitize attribute support by special-casing the string "coverage" in the NoSanitizeAttr implementation. To keep the feature as unintrusive to existing IR generation as possible, define a new negative function attribute NoSanitizeCoverage to propagate the information through to the instrumentation pass. Fixes: https://bugs.llvm.org/show_bug.cgi?id=49035 Reviewed By: vitalybuka, morehouse Differential Revision: https://reviews.llvm.org/D102772	2021-05-25 12:57:14 +02:00
Marco Elver	85feebf5a3	[NFC][SanitizeCoverage] Test always_inline functions work Test that always_inline functions are instrumented as expected. Reviewed By: vitalybuka Differential Revision: https://reviews.llvm.org/D102929	2021-05-25 12:57:14 +02:00
Marco Elver	ca6df73406	[NFC][CodeGenOptions] Refactor checking SanitizeCoverage options Refactor checking SanitizeCoverage options into CodeGenOptions::hasSanitizeCoverage(). Reviewed By: vitalybuka Differential Revision: https://reviews.llvm.org/D102927	2021-05-25 12:57:14 +02:00
Simon Pilgrim	ed14062be0	Fix MSVC "truncation of constant value" warning. NFCI.	2021-05-25 11:35:57 +01:00
Simon Pilgrim	68ef68f8ac	[CostModel][X86] Improve accuracy of vXi8/vXi16 vector non-uniform shift costs on AVX2/AVX512 targets Determined from llvm-mca analysis, AVX2+ capable targets have a higher throughput for VPBLENDVB and VPMOVZX ops, making it cheaper to perform shift+select patterns for vXi8 shifts or extend/shift/truncate for vXi16 shifts. Similarly AVX512BW can perform vXi8 as extend/shift/truncate patterns.	2021-05-25 11:35:57 +01:00
Christudasan Devadasan	e3b8e6d482	[AMDGPU] Remove dead declaration (NFC).	2021-05-25 16:04:04 +05:30
Vinayaka Bandishti	eff269fc9f	[MLIR][Affine][LICM] Mark users of `iter_args` variant Prevent users of `iter_args` of an affine for loop from being hoisted out of it. Otherwise, LICM leads to a violation of the SSA dominance (as demonstrated in the added test case). Fixes: https://bugs.llvm.org/show_bug.cgi?id=50103 Reviewed By: bondhugula, ayzhuang Differential Revision: https://reviews.llvm.org/D102984	2021-05-25 15:56:52 +05:30
Tres Popp	9ccdc2e23b	[mlir] Fold memref.dim of OffsetSizeAndStrideOpInterface outputs This previously handled memref::SubviewOp, but this can be extended to all ops implementing the interface. Differential Revision: https://reviews.llvm.org/D103076	2021-05-25 12:16:10 +02:00
Florian Hahn	536447eb20	[AArch64] Add tests for lowering of vector load + single extract. Currently the vector load + extract gets lowered to a single scalar store, not accounting for the fact that the index could be out-of-bounds, which is poison, not UB. See PR50382.	2021-05-25 11:09:15 +01:00
Raphael Isemann	3bf96b0329	[lldb] Disable minimal import mode for RecordDecls that back FieldDecls Clang adds a Decl in two phases to a DeclContext. First it adds it invisible and then it makes it visible (which will add it to the lookup data structures). It's important that we can't do lookups into the DeclContext we are currently adding the Decl to during this process as once the Decl has been added, any lookup will automatically build a new lookup map and add the added Decl to it. The second step would then add the Decl a second time to the lookup which will lead to weird errors later one. I made adding a Decl twice to a lookup an assertion error in D84827. In the first step Clang also does some computations on the added Decl if it's for example a FieldDecl that is added to a RecordDecl. One of these computations is checking if the FieldDecl is of a record type and the record type has a deleted constexpr destructor which will delete the constexpr destructor of the record that got the FieldDecl. This can lead to a bug with the way we implement MinimalImport in LLDB and the following code: ``` struct Outer { typedef int HookToOuter; struct NestedClass { HookToOuter RefToOuter; } NestedClassMember; // We are adding this. }; ``` 1. We just imported `Outer` minimally so far. 2. We are now asked to add `NestedClassMember` as a FieldDecl. 3. We import `NestedClass` minimally. 4. We add `NestedClassMember` and clang does a lookup for a constexpr dtor in `NestedClass`. `NestedClassMember` hasn't been added to the lookup. 5. The lookup into `NestedClass` will now load the members of `NestedClass`. 6. We try to import the type of `RefToOuter` which will try to import the `HookToOuter` typedef. 7. We import the typedef and while importing we check for conflicts in `Outer` via a lookup. 8. The lookup into `Outer` will cause the invisible `NestedClassMember` to be added to the lookup. 9. We continue normally until we get back to the `addDecl` call in step 2. 10. We now add `NestedClassMember` to the lookup even though we already did that in step 8. The fix here is disabling the minimal import for RecordTypes from FieldDecls. We actually already did this, but so far we only force the definition of the type to be imported after we imported the FieldDecl. This just moves that code before we import the FieldDecl so prevent the issue above. Reviewed By: shafik, aprantl Differential Revision: https://reviews.llvm.org/D102993	2021-05-25 12:08:50 +02:00
Raphael Isemann	8b656b8846	[lldb] Re-eanble and rewrite TestCPPStaticMembers It's not clear why the whole test got disabled, but the linked bug report has since been fixed and the only part of it that still fails is the test for the too permissive lookup. This re-enables the test, rewrites it to use the modern test functions we have and splits the failing part into its own test that we can skip without disabling the rest.	2021-05-25 11:52:28 +02:00
Stanislav Mekhanoshin	8f681d5b27	[IR] Allow Value::replaceUsesWithIf() to process constants The change is currently NFC, but exploited by the depending D102954. Code to handle constants is borrowed from the general implementation of Value::doRAUW(). Differential Revision: https://reviews.llvm.org/D103051	2021-05-25 02:12:01 -07:00
Roman Lebedev	78eaff2ef8	[llvm-exegesis] Loop unrolling for loop snippet repetitor mode I really needed this, like, factually, yesterday, when verifying dependency breaking idioms for AMD Zen 3 scheduler model. Consider the following example: ``` $ ./bin/llvm-exegesis --mode=inverse_throughput --snippets-file=/tmp/snippet.s --num-repetitions=1000000 --repetition-mode=duplicate Check generated assembly with: /usr/bin/objdump -d /tmp/snippet-4a7e50.o --- mode: inverse_throughput key: instructions: - 'VPXORYrr YMM0 YMM0 YMM0' config: '' register_initial_values: [] cpu_name: znver3 llvm_triple: x86_64-unknown-linux-gnu num_repetitions: 1000000 measurements: - { key: inverse_throughput, value: 0.31025, per_snippet_value: 0.31025 } error: '' info: '' assembled_snippet: C5FDEFC0C5FDEFC0C5FDEFC0C5FDEFC0C5FDEFC0C5FDEFC0C5FDEFC0C5FDEFC0C5FDEFC0C5FDEFC0C5FDEFC0C5FDEFC0C5FDEFC0C5FDEFC0C5FDEFC0C5FDEFC0C3 ... ``` What does it tell us? So wait, it can only execute ~3 x86 AVX YMM PXOR zero-idioms per cycle? That doesn't seem right. That's even less than there are pipes supporting this type of op. Now, second example: ``` $ ./bin/llvm-exegesis --mode=inverse_throughput --snippets-file=/tmp/snippet.s --num-repetitions=1000000 --repetition-mode=loop Check generated assembly with: /usr/bin/objdump -d /tmp/snippet-2418b5.o --- mode: inverse_throughput key: instructions: - 'VPXORYrr YMM0 YMM0 YMM0' config: '' register_initial_values: [] cpu_name: znver3 llvm_triple: x86_64-unknown-linux-gnu num_repetitions: 1000000 measurements: - { key: inverse_throughput, value: 1.00011, per_snippet_value: 1.00011 } error: '' info: '' assembled_snippet: 49B80800000000000000C5FDEFC0C5FDEFC04983C0FF75F2C3 ... ``` Now that's just worse. Due to the looping, the throughput completely plummeted, and now we can only do a single instruction/cycle!? That's not great. And final example: ``` $ ./bin/llvm-exegesis --mode=inverse_throughput --snippets-file=/tmp/snippet.s --num-repetitions=1000000 --repetition-mode=loop --loop-body-size=1000 Check generated assembly with: /usr/bin/objdump -d /tmp/snippet-c402e2.o --- mode: inverse_throughput key: instructions: - 'VPXORYrr YMM0 YMM0 YMM0' config: '' register_initial_values: [] cpu_name: znver3 llvm_triple: x86_64-unknown-linux-gnu num_repetitions: 1000000 measurements: - { key: inverse_throughput, value: 0.167087, per_snippet_value: 0.167087 } error: '' info: '' assembled_snippet: 49B80800000000000000C5FDEFC0C5FDEFC04983C0FF75F2C3 ... ``` So if we merge the previous two approaches, do duplicate this single-instruction snippet 1000x (loop-body-size/instruction count in snippet), and run a loop with 1000 iterations over that duplicated/unrolled snippet, the measured throughput goes through the roof, up to 5.9 instructions/cycle, which finally tells us that this idiom is zero-cycle! Reviewed By: courbet Differential Revision: https://reviews.llvm.org/D102522	2021-05-25 12:08:27 +03:00
Kristina Bessonova	44843e2a04	[ARM][NEON] Combine base address updates for vld1x intrinsics Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D102855	2021-05-25 11:06:39 +02:00
David Spickett	de7729d47a	[clang][ARM] Remove non-existent arm9312 CPU I cannot find documentation on this CPU, and it is not supported by the Arm Compiler 5 product either. It was likely a mistake or a different name for the "ep9312", which is an Arm based Cirrus Logic chip. Reviewed By: peter.smith Differential Revision: https://reviews.llvm.org/D103024	2021-05-25 08:58:24 +00:00
David Spickett	0cd2629d97	[llvm][ARM] Remove non-existent arm1176j-s CPU This was removed in https://reviews.llvm.org/D52594 for clang. The one test using it has been updated to use the mpcore CPU as the linked clang change does. This is part of fixing https://bugs.llvm.org/show_bug.cgi?id=50454. Reviewed By: peter.smith Differential Revision: https://reviews.llvm.org/D103022	2021-05-25 08:56:55 +00:00

1 2 3 4 5 ...

389380 Commits All Branches Search

389380 Commits

All Branches