llvm-project

Commit Graph

Author	SHA1	Message	Date
Johannes Doerfert	982053e85e	[Attributor][NFC] Improve debug code and comments	2022-06-09 13:41:23 +02:00
Johannes Doerfert	0ece283f03	[Attributor] Add checks needed as we strengthen value simplify	2022-06-09 13:41:23 +02:00
Johannes Doerfert	393be12b74	[Attributor] Look at base values for align, nonnull, and deref Stripping bitcasts and 0-geps helps normalization and minimizes the impact of a follow up change.	2022-06-09 13:41:23 +02:00
Johannes Doerfert	cb8adf76f7	[Attributor] Simplify loads from constant globals If a global is constant and the initializer is known we can simplify loads from it as the value has to be the initializer.	2022-06-09 13:41:23 +02:00
Martin Storsjö	39c4ac140d	[lldb] Silence a GCC warning about missing returns after a fully covered switch. NFC.	2022-06-09 14:39:33 +03:00
Alvin Wong	c8daf4a707	[lldb] Add gnu-debuglink support for Windows PE/COFF The specification of gnu-debuglink can be found at: https://sourceware.org/gdb/onlinedocs/gdb/Separate-Debug-Files.html The file CRC or the CRC value from the .gnu_debuglink section is now used to calculate the module UUID as a fallback, to allow verifying that the debug object does match the executable. Note that if a CodeView build id exists, it still takes precedence. This works even for MinGW builds because LLD writes a synthetic CodeView build id which does not get stripped from the debug object. The `Minidump/Windows/find-module` test also needs a fix by adding a CodeView record to the exe to match the one in the minidump, otherwise it fails due to the new UUID calculated from the file CRC. Fixes https://github.com/llvm/llvm-project/issues/54344 Reviewed By: DavidSpickett Differential Revision: https://reviews.llvm.org/D126367	2022-06-09 14:39:33 +03:00
Nicolai Hähnle	264d1136f9	AMDGPU/GISel: Introduce custom legalization of G_MUL The generic legalizer framework is still used to reduce the problem to scalar multiplication with the bit size a multiple of 32. Generating optimal code sequences for big integer multiplication is somewhat tricky and has a number of target-specific intricacies: - The target has V_MAD_U64_U32 instructions that multiply two 32-bit factors and add a 64-bit accumulator. Most partial products should use this instruction. - The accumulator is mapped to consecutive 32-bit GPRs, and partial- product multiply-adds can feed the accumulator into each other directly. (The register allocator's support for that is somewhat limited, but that only matters for 128-bit integers and larger.) - OTOH, on some hardware, V_MAD_U64_U32 requires the accumulator to be stored in an even-aligned pair of GPRs. To avoid excessive register copies, it makes sense to compute odd partial products separately from even partial products (where a partial product src0[j0] * src1[j1] is "odd" if j0 + j1 is odd) and add both halves together as a final step. - We can combine G_MUL+G_ADD into a single cascade of multiply-adds. - The target can keep many carry-bits in flight simultaneously, so combining carries using G_UADDE is preferable over G_ZEXT + G_ADD. - Not addressed by this patch: When the factors are sign-extended, the V_MAD_I64_I32 instruction (signed version!) can be used. It is difficult to address these points generically: 1) Finding matching pairs of G_MUL and G_UMULH to find a wide multiply is expensive. We could add a G_UMUL_LOHI generic instruction and conditionally use that in the generic legalizer, but by itself this wouldn't allow us to use the accumulation capability of V_MAD_U64_U32. One could attempt to find matching G_ADD + G_UADDE post-legalization, but this is also expensive. 2) Similarly, making sense of the legalization outcome of a wide pre-legalization G_MUL+G_ADD pair is extremely expensive. 3) How could the generic legalizer possibly deal with the particular idiosyncracy of "odd" vs. "even" partial products. All this points in the direction of directly emitting an ideal code sequence during legalization, but the generic legalizer should not be burdened with such overly target-specific concerns. Hence, a custom legalization. Note that the implemented approach is different from that used by SelectionDAG because narrowing of scalars works differently in general. SelectionDAG iteratively cuts wide scalars into low and high halves until a legal size is reached. By contrast, GlobalISel does the narrowing in a single shot, which should be better for compile-time and for the quality of the generated code. This patch leaves three gaps open: 1. When the factors are uniform, we should execute the multiplication on the SALU. Register bank mapping already ensures this. However, the resulting code sequence is not optimal because it doesn't fully use the carry-in capabilities of S_ADDC_U32. (V_MAD_U64_U32 doesn't have a carry-in.) It is very difficult to fix this after the fact, so we should really use a different legalization sequence in this case. Unfortunately, we don't have a divergence analysis and so cannot make that choice. (This only matters for 128-bit integers and larger.) 2. Avoid unnecessary multiplies when sources are known to be zero- or sign-extended. The challenge is that the legalizer does not currently have access to GISelKnownBits. 3. When the G_MUL is followed by a G_ADD, we should consider combining the two instructions into a single multiply-add sequence, to utilize the accumulator of V_MAD_U64_U32 fully. (Unless the multiply has multiple uses and the implied duplication of the multiply is an overall negative). However, this is also not true when the factors are uniform: in that case, it is generally better to not combine the two operations, so that the multiply can be done on the SALU. Again, we don't have a divergence analysis available and so cannot make an informed choice. Differential Revision: https://reviews.llvm.org/D124844	2022-06-09 13:38:56 +02:00
Simon Pilgrim	1a02db9882	[X86] canonicalizeShuffleWithBinOps - add TODO for X86ISD::ANDNP bitwise handling Its just as safe to move shuffles across X86ISD::ANDNP as any other logical bitop, they just tend to appear too late to matter. Noticed while triaging D127115 regressions.	2022-06-09 12:18:26 +01:00
Benjamin Kramer	abcf1496ad	Fix complex.conj integration test - It doesn't actually print the fractional part if the result is a whole number - One of the expectations was just wrong	2022-06-09 13:11:10 +02:00
Florian Hahn	85983ca42e	[VPlan] Replace remaining use of needsScalarIV. All information is already available in VPlan. Note that there are some test changes, because we now can correctly look through instructions like truncates to analyze the actual users. Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D123541	2022-06-09 12:05:37 +01:00
Jose Manuel Monsalve Diaz	84e020a061	Revert "[LIBOMPTARGET] Adding AMD to llvm-omp-device-info" This reverts commit `d16a0877d8`.	2022-06-09 10:46:03 +00:00
Matheus Izvekov	51608515fa	cmake: use llvm dir variables for clang/utils/hmaptool Copy hmaptool using the paths for CURRENT_TOOLS_DIR, so everything goes in the right place in case llvm is included from a top level CMakeLists.txt. Signed-off-by: Matheus Izvekov <mizvekov@gmail.com> Reviewed By: stephenneuendorffer Differential Revision: https://reviews.llvm.org/D126308	2022-06-09 12:25:38 +02:00
Paul Pluzhnikov	afbe3aed49	[clangd] Minor refactor of CanonicalIncludes::addSystemHeadersMapping. Before commit `b3a991df3c` SystemHeaderMap used to be a vector. Commit `b3a991df3c` changed it into a map, but neglected to remove duplicate keys (e.g. "bits/typesizes.h", "include/stdint.h", etc.). To prevent confusion, remove all duplicates, build HeaderMapping one pair at a time and assert() that no duplicates are found. Change by Paul Pluzhnikov (ppluzhnikov)! Reviewed By: ilya-biryukov Differential Revision: https://reviews.llvm.org/D125742	2022-06-09 12:18:39 +02:00
Kiran Chandramohan	08407255b2	[Flang] Temporary fix for conversion materialization Simply add a source and target materialization handler that do nothing and that override the default handlers that would add illegal LLVM::DialectCastOp otherwise. This is the simplest workaround, but not an actual fix, something may be inconsistent after D82831 (most likely fir lowering to llvm happens in a way that mlir infrastructure is not expecting in D82831). Here is a minimal reproducer of what the issue was: ``` func @foop(%a : !fir.real<4>) -> () func @bar(%a : !fir.real<2>) { %1 = fir.convert %a : (!fir.real<2>) -> !fir.real<4> call @foop(%1) : (!fir.real<4>) -> () return } ``` tco -o - output was: ``` error: 'llvm.mlir.cast' op type must be non-index integer types, float types, or vector of mentioned types. llvm.func @foop(!llvm.float) llvm.func @bar(%arg0: !llvm.half) { %0 = llvm.fpext %arg0 : !llvm.half to !llvm.float %1 = llvm.mlir.cast %0 : !llvm.float to !fir.real<4> llvm.call @foop(%1) : (!fir.real<4>) -> () llvm.return } ``` This patch disable the introduction of the llvm.mlir.cast and preserve the previous behavior. Also fixes https://github.com/llvm/llvm-project/issues/55210. Note: This is part of upstreaming from the fir-dev branch of https://github.com/flang-compiler/f18-llvm-project. Reviewed By: awarzynski Differential Revision: https://reviews.llvm.org/D127212 Co-authored-by: Jean Perier <jperier@nvidia.com> Co-authored-by: Eric Schweitz <eschweitz@nvidia.com>	2022-06-09 10:10:56 +00:00
Haojian Wu	f1ac00c9b0	[pseudo] Add grammar annotations support. Add annotation handling ([key=value]) in the BNF grammar parser, which will be used in the conditional reduction, and error recovery. Reviewed By: sammccall Differential Revision: https://reviews.llvm.org/D126536	2022-06-09 12:06:22 +02:00
Denis Antrushin	e99e821ce8	[FixupStatepoints] Precommit test for D127308. NFC	2022-06-09 17:04:48 +07:00
Johannes Doerfert	14899bc43d	[Attributor] Generalize interface from ConstantInt to Constant We can use constant to allow undef and there is no need to force integers in the API anyway. The user can decide if a non integer constant is fine or not.	2022-06-09 12:00:26 +02:00
Johannes Doerfert	7a07b88f37	[Attributor][FIX] Replace call site argument uses, not values We need to be careful replacing values as call site arguments (IRPosition::IRP_CALL_SITE_ARGUMENT) is representing a use and not a value. This patch replaces the interface to take a IR position instead making it harder to misuse accidentally. It does not change our tests right now but a follow up exposed the potential footgun.	2022-06-09 12:00:26 +02:00
Johannes Doerfert	1df6e171c3	[Attributor] Simplify (integer range) state handling We used to be very conservative when integer states were merged. Instead of adding the known range (which is large due to uncertainty) into the assumed range (which is hopefully small), we can also only allow to merge in both at the same time into their respective counterpart. This will ensure we keep the invariant that assumed is part of known.	2022-06-09 12:00:26 +02:00
Johannes Doerfert	481b8f31df	[Attributor][NFC] Introduce helper struct We often use a context associated with a value. For now only one use case has been changed.	2022-06-09 12:00:26 +02:00
Johannes Doerfert	4277c1be88	[Attributor][FIX] Avoid metadata and duplicate replication assertion When we recreate instructions as part of simplification we need to take care of debug metadata and replacing the value multiple times. For now, we handle both conservatively.	2022-06-09 12:00:26 +02:00
Biplob Mishra	d87bfa9ad0	[InstCombine] Combine instructions of type or/and where AND masks can be combined. The patch simplifies some of the patterns as below (A \| (B & C0)) \| (B & C1) -> A \| (B & C0\|C1) ((B & C0) \| A) \| (B & C1) -> (B & C0\|C1) \| A In some scenarios like byte reverse on half word, we can see this pattern multiple times and this conversion can optimize these patterns. Additionally this commit fixes the issue reported with the test case. int f(int a, int b) { int c = ((unsigned char)(a >> 23) & 925); if (a) c = (a >> 23 & b) \| ((unsigned char)(a >> 23) & 925) \| (b >> 23 & 157); return c; } The previous revision/commit did not check one-use of an intermediate value that this transform re-uses. When that value has another use, an existing transform will try to invert the transform here. By adding one-use checks, we avoid the infinite loops seen with the earlier commit. Differential Revision: https://reviews.llvm.org/D124119	2022-06-09 10:58:30 +01:00
Andrzej Warzynski	1953bcdaac	[flang] Add RUN lines using `fir-opt` In tests that define a pass pipeline to use, add a RUN line using fir-opt. Differential Revision: https://reviews.llvm.org/D126955	2022-06-09 09:48:54 +00:00
Kiran Chandramohan	8b951e64d9	[Flang][OpenMP] Lower schedule modifiers for worksharing loop Add support for lowering the schedule modifiers (simd, monotonic, non-monotonic) in worksharing loops. Note: This is part of upstreaming from the fir-dev branch of https://github.com/flang-compiler/f18-llvm-project. Reviewed By: peixin Differential Revision: https://reviews.llvm.org/D127311 Co-authored-by: Mats Petersson <mats.petersson@arm.com> Co-authored-by: Jean Perier <jperier@nvidia.com> Co-authored-by: Eric Schweitz <eschweitz@nvidia.com> Co-authored-by: V Donaldson <vdonaldson@nvidia.com>	2022-06-09 09:45:14 +00:00
Alex Zinenko	5f0d4f208e	[mlir] Introduce Transform ops for loops Introduce transform ops for "for" loops, in particular for peeling, software pipelining and unrolling, along with a couple of "IR navigation" ops. These ops are intended to be generalized to different kinds of loops when possible and therefore use the "loop" prefix. They currently live in the SCF dialect as there is no clear place to put transform ops that may span across several dialects, this decision is postponed until the ops actually need to handle non-SCF loops. Additionally refactor some common utilities for transform ops into trait or interface methods, and change the loop pipelining to be a returning pattern. Reviewed By: springerm Differential Revision: https://reviews.llvm.org/D127300	2022-06-09 11:41:55 +02:00
Andrzej Warzynski	ef4318e1a4	[flang][driver] Generate run-time type info This is a small follow-up for https://reviews.llvm.org/D120051. It makes sure that tables with "run-time type information for derived types" are generated for code-gen actions. Originally, only non-code-gen actions were updated (i.e. actions that were fully supported at that time). Differential Revision: https://reviews.llvm.org/D127307	2022-06-09 09:38:12 +00:00
Haojian Wu	74e4d5f256	[pseudo] Simplify the glrReduce implementation. glrReduce maintains two priority queues (one for bases, and the other for Sequence), these queues are in parallel with each other, corresponding to a single family. They can be folded into one. This patch removes the bases priority queue, which improves the glrParse by 10%. ASTReader.cpp: 2.03M/s (before) vs 2.26M/s (after) Differential Revision: https://reviews.llvm.org/D127283	2022-06-09 11:28:52 +02:00
owenca	40a5d79a5c	[clang-format][NFC] Format lib/Format and unittests/Format in clang Reformat these directories with InsertBraces and RemoveBracesLLVM. Differential Revision: https://reviews.llvm.org/D127366	2022-06-09 02:25:06 -07:00
Haojian Wu	7a05942dd0	[pseudo] Remove the explicit Accept actions. As pointed out in the previous review section, having a dedicated accept action doesn't seem to be necessary. This patch implements the the same behavior without accept acction, which will save some code complexity. Reviewed By: sammccall Differential Revision: https://reviews.llvm.org/D125677	2022-06-09 11:19:07 +02:00
Haojian Wu	075449da80	[pseudo] Fix a sign-compare warning in debug build, NFC.	2022-06-09 11:18:03 +02:00
lewuathe	fff27d181c	[mlir][complex] Correctness check for complex.conj Add correctness check for complex.conj operation Reviewed By: pifon2a Differential Revision: https://reviews.llvm.org/D127377	2022-06-09 11:11:56 +02:00
Sam Parker	447c411fef	[ARM][ParallelDSP] Fix self reference bug Ensure we don't generate a smlad intrinsic that takes itself as an argument. Differential Revision: https://reviews.llvm.org/D127213	2022-06-09 09:10:57 +00:00
Guillaume Chatelet	dc3367970e	[SelectionDAG] Handle bzero/memset libcalls globally instead of per target Differential Revision: https://reviews.llvm.org/D127279	2022-06-09 08:34:55 +00:00
Florian Hahn	553d5161ee	[cmake] Add missing dependencies to objlib in add_llvm_executable. After `f06abbb393` I have been seeing build failures due to the obj.clang target missing a dependency on tools/clang/clang-tablegen-targets. This appears to be due to the fact that LLVM_COMMON_DEPENDS are not added as dependencies to the object library. This patch uses the same logic as llvm_add_library to register dependencies for object libraries. Reviewed By: beanz, abrachet, steven_wu Differential Revision: https://reviews.llvm.org/D127318	2022-06-09 09:24:13 +01:00
Chenbing Zheng	38992d2c5e	[InstCombine] improve fold for icmp-ugt-ashr Existing condition for fold icmp ugt (ashr X, ShAmtC), C --> icmp ugt X, ((C + 1) << ShAmtC) - 1 missed some boundary. It cause this fold don't work for some cases, and the reason is due to signed number overflow. Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D127188	2022-06-09 16:22:12 +08:00
Lian Wang	91e31fd205	[RISCV][VP] Add fp test of widen and split for vp.setcc Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D127079	2022-06-09 08:14:12 +00:00
Nikita Popov	56c9976d46	[IndVarSimplify] Don't assert that terminator is not SCEVable (PR55925) The IV widening code currently asserts that terminators aren't SCEVable -- however, this is not the case for invokes with a returned attribute. As far as I can tell, this assertions is not necessary -- even if we have a critical edge (the second test case), the trunc gets inserted in a legal position. Fixes https://github.com/llvm/llvm-project/issues/55925. Differential Revision: https://reviews.llvm.org/D127288	2022-06-09 10:12:13 +02:00
Nicolai Hähnle	f971e77fb4	ADT/ArrayRef: Add makeMutableArrayRef overloads Equivalent overloads already exist for makeArrayRef. Differential Revision: https://reviews.llvm.org/D126421	2022-06-09 09:59:50 +02:00
Lian Wang	362a02dabe	[RISCV][test] Add widen STEP_VECTOR tests. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D127371	2022-06-09 07:47:04 +00:00
Christian Kandeler	d352017184	[include-cleaner] Fix build error in unit test Reviewed By: nridge Differential Revision: https://reviews.llvm.org/D127217	2022-06-09 03:38:13 -04:00
lorenzo chelini	2a3c07f897	[MLIR][Math] Re-order conversions alphabetically (NFC) Minor follow-up after: D127286 (https://reviews.llvm.org/D127286/new/) Reviewed By: ftynse Differential Revision: https://reviews.llvm.org/D127382	2022-06-09 09:12:13 +02:00
Matthias Gehre	7e17e15c9f	clang: Introduce -fexperimental-max-bitint-width This splits of the introduction of -fexperimental-max-bitint-width from https://reviews.llvm.org/D122234 because that PR is still blocked on discussions on the backend side. I was asked [0] to upstream at least the flag. [0] `09854f2af3 (commitcomment-75116619)` Differential Revision: https://reviews.llvm.org/D127287	2022-06-09 07:15:03 +01:00
Bruno Cardoso Lopes	e6a76a4935	[Clang][CoverageMapping] Fix compile time explosions by adjusting only appropriated skipped ranges D83592 added comments to be part of skipped regions, and as part of that, it also shrinks a skipped range if it spans a line that contains a non-comment token. This is done by `adjustSkippedRange`. The `adjustSkippedRange` currently runs on skipped regions that are not comments, causing a 5min regression while building a big C++ files without any comments. Fix the compile time introduced in D83592 by tagging SkippedRange with kind information and use that to decide what needs additional processing. Differential Revision: https://reviews.llvm.org/D127338	2022-06-08 23:13:39 -07:00
David Carlier	a4c97e1937	[Sanitizers] prctl interception update for the PR_SET_VMA option case. Supports on Android but also from Linux 5.17 Reviewers: vitalybuka, eugenis Reviewed-By: vitalybuka Differential Revision: https://reviews.llvm.org/D127326	2022-06-09 06:07:26 +01:00
Fangrui Song	11136a6032	[DeadArgElim] Remove dead code after r128810	2022-06-08 21:11:54 -07:00
Jez Ng	977d62c33e	[lld-macho] Support EH frames under arm64 For arm64, llvm-mc emits relocations for the target function address like so: ltmp: <CIE start> ... <CIE end> ... multiple FDEs ... <FDE start> <target function address - (ltmp + pcrel offset)> ... If any of the FDEs in `multiple FDEs` get dead-stripped, then `FDE start` will move to an earlier address, and `ltmp + pcrel offset` will no longer reflect an accurate pcrel value. To avoid this problem, we "canonicalize" our relocation by adding an `EH_Frame` symbol at `FDE start`, and updating the reloc to be `target function address - (EH_Frame + new pcrel offset)`. Reviewed By: #lld-macho, Roger Differential Revision: https://reviews.llvm.org/D124561	2022-06-08 23:41:29 -04:00
Jez Ng	826be330af	[lld-macho] Initial support for EH Frames == Background == `llvm-mc` generates unwind info in both compact unwind and DWARF formats. LLD already handles the compact unwind format; this diff gets us close to handling the DWARF format properly. == Caveats == It's not quite done yet, but I figure it's worth getting this reviewed and landed first as it's shaping up to be a fairly large code change. Known limitations of the current code: * Only works for x86_64, for which `llvm-mc` emits "abs-ified" relocations as described in `618def651b`. `llvm-mc` emits regular relocations for ARM EH frames, which we do not yet handle correctly. Since the feature is not ready for real use yet, I've gated it behind a flag that only gets toggled on during test suite runs. With most of the new code disabled, we see just a hint of perf regression, so I don't think it'd be remiss to land this as-is: base diff difference (95% CI) sys_time 1.926 ± 0.168 1.979 ± 0.117 [ -1.2% .. +6.6%] user_time 3.590 ± 0.033 3.606 ± 0.028 [ +0.0% .. +0.9%] wall_time 7.104 ± 0.184 7.179 ± 0.151 [ -0.2% .. +2.3%] samples 30 31 == Design == Like compact unwind entries, EH frames are also represented as regular ConcatInputSections that get pointed to via `Defined::unwindEntry`. This allows them to be handled generically by e.g. the MarkLive and ICF code. (But note that unlike compact unwind subsections, EH frame subsections do end up in the final binary.) In order to make EH frames "look like" a regular ConcatInputSection, some processing is required. First, we need to split the `__eh_frame` section along EH frame boundaries rather than along symbol boundaries. We do this by decoding the length field of each EH frame. Second, the abs-ified relocations need to be turned into regular Relocs. == Next Steps == In order to support EH frames on ARM targets, we will either have to teach LLD how to handle EH frames with explicit relocs, or we can try to make `llvm-mc` emit abs-ified relocs for ARM as well. I'm hoping to do the latter as I think it will make the LLD implementation both simpler and faster to execute. == Misc == The `obj-file-with-stabs.s` test had to be updated as the previous version would trip assertion errors in the code. It appears that in our attempt to produce a minimal YAML test input, we created a file with invalid EH frame data. I've fixed this by re-generating the YAML and not doing any hand-pruning of it. Reviewed By: #lld-macho, Roger Differential Revision: https://reviews.llvm.org/D123435	2022-06-08 23:40:52 -04:00
Mogball	971e13d69e	[mlir][ods] Mark StructAttr as deprecated	2022-06-09 03:23:31 +00:00
chenglin.bi	226c564329	[InstCombine] Add vector tests for shl+lshr+and transforms; NFC D126617	2022-06-09 11:14:26 +08:00
Fangrui Song	62309ed955	[msan][test] Fix cpusetsize for another pthread_getaffinity_np.cpp test Similar to D127368	2022-06-08 20:08:05 -07:00

... 3 4 5 6 7 ...

426336 Commits All Branches Search

426336 Commits

All Branches