llvm-project

Commit Graph

Author	SHA1	Message	Date
David Green	02f8519502	[DAG] Prevent infinite loop combining bitcast shuffle This prevents an infinite loop from D123801, where code trying to reduce the total number of bitcasts, but also handling constants, could create the opposite transform. Prevent the transform in these case to let the bitcast of a constant transform naturally. Fixes #55345	2022-05-09 09:36:22 +01:00
Ben Shi	d2c4ac979b	[AVR] Add PrintMethod for operand memspi Reviewed By: Patryk27 Differential Revision: https://reviews.llvm.org/D124913	2022-05-09 08:31:49 +00:00
Abinav Puthan Purayil	7f6489d0e3	[AMDGPU] Regenerate checks in a mir test	2022-05-09 13:28:09 +05:30
Jean Perier	ed0341788a	[flang] retain binding label of entry subprograms When processing an entry-stmt in name resolution, attrs_ was reset before SetBindNameOn was called, causing the symbol to lose the binding label information. Differential Revision: https://reviews.llvm.org/D125097	2022-05-09 09:50:17 +02:00
Hongtao Yu	a4190037fa	[CSSPGO][Preinliner] Use linear threshold to drive inline decision. The per-callsite size threshold used today to drive preinline decision is based on hotness/coldness cutoff. The default setup is for callsites with a sample count above the hotness cutoff (99%), a 1500 size threshold is used. Any callsite below 99.99% coldness cutoff uses a zero threshold. This has a couple issues: 1. While both cutoffs and size thoresholds are configurable, different applications may need different setups, making a universal setup impractical. 2. The callsites between hotness cutoff and coldness cutoff are not considered as inline candidates, which could be a missing opportunity. 3. Hot callsites always use the same threshold. In reality we may want a bigger threshold for hotter callsites. In this change we are introducing a linear threshold regardless of hot/cold cutoffs. Given a sample space, a threshold is computed for a callsite based on the position of that callsite sample in the whole space. With that we no longer need to define what's hot or cold. Callsites with different hotness will get a different threshold. This should overcome the above three issues. I have seen good results with a universal default setup for two of our internal services. For one service, 0.2% to 0.5% perf improvement over a baseline with a previous default setup, on-par code size. For the second service, 0.5% to 0.8% perf improvement over a baseline with a previous default setup, 0.2% code size increase; on-par performance and code size with a baseline that is with a carefully tuned cutoff to cover enough hot functions. Reviewed By: wenlei Differential Revision: https://reviews.llvm.org/D125023	2022-05-08 22:07:58 -07:00
Christopher Bate	9879807393	[mlir][NvGpu] Fix nvgpu.mma.sync lowering to NVVM for f32, tf32 types Adds missing logic in the lowering from NvGPU to NVVM to support fp32 (in an accumulator operand) and tf32 (in multiplicand operand) types. Fixes logic in one of the helper functions for converting the result of a mma.sync operation with multiple 8x256bit output tiles, which is the case for f32 outputs. Differential Revision: https://reviews.llvm.org/D124533	2022-05-08 21:49:42 -06:00
Peixin-Qiao	c207e36025	[flang] Enforce a program not including more than one main program As Fortran 2018 5.2.2 states, a program shall consist of exactly one main program. Add this semantic check. Reviewed By: klausler Differential Revision: https://reviews.llvm.org/D125186	2022-05-09 10:48:06 +08:00
Xiaodong Liu	36d4f42c36	[lld] Fix typo for processAux; NFC Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D125163	2022-05-09 10:21:47 +08:00
Alexander Yermolovich	3abb68a626	[BOLT][DWARF] Fix assert for split dwarf. Fixing a small bug where it would assert if CU does not modify .debug_addr section. Differential Revision: https://reviews.llvm.org/D125181	2022-05-08 19:18:17 -07:00
Simon Pilgrim	9a12138b5f	[SLP][X86] Add test coverage for PR50392 / Issue #49736	2022-05-08 19:40:04 +01:00
Tue Ly	6d92f4022d	[libc][Obvious] Fix cmake usage of list PREPEND (unavailable pre-3.15).	2022-05-08 13:58:05 -04:00
Tue Ly	13f358376a	[libc] Add LINK_LIBRARIES option to add_fp_unittest and add_libc_unittest. This is needed to prepare for adding FLAGS option. Reviewed By: sivachandra Differential Revision: https://reviews.llvm.org/D125055	2022-05-08 17:33:45 +00:00
Simon Pilgrim	7e3aa70668	[X86] Add test coverage for PR26515 / Issue #26889	2022-05-08 18:19:04 +01:00
Groverkss	4d1fd705f0	[docs] Add Office Hours for Tobias Grosser	2022-05-08 21:14:31 +05:30
Simon Pilgrim	6824cf1ab7	[X86] Set some more plausible latencies for horizontal add/subs on znver1 These are all microcoded/multi-pipe nightmares on Ryzen, but we shouldn't just be using the WriteMicrocoded class which is for REALLY bad microcoded nightmares - instead use the same approximate latencies as znver2 (Agner and uops.info both suggest similar values) - and make sure we use the FPU defs for both Fixes #53242	2022-05-08 15:48:42 +01:00
Simon Pilgrim	800d36cf32	[DAG] Only perform the fold (A-B)+(C-D) --> (A+C)-(B+D) when both inner subs have one use Fixes #51381	2022-05-08 13:51:58 +01:00
Simon Pilgrim	5a6792a146	[X86] combine-add.ll - add test case for PR52039 / Issue #51381 Also split AVX1/AVX2 test coverage	2022-05-08 13:45:23 +01:00
Luo, Yuanke	d5d498f9ba	[X86][AMX] Simplify AMX test case. Extract test for zero tile configure into a small test case.	2022-05-08 19:12:54 +08:00
Simon Pilgrim	751005a2ca	[SLP][X86] Add test coverage for PR42652 / Issue #41997	2022-05-08 12:09:14 +01:00
Simon Pilgrim	7d94597048	[SLP][X86] Add test coverage for PR41892 / Issue #41237	2022-05-08 11:40:53 +01:00
Simon Pilgrim	2233a61500	[SLP][X86] Add test coverage for PR49934 / Issue #49278 D124284 should help us vectorize the sub-128-bit vector cases	2022-05-08 11:33:01 +01:00
Simon Pilgrim	96d2d2508e	[SLP][X86] Add test coverage for PR47491 / Issue #46835 D124284 should help us vectorize the sub-128-bit vector cases	2022-05-08 11:24:46 +01:00
Simon Pilgrim	993d9462e1	[InstCombine] Add test coverage for PR43261 / Issue #42606	2022-05-08 11:10:49 +01:00
Simon Pilgrim	72eb630207	[Headers][X86] Enable basic Wdocumentation testing on X86 headers First part of Issue #35297 - we want to enable Wdocumentation-pedantic as well, but need '\n' support first which Issue #55319 is addressing	2022-05-08 10:53:28 +01:00
Simon Pilgrim	6b3a111a28	[Headers][X86] Replace \operation with \code{.operation} \operation ... \endoperation are not valid doxygen commands and cause issues when -Wdocumentation is enabled (Issue #35297) This patch proposes to replace them with \code{.operation} ... \endcode blocks so that the pseudo-code is correctly retained in any documentation and downstream can use the ".operation" type for its own formatting. Differential Revision: https://reviews.llvm.org/D125170	2022-05-08 10:46:26 +01:00
David Green	6f9e1ea0ef	[VectorCombine] Attempt to fold select shuffles from reductions Given a commutative reduction leading from a shuffle, the order of the lanes on the shuffle are not important for the result. This means we can reorder the shuffle to something simpler, which we try shuffling the first vector lanes first. This was D123494. The new shuffle may not be profitable though, and if it is not we can try the folding of select shuffles from D123911. This, with some adjustment as the output lane ordering is now unimportant, can allow the final shuffle to simplify given the inputs to the patterns from D123911. Where as each transformation on their own are not profitable, the combination is. We can only support a single shuffle when called from reductions, but we are able to sort the ReconstructMask, potentially allowing it to simplify to an identity or concat mask. Differential Revision: https://reviews.llvm.org/D125086	2022-05-08 10:32:41 +01:00
Simon Pilgrim	f2b1648812	[X86] Fix some signedness errors in x86 headers Another step toward enabling full -Wsystem-headers testing across all x86 headers Fix a number of cases where the arg / return value signedness doesn't match the C/C++ intrinsic. So far I've just added explicit casts as necessary, but we might want to address some of the mismatches directly Differential Revision: https://reviews.llvm.org/D125164	2022-05-08 09:42:58 +01:00
Vitaly Buka	08ac661248	[test][msan] Relax order of param shadow Looks like different bots have them in a different order.	2022-05-07 21:17:44 -07:00
Vitaly Buka	009d56da5c	[test][msa] Add more sse,avx intrinsics tests	2022-05-07 20:16:22 -07:00
Stella Laurenzo	6dedbcd5e9	Make BinaryStreamWriter::padToAlignment write blocks vs bytes. While I think this is a performance improvement over the original, this actually fixes a correctness issue: For an appendable underlying stream, padToAlignment would fail if the additional padding would have caused the stream to grow since it was doing its own check on bounds. By deferring to the regular writeArray method this takes the same path as everything else, which does the correct bounds check in WritableBinaryStreamRef::checkOffsetForWrite (i.e. skips the extension check if BSF_Append is set). I had started to fix the existing bounds check in BinaryStreamWriter but deferred to this because it layered better and is more efficient/consistent. It didn't look like this method was tested at all, so I added a unit test. Differential Revision: https://reviews.llvm.org/D124746	2022-05-07 17:37:18 -07:00
Sam McCall	6bbf51f3ed	[Frontend] Move, don't copy the predefines buffer into PP. NFC. It's not trivially small, >10kb.	2022-05-08 01:04:46 +02:00
Amaury Séchet	5cd690ad9c	Generate sse-intel-ocl.ll automatically. NFC	2022-05-07 22:46:39 +00:00
Amaury Séchet	bead7a2ed5	Regenerate avx512-regcall-NoMask.ll . NFC	2022-05-07 22:27:28 +00:00
Andrew Litteken	e38f014c40	[IROutliner] Accomodate blocks containing PHINodes with one entry outside the region and others inside the region. When a PHINode has an incoming block from outside the region, it must be handled specially when assigning a global value number to each incoming value. A PHINode has multiple predecessors, and we must handle this case rather than only the single predecessor case. Reviewer: paquette Differential Revision: https://reviews.llvm.org/D124777	2022-05-07 17:11:21 -05:00
Simon Pilgrim	a385645b47	[RISCV] Regenerate rv32zbp-zbkb.ll Noticed in D124839	2022-05-07 21:29:36 +01:00
David Green	830c18047b	[AArch64] Add missing NVCAST patterns. There were apparently some missing NVCAST patterns. This fills them in using foreach, as opposed to having the specify them individually. Fixes #55321	2022-05-07 21:08:14 +01:00
Simon Pilgrim	7e3ef7dcd2	[AMDGPU] lowerEXTRACT_VECTOR_ELT - fold from a SCALAR_TO_VECTOR source As suggested by @foad on D124839 If we're extracting a vector element that originally came from a scalar_to_vector, then avoid the bitcasting of a vector type and perform the shift masking on the (any-extended) scalar source directly, making use of the fact that the upper elements of a scalar_to_vector are all undef. Differential Revision: https://reviews.llvm.org/D125173	2022-05-07 20:23:31 +01:00
Craig Topper	b81bf7bb2f	[LegalizeTypes] Make use of SelectionDAG::getShiftAmountConstant. NFC Instead of calling getShiftAmountTy and getConstant separately.	2022-05-07 12:16:53 -07:00
Craig Topper	00bfaba997	[LegalizeTypes] Don't assume fshl/fshr shift amount type matches the other operands. Like other shifts, the type isn't required to match. We shouldn't assume we can call ZExtPromotedInteger. I tested the PromoteIntOp_FunnelShift locally by removing the promotion of the shift amount from PromoteIntRes_FunnelShift. But with the final version of this patch it is never executed on any tests. Differential Revision: https://reviews.llvm.org/D125106	2022-05-07 11:44:07 -07:00
Amaury Séchet	06fad8bc05	[DAGCombine] Add node in the worklist in topological order in CombineTo This is part of an ongoing effort toward making DAGCombine process the nodes in topological order. This is able to discover a couple of new optimizations, but also causes a couple of regression. I nevertheless chose to submit this patch for review as to start the discussion with people working on the backend so we can find a good way forward. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D124743	2022-05-07 16:24:31 +00:00
Simon Pilgrim	b432f80e48	[ARM] Update ror.ll test to canonicalized IR As discussed on D124839, we're almost certainly only ever going to see this from IR directly - which now will create funnel shift intrinsics directly I've also added a couple of rotl(rotr()) tests to check left/right rotation merging.	2022-05-07 17:23:42 +01:00
Simon Pilgrim	e7806c08dc	[Headers][X86] amxintrin.h - fixed unknown parameter Wdocumentation warning. NFC Noticed while triaging Issue #35297	2022-05-07 16:20:39 +01:00
Sam McCall	d44ffd631c	[Bitstream] Only consider flushing to file on block boundaries The goal of flushing to disk is to keep a reasonable bound on peak memory usage. With a a default threshold of 512MB (and most BitstreamWriters having no backing file at all), checking after every byte whether to flush seems excessive. This change makes clangd's unittests run 5% faster (in opt), so it's not actually free even in the case with no backing file. Likely there are more important workloads where it makes some difference. Differential Revision: https://reviews.llvm.org/D125145	2022-05-07 16:57:03 +02:00
Amaury Séchet	c2c259224b	const char* for LLVMTargetMachineEmitToFile's argument The `LLVMTargetMachineEmitToFile` takes a `char* Filename` right now, but it doesn't modify it. This is annoying to use in the case where you want to pass a const string, because you either have to remove the const, or copy it somewhere else and pass that. Either way, it's not very nice. I added a const and clang formatted it. This shouldn't break any ABI in my opinion. I'm sorry but I didn't know whom to put as reviewer for this, so I chose someone with a lot of commits from the .cpp file. Reviewed By: deadalnix Differential Revision: https://reviews.llvm.org/D124453	2022-05-07 14:40:55 +00:00
Simon Pilgrim	4750be4907	[X86] Add 32-bit target test coverage to clean header tests	2022-05-07 15:23:46 +01:00
David Green	802e15c576	[SLP] Cluster ordering for loads Given a load without a better order, this patch partially sorts the elements to form clusters of adjacent elements in memory. These clusters can potentially be loaded in fewer loads, meaning less overall shuffling (for example loading v4i8 clusters of a v16i8 as a single f32 loads, as opposed to multiple independent bytes loads and inserts). Differential Revision: https://reviews.llvm.org/D122145	2022-05-07 14:38:11 +01:00
Simon Pilgrim	2cd080c884	[X86] rdrand-builtins.c - add 32-bit target coverage and enable -Wall/-Werror	2022-05-07 14:35:42 +01:00
Amaury Séchet	f4183441d4	Automatically generate aix32-cc-abi-vaarg.ll . NFC	2022-05-07 13:22:40 +00:00
Sanjay Patel	8650f05c97	[InstCombine] fix miscompile when casting int->FP->int As shown in https://github.com/llvm/llvm-project/issues/55150 - the existing fold may be wrong when converting to a signed value. This is a quick fix to avoid the miscompile. I added tests/comments for all of the signed/unsigned combinations at either side of the boundary width, and tried to confirm with Alive2: https://alive2.llvm.org/ce/z/3p9DSu There are already some TODO items in the test file that suggest possible refinements, so the regression with ui->FP->si is probably ok. It seems unlikely that we'd see these kind of edge cases with non-byte-width integer types in real code. The potential miscompile went undetected for several years. This and `747c6a0c73` fixes #55150. Differential Revision: https://reviews.llvm.org/D124692	2022-05-07 08:46:25 -04:00
Simon Pilgrim	6e345426de	[X86] Remove unused 'hint' argument from prefetch tests hint is a compile time constant and can't be passed in as a variable - we already hardcode	2022-05-07 13:38:40 +01:00

... 2 3 4 5 6 ...

423376 Commits All Branches Search

423376 Commits

All Branches