llvm-project

Commit Graph

Author	SHA1	Message	Date
Philip Reames	26223af256	[SCEV] Split isSCEVExprNeverPoison reasoning explicitly into scope and mustexecute parts [NFC] Inspired by the needs to D111001 and D109845. The seperation of concerns also amakes it easier to reason about correctness and completeness.	2021-10-02 13:10:38 -07:00
Kazu Hirata	c1e32b3fc0	[Target] Migrate from getNumArgOperands to arg_size (NFC) Note that getNumArgOperands is considered a legacy name. See llvm/include/llvm/IR/InstrTypes.h for details.	2021-10-02 12:06:29 -07:00
Lang Hames	d9152a8571	[llvm-jitlink] Sink getPageSize call in Session::Create. The page size for the host process is only needed in the in-process use case.	2021-10-02 11:28:14 -07:00
Simon Pilgrim	7cae0daee6	[X86][Atom] Fix BSR/BSF uops + port usage Both ports are required for BitScan ops. Update the uops counts + port usage based off the most recent llvm-exegesis captures (PR36895) and what Intel AoM / Agner reports as well.	2021-10-02 19:09:44 +01:00
Craig Topper	33d20977b7	Revert "[RISCV] Add an GPR def to the Zvlseg SPILL/RELOAD pseudos" This reverts commit `1f16191906`. We're seeing some issues with this internally. It seems that when the spill is created by register allocation, the GPR doesn't get allocated and an assertion fires during virtual register rewriting. The .mir test case contains the spill before register allocation so register allocation sees it as any other instruction.	2021-10-02 10:44:11 -07:00
mydeveloperday	ac21e3922b	[clang-format] NFC 1% improvement in the overall clang-formatted status	2021-10-02 18:05:45 +01:00
Mehdi Amini	db79f4a2e9	Free memory leak on duplicate interface registration I guess this is why we should use unique_ptr as much as possible. Also fix the InterfaceAttachmentTest.cpp test. Differential Revision: https://reviews.llvm.org/D110984	2021-10-02 16:41:28 +00:00
Simon Pilgrim	9452ec722c	[X86][SSE] Fix typo + infinite-loop in HOP(HOP'(X,X),HOP'(Y,Y)) fold (PR52040) PR52040 identified several issues with the HOP(HOP'(X,X),HOP'(Y,Y)) -> HOP(PERMUTE(HOP'(X,Y)),PERMUTE(HOP'(X,Y)) slow-HOP fold. Not only was there a copy+paste typo when accessing the inner HOP operands, but the (unnecessary) ReplaceAllUsesOfValueWith call was missing one use checks. Now that we have better shuffle combines of HOPs we can just return a new HOP() sequence and not use ReplaceAllUsesOfValueWith at all - this actually improved pair_sum_v8i32_v4i32 codegen as it kicks off further shuffle combines.	2021-10-02 15:31:12 +01:00
Josh Learn	3d209c76dd	[clang-format] Constructor initializer lists format with pp directives Currently constructor initializer lists sometimes format incorrectly when there is a preprocessor directive in the middle of the list. This patch fixes the issue when parsing the initilizer list by ignoring the preprocessor directive when checking if a block is part of an initializer list. rdar://82554274 Reviewed By: MyDeveloperDay, HazardyKnusperkeks Differential Revision: https://reviews.llvm.org/D109951	2021-10-02 13:23:43 +01:00
mydeveloperday	dd3caa99bd	[clang-format] [docs] [NFC] improve clarity in the QualifierAlignment warning Improve the clarity and guidance of the warning when using code modifying option in clang-format see {D69764} Reviewed By: HazardyKnusperkeks, curdeius Differential Revision: https://reviews.llvm.org/D110801	2021-10-02 13:18:42 +01:00
Mark de Wever	09b51451da	[NFC][libc++] Use TEST_HAS_NO_EXCEPTIONS in tests.	2021-10-02 13:47:27 +02:00
Mark de Wever	02c601f442	[libc++][doc] Update format status. Updated based on recent commits, new reviews and work continuing for P2216.	2021-10-02 13:47:02 +02:00
Simon Pilgrim	bb42cc2090	[X86] decomposeMulByConstant - decompose legal vXi32 multiplies on SlowPMULLD targets and all vXi64 multiplies X86's decomposeMulByConstant never permits mul decomposition to shift+add/sub if the vector multiply is legal. Unfortunately this isn't great for SSE41+ targets which have PMULLD for vXi32 multiplies, but is often quite slow. This patch proposes to allow decomposition if the target has the SlowPMULLD flag (i.e. Silvermont). We also always decompose legal vXi64 multiplies - even latest IceLake has really poor latencies for PMULLQ. Differential Revision: https://reviews.llvm.org/D110588	2021-10-02 12:35:25 +01:00
Simon Pilgrim	8e7f6039fa	[X86] Atom SSE shift-by-variable take 2uops/3uops not 1uop Based off the most recent llvm-exegesis captures (PR36895) and what Intel AoM / Agner / InstLatX64 reports as well.	2021-10-02 12:28:41 +01:00
Roman Lebedev	acb459574a	[X86][Costmodel] Load/store i8 Stride=4 VF=32 interleaving costs While we already model this tuple, the load cost is divergent from reality, so fix it. The only sched models that for cpu's that support avx2 but not avx512 are: haswell, broadwell, skylake, zen1-3 For load we have: https://godbolt.org/z/zWMhhnPYa - for intels `Block RThroughput: =56.0`; for ryzens, `Block RThroughput: <=24.0` So pick cost of `56`. For store we have: https://godbolt.org/z/vnqqjWx51 - for intels `Block RThroughput: =12.0`; for ryzens, `Block RThroughput: <=4.0` So pick cost of `12`. I'm directly using the shuffling asm the llc produced, without any manual fixups that may be needed to ensure sequential execution. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D110971	2021-10-02 13:40:21 +03:00
Roman Lebedev	0e71ae6da8	[X86][Costmodel] Load/store i8 Stride=4 VF=16 interleaving costs While we already model this tuple, the values are divergent from reality, so fix them. The only sched models that for cpu's that support avx2 but not avx512 are: haswell, broadwell, skylake, zen1-3 For load we have: https://godbolt.org/z/TrGW7cKsE - for intels `Block RThroughput: =24.0`; for ryzens, `Block RThroughput: <=12.0` So pick cost of `24`. For store we have: https://godbolt.org/z/Mh7qaqEfe - for intels `Block RThroughput: =8.0`; for ryzens, `Block RThroughput: <=4.0` So pick cost of `8`. I'm directly using the shuffling asm the llc produced, without any manual fixups that may be needed to ensure sequential execution. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D110970	2021-10-02 13:40:21 +03:00
Roman Lebedev	74e4a0e327	[X86][Costmodel] Load/store i8 Stride=4 VF=8 interleaving costs While we already model this tuple, the values are divergent from reality, so fix them. The only sched models that for cpu's that support avx2 but not avx512 are: haswell, broadwell, skylake, zen1-3 For load we have: https://godbolt.org/z/v7746Wcf7 - for intels `Block RThroughput: =12.0`; for ryzens, `Block RThroughput: <=6.0` So pick cost of `12`. For store we have: https://godbolt.org/z/aEeEohEbP - for intels `Block RThroughput: =4.0`; for ryzens, `Block RThroughput: <=2.0` So pick cost of `4`. I'm directly using the shuffling asm the llc produced, without any manual fixups that may be needed to ensure sequential execution. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D110969	2021-10-02 13:40:20 +03:00
Roman Lebedev	ae08362cb8	[X86][Costmodel] Load/store i8 Stride=4 VF=4 interleaving costs While we already model this tuple, the store cost is divergent from reality, so fix it. The only sched models that for cpu's that support avx2 but not avx512 are: haswell, broadwell, skylake, zen1-3 For load we have: https://godbolt.org/z/1n4bPh7Tn - for intels `Block RThroughput: =4.0`; for ryzens, `Block RThroughput: <=2.0` So pick cost of `4`. For store we have: https://godbolt.org/z/r8K9sveqo - for intels `Block RThroughput: =4.0`; for ryzens, `Block RThroughput: <=2.0` So pick cost of `4`. I'm directly using the shuffling asm the llc produced, without any manual fixups that may be needed to ensure sequential execution. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D110968	2021-10-02 13:40:20 +03:00
Roman Lebedev	935b9693ae	[X86][Costmodel] Load/store i8 Stride=4 VF=2 interleaving costs While we already model this tuple, the values are divergent from reality, so fix them. The only sched models that for cpu's that support avx2 but not avx512 are: haswell, broadwell, skylake, zen1-3 For load we have: https://godbolt.org/z/KP6nn36zs - for intels `Block RThroughput: =4.0`; for ryzens, `Block RThroughput: <=2.0` So pick cost of `4`. For store we have: https://godbolt.org/z/ov95zhrq6 - for intels `Block RThroughput: =4.0`; for ryzens, `Block RThroughput: <=2.0` So pick cost of `4`. I'm directly using the shuffling asm the llc produced, without any manual fixups that may be needed to ensure sequential execution. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D110966	2021-10-02 13:40:20 +03:00
Roman Lebedev	448c939839	[X86][Costmodel] Load/store i8 Stride=3 VF=32 interleaving costs For VF=16, costs are correct. For VF=32, load cost is divergent. The only sched models that for cpu's that support avx2 but not avx512 are: haswell, broadwell, skylake, zen1-3 For load we have: https://godbolt.org/z/qKjevqf4W - for intels `Block RThroughput: <=14.0`; for ryzens, `Block RThroughput: <=4.5` So pick cost of `14`. For store we have: https://godbolt.org/z/xTssTq319 - for intels `Block RThroughput: =13.0`; for ryzens, `Block RThroughput: <=5.5` So pick cost of `13`. I'm directly using the shuffling asm the llc produced, without any manual fixups that may be needed to ensure sequential execution. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D110961	2021-10-02 13:39:15 +03:00
Roman Lebedev	d1460c88a6	[X86][Costmodel] Load/store i8 Stride=3 VF=8 interleaving costs While we already model this tuple, the values are divergent from reality, so fix them. The only sched models that for cpu's that support avx2 but not avx512 are: haswell, broadwell, skylake, zen1-3 For load we have: https://godbolt.org/z/1jeocxj55 - for intels `Block RThroughput: =6.0`; for ryzens, `Block RThroughput: <=3.0` So pick cost of `6`. For store we have: https://godbolt.org/z/fr7xfa3K5 - for intels `Block RThroughput: =6.0`; for ryzens, `Block RThroughput: <=2.0` So pick cost of `6`. I'm directly using the shuffling asm the llc produced, without any manual fixups that may be needed to ensure sequential execution. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D110960	2021-10-02 13:39:15 +03:00
Roman Lebedev	f1df2d8eaf	[X86][Costmodel] Load/store i8 Stride=3 VF=4 interleaving costs While we already model this tuple, the values are divergent from reality, so fix them. The only sched models that for cpu's that support avx2 but not avx512 are: haswell, broadwell, skylake, zen1-3 For load we have: https://godbolt.org/z/obWz3PrfK - for intels `Block RThroughput: =3.0`; for ryzens, `Block RThroughput: <=1.5` So pick cost of `3`. For store we have: https://godbolt.org/z/orjPshn3h - for intels `Block RThroughput: =4.0`; for ryzens, `Block RThroughput: <=2.0` So pick cost of `4`. I'm directly using the shuffling asm the llc produced, without any manual fixups that may be needed to ensure sequential execution. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D110958	2021-10-02 13:39:10 +03:00
Roman Lebedev	8a3c64c3a2	[X86][Costmodel] Load/store i8 Stride=3 VF=2 interleaving costs While we already model this tuple, the values are divergent from reality, so fix them. The only sched models that for cpu's that support avx2 but not avx512 are: haswell, broadwell, skylake, zen1-3 For load we have: https://godbolt.org/z/WYscYMcW4 - for intels `Block RThroughput: =3.0`; for ryzens, `Block RThroughput: <=1.5` So pick cost of `3`. For store we have: https://godbolt.org/z/e9qvYdbbs - for intels `Block RThroughput: =4.0`; for ryzens, `Block RThroughput: <=2.0` So pick cost of `4`. I'm directly using the shuffling asm the llc produced, without any manual fixups that may be needed to ensure sequential execution. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D110956	2021-10-02 13:39:05 +03:00
Mark de Wever	ac7031b2b2	[libc++][format] Implement Unicode support. This adds the width estimation functions to the std-format-spec. Implements parts of: - P0645 Text Formatting - P1868 width: clarifying units of width and precision in std::format Reviewed By: #libc, ldionne, vitaut Differential Revision: https://reviews.llvm.org/D103413	2021-10-02 11:57:40 +02:00
Tomasz Miąsko	f33274c7bf	[llvm-cxxfilt] Replace isalnum with isAlnum from StringExtras D104366 introduced a new llvm-cxxfilt test with non-ASCII characters, which caused a failure on llvm-clang-x86_64-expensive-checks-win builder, with a stack trace suggesting issue in a call to isalnum. The argument to isalnum should be either EOF or a value that is representable in the type unsigned char. The llvm-cxxfilt does not perform a cast from char to unsigned char before the call, so the value might be out of valid range. Replace the call to isalnum with isAlnum from StringExtras, which takes a char as the argument. This also makes the check independent of the current locale. Differential Revision: https://reviews.llvm.org/D110986	2021-10-02 08:54:04 +02:00
Amara Emerson	f41a9cf859	[AArch64][GlobalISel] Lower G_SMULH/G_UMULH unless its one of the supported types. s32 was also incorrectly marked as a supported type, and was causing fallbacks because we don't support it.	2021-10-01 22:15:23 -07:00
Alexey Lapshin	0b8c50812b	[DWARF][NFC] add ParentIdx and SiblingIdx to DWARFDebugInfoEntry for faster navigation. This patch implements suggestion done while reviewing D102634. It adds two fields: ParentIdx and SiblingIdx. These fields allow fast navigation to die parent and die sibling. These fields are set at the moment when dies are loaded. dsymutil works 2% faster with this patch(run on clang binary). Differential Revision: https://reviews.llvm.org/D110363	2021-10-02 08:11:06 +03:00
Mehdi Amini	237d18a61a	Fix memory leaks in mlir/test/CAPI/ir.c	2021-10-02 04:45:40 +00:00
Mehdi Amini	a1d1c31746	Add a `check-mlir-build-only` build target that only builds the dependencies of the `check-mlir` test target (NFC)	2021-10-02 04:06:17 +00:00
Nimish Mishra	063c5bc31b	[flang][OpenMP] Added OpenMP 5.0 specification based semantic checks for sections construct and test case for simd construct According to OpenMP 5.0 spec document, the following semantic restrictions have been dealt with in this patch. 1. [sections construct] Orphaned section directives are prohibited. That is, the section directives must appear within the sections construct and must not be encountered elsewhere in the sections region. Semantic checks for the following are not necessary, since use of orphaned section construct (i.e. without an enclosing sections directive) throws parser errors and control flow never reaches the semantic checking phase. Added a test case for the same. 2. [sections construct] Must be a structured block Added test case and made changes to branching logic 3. [simd construct] Must be a structured block / A program that branches in or out of a function with declare simd is non conforming 4. Fixed !$omp do's handling of unlabeled CYCLEs Reviewed By: kiranchandramohan Differential Revision: https://reviews.llvm.org/D108904	2021-10-02 08:40:53 +05:30
Shivam Gupta	237e9059f7	[libc++][Docs] Update benchmark doc wrt monorepo Seems this section is not updated since we have transited to llvm-project monorepo. At the start, we build libcxx under monorepo configuration but later try to make the separate configuration for libcxx build and running benchmark. Reviewed By: ldionne, #libc Differential Revision: https://reviews.llvm.org/D110722	2021-10-02 07:35:32 +05:30
LLVM GN Syncbot	e420164f40	[gn build] Port `657f02d458`	2021-10-02 00:21:42 +00:00
Daniel Rodríguez Troitiño	657f02d458	Revert "Extract LC_CODE_SIGNATURE related implementation out of LLD" This reverts commit `cc8229603b`. As discussed in the review of https://reviews.llvm.org/D109972, this was not right approach, so we are reverting to start with a different approach. Differential Revision: https://reviews.llvm.org/D110974	2021-10-01 17:19:50 -07:00
Philip Reames	91dfc0840d	[test] add coverage for a SCEVUnknown scoped value in isSCEVExprNeverPoison Note that a couple of the "negative" tests also end up showing miscompiles due to D109845 which is not yet fixed.	2021-10-01 16:39:23 -07:00
Philip Reames	2ca8a3f213	[SCEV] Stop blindly propagating flags from inbound geps to SCEV nodes This fixes a violation of the wrap flag rules introduced in `c4048d8f`. This was also noted in the (very old) PR23527. The issue being fixed is that we assume the inbound flag on any GEP assumes that all users of any gep (or add) which happens to map to that SCEV would also be UB if the (other) gep overflowed. That's simply not true. In terms of the test diffs, I don't see anything seriously problematic. The lost flags are expected (given the semantic restriction on when its legal to tag the SCEV), and there are several cases where the previously inferred flags are unsound per the new semantics. The only common trend I noticed when looking at the deltas is that by not considering branch on poison as immediate UB in ValueTracking, we do miss a few cases we could reclaim. We may be able to claw some of these back with the follow ideas mentioned in PR51817. It's worth noting that most of the changes are analysis result only changes. The two transform changes are pretty minimal. In one case, we miss the opportunity to infer a nuw (correctly). In the other, we fail to fold an exit and produce a loop invariant form instead. This one is probably over-reduced as the program appears to be undefined in practice, and neither before or after exploits that. Differential Revision: https://reviews.llvm.org/D109789	2021-10-01 16:30:44 -07:00
Philip Reames	24cde2f602	[SCEV] Remove invariant requirement from isSCEVExprNeverPoison This code is attempting to prove that I must execute if we enter the defining scope of the SCEV which will be created from I. In the case where it found a defining addrec scope, it had a rather odd restriction that all of the other operands must be loop invariant in that addrec's loop. As near as I can tell here, we really only need a upper bound on the defining scope. If we can prove the stronger property, then we must also have proven the property on the exact defining scope as well. In practice, the actual effect of this change is narrow. The compile time restriction at the top of the routine basically limits us to I being an arithmetic in some loop L with both an addrec operand in L, and a unknown operands in L. Possible to demonstrate, but the main value of the change is removing unneeded code. Differential Revision: https://reviews.llvm.org/D110892	2021-10-01 15:57:37 -07:00
Philip Reames	d0bca006bb	[test] split flags-from-poison.ll to allow ease of autogen update	2021-10-01 15:35:09 -07:00
Jessica Paquette	96843d220d	[AArch64][GlobalISel] Change G_ANYEXT fed by scalar G_ICMP to G_ZEXT This is a common pattern: ``` %icmp:_(s32) = G_ICMP intpred(eq), ... %ext:_(s64) = G_ANYEXT %icmp(s32) %and:_(s64) = G_AND %ext, 1 ``` Here's an example: https://godbolt.org/z/T13f6o8zE This pattern appears because of the following combine in the LegalizationArtifactCombiner: ``` // zext(trunc x) - > and (aext/copy/trunc x), mask ``` Which kicks in when we widen the result of G_ICMP from 1 bit to 32 bits. We know that, on AArch64, a scalar G_ICMP will produce 0 or 1. So the result of `%ext` will always be 0 or 1 as well. We have some KnownBits combines which eliminate redundant G_ANDs with masks. These combines don't kick in with G_ANYEXT. So, if we replace the G_ANYEXT with G_ZEXT in this situation, the KnownBits based combines can remove the redundant G_AND. I wasn't sure if it woud be more appropriate to * Take this route * Put this in the LegalizationArtifactCombiner. * Allow 64 bit G_ICMP destinations I decided on this route because 1) It's simple 2) I'm not sure if philosophically-speaking, we should be handling non-artifact instructions + target-specific details like TargetBooleanContents in the LegalizationArtifactCombiner 3) There is a lot of existing code which assumes we only have 32 bit G_ICMP destinations. So, adding support for 64-bit destinations seems rather invasive right now. I think that adding support for 64-bit destinations, or modelling G_ICMP as ADDS/SUBS/etc is probably cleaner long term though. This gives minor code size savings on all CTMark benchmarks. Differential Revision: https://reviews.llvm.org/D110959	2021-10-01 15:01:20 -07:00
Stefan Pintilie	40f382ad10	[NFC][PowerPC] Add test case for byval store. Added a test case for situations where a struct of size 1-7 bytes is passed by value.	2021-10-01 16:54:29 -05:00
Daniil Suchkov	a67c7deae7	Revert "[DomTree] Assert that blocks in queries aren't from another function" This reverts commit `86046516e4`. This assertion fails on https://lab.llvm.org/buildbot/#/builders/98/builds/6690 Reverting it for now.	2021-10-01 21:51:00 +00:00
Amy Kwan	103c1bd118	Revert "tsan: fix and test detection of TLS races" This reverts commit `b4c1e5cb73`. Reverting this as it contains a test that is currently failing on the PPC BE bots.	2021-10-01 16:42:31 -05:00
Amy Kwan	8b1984bb8c	Revert "tsan: fix tls_race3 test on darwin" This reverts commit `ade5023c54`. Reverting this commit as it is dependent on a test breaking the PPC BE bots.	2021-10-01 16:42:31 -05:00
Amy Kwan	2df1019576	Revert "tsan: print a meaningful frame for stack races" This reverts commit `ccc83ac7c5`. Reverting this commit as it is dependent on additional commits breaking the PPC BE bots.	2021-10-01 16:42:30 -05:00
Zequan Wu	ab694cd845	[Profile] Add a warning when lock file failed in __llvm_profile_set_file_object with continuous mode	2021-10-01 14:37:09 -07:00
Daniil Suchkov	86046516e4	[DomTree] Assert that blocks in queries aren't from another function This assertion should help us catch cases when DT is used in a way that doesn't make much sense and usually indicates usage errors. In D110752 you can see a test on which this assertion catches a miscompile. The assertion is added to getNode since all queries seem to be routed through that function for all non-trivial cases. Reviewed By: aeubanks, MaskRay Differential Revision: https://reviews.llvm.org/D110751	2021-10-01 21:30:54 +00:00
Daniil Suchkov	45bd8d9477	[SimpleLoopUnswitch] Don't unswitch constant conditions Added an additional check for constants after simplification of "select _, true, false" pattern. We need to prevent attempts to unswitch constant conditions for two reasons: a) Doing that doesn't make any sense, in the best case it will just burn some compile time. b) SimpleLoopUnswitch isn't designed to unswitch constant conditions (due to (a)), so attempting that can cause miscompiles. The attached testcase is an example of such miscompile. Also added an assertion that'll make sure we aren't trying to replace constants, so it will help us prevent such bugs in future. The assertion from D110751 is another layer of protection against such cases. Reviewed By: aeubanks Differential Revision: https://reviews.llvm.org/D110752	2021-10-01 21:30:54 +00:00
Daniil Suchkov	bdd52e8bae	[Test] Add a test exposing a miscompile in SimpleLoopUnswitch. The miscompile was introduced by `6b4b1dc6ec`.	2021-10-01 21:30:54 +00:00
wren romano	af7ac1d95b	[mlir][sparse] Sharing calls to adaptor.getOperands()[0] This is preliminary work towards D110790. Depends On D110883. Reviewed By: aartbik Differential Revision: https://reviews.llvm.org/D110884	2021-10-01 14:20:31 -07:00
wren romano	14fffda979	[mlir][sparse] Factoring out allocaIndices() This is preliminary work towards D110790. Depends On D110882. Reviewed By: aartbik Differential Revision: https://reviews.llvm.org/D110883	2021-10-01 14:18:56 -07:00
wren romano	ca01034714	[mlir][sparse] Factoring out getZero() and avoiding unnecessary Type params This is preliminary work towards D110790 Reviewed By: aartbik Differential Revision: https://reviews.llvm.org/D110882	2021-10-01 14:17:53 -07:00

1 2 3 4 5 ...

400618 Commits All Branches Search

400618 Commits

All Branches