Commit Graph

377131 Commits

Author SHA1 Message Date
Jonas Devlieghere e5553b9a6a [dsymutil] Warn on timestmap mismatch between object file and debug map
Add a warning when the timestmap doesn't match between the object file
and the debug map entry. We were already emitting such warnings for
archive members and swift interface files. This patch also unifies the
warning across all three.

rdar://65614640

Differential revision: https://reviews.llvm.org/D94536
2021-01-12 18:58:10 -08:00
Hsiangkai Wang 914e2f5a02 [NFC] Use generic name for scalable vector stack ID.
Differential Revision: https://reviews.llvm.org/D94471
2021-01-13 10:57:43 +08:00
Hansang Bae bba3a82b56 [OpenMP] Use persistent memory for omp_large_cap_mem
This change enables volatile use of persistent memory for omp_large_cap_mem*
on supported systems. It depends on libmemkind's support for persistent memory,
and requirements/details can be found at the following url.

https://pmem.io/2020/01/20/memkind-dax-kmem.html

Differential Revision: https://reviews.llvm.org/D94353
2021-01-12 20:35:27 -06:00
Shoaib Meenai 0066a09579 [libc++] Give extern templates default visibility on gcc
Contrary to the current visibility macro documentation, it appears that
gcc does handle visibility attribute on extern templates correctly, e.g.
https://godbolt.org/g/EejuV7. We need this so that extern template
instantiations of classes not marked _LIBCPP_TEMPLATE_VIS (e.g.
__vector_base_common) are correctly exported with gcc when building with
hidden visibility.

Reviewed By: ldionne

Differential Revision: https://reviews.llvm.org/D35388
2021-01-12 18:30:56 -08:00
Nico Weber acea470c16 [gn build] Reorganize libcxx/include/BUILD.gn a bit
- Merge 6706342f48 -- no more libcxx_needs_site_config, we now
  always need it
- Since it was always off in practice, write_config bitrot. Unbitrot
  it so that it works
- Remove copy step and let concat step write to final location
  immediately -- and fix copy destination directory

As a side effect, libcxx/include/BUILD.gn now has only a single
sources list, which means the cmake sync script should be able to
automatically sync additions and removals of .h files. On the flipside,
this means this file now must be updated after most changes to
libcxx/include/__config_site.in, and looking through the last few months
of changes this looks like it's going to be a wash.
2021-01-12 21:30:06 -05:00
Hansang Bae 6f0f022038 [OpenMP] Update allocator trait key/value definitions
Use new definitions introduced in 5.1 specification.

Differential Revision: https://reviews.llvm.org/D94277
2021-01-12 20:09:45 -06:00
Reid Kleckner 6529d7c5a4 [PDB] Defer relocating .debug$S until commit time and parallelize it
This is a pretty classic optimization. Instead of processing symbol
records and copying them to temporary storage, do a first pass to
measure how large the module symbol stream will be, and then copy the
data into place in the PDB file. This requires defering relocation until
much later, which accounts for most of the complexity in this patch.

This patch avoids copying the contents of all live .debug$S sections
into heap memory, which is worth about 20% of private memory usage when
making PDBs. However, this is not an unmitigated performance win,
because it can be faster to read dense, temporary, heap data than it is
to iterate symbol records in object file backed memory a second time.

Results on release chrome.dll:
peak mem: 5164.89MB -> 4072.19MB (-1,092.7MB, -21.2%)
wall-j1:  0m30.844s -> 0m32.094s (slightly slower)
wall-j3:  0m20.968s -> 0m20.312s (slightly faster)
wall-j8:  0m19.062s -> 0m17.672s (meaningfully faster)

I gathered similar numbers for a debug, component build of content.dll
in Chrome, and the performance impact of this change was in the noise.
The memory usage reduction was visible and similar.

Because of the new parallelism in the PDB commit phase, more cores makes
the new approach faster. I'm assuming that most C++ developer machines
these days are at least quad core, so I think this is a win.

Differential Revision: https://reviews.llvm.org/D94267
2021-01-12 17:46:29 -08:00
Yuanfang Chen 5c7dcd7aea [Coroutine] Update promise object's final layout index
promise is a header field but it is not guaranteed that it would be the third
field of the frame due to `performOptimizedStructLayout`.

Reviewed By: lxfind

Differential Revision: https://reviews.llvm.org/D94137
2021-01-12 17:44:02 -08:00
Luo, Yuanke 055644cc45 [X86][AMX] Prohibit pointer cast on load.
The load/store instruction will be transformed to amx intrinsics in the
pass of AMX type lowering. Prohibiting the pointer cast make that pass
happy.

Differential Revision: https://reviews.llvm.org/D94372
2021-01-13 09:39:19 +08:00
zhanghb97 c0f3ea8a08 [mlir][Python] Add checking process before create an AffineMap from a permutation.
An invalid permutation will trigger a C++ assertion when attempting to create an AffineMap from the permutation.
This patch adds an `isPermutation` function to check the given permutation before creating the AffineMap.

Reviewed By: mehdi_amini

Differential Revision: https://reviews.llvm.org/D94492
2021-01-13 09:32:32 +08:00
Nico Weber 25b3921f2f [gn build] (manually) port 79f99ba65d 2021-01-12 20:30:56 -05:00
Jianzhou Zhao 82655c1514 [MSan] Tweak CopyOrigin
There could be some mis-alignments when copying origins not aligned.

I believe inaligned memcpy is rare so the cases do not matter too much
in practice.

1) About the change at line 50

Let dst be (void*)5,
then d=5, beg=4
so we need to write 3 (4+4-5) bytes from 5 to 7.

2) About the change around line 77.

Let dst be (void*)5,
because of lines 50-55, the bytes from 5-7 were already writen.
So the aligned copy is from 8.

Reviewed-by: eugenis
Differential Revision: https://reviews.llvm.org/D94552
2021-01-13 01:22:05 +00:00
Juneyoung Lee 25eb7b08ba [DAGCombiner] Fold BRCOND(FREEZE(COND)) to BRCOND(COND)
This patch resolves the suboptimal codegen described in http://llvm.org/pr47873 .
When CodeGenPrepare lowers select into a conditional branch, a freeze instruction is inserted.
It is then translated to `BRCOND(FREEZE(SETCC))` in SelDag.
The `FREEZE` in the middle of `SETCC` and `BRCOND` was causing a suboptimal code generation however.
This patch adds `BRCOND(FREEZE(cond))` -> `BRCOND(cond)` fold to DAGCombiner to remove the `FREEZE`.

To make this optimization sound, `BRCOND(UNDEF)` simply should nondeterministically jump to the branch or not, rather than raising UB.
It wasn't clear what happens when the condition was undef according to the comments in ISDOpcodes.h, however.
I updated the comments of `BRCOND` to make it explicit (as well as `BR_CC`, which is also a conditional branch instruction).

Note that it diverges from the semantics of `br` instruction in IR, which is explicitly UB.
Since the UB semantics was necessary to explain optimizations that use branching conditions, and SelDag doesn't seem to have such optimization, I think this divergence is okay.

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D92015
2021-01-13 09:36:52 +09:00
Juneyoung Lee 76643c48cd [LangRef] State that a nocapture pointer cannot be returned
This is a small patch stating that a nocapture pointer cannot be returned.

Discussed in D93189.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D94386
2021-01-13 09:30:54 +09:00
Siva Chandra Reddy 0c8466c001 [libc][NFC] Use more specific comparison macros in LdExpTest.h. 2021-01-12 16:13:10 -08:00
Michael Jones 04edcc0263 [libc] add isascii and toascii implementations
adding both at once since these are trivial functions.

Reviewed By: sivachandra

Differential Revision: https://reviews.llvm.org/D94558
2021-01-12 23:41:20 +00:00
Joe Nash 314e29ed2b [AMDGPU] Add _e64 suffix to VOP3 Insts
Previously, instructions which could be
expressed as VOP3 in addition to another
encoding had a _e64 suffix on the tablegen
record name, while those
only available as VOP3 did not. With this
patch, all VOP3s will have the _e64 suffix.
The assembly does not change, only  the mir.

Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D94341

Change-Id: Ia8ec8890d47f8f94bbbdac43745b4e9dd2b03423
2021-01-12 18:33:18 -05:00
David Blaikie 0d88d7d82b Delete unused function (was breaking the -Werror build) 2021-01-12 15:29:44 -08:00
Mircea Trofin 585612355c [NFC] Disallow unused prefixes under MC/AMDGPU
This patches remaining tests, and patches lit.local.cfg to block future
such cases (until we flip FileCheck's flag)

Differential Revision: https://reviews.llvm.org/D94556
2021-01-12 15:24:44 -08:00
Julian Lettner 8f5ec45937 [Sanitizer][Darwin] Fix test for macOS 11+ point releases
This test wrongly asserted that the minor version is always 0 when
running on macOS 11 and above.
2021-01-12 15:23:43 -08:00
Jessica Paquette ddcb0aae8b [MIPatternMatch] Add matcher for G_PTR_ADD
Add a matcher which recognizes G_PTR_ADD and add a test.

Differential Revision: https://reviews.llvm.org/D94348
2021-01-12 15:21:19 -08:00
Hongtao Yu 175288a1af Add sample-profile-suffix-elision-policy attribute with -funique-internal-linkage-names.
Adding sample-profile-suffix-elision-policy attribute to functions whose linkage names are uniquefied so that their unique name suffix won't be trimmed when applying AutoFDO profiles.

Reviewed By: dblaikie

Differential Revision: https://reviews.llvm.org/D94455
2021-01-12 15:15:53 -08:00
Bob Haarman 6166b91e83 [ELF][NFCI] small cleanup to OutputSections.h
OutputSections.h used to close the lld::elf namespace only to
immediately open it again. This change merges both parts into
one.

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D94538
2021-01-12 23:09:16 +00:00
Craig Topper 1730b0f66a [RISCV] Remove '.mask' from vcompress intrinsic name. NFC
It has a mask argument, but isn't a masked instruction. It doesn't
use the mask policy of or the v0.t syntax.
2021-01-12 14:46:16 -08:00
Nathan James a7130d85e4
[ADT][NFC] Use empty base optimisation in BumpPtrAllocatorImpl
Most uses of this class just use the default MallocAllocator.
As this contains no fields, we can use the empty base optimisation for BumpPtrAllocatorImpl and save 8 bytes of padding for most use cases.

This prevents using a class that is marked as `final` as the `AllocatorT` template argument.
In one must use an allocator that has been marked as `final`, the simplest way around this is a proxy class.
The class should have all the methods that `AllocaterBase` expects and should forward the calls to your own allocator instance.

Reviewed By: dblaikie

Differential Revision: https://reviews.llvm.org/D94439
2021-01-12 22:43:48 +00:00
Mircea Trofin 55f2eeebc9 [NFC] Disallow unused prefixes in MC/AMDGPU
1 out of 2 patches.

Differential Revision: https://reviews.llvm.org/D94553
2021-01-12 14:31:22 -08:00
Fangrui Song cf45731f0e [Driver] Fix assertion failure when -fprofile-generate -fcs-profile-generate are used together
If conflicting `-fprofile-generate -fcs-profile-generate` are used together,
there is currently an assertion failure. Fix the failure.

Also add some driver tests.

Reviewed By: xur

Differential Revision: https://reviews.llvm.org/D94463
2021-01-12 14:19:55 -08:00
Matt Arsenault 3d39709159 AMDGPU: Remove wrapper only call limitation
This seems to only have overridden cold handling, which we probably
shouldn't do. As far as I can tell the wrapper library functions are
still inlined as appropriate.
2021-01-12 17:12:49 -05:00
Shilei Tian 01f1273fe2 [OpenMP] Fixed a typo in openmp/CMakeLists.txt 2021-01-12 17:00:49 -05:00
Martin Storsjö 02f1d28ed6 [libcxx] Avoid overflows in the windows __libcpp_steady_clock_now()
As freq.QuadValue can be in the range of 10000000 to 19200000,
the multiplication before division makes the calculation overflow
and wrap to negative values every 16-30 minutes.

Instead count the whole seconds separately before adding the
scaled fractional seconds.

Add a testcase for steady_clock to check that the values returned for
now() compare as bigger than the zero time origin; this
corresponds to a testcase in Qt [1] [2] (that failed spuriously
due to this).

[1] https://bugreports.qt.io/browse/QTBUG-89539
[2] https://code.qt.io/cgit/qt/qtbase.git/tree/tests/auto/corelib/kernel/qdeadlinetimer/tst_qdeadlinetimer.cpp?id=f8de5e54022b8b7471131b7ad55c83b69b2684c0#n569

Differential Revision: https://reviews.llvm.org/D93456
2021-01-12 23:56:03 +02:00
Martin Storsjö d1fa7afc7a [AArch64] [Windows] Properly add :lo12: reloc specifiers when generating assembly
This makes sure that assembly output actually can be assembled.

Set the correct MCExpr relocations specifier VK_PAGEOFF - and also
set VK_PAGE consistently even though it's not visible in the assembly
output.

Differential Revision: https://reviews.llvm.org/D94365
2021-01-12 23:56:03 +02:00
Shilei Tian 68ff52ffea [OpenMP] Fixed the link error that cannot find static data member
Constant static data member can be defined in the class without another
define after the class in C++17. Although it is C++17, Clang can still handle it
even w/o the flag for C++17. Unluckily, GCC cannot handle that.

Reviewed By: jhuber6

Differential Revision: https://reviews.llvm.org/D94541
2021-01-12 16:48:28 -05:00
modimo 2a49b7c64a [Inliner] Change inline remark format and update ReplayInlineAdvisor to use it
This change modifies the source location formatting from:
LineNumber.Discriminator
to:
LineNumber:ColumnNumber.Discriminator

The motivation here is to enhance location information for inline replay that currently exists for the SampleProfile inliner. This will be leveraged further in inline replay for the CGSCC inliner in the related diff.

The ReplayInlineAdvisor is also modified to read the new format and now takes into account the callee for greater accuracy.

Testing:
ninja check-llvm

Reviewed By: mtrofin

Differential Revision: https://reviews.llvm.org/D94333
2021-01-12 13:43:48 -08:00
Alex Zinenko 7fd1850813 [mlir] Update LLVM dialect type documentation
Recent commits reconfigured LLVM dialect types to use built-in types whenever
possible. Update the documentation accordingly.

Reviewed By: mehdi_amini

Differential Revision: https://reviews.llvm.org/D94485
2021-01-12 22:38:24 +01:00
Nikita Popov 23390e7a13 [InstCombine] Handle logical and/or in assume optimization
assume(a && b) can be converted to assume(a); assume(b) even if
the condition is logical. Same for assume(!(a || b)).
2021-01-12 22:36:40 +01:00
Michael Munday 71ed4b6ce5 [RISCV] Legalize select when Zbt extension available
The custom expansion of select operations in the RISC-V backend
interferes with the matching of cmov instructions. Legalizing
select when the Zbt extension is available solves that problem.

Reviewed By: lenary, craig.topper

Differential Revision: https://reviews.llvm.org/D93767
2021-01-12 21:24:38 +00:00
Nikita Popov e15f3ddcae [InstCombine] Add tests for logical and/or poison implication (NFC)
These tests cover some cases where we can fold select to and/or
based on poison implication logic.
2021-01-12 22:18:51 +01:00
Craig Topper 7583ae48a3 [RISCV] Add double test cases to vfmerge-rv32.ll. NFC 2021-01-12 13:09:48 -08:00
Sanjay Patel 9e7895a868 [SLP] reduce code duplication while processing reductions; NFC 2021-01-12 16:03:57 -05:00
Sanjay Patel 92fb5c49e8 [SLP] rename variable to improve readability; NFC
The OperationData in the 2nd block (visiting the operands)
is completely independent of the 1st block.
2021-01-12 16:03:57 -05:00
Sanjay Patel 554be30a42 [SLP] reduce code duplication in processing reductions; NFC 2021-01-12 16:03:57 -05:00
Sanjay Patel 46507a96fc [SLP] reduce code duplication while matching reductions; NFC 2021-01-12 16:03:57 -05:00
Philip Reames caafdf07bb [LV] Weaken spuriously strong assert in LoopVersioning
LoopVectorize uses some utilities on LoopVersioning, but doesn't actually use it for, you know, versioning.  As a result, the precondition LoopVersioning expects is too strong for this user.  At the moment, LoopVectorize supports any loop with a unique exit block, so check the same precondition here.

Really, the whole class structure here is a mess.  We should separate the actual versioning from the metadata updates, but that's a bigger problem.
2021-01-12 12:57:13 -08:00
Nikita Popov fb063c933f [InstCombine] Duplicate tests for logical and/or (NFC)
This replicates existing and/or tests to also test variants using
select. This should help us get a more accurate view on which
optimizations we're missing if we disable the select -> and/or
fold.
2021-01-12 21:50:41 +01:00
Sunil Srivastava f706486eaf Fix for crash in __builtin_return_address in template context.
The check for argument value needs to be guarded by !isValueDependent().

Differential Revision: https://reviews.llvm.org/D94438
2021-01-12 12:37:18 -08:00
Philip Reames 9f61fbd75a [LV] Relax assumption that LCSSA implies single entry
This relates to the ongoing effort to support vectorization of multiple exit loops (see D93317).

The previous code assumed that LCSSA phis were always single entry before the vectorizer ran. This was correct, but only because the vectorizer allowed only a single exiting edge. There's nothing in the definition of LCSSA which requires single entry phis.

A common case where this comes up is with a loop with multiple exiting blocks which all reach a common exit block. (e.g. see the test updates)

Differential Revision: https://reviews.llvm.org/D93725
2021-01-12 12:34:52 -08:00
Nikita Popov d49974f9c9 [InstCombine] Regenerate test checks (NFC) 2021-01-12 21:26:42 +01:00
Yitzhak Mandelbaum 922a5b8941 [clang-tidy] Add test for Transformer-based checks with diagnostics.
Adds a test that checks the diagnostic output of the tidy.

Differential Revision: https://reviews.llvm.org/D94453
2021-01-12 20:15:22 +00:00
Zequan Wu e53bbd9951 [IR] move nomerge attribute from function declaration/definition to callsites
Move nomerge attribute from function declaration/definition to callsites to
allow virtual function calls attach the attribute.

Differential Revision: https://reviews.llvm.org/D94537
2021-01-12 12:10:46 -08:00
Florian Hahn 6cd44b204c
[FunctionAttrs] Derive willreturn for fns with readonly` & `mustprogress`.
Similar to D94125, derive `willreturn` for functions that are `readonly` and
`mustprogress` in FunctionAttrs.

To quote the reasoning from D94125:

    Since D86233 we have `mustprogress` which, in combination with
    `readonly`, implies `willreturn`. The idea is that every side-effect
    has to be modeled as a "write". Consequently, `readonly` means there
    is no side-effect, and `mustprogress` guarantees that we cannot "loop"
    forever without side-effect.

Reviewed By: jdoerfert, nikic

Differential Revision: https://reviews.llvm.org/D94502
2021-01-12 20:02:34 +00:00