Summary: Couple of tests to showcase what will be done and what to expect with ICV tracking.
Reviewers: jdoerfert, JonChesterfield
Subscribers: yaxunl, guansong, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D81114
Note that .eh_frame sections are generated in the 32-bit format even
when debug sections are 64-bit, for compatibility reasons. They use
relative references between entries, so they hardly benefit from the
64-bit format.
Differential Revision: https://reviews.llvm.org/D81149
DW_FORM_sec_offset was introduced in DWARFv4, so, for 64-bit DWARFv3,
DW_FORM_data8 should be used instead.
Differential Revision: https://reviews.llvm.org/D81148
The patch enables producing DWARF64 compilation units and fixes
generating references to .debug_abbrev and .debug_line sections.
A similar change for .debug_ranges/.debug_rnglists will be added
in a forthcoming patch.
Differential Revision: https://reviews.llvm.org/D81145
The patch adds an option `--dwarf64` to instruct a tool to generate
debug information in the 64-bit DWARF format. There is no real
implementation yet, only a few compatibility checks.
Differential Revision: https://reviews.llvm.org/D81143
This patch extends MatchVectorAllZeroTest to handle OR vector reduction patterns where the result is compared against zero.
Fixes PR45378
Differential Revision: https://reviews.llvm.org/D81547
Only functions with floating-point return type accepts fast-math flags.
When adding such flags to function returning integer, we'll see a crash,
because there's still an undeleted value referencing the argument. This
patch manually removes the temporary instruction when error occurs.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D78355
This diff adds support for copying binaries
containing a LC_CODE_SIGNATURE load command.
Test plan: make check-all
Differential revision: https://reviews.llvm.org/D81768
Summary:
Normally, the Origin is passed over TLS, which seems like it introduces unnecessary overhead. It's in the (extremely) cold path though, so the only overhead is in code size.
But with eager-checks, calls to __msan_warning functions are extremely common, so this becomes a useful optimization.
This can save ~5% code size.
Reviewers: eugenis, vitalybuka
Reviewed By: eugenis, vitalybuka
Subscribers: hiraditya, #sanitizers, llvm-commits
Tags: #sanitizers, #llvm
Differential Revision: https://reviews.llvm.org/D81700
It's possible to end up with a zext or something in the way of a G_CONSTANT,
even pre-legalization. This can happen with memsets.
e.g.
https://godbolt.org/z/Bjc8cw
To make sure we can catch these cases, use `getConstantVRegValWithLookThrough`
instead of `mi_match`.
Differential Revision: https://reviews.llvm.org/D81875
Summary:
L is meant to support the second word used by 32b calling conventions for 64b arguments.
This is required for build 32b PowerPC Linux kernels after upstream
commit 334710b1496a ("powerpc/uaccess: Implement unsafe_put_user() using 'asm goto'")
Thanks for the report from @nathanchance, and reference to GCC's
implementation from @segher.
Fixes: pr/46186
Fixes: https://github.com/ClangBuiltLinux/linux/issues/1044
Reviewers: echristo, hfinkel, MaskRay
Reviewed By: MaskRay
Subscribers: MaskRay, wuzish, nemanjai, hiraditya, kbarton, steven.zhang, llvm-commits, segher, nathanchance, srhines
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D81767
The promotion machinery in CGP moves instructions retaining
debug locations. When the transformation is local, this is mostly
correct, but when instructions are moved cross-BBs, this is not
always true and causes jumpiness in line tables. This is the first
of a series of commits. sext(s) and zext(s) need to be treated
similarly.
Differential Revision: https://reviews.llvm.org/D81879
As suggested in D81472, the load/store intrinsics' pointer arguments can
be marked as nocapture and all matrix intrinsics as nosync.
This also re-flows the intrinsic definitions, to make them a little more
concise.
Add selection support for ext via a new opcode, G_EXT and a post-legalizer
combine which matches it.
Add an `applyEXT` function, because the AArch64ext patterns require a register
for the immediate. So, we have to create a G_CONSTANT to get these without
writing new patterns or modifying the existing ones.
Tests are the same as arm64-ext.ll.
Also prevent ext from firing on the zip test. It has higher priority, so we
don't want it potentially getting in the way of mask tests.
Also fix up the shuffle-splat test, because ext is now selected there. The
test was incorrectly regbank selected before, which could cause a verifier
failure when you emit copies.
Differential Revision: https://reviews.llvm.org/D81436
Port partial constant store merging logic to MemorySSA backed DSE. The
heavy lifting is done by the existing helper function. It is used in
context where we already ensured that the later instruction can
eliminate the earlier one, if it is a complete overwrite.
If a symbol name begins with the linker private global prefix (as
described by the DataLayout) then it should be treated as non-exported,
regardless of its LLVM IR visibility value.
This adds 4 new reloc types.
A lot of code that previously assumed any memory or offset values could be contained in a uint32_t (and often truncated results from functions returning 64-bit values) have been upgraded to uint64_t. This is not comprehensive: it is only the values that come in contact with the new relocation values and their dependents.
A new tablegen mapping was added to automatically upgrade loads/stores in the assembler, which otherwise has no way to select for these instructions (since they are indentical other than for the offset immediate). It follows a similar technique to https://reviews.llvm.org/D53307
Differential Revision: https://reviews.llvm.org/D81704
This implements the following combines:
((0-A) + B) -> B-A
(A + (0-B)) -> A-B
Porting over the basic algebraic combines from the DAGCombiner. There are
several combines which fold adds away into subtracts. This is just the simplest
one.
I noticed that add combines are some of the most commonly hit across CTMark,
(via print statements when they fire), so I'm porting over some of the obvious
ones.
This gives some minor code size improvements on CTMark at -O3 on AArch64.
Differential Revision: https://reviews.llvm.org/D77453
These are legal since we can do a 96-bit load on some subtargets, but
this is only for vector loads. If we can't widen the load, it needs to
be broken down once known scalar. For 16-byte alignment, widen to a
128-bit load.
Context: https://github.com/WebAssembly/memory64/blob/master/proposals/memory64/Overview.md
This is just a first step, adding the new instruction variants while keeping the existing 32-bit functionality working.
Some of the basic load/store tests have new wasm64 versions that show that the basics of the target are working.
Further features need implementation, but these will be added in followups to keep things reviewable.
Differential Revision: https://reviews.llvm.org/D80769
We should not be adding the relocation addend to the instruction encoding.
This patch removes that and sets those bits to zero.
Differential Revision: https://reviews.llvm.org/D81082
Reduce by splitting the vector until we reach the target size for PTEST/MOVMSK_PCMPEQ. There might be some cases where AVX512 can perform this with 512-bit vectors but so far I haven't encountered any such pattern that reaches LowerVectorAllZeroTest.
Prep work for D81547
> relocImm was a complexPattern that handled both ConstantSDNode
> and X86Wrapper. But it was only applied selectively because using
> it would cause patterns to be not importable into FastISel or
> GlobalISel. So it only got applied to flag setting instructions,
> stores, RMW arithmetic instructions, and rotates.
>
> Most of the test changes are a result of making patterns available
> to GlobalISel or FastISel. The absolute-cmp.ll change is due to
> this fixing a pattern ordering issue to make an absolute symbol
> match to an 8-bit immediate before trying a 32-bit immediate.
>
> I tried to use PatFrags to reduce the repetition, but I was getting
> errors from TableGen.
This caused "Invalid EmitNode" assertions, see the llvm-commits thread for
discussion.
In preparation for a patch that will enforce new rules for the usage of
the strictfp attribute, this patch introduces auto-upgrade behavior that
will replace the strictfp attribute on callsites with nobuiltin if the
enclosing function declaration doesn't also have the strictfp attribute.
This auto-upgrade isn't being performed on .ll files because that would
prevent us from writing a test for the forthcoming verifier behavior.
Differential Revision: https://reviews.llvm.org/D70096
Outline chunks of code which need to save and restore the link register
when a spare register can be used to it.
Differential Revision: https://reviews.llvm.org/D80127
Summary:
SCTLR_EL1.BT[01] controls the PACI[AB]SP compatibility with PBYTE 11
(see [1])
This bit will be set to zero so PACI[AB]SP are equal to BTI C
instruction only.
[1] https://developer.arm.com/docs/ddi0595/b/aarch64-system-registers/sctlr_el1
Reviewers: chill, tamas.petz, pbarrio, ostannard
Reviewed By: tamas.petz, ostannard
Subscribers: kristof.beyls, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D81746
The logic is written for what loads/stores should be selectable. There
are a set of cases that should be selectable, but due to missing MVTs
and/or selection patterns, will fail to select. I think eventually
load/store select patterns should ignore the type and only look at the
value size, but until that happens, bitcast these to equivalent i32
vectors.
We have an issue currently. The following YAML piece just ignores the `Excluded` key.
```
SectionHeaderTable:
Sections: []
Excluded:
- Name: .foo
```
Currently the meaning is: exclude the whole table.
The code checks that the `Sections` key is empty and doesn't catch/check
invalid/duplicated/missed `Excluded` entries.
Also there is no way to exclude all sections except the first null section,
because `Sections: []` currently just excludes the whole the sections header table.
To fix it, I suggest a change of the behavior.
1) A new `NoHeaders` key is added. It provides an explicit syntax to drop the whole table.
2) The meaning of the following is changed:
```
SectionHeaderTable:
Sections: []
Excluded:
- Name: .foo
```
Assuming there are 2 sections in the object (a null section and `.foo`), with this patch it
means: exclude the `.foo` section, keep the null section. The null section is an implicit
section and I think it is reasonable to make "Sections: []" to mean it is implicitly added.
It will be consistent with the global "Sections" tag that is used to describe sections.
3) `SectionHeaderTable->Sections` is now optional. No `Sections` is the same as
`Sections: []` (I think it avoids a confusion).
4) Using of `NoHeaders` together with `Sections`/`Excluded` is not allowed.
5) It is possible to use the `Excluded` key without the `Sections` key now (in this case
`Excluded` must contain all sections).
6) `SectionHeaderTable:` or `SectionHeaderTable: []` is not allowed.
7) When the `SectionHeaderTable` key is present, we still require all sections to be
present in `Sections` and `Excluded` lists. No changes here, we are still strict.
Differential revision: https://reviews.llvm.org/D81655