Add a new llvm.fptrunc.round intrinsic to precisely control
the rounding mode when converting from f32 to f16.
Differential Revision: https://reviews.llvm.org/D110579
Generalize D99629 for ELF. A default visibility non-local symbol is preemptible
in a -shared link. `isInterposable` is an insufficient condition.
Moreover, a non-preemptible alias may be referenced in a sub constant expression
which intends to lower to a PC-relative relocation. Replacing the alias with a
preemptible aliasee may introduce a linker error.
Respect dso_preemptable and suppress optimization to fix the abose issues. With
the change, `alias = 345` will not be rewritten to use aliasee in a `-fpic`
compile.
```
int aliasee;
extern int alias __attribute__((alias("aliasee"), visibility("hidden")));
void foo() { alias = 345; } // intended to access the local copy
```
While here, refine the condition for the alias as well.
For some binary formats like COFF, `isInterposable` is a sufficient condition.
But I think canonicalization for the changed case has little advantage, so I
don't bother to add the `Triple(M.getTargetTriple()).isOSBinFormatELF()` or
`getPICLevel/getPIELevel` complexity.
For instrumentations, it's recommended not to create aliases that refer to
globals that have a weak linkage or is preemptible. However, the following is
supported and the IR needs to handle such cases.
```
int aliasee __attribute__((weak));
extern int alias __attribute__((alias("aliasee")));
```
There are other places where GlobalAlias isInterposable usage may need to be
fixed.
Reviewed By: rnk
Differential Revision: https://reviews.llvm.org/D107249
Currently, the clang.arc.attachedcall bundle takes an optional function
argument. Depending on whether the argument is present, calls with this
bundle have the following semantics:
- on x86, with the argument present, the call is lowered to:
call _target
mov rax, rdi
call _objc_retainAutoreleasedReturnValue
- on AArch64, without the argument, the call is lowered to:
bl _target
mov x29, x29
and the objc runtime call is expected to be emitted separately.
That's because, on x86, the objc runtime checks for both the mov and
the call on x86, and treats the combination as the ARC autorelease elision
marker.
But on AArch64, it only checks for the dedicated NOP marker, as that's
historically been sufficiently unique. Thanks to that, the runtime call
wasn't required to be adjacent to the NOP marker, so it wasn't emitted
as part of the bundle sequence.
This patch unifies both architectures: on AArch64, we now emit all
3 instructions for the bundle. This guarantees that the runtime call
is adjacent to the marker in the sequence, and that's information the
runtime can use to further optimize this.
This helps simplify some of the handling, in particular
BundledRetainClaimRVs, which no longer needs to know whether the bundle
is sufficient or not: it now always should be.
Note that this does not include an AutoUpgrade for the nullary bundles,
as they are only produced in ObjCContract as part of the obj/asm emission
pipeline, and are not expected to be in bitcode.
Differential Revision: https://reviews.llvm.org/D118214
Use the llvm flag `-pgo-function-entry-coverage` to create single byte "counters" to track functions coverage. This mode has significantly less size overhead in both code and data because
* We mark a function as "covered" with a store instead of an increment which generally requires fewer assembly instructions
* We use a single byte per function rather than 8 bytes per block
The trade off of course is that this mode only tells you if a function has been covered. This is useful, for example, to detect dead code.
When combined with debug info correlation [0] we are able to create an instrumented Clang binary that is only 150M (the vanilla Clang binary is 143M). That is an overhead of 7M (4.9%) compared to the default instrumentation (without value profiling) which has an overhead of 31M (21.7%).
[0] https://groups.google.com/g/llvm-dev/c/r03Z6JoN7d4
Reviewed By: kyulee
Differential Revision: https://reviews.llvm.org/D116180
The behavior in Analysis (knownbits) implements poison semantics already,
and we expect the transforms (for example, in instcombine) derived from
those semantics, so this patch changes the LangRef and remaining code to
be consistent. This is one more step in removing "undef" from LLVM.
Without this, I think https://github.com/llvm/llvm-project/issues/53330
has a legitimate complaint because that report wants to allow subsequent
code to mask off bits, and that is allowed with undef values. The clang
builtins are not actually documented anywhere AFAICT, but we might want
to add that to remove more uncertainty.
Differential Revision: https://reviews.llvm.org/D117912
Confusion over this point came up in a couple of recent changes (D117180, e20b32ff3). Current tone of discussion seems to be that we think inaccessiblememonly was always legal on allocation functions, so this change takes the form of a clarification instead of a change.
Differential Revision: https://reviews.llvm.org/D117571
This patch fixes a case where the 'align' parameter attribute on the
pointer operands to llvm.vp.gather and llvm.vp.scatter was being dropped
during the conversion to the SelectionDAG. The default alignment equal
to the ABI type alignment of the vector type was kept. It also updates
the documentation to reflect the fact that the parameter attribute is
now properly supported.
The default alignment of these intrinsics was previously documented as
being equal to the ABI alignment of the *scalar* type, when in fact that
wasn't the case: the ABI alignment of the vector type was used instead.
This has also been fixed in this patch.
Reviewed By: simoll, craig.topper
Differential Revision: https://reviews.llvm.org/D114423
Since 26c6a3e736, LLVM's inliner will "upgrade" the caller's stack protector
attribute based on the callee. This lead to surprising results with Clang's
no_stack_protector attribute added in 4fbf84c173 (D46300). Consider the
following code compiled with clang -fstack-protector-strong -Os
(https://godbolt.org/z/7s3rW7a1q).
extern void h(int* p);
inline __attribute__((always_inline)) int g() {
return 0;
}
int __attribute__((__no_stack_protector__)) f() {
int a[1];
h(a);
return g();
}
LLVM will inline g() into f(), and f() would get a stack protector, against the
users explicit wishes, potentially breaking the program e.g. if h() changes the
value of the stack cookie. That's a miscompile.
More recently, bc044a88ee (D91816) addressed this problem by preventing
inlining when the stack protector is disabled in the caller and enabled in the
callee or vice versa. However, the problem remained if the callee is marked
always_inline as in the example above. This affected users, see e.g.
http://crbug.com/1274129 and http://llvm.org/pr52886.
One way to fix this would be to prevent inlining also in the always_inline
case. Despite the name, always_inline does not guarantee inlining, so this
would be legal but potentially surprising to users.
However, I think the better fix is to not enable the stack protector in a
caller based on the callee. The motivation for the old behaviour is unclear, it
seems counter-intuitive, and causes real problems as we've seen.
This commit implements that fix, which means in the example above, g() gets
inlined into f() (also without always_inline), and f() is emitted without stack
protector. I think that matches most developers' expectations, and that's also
what GCC does.
Another effect of this change is that a no_stack_protector function can now be
inlined into a stack protected function, e.g. (https://godbolt.org/z/hafP6W856):
extern void h(int* p);
inline int __attribute__((__no_stack_protector__)) __attribute__((always_inline)) g() {
return 0;
}
int f() {
int a[1];
h(a);
return g();
}
I think that's fine. Such code would be unusual since no_stack_protector is
normally applied to a program entry point which sets up the stack canary. And
even if such code exists, inlining doesn't change the semantics: there is still
no stack cookie setup/check around entry/exit of the g() code region, but there
may be in the surrounding context, as there was before inlining. This also
matches GCC.
See also the discussion at https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94722
Differential revision: https://reviews.llvm.org/D116589
IR:
- globals (and functions, ifuncs, aliases) can have a partition
- catchret has a `to` before the label
- the sint/int types do not exist
- signext comes after the type
- a variable was missing its type
TableGen:
- The second value after a `#` concatenation is optional
See e.g. llvm/lib/Target/X86/X86InstrAVX512.td:L3351
- IncludeDirective and PreprocessorDirective were never referenced in
the grammar
- Add some missing ;
- Parent classes of multiclasses can have generic arguments.
Reuse the `ParentClassList` that is already used in other places.
MIR:
- liveins only allows physical registers, which start with a $
Differential Revision: https://reviews.llvm.org/D116674
llvm.vp.merge interprets the %evl operand differently than the other vp
intrinsics: all lanes at positions greater or equal than the %evl
operand are passed through from the second vector input. Otherwise it
behaves like llvm.vp.select.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D116725
I've changed the definition of the experimental.vector.splice
instrinsic to reject indices that are known to be or possibly
out-of-bounds. In practice, this means changing the definition so that
the index is now only valid in the range [-VL, VL-1] where VL is the
known minimum vector length. We use the vscale_range attribute to
take the minimum vscale value into account so that we can permit
more indices when the attribute is present.
The splice intrinsic is currently only ever generated by the vectoriser,
which will never attempt to splice vectors with out-of-bounds values.
Changing the definition also makes things simpler for codegen since we
can always assume that the index is valid.
This patch was created in response to review comments on D115863
Differential Revision: https://reviews.llvm.org/D115933
Indirect inline asm operands may require the materialization of a
memory access according to the pointer element type. As this will
no longer be available with opaque pointers, we require it to be
explicitly annotated using the elementtype attribute, for example:
define void @test(i32* %p, i32 %x) {
call void asm "addl $1, $0", "=*rm,r"(i32* elementtype(i32) %p, i32 %x)
ret void
}
This patch only includes the LangRef change and Verifier updates to
allow adding the elementtype attribute in this position. It does not
yet enforce this, as this will require changes on the clang side
(and test updates) first.
Something I'm a bit unsure about is whether we really need the
elementtype for all indirect constraints, rather than only indirect
register constraints. I think indirect memory constraints might not
strictly need it (though the backend code is written in a way that
does require it). I think it's okay to just make this a general
requirement though, as this means we don't need to carefully deal
with multiple or alternative constraints. In addition, I believe
that MemorySanitizer benefits from having the element type even in
cases where it may not be strictly necessary for normal lowering
(cd2b050fa4/llvm/lib/Transforms/Instrumentation/MemorySanitizer.cpp (L4066)).
Differential Revision: https://reviews.llvm.org/D116531
This patch extends the available uses of the 'align' parameter attribute
to include vectors of pointers. The attribute specifies pointer
alignment element-wise.
This change was previously requested and discussed in D87304.
The vector predication (VP) intrinsics intend to use this for scatter
and gather operations, as they lack the explicit alignment parameter
that the masked versions use.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D115161
With Control-Flow Integrity (CFI), the LowerTypeTests pass replaces
function references with CFI jump table references, which is a problem
for low-level code that needs the address of the actual function body.
For example, in the Linux kernel, the code that sets up interrupt
handlers needs to take the address of the interrupt handler function
instead of the CFI jump table, as the jump table may not even be mapped
into memory when an interrupt is triggered.
This change adds the no_cfi constant type, which wraps function
references in a value that LowerTypeTestsModule::replaceCfiUses does not
replace.
Link: https://github.com/ClangBuiltLinux/linux/issues/1353
Reviewed By: nickdesaulniers, pcc
Differential Revision: https://reviews.llvm.org/D108478
The LangRef incorrectly says that if no exact match is found when
seeking alignment for a vector type, the largest vector type smaller
than the sought-after vector type. This is incorrect as vector types
require an exact match, else they fall back to reporting the natural
alignment.
The corrected rule was not added in its place, as rules for other types
(e.g., floating-point types) aren't documented.
A unit test was added to demonstrate this.
Reviewed By: aeubanks
Differential Revision: https://reviews.llvm.org/D112463
- `vm` constraint is used for masking operand, which always v0.
- Update testcase, only masking operand should use `vm`, vector mask operations
should just use `vr` for any vector register.
- Revise the description of `vm` constraint.
- This patch also fix issue on RISCVRegisterInfo.td and RISCVISelLowering.cpp.
RISCVRegisterInfo.td:
- The first VT in the list must be the largest total size since the
SelectionDAGBuilder uses the first register in the list as the canonical
type for the register.
RISCVISelLowering.cpp:
- Fix RISCVTargetLowering::splitValueIntoRegisterParts and
RISCVTargetLowering::joinRegisterPartsIntoValue for handling vectors
with different total size, that will happened on fractional LMUL since
fractional LMUL is always occupy one vector register.
Reviewed By: frasercrmck
Differential Revision: https://reviews.llvm.org/D112599
This patch fixes a case where the 'align' parameter attribute on the
pointer operands to llvm.vp.load and llvm.vp.store was being dropped
during the conversion to the SelectionDAG. The default alignment
equal to the ABI type alignment of the vector type was kept. It also
updates the documentation to reflect the fact that the parameter
attribute is now properly supported.
Reviewed By: simoll
Differential Revision: https://reviews.llvm.org/D114422
The default for min is changed to 1. The behaviour of -mvscale-{min,max}
in Clang is also changed such that 16 is the max vscale when targeting
SVE and no max is specified.
Reviewed By: sdesmalen, paulwalker-arm
Differential Revision: https://reviews.llvm.org/D113294
This defines the new `@llvm.ptrauth.` pointer authentication intrinsics:
sign, auth, strip, blend, and sign_generic, documented in PointerAuth.md.
Pointer Authentication is a mechanism by which certain pointers are
signed. When a pointer gets signed, a cryptographic hash of its value
and other values (pepper and salt) is stored in unused bits of that
pointer.
Before the pointer is used, it needs to be authenticated, i.e., have its
signature checked. This prevents pointer values of unknown origin from
being used to replace the signed pointer value.
sign and auth provide the core operations. strip removes the ptrauth
bits from a signed pointer without checking them. sign_generic allows
signing non-pointer values. Finally, blend combines salt values
("discriminators") to derive more targeted and less reusable ones.
In later patches, we implement primary backend support for these
intrinsics using the AArch64 PAuth feature, and build on that to
implement the arm64e Darwin ABI and ELF PAuth ABI Extension in clang.
For more details, see the docs page, as well as our llvm-dev RFC:
http://lists.llvm.org/pipermail/llvm-dev/2019-October/136091.html
or our 2019 Developers' Meeting talk.
Differential Revision: https://reviews.llvm.org/D90868
Add UNIQUED and DISTINCT properties in Metadata.def and use them to
implement restrictions on the `distinct` property of MDNodes:
* DIExpression can currently be parsed from IR or read from bitcode
as `distinct`, but this property is silently dropped when printing
to IR. This causes accepted IR to fail to round-trip. As DIExpression
appears inline at each use in the canonical form of IR, it cannot
actually be `distinct` anyway, as there is no syntax to describe it.
* Similarly, DIArgList is conceptually always uniqued. It is currently
restricted to only appearing in contexts where there is no syntax for
`distinct`, but for consistency it is treated equivalently to
DIExpression in this patch.
* DICompileUnit is already restricted to always being `distinct`, but
along with adding general support for the inverse restriction I went
ahead and described this in Metadata.def and updated the parser to be
general. Future nodes which have this restriction can share this
support.
The new UNIQUED property applies to DIExpression and DIArgList, and
forbids them to be `distinct`. It also implies they are canonically
printed inline at each use, rather than via MDNode ID.
The new DISTINCT property applies to DICompileUnit, and requires it to
be `distinct`.
A potential alternative change is to forbid the non-inline syntax for
DIExpression entirely, as is done with DIArgList implicitly by requiring
it appear in the context of a function. For example, we would forbid:
!named = !{!0}
!0 = !DIExpression()
Instead we would only accept the equivalent inlined version:
!named = !{!DIExpression()}
This essentially removes the ability to create a `distinct` DIExpression
by construction, as there is no syntax for `distinct` inline. If this
patch is accepted as-is, the result would be that the non-canonical
version is accepted, but the following would be an error and produce a diagnostic:
!named = !{!0}
; error: 'distinct' not allowed for !DIExpression()
!0 = distinct !DIExpression()
Also update some documentation to consistently use the inline syntax for
DIExpression, and to describe the restrictions on `distinct` for nodes
where applicable.
Reviewed By: StephenTozer, t-tye
Differential Revision: https://reviews.llvm.org/D104827
This patch fleshes out the missing documentation for the final two VP
intrinsics introduced in D99355: `llvm.vp.gather` and `llvm.vp.scatter`.
It does so mostly by deferring to the `llvm.masked.gather` and
`llvm.masked.scatter` intrinsics, respectively.
Reviewed By: simoll
Differential Revision: https://reviews.llvm.org/D112997
This patch fleshes out the missing documentation for two of the VP
intrinsics introduced in D99355: `llvm.vp.load` and `llvm.vp.store`. It
does so mostly by deferring to the `llvm.masked.load` and
`llvm.masked.store` intrinsics, respectively.
Reviewed By: simoll
Differential Revision: https://reviews.llvm.org/D112930
The mask type for the llvm.experimental.vp.splice intrinsics must have
the same number of elements as the result type.
Reviewed By: simoll
Differential Revision: https://reviews.llvm.org/D112924
This patch fixes a couple of small oversights in the documentation for
the datalayout specification:
* The v and f specifications are subject to the same constraints on <size>
as i is.
* The p[n] specification didn't mark <idx> as optional, despite
being documented and parsed as such.
* Similarly, none of the alignment specifications require <pref>.
Add the additional flags from D36850 as well as noInline/alwaysInline from previous changes.
Reviewed By: tejohnson
Differential Revision: https://reviews.llvm.org/D111600
This patch introduces the vector-predicated version of the
experimental_vector_splice intrinsic [1] at the IR level. It considers
the active vector length for both vectors and and uses a vector mask to
disable certain lanes in the result.
[1] https://reviews.llvm.org/D94708
Change originally authored by Vineet Kumar <vineet.kumar@bsc.es>
Reviewed By: simoll
Differential Revision: https://reviews.llvm.org/D103898
To avoid using the AST when emitting diagnostics, split the "dontcall"
attribute into "dontcall-warn" and "dontcall-error", and also add the
frontend attribute value as the LLVM attribute value. This gives us all
the information to report diagnostics we need from within the IR (aside
from access to the original source).
One downside is we directly use LLVM's demangler rather than using the
existing Clang diagnostic pretty printing of symbols.
Previous revisions didn't properly declare the new dependencies.
Reviewed By: nickdesaulniers
Differential Revision: https://reviews.llvm.org/D110364
To avoid using the AST when emitting diagnostics, split the "dontcall"
attribute into "dontcall-warn" and "dontcall-error", and also add the
frontend attribute value as the LLVM attribute value. This gives us all
the information to report diagnostics we need from within the IR (aside
from access to the original source).
One downside is we directly use LLVM's demangler rather than using the
existing Clang diagnostic pretty printing of symbols.
Reviewed By: nickdesaulniers
Differential Revision: https://reviews.llvm.org/D110364
- This patch adds in the GOFF mangling support to the LLVM data layout string. A corresponding additional line has been added into the data layout section in the language reference documentation.
- Furthermore, this patch also sets the right data layout string for the z/OS target in the SystemZ backend.
Reviewed By: uweigand, Kai, abhina.sreeskantharajan, MaskRay
Differential Revision: https://reviews.llvm.org/D109362
New field `elements` is added to '!DIImportedEntity', representing
list of aliased entities.
This is needed to dump optimized debugging information where all names
in a module are imported, but a few names are imported with overriding
aliases.
Reviewed By: dblaikie
Differential Revision: https://reviews.llvm.org/D109343
SelectionDAG will promote illegal types up to a power of 2 before
splitting down to a legal type. This will create an IntegerType
with a bit width that must be <= MAX_INT_BITS. This places an
effective upper limit on any type of 2^23 so that we don't try
create a 2^24 type.
I considered putting a fatal error somewhere in the path from
TargetLowering::getTypeConversion down to IntegerType::get, but
limiting the type in IR seemed better.
This breaks backwards compatibility with IR that is using a really
large type. I suspect such IR is going to be very rare due to the
the compile time costs such a type likely incurs.
Prevents the ICE in PR51829.
Reviewed By: efriedma, aaron.ballman
Differential Revision: https://reviews.llvm.org/D109721
integer 0/1 for the operand of bundle "clang.arc.attachedcall"
https://reviews.llvm.org/D102996 changes the operand of bundle
"clang.arc.attachedcall". This patch makes changes to llvm that are
needed to handle the new IR.
This should make it easier to understand what the IR is doing and also
simplify some of the passes as they no longer have to translate the
integer values to the runtime functions.
Differential Revision: https://reviews.llvm.org/D103000