Commit Graph

285 Commits

Author SHA1 Message Date
serge-sans-paille e188aae406 Cleanup header dependencies in LLVMCore
Based on the output of include-what-you-use.

This is a big chunk of changes. It is very likely to break downstream code
unless they took a lot of care in avoiding hidden ehader dependencies, something
the LLVM codebase doesn't do that well :-/

I've tried to summarize the biggest change below:

- llvm/include/llvm-c/Core.h: no longer includes llvm-c/ErrorHandling.h
- llvm/IR/DIBuilder.h no longer includes llvm/IR/DebugInfo.h
- llvm/IR/IRBuilder.h no longer includes llvm/IR/IntrinsicInst.h
- llvm/IR/LLVMRemarkStreamer.h no longer includes llvm/Support/ToolOutputFile.h
- llvm/IR/LegacyPassManager.h no longer include llvm/Pass.h
- llvm/IR/Type.h no longer includes llvm/ADT/SmallPtrSet.h
- llvm/IR/PassManager.h no longer includes llvm/Pass.h nor llvm/Support/Debug.h

And the usual count of preprocessed lines:
$ clang++ -E  -Iinclude -I../llvm/include ../llvm/lib/IR/*.cpp -std=c++14 -fno-rtti -fno-exceptions | wc -l
before: 6400831
after:  6189948

200k lines less to process is no that bad ;-)

Discourse thread on the topic: https://llvm.discourse.group/t/include-what-you-use-include-cleanup

Differential Revision: https://reviews.llvm.org/D118652
2022-02-02 06:54:20 +01:00
Nikita Popov 7d176844d0 [CodeExtractor] Fix warning in assert (NFC) 2022-01-28 16:33:34 +01:00
Giorgis Georgakoudis 95b981ca2a [CodeExtractor] Enable partial aggregate arguments
Summary:
Enable CodeExtractor to construct output functions that partially
aggregate inputs/outputs in their argument list. A use case is the
OMPIRBuilder to create outlined functions for parallel regions that
aggregate in a struct the payload variables for the region while passing
as scalars thread and bound identifiers.

Differential Revision: https://reviews.llvm.org/D96854
2022-01-25 20:50:34 -05:00
Nikita Popov 730414b341 [CodeExtractor] Remove unnecessary explicit attribute handling (NFC)
The nounwind and uwtable attributes will get handled as part of
the loop below as well, there is no need to special-case them here.
2022-01-03 14:23:25 +01:00
Nikita Popov 587495ffa1 [CodeExtractor] Separate function from param/ret attributes (NFC)
This list is confusing because it conflates functions attributes
(which are either extractable or not) and other attribute kinds,
which are simply irrelevant for this code.
2022-01-03 14:07:12 +01:00
Kazu Hirata 0d182d9d1e [Transforms] Use make_early_inc_range (NFC) 2021-11-07 17:03:15 -08:00
Andrew Litteken 144cd22bae [CodeExtractor] Creating exit stubs based off original order branch instructions.
Previously the CodeExtractor created exit stubs, and the subsequent return value of the outlined function based on the order of out-of-region blocks after splitting any phi nodes, and collecting the blocks to be outlined. This could cause differences in order if there was a difference of exit block phi nodes between the two regions. This patch moves the collection of the output target blocks to be before this occurs, so that the assignment of target block to output value will be the same, regardless of the contents of the output block.

Reviewers: paquette, roelofs

Differential Revision: https://reviews.llvm.org/D108657
2021-09-08 15:15:15 -07:00
Andrew Litteken 9d2c859ebb [CodeExtractor] Making the arguments outlined easier to access from the outside
The Code Extractor does not provide an easy mechanism for determining the
inputs and outputs after extraction has occurred, this patch gives the
ability to pass in empty SetVectors to be filled with the inputs and
outputs if they need to be analyzed.

Added Tests:
- InputOutputMonitoring in unittests/Transforms/Utils/CodeExtractorTests.cpp

Reviewers: paquette

Differential Revision: https://reviews.llvm.org/D106991
2021-08-26 09:47:53 -07:00
Vyacheslav Zakharin 2e192ab1f4 [CodeExtractor] Preserve topological order for the return blocks.
Differential Revision: https://reviews.llvm.org/D108673
2021-08-25 08:09:01 -07:00
Alexander Potapenko b0391dfc73 [clang][Codegen] Introduce the disable_sanitizer_instrumentation attribute
The purpose of __attribute__((disable_sanitizer_instrumentation)) is to
prevent all kinds of sanitizer instrumentation applied to a certain
function, Objective-C method, or global variable.

The no_sanitize(...) attribute drops instrumentation checks, but may
still insert code preventing false positive reports. In some cases
though (e.g. when building Linux kernel with -fsanitize=kernel-memory
or -fsanitize=thread) the users may want to avoid any kind of
instrumentation.

Differential Revision: https://reviews.llvm.org/D108029
2021-08-20 14:01:06 +02:00
Arthur Eubanks 80ea2bb574 [NFC] Rename AttributeList::getParam/Ret/FnAttributes() -> get*Attributes()
This is more consistent with similar methods.
2021-08-13 11:16:52 -07:00
Nikita Popov c191035f42 [IR] Add elementtype attribute
This implements the elementtype attribute specified in D105407. It
just adds the attribute and the specified verifier rules, but
doesn't yet make use of it anywhere.

Differential Revision: https://reviews.llvm.org/D106008
2021-07-15 18:04:26 +02:00
Adrian Prantl f3869a5c32 Support stripping indirectly referenced DILocations from !llvm.loop metadata
in stripDebugInfo().  This patch fixes an oversight in
https://reviews.llvm.org/D96181 and also takes into account loop
metadata pointing to other MDNodes that point into the debug info.

rdar://78487175

Differential Revision: https://reviews.llvm.org/D103220
2021-05-27 13:23:33 -07:00
Marco Elver 280333021e [SanitizeCoverage] Add support for NoSanitizeCoverage function attribute
We really ought to support no_sanitize("coverage") in line with other
sanitizers. This came up again in discussions on the Linux-kernel
mailing lists, because we currently do workarounds using objtool to
remove coverage instrumentation. Since that support is only on x86, to
continue support coverage instrumentation on other architectures, we
must support selectively disabling coverage instrumentation via function
attributes.

Unfortunately, for SanitizeCoverage, it has not been implemented as a
sanitizer via fsanitize= and associated options in Sanitizers.def, but
rolls its own option fsanitize-coverage. This meant that we never got
"automatic" no_sanitize attribute support.

Implement no_sanitize attribute support by special-casing the string
"coverage" in the NoSanitizeAttr implementation. To keep the feature as
unintrusive to existing IR generation as possible, define a new negative
function attribute NoSanitizeCoverage to propagate the information
through to the instrumentation pass.

Fixes: https://bugs.llvm.org/show_bug.cgi?id=49035

Reviewed By: vitalybuka, morehouse

Differential Revision: https://reviews.llvm.org/D102772
2021-05-25 12:57:14 +02:00
Tim Northover ea0eec69f1 IR+AArch64: add a "swiftasync" argument attribute.
This extends any frame record created in the function to include that
parameter, passed in X22.

The new record looks like [X22, FP, LR] in memory, and FP is stored with 0b0001
in bits 63:60 (CodeGen assumes they are 0b0000 in normal operation). The effect
of this is that tools walking the stack should expect to see one of three
values there:

  * 0b0000 => a normal, non-extended record with just [FP, LR]
  * 0b0001 => the extended record [X22, FP, LR]
  * 0b1111 => kernel space, and a non-extended record.

All other values are currently reserved.

If compiling for arm64e this context pointer is address-discriminated with the
discriminator 0xc31a and the DB (process-specific) key.

There is also an "i8** @llvm.swift.async.context.addr()" intrinsic providing
front-ends access to this slot (and forcing its creation initialized to nullptr
if necessary).
2021-05-14 11:43:58 +01:00
Philip Reames a6d2a8d6f5 Add a subclass of IntrinsicInst for llvm.assume [nfc]
Add the subclass, update a few places which check for the intrinsic to use idiomatic dyn_cast, and update the public interface of AssumptionCache to use the new class.  A follow up change will do the same for the newer assumption query/bundle mechanisms.
2021-04-06 11:16:22 -07:00
Bradley Smith 48f5a392cb [IR] Add vscale_range IR function attribute
This attribute represents the minimum and maximum values vscale can
take. For now this attribute is not hooked up to anything during
codegen, this will be added in the future when such codegen is
considered stable.

Additionally hook up the -msve-vector-bits=<x> clang option to emit this
attribute.

Differential Revision: https://reviews.llvm.org/D98030
2021-03-22 12:05:06 +00:00
Stephen Tozer 3bfddc2593 Reapply "[DebugInfo] Handle multiple variable location operands in IR"
Fixed section of code that iterated through a SmallDenseMap and added
instructions in each iteration, causing non-deterministic code; replaced
SmallDenseMap with MapVector to prevent non-determinism.

This reverts commit 01ac6d1587.
2021-03-17 16:45:25 +00:00
Hans Wennborg 01ac6d1587 Revert "[DebugInfo] Handle multiple variable location operands in IR"
This caused non-deterministic compiler output; see comment on the
code review.

> This patch updates the various IR passes to correctly handle dbg.values with a
> DIArgList location. This patch does not actually allow DIArgLists to be produced
> by salvageDebugInfo, and it does not affect any pass after codegen-prepare.
> Other than that, it should cover every IR pass.
>
> Most of the changes simply extend code that operated on a single debug value to
> operate on the list of debug values in the style of any_of, all_of, for_each,
> etc. Instances of setOperand(0, ...) have been replaced with with
> replaceVariableLocationOp, which takes the value that is being replaced as an
> additional argument. In places where this value isn't readily available, we have
> to track the old value through to the point where it gets replaced.
>
> Differential Revision: https://reviews.llvm.org/D88232

This reverts commit df69c69427.
2021-03-17 13:36:48 +01:00
gbtozers df69c69427 [DebugInfo] Handle multiple variable location operands in IR
This patch updates the various IR passes to correctly handle dbg.values with a
DIArgList location. This patch does not actually allow DIArgLists to be produced
by salvageDebugInfo, and it does not affect any pass after codegen-prepare.
Other than that, it should cover every IR pass.

Most of the changes simply extend code that operated on a single debug value to
operate on the list of debug values in the style of any_of, all_of, for_each,
etc. Instances of setOperand(0, ...) have been replaced with with
replaceVariableLocationOp, which takes the value that is being replaced as an
additional argument. In places where this value isn't readily available, we have
to track the old value through to the point where it gets replaced.

Differential Revision: https://reviews.llvm.org/D88232
2021-03-09 16:44:38 +00:00
gbtozers e5d958c456 [DebugInfo] Support DIArgList in DbgVariableIntrinsic
This patch updates DbgVariableIntrinsics to support use of a DIArgList for the
location operand, resulting in a significant change to its interface. This patch
does not update all IR passes to support multiple location operands in a
dbg.value; the only change is to update the DbgVariableIntrinsic interface and
its uses. All code outside of the intrinsic classes assumes that an intrinsic
will always have exactly one location operand; they will still support
DIArgLists, but only if they contain exactly one Value.

Among other changes, the setOperand and setArgOperand functions in
DbgVariableIntrinsic have been made private. This is to prevent code from
setting the operands of these intrinsics directly, which could easily result in
incorrect/invalid operands being set. This does not prevent these functions from
being called on a debug intrinsic at all, as they can still be called on any
CallInst pointer; it is assumed that any code directly setting the operands on a
generic call instruction is doing so safely. The intention for making these
functions private is to prevent DIArgLists from being overwritten by code that's
naively trying to replace one of the Values it points to, and also to fail fast
if a DbgVariableIntrinsic is updated to use a DIArgList without a valid
corresponding DIExpression.
2021-03-08 14:36:13 +00:00
Kazu Hirata 1d4a2f3778 [Transforms/Utils] Use range-based for loops (NFC) 2021-02-26 22:36:40 -08:00
Michael Kruse 606aa622b2 Revert "[AssumptionCache] Avoid dangling llvm.assume calls in the cache"
This reverts commit b7d870eae7 and the
subsequent fix "[Polly] Fix build after AssumptionCache change (D96168)"
(commit e6810cab09).

It caused indeterminism in the output, such that e.g. the
polly-x86_64-linux buildbot failed accasionally.
2021-02-11 12:17:38 -06:00
Kazu Hirata be23012d5a [Transforms/Utils] Use range-based for loops (NFC) 2021-02-07 09:49:36 -08:00
Johannes Doerfert b7d870eae7 [AssumptionCache] Avoid dangling llvm.assume calls in the cache
PR49043 exposed a problem when it comes to RAUW llvm.assumes. While
D96106 would fix it for GVNSink, it seems a more general concern. To
avoid future problems this patch moves away from the vector of weak
reference model used in the assumption cache. Instead, we track the
llvm.assume calls with a callback handle which will remove itself from
the cache if the call is deleted.

Fixes PR49043.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D96168
2021-02-06 12:18:39 -06:00
Kazu Hirata dc3d5453bc [Transforms/Utils] Use range-based for loops (NFC) 2021-02-02 22:52:47 -08:00
Kazu Hirata 8ed1636184 [llvm] Use isa instead of dyn_cast (NFC) 2021-01-29 23:23:37 -08:00
Petr Hosek bb9eb19829 Support for instrumenting only selected files or functions
This change implements support for applying profile instrumentation
only to selected files or functions. The implementation uses the
sanitizer special case list format to select which files and functions
to instrument, and relies on the new noprofile IR attribute to exclude
functions from instrumentation.

Differential Revision: https://reviews.llvm.org/D94820
2021-01-26 17:13:34 -08:00
Petr Hosek 1e634f3952 Revert "Support for instrumenting only selected files or functions"
This reverts commit 4edf35f11a because
the test fails on Windows bots.
2021-01-26 12:25:28 -08:00
Petr Hosek 4edf35f11a Support for instrumenting only selected files or functions
This change implements support for applying profile instrumentation
only to selected files or functions. The implementation uses the
sanitizer special case list format to select which files and functions
to instrument, and relies on the new noprofile IR attribute to exclude
functions from instrumentation.

Differential Revision: https://reviews.llvm.org/D94820
2021-01-26 11:11:39 -08:00
Kazu Hirata 8f5da41c4d [llvm] Construct SmallVector with iterator ranges (NFC) 2021-01-20 21:35:52 -08:00
Rong Xu 31c0b8700b Fix clang-ppc64le-rhel buildbot build error
ix buildbot build error due to
commit 3733463d: [IR][PGO] Add hot func attribute and use hot/cold
attribute in func section
2020-12-17 19:14:43 -08:00
Gulfem Savrun Yeniceri 7c0e3a77bc [clang][IR] Add support for leaf attribute
This patch adds support for leaf attribute as an optimization hint
in Clang/LLVM.

Differential Revision: https://reviews.llvm.org/D90275
2020-12-14 14:48:17 -08:00
Fangrui Song b5ad32ef5c Migrate deprecated DebugLoc::get to DILocation::get
This migrates all LLVM (except Kaleidoscope and
CodeGen/StackProtector.cpp) DebugLoc::get to DILocation::get.

The CodeGen/StackProtector.cpp usage may have a nullptr Scope
and can trigger an assertion failure, so I don't migrate it.

Reviewed By: #debug-info, dblaikie

Differential Revision: https://reviews.llvm.org/D93087
2020-12-11 12:45:22 -08:00
Nick Desaulniers f4c6080ab8 Revert "[IR] add fn attr for no_stack_protector; prevent inlining on mismatch"
This reverts commit b7926ce6d7.

Going with a simpler approach.
2020-11-17 17:27:14 -08:00
Giorgis Georgakoudis 700d2417d8 [CodeExtractor] Replace uses of extracted bitcasts in out-of-region lifetime markers
CodeExtractor handles bitcasts in the extracted region that have
lifetime markers users in the outer region as outputs. That
creates unnecessary alloca/reload instructions and extra lifetime
markers. The patch identifies those cases, and replaces uses in
out-of-region lifetime markers with new bitcasts in the outer region.

**Example**
```
define void @foo() {
entry:
  %0 = alloca i32
  br label %extract

extract:
  %1 = bitcast i32* %0 to i8*
  call void @llvm.lifetime.start.p0i8(i64 4, i8* %1)
  call void @use(i32* %0)
  br label %exit

exit:
  call void @use(i32* %0)
  call void @llvm.lifetime.end.p0i8(i64 4, i8* %1)
  ret void
}
```

**Current extraction**
```
define void @foo() {
entry:
  %.loc = alloca i8*, align 8
  %0 = alloca i32, align 4
  br label %codeRepl

codeRepl:                                         ; preds = %entry
  %lt.cast = bitcast i8** %.loc to i8*
  call void @llvm.lifetime.start.p0i8(i64 -1, i8* %lt.cast)
  %lt.cast1 = bitcast i32* %0 to i8*
  call void @llvm.lifetime.start.p0i8(i64 -1, i8* %lt.cast1)
  call void @foo.extract(i32* %0, i8** %.loc)
  %.reload = load i8*, i8** %.loc, align 8
  call void @llvm.lifetime.end.p0i8(i64 -1, i8* %lt.cast)
  br label %exit

exit:                                             ; preds = %codeRepl
  call void @use(i32* %0)
  call void @llvm.lifetime.end.p0i8(i64 4, i8* %.reload)
  ret void
}

define internal void @foo.extract(i32* %0, i8** %.out) {
newFuncRoot:
  br label %extract

exit.exitStub:                                    ; preds = %extract
  ret void

extract:                                          ; preds = %newFuncRoot
  %1 = bitcast i32* %0 to i8*
  store i8* %1, i8** %.out, align 8
  call void @use(i32* %0)
  br label %exit.exitStub
}
```

**Extraction with patch**
```
define void @foo() {
entry:
  %0 = alloca i32, align 4
  br label %codeRepl

codeRepl:                                         ; preds = %entry
  %lt.cast1 = bitcast i32* %0 to i8*
  call void @llvm.lifetime.start.p0i8(i64 -1, i8* %lt.cast1)
  call void @foo.extract(i32* %0)
  br label %exit

exit:                                             ; preds = %codeRepl
  call void @use(i32* %0)
  %lt.cast = bitcast i32* %0 to i8*
  call void @llvm.lifetime.end.p0i8(i64 4, i8* %lt.cast)
  ret void
}

define internal void @foo.extract(i32* %0) {
newFuncRoot:
  br label %extract

exit.exitStub:                                    ; preds = %extract
  ret void

extract:                                          ; preds = %newFuncRoot
  %1 = bitcast i32* %0 to i8*
  call void @use(i32* %0)
  br label %exit.exitStub
}
```

Reviewed By: vsk

Differential Revision: https://reviews.llvm.org/D90689
2020-11-05 17:01:08 -08:00
Arthur Eubanks 5c31b8b94f Revert "Use uint64_t for branch weights instead of uint32_t"
This reverts commit 10f2a0d662.

More uint64_t overflows.
2020-10-31 00:25:32 -07:00
Arthur Eubanks 10f2a0d662 Use uint64_t for branch weights instead of uint32_t
CallInst::updateProfWeight() creates branch_weights with i64 instead of i32.
To be more consistent everywhere and remove lots of casts from uint64_t
to uint32_t, use i64 for branch_weights.

Reviewed By: davidxl

Differential Revision: https://reviews.llvm.org/D88609
2020-10-30 10:03:46 -07:00
Nico Weber 2a4e704c92 Revert "Use uint64_t for branch weights instead of uint32_t"
This reverts commit e5766f25c6.
Makes clang assert when building Chromium, see https://crbug.com/1142813
for a repro.
2020-10-27 09:26:21 -04:00
Arthur Eubanks e5766f25c6 Use uint64_t for branch weights instead of uint32_t
CallInst::updateProfWeight() creates branch_weights with i64 instead of i32.
To be more consistent everywhere and remove lots of casts from uint64_t
to uint32_t, use i64 for branch_weights.

Reviewed By: davidxl

Differential Revision: https://reviews.llvm.org/D88609
2020-10-26 20:24:04 -07:00
Nick Desaulniers b7926ce6d7 [IR] add fn attr for no_stack_protector; prevent inlining on mismatch
It's currently ambiguous in IR whether the source language explicitly
did not want a stack a stack protector (in C, via function attribute
no_stack_protector) or doesn't care for any given function.

It's common for code that manipulates the stack via inline assembly or
that has to set up its own stack canary (such as the Linux kernel) would
like to avoid stack protectors in certain functions. In this case, we've
been bitten by numerous bugs where a callee with a stack protector is
inlined into an __attribute__((__no_stack_protector__)) caller, which
generally breaks the caller's assumptions about not having a stack
protector. LTO exacerbates the issue.

While developers can avoid this by putting all no_stack_protector
functions in one translation unit together and compiling those with
-fno-stack-protector, it's generally not very ergonomic or as
ergonomic as a function attribute, and still doesn't work for LTO. See also:
https://lore.kernel.org/linux-pm/20200915172658.1432732-1-rkir@google.com/
https://lore.kernel.org/lkml/20200918201436.2932360-30-samitolvanen@google.com/T/#u

Typically, when inlining a callee into a caller, the caller will be
upgraded in its level of stack protection (see adjustCallerSSPLevel()).
By adding an explicit attribute in the IR when the function attribute is
used in the source language, we can now identify such cases and prevent
inlining.  Block inlining when the callee and caller differ in the case that one
contains `nossp` when the other has `ssp`, `sspstrong`, or `sspreq`.

Fixes pr/47479.

Reviewed By: void

Differential Revision: https://reviews.llvm.org/D87956
2020-10-23 11:55:39 -07:00
Vedant Kumar 099bffe7f7 Revert "[CodeExtractor] Don't create bitcasts when inserting lifetime markers (NFCI)"
This reverts commit 26ee8aff2b.

It's necessary to insert bitcast the pointer operand of a lifetime
marker if it has an opaque pointer type.

rdar://70560161
2020-10-22 12:25:50 -07:00
Atmn Patel 595c615606 [IR] Adds mustprogress as a LLVM IR attribute
This adds the LLVM IR attribute `mustprogress` as defined in LangRef through D86233. This attribute will be applied to functions with in languages like C++ where forward progress is guaranteed. Functions without this attribute are not required to make progress.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D85393
2020-10-20 03:09:57 -04:00
Simon Pilgrim 8f0658ae67 [Transforms] CodeExtractor::verifyAssumptionCache - don't dereference a dyn_cast<>. NFCI.
Use cast<> as we immediately dereference the pointer afterwards - cast<> will assert if we fail.

Prevents clang static analyzer warning that we could deference a null pointer.
2020-10-08 19:04:30 +01:00
Vedant Kumar 26ee8aff2b [CodeExtractor] Don't create bitcasts when inserting lifetime markers (NFCI)
Lifetime marker intrinsics support any pointer type, so CodeExtractor
does not need to bitcast to `i8*` in order to use these markers.
2020-09-29 16:34:36 -07:00
Fangrui Song 6913812abc Fix some clang-tidy bugprone-argument-comment issues 2020-09-19 20:41:25 -07:00
Matt Arsenault 5e999cbe8d IR: Define byref parameter attribute
This allows tracking the in-memory type of a pointer argument to a
function for ABI purposes. This is essentially a stripped down version
of byval to remove some of the stack-copy implications in its
definition.

This includes the base IR changes, and some tests for places where it
should be treated similarly to byval. Codegen support will be in a
future patch.

My original attempt at solving some of these problems was to repurpose
byval with a different address space from the stack. However, it is
technically permitted for the callee to introduce a write to the
argument, although nothing does this in reality. There is also talk of
removing and replacing the byval attribute, so a new attribute would
need to take its place anyway.

This is intended avoid some optimization issues with the current
handling of aggregate arguments, as well as fixes inflexibilty in how
frontends can specify the kernel ABI. The most honest representation
of the amdgpu_kernel convention is to expose all kernel arguments as
loads from constant memory. Today, these are raw, SSA Argument values
and codegen is responsible for turning these into loads.

Background:

There currently isn't a satisfactory way to represent how arguments
for the amdgpu_kernel calling convention are passed. In reality,
arguments are passed in a single, flat, constant memory buffer
implicitly passed to the function. It is also illegal to call this
function in the IR, and this is only ever invoked by a driver of some
kind.

It does not make sense to have a stack passed parameter in this
context as is implied by byval. It is never valid to write to the
kernel arguments, as this would corrupt the inputs seen by other
dispatches of the kernel. These argumets are also not in the same
address space as the stack, so a copy is needed to an alloca. From a
source C-like language, the kernel parameters are invisible.
Semantically, a copy is always required from the constant argument
memory to a mutable variable.

The current clang calling convention lowering emits raw values,
including aggregates into the function argument list, since using
byval would not make sense. This has some unfortunate consequences for
the optimizer. In the aggregate case, we end up with an aggregate
store to alloca, which both SROA and instcombine turn into a store of
each aggregate field. The optimizer never pieces this back together to
see that this is really just a copy from constant memory, so we end up
stuck with expensive stack usage.

This also means the backend dictates the alignment of arguments, and
arbitrarily picks the LLVM IR ABI type alignment. By allowing an
explicit alignment, frontends can make better decisions. For example,
there's real no advantage to an aligment higher than 4, so a frontend
could choose to compact the argument layout. Similarly, there is a
high penalty to using an alignment lower than 4, so a frontend could
opt into more padding for small arguments.

Another design consideration is when it is appropriate to expose the
fact that these arguments are all really passed in adjacent
memory. Currently we have a late IR optimization pass in codegen to
rewrite the kernel argument values into explicit loads to enable
vectorization. In most programs, unrelated argument loads can be
merged together. However, exposing this property directly from the
frontend has some disadvantages. We still need a way to track the
original argument sizes and alignments to report to the driver. I find
using some side-channel, metadata mechanism to track this
unappealing. If the kernel arguments were exposed as a single buffer
to begin with, alias analysis would be unaware that the padding bits
betewen arguments are meaningless. Another family of problems is there
are still some gaps in replacing all of the available parameter
attributes with metadata equivalents once lowered to loads.

The immediate plan is to start using this new attribute to handle all
aggregate argumets for kernels. Long term, it makes sense to migrate
all kernel arguments, including scalars, to be passed indirectly in
the same manner.

Additional context is in D79744.
2020-07-20 10:23:09 -04:00
Gui Andrade ff7900d5de [LLVM] Accept `noundef` attribute in function definitions/calls
The `noundef` attribute indicates an argument or return value which
may never have an undef value representation.

This patch allows LLVM to parse the attribute.

Differential Revision: https://reviews.llvm.org/D83412
2020-07-08 19:02:04 +00:00
Yevgeny Rouban 8138487468 [BrachProbablityInfo] Set edge probabilities at once and fix calcMetadataWeights()
Hide the method that allows setting probability for particular edge
and introduce a public method that sets probabilities for all
outgoing edges at once.
Setting individual edge probability is error prone. More over it is
difficult to check that the total probability is 1.0 because there is
no easy way to know when the user finished setting all
the probabilities.

Related bug is fixed in BranchProbabilityInfo::calcMetadataWeights().
Changing unreachable branch probabilities to raw(1) and distributing
the rest (oldProbability - raw(1)) over the reachable branches could
introduce total probability inaccuracy bigger than 1/numOfBranches.

Reviewers: yamauchi, ebrevnov
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D79396
2020-05-21 12:52:37 +07:00
Eli Friedman 11aa3707e3 StoreInst should store Align, not MaybeAlign
This is D77454, except for stores.  All the infrastructure work was done
for loads, so the remaining changes necessary are relatively small.

Differential Revision: https://reviews.llvm.org/D79968
2020-05-15 12:26:58 -07:00