Instcombine folds (a + b <u a) to (a ^ -1 <u b) and that does not match
the expected pattern in CodeGenPerpare via UAddWithOverflow.
This causes a regression over Clang 7 on both X86 and AArch64:
https://gcc.godbolt.org/z/juhXYV
This patch extends UAddWithOverflow to also catch the XOR case, if the
XOR is only used in the ICMP. This covers just a single case, but I'd
like to make sure I am not missing anything before tackling the other
cases.
Reviewers: nikic, RKSimon, lebedev.ri, spatel
Reviewed By: nikic, lebedev.ri
Differential Revision: https://reviews.llvm.org/D74228
On some targets, like SPARC, forming overflow ops is only profitable if
the math result is used: https://godbolt.org/z/DxSmdB
This patch adds a new MathUsed parameter to allow the targets
to make the decision and defaults to only allowing it
if the math result is used. That is the conservative choice.
This patch also updates AArch64ISelLowering, X86ISelLowering,
ARMISelLowering.h, SystemZISelLowering.h to allow forming overflow
ops if the math result is not used. On those targets using the
overflow intrinsic for the overflow check only generates better code.
Reviewers: nikic, RKSimon, lebedev.ri, spatel
Reviewed By: lebedev.ri
Differential Revision: https://reviews.llvm.org/D74722
Summary:
Depends on https://reviews.llvm.org/D71901.
The fifth in a series of patches that ports the LLVM coroutines passes
to the new pass manager infrastructure.
The first 4 patches allow users to run coroutine passes by invoking, for
example `opt -passes=coro-early`. However, most of LLVM's tests for
coroutines use an option, `opt -enable-coroutines`, which adds all 4
coroutine passes to the appropriate legacy pass manager extension points.
This patch does the same, but using the new pass manager: when
coroutine features are enabled and the new pass manager is being used,
this adds the new-pass-manager-compliant coroutine passes to the pass
builder's pipeline.
This allows us to run all coroutine tests using the new pass manager
(besides those that use the coroutine retcon ABI used by the Swift
compiler, which is not yet supported in the new pass manager).
Reviewers: GorNishanov, lewissbaker, chandlerc, junparser, wenlei
Subscribers: wenlei, EricWF, Prazek, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D71902
Summary:
Depends on https://reviews.llvm.org/D71900.
The fourth in a series of patches that ports the LLVM coroutines passes
to the new pass manager infrastructure. This patch implements
'coro-cleanup'.
No existing regression tests check the behavior of coro-cleanup on its
own, so this patch adds one. (A test named 'coro-cleanup.ll' exists, but
it relies on the entire coroutines pipeline being run. It's updated to
test the new pass manager in the 5th patch of this series.)
Reviewers: GorNishanov, lewissbaker, chandlerc, junparser, deadalnix, wenlei
Reviewed By: wenlei
Subscribers: wenlei, EricWF, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D71901
https://reviews.llvm.org/D67133
While investigating some non determinism (CSE doesn't produce wrong
code, it just doesn't CSE some times) in GISel CSE on an out of tree
target, I realized that the core issue was that there were lots of code
that mutates (setReg, setRegClass etc), but doesn't notify observers
(CSE in this case but this could be any other observer). In order to
make the Observer be available in various parts of code and to avoid
having to thread it through various API, the MachineFunction now has the
observer as field. This allows it to be easily used in helper functions
such as constrainOperandRegClass.
Also added some invariant verification method in CSEInfo which can
catch these issues (when CSE is enabled).
Essentially, fold OrderedBasicBlock into BasicBlock, and make it
auto-invalidate the instruction ordering when new instructions are
added. Notably, we don't need to invalidate it when removing
instructions, which is helpful when a pass mostly delete dead
instructions rather than transforming them.
The downside is that Instruction grows from 56 bytes to 64 bytes. The
resulting LLVM code is substantially simpler and automatically handles
invalidation, which makes me think that this is the right speed and size
tradeoff.
The important change is in SymbolTableTraitsImpl.h, where the numbering
is invalidated. Everything else should be straightforward.
We probably want to implement a fancier re-numbering scheme so that
local updates don't invalidate the ordering, but I plan for that to be
future work, maybe for someone else.
Reviewed By: lattner, vsk, fhahn, dexonsmith
Differential Revision: https://reviews.llvm.org/D51664
This reverts commit 649aba93a2, now that
the approach started there has been shown to be workable in the patch
series culminating in https://reviews.llvm.org/D74192.
Summary:
This patch adds a new MVE intrinsics family, `vbrsrq`: vector bit
reverse and shift right. The intrinsics are compiled into the VBRSR
instruction. Two new LLVM IR intrinsics were also added: arm.mve.vbrsr
and arm.mve.vbrsr.predicated.
Reviewers: simon_tatham, dmgreen, ostannard, MarkMurrayARM
Reviewed By: simon_tatham
Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D74721
Summary:
Making `Scale` a `TypeSize` in AArch64InstrInfo::getMemOpInfo,
has the effect that all places where this information is used
(notably, TargetInstrInfo::getMemOperandWithOffset) will need
to consider Scale - and derived, Offset - possibly being scalable.
This patch adds a new operand `bool &OffsetIsScalable` to
TargetInstrInfo::getMemOperandWithOffset and fixes up all
the places where this function is used, to consider the
offset possibly being scalable.
In most cases, this means bailing out because the algorithm does not
(or cannot) support scalable offsets in places where it does some
form of alias checking for example.
Reviewers: rovka, efriedma, kristof.beyls
Reviewed By: efriedma
Subscribers: wuzish, kerbowa, MatzeB, arsenm, nemanjai, jvesely, nhaehnle, hiraditya, kbarton, javed.absar, asb, rbar, johnrusso, simoncook, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, jsji, Jim, lenary, s.egerton, pzheng, sameer.abuasal, apazos, luismarques, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D72758
This patch upstreams support for the AArch64 Armv8-A cpu Cortex-A34.
In detail adding support for:
- mcpu option in clang
- AArch64 Target Features in clang
- llvm AArch64 TargetParser definitions
details of the cpu can be found here:
https://developer.arm.com/ip-products/processors/cortex-a/cortex-a34
Reviewers: SjoerdMeijer
Reviewed By: SjoerdMeijer
Subscribers: SjoerdMeijer, kristof.beyls, hiraditya, cfe-commits,
llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D74483
Change-Id: Ida101fc544ca183a0a0e61a1277c8957855fde0b
This patch enables the debug entry values feature.
- Remove the (CC1) experimental -femit-debug-entry-values option
- Enable it for x86, arm and aarch64 targets
- Resolve the test failures
- Leave the llc experimental option for targets that do not
support the CallSiteInfo yet
Differential Revision: https://reviews.llvm.org/D73534
Summary:
I noticed a small regression in a toy project of mine after applying
D73835, in which instruction names weren't being set properly. In the
example test case included with this patch,
`llvm::IRBuilderBase::CreateAdd` returns an `llvm::Value *` that is then
passed as an argument to `llvm::IRBuilderBase::Insert`. The overloaded
function that is selected for that call then ignores the `Name`
parameter that is given. This patch addresses that issue.
Reviewers: nikic, Meinersbur, nhaehnle, fhahn, thakis, teemperor
Reviewed By: nikic, fhahn
Subscribers: llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D74754
Summary:
vclzq maps nicely to the existing target-independent @llvm.ctlz IR
intrinsic. But vclsq ('count leading sign bits') has no corresponding
target-independent intrinsic, so I've made up @llvm.arm.mve.vcls.
This commit adds the unpredicated forms only.
Reviewers: dmgreen, miyuki, MarkMurrayARM, ostannard
Reviewed By: miyuki
Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D74335
Summary:
This adds the unpredicated forms of six different MVE intrinsics which
all round a vector of floating-point numbers to integer values,
leaving them still in FP format, differing only in rounding mode and
exception settings.
Five of them map to existing target-independent intrinsics in LLVM IR,
such as @llvm.trunc and @llvm.rint. The sixth, mapping to the `vrintn`
instruction, is done by inventing a target-specific intrinsic.
(`vrintn` behaves the same as `vrintx` in terms of the output value:
the side effects on the FPSCR flags are the only difference between
the two. But ACLE specifies separate user-callable intrinsics for the
two, so the side effects matter enough to make sure we generate the
right one of the two instructions in each case.)
Reviewers: dmgreen, miyuki, MarkMurrayARM, ostannard
Reviewed By: miyuki
Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D74333
Summary:
This patch is follow-up for D74481. It adds comments
to WithColor::defaultErrorHandler() and
WithColor::defaultWarningHandler().
Reviewers: jhenderson, dblaikie, JDevlieghere
Reviewed By: JDevlieghere
Subscribers: llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D74742
Summary:
Depends on https://reviews.llvm.org/D71899.
The third in a series of patches that ports the LLVM coroutines passes
to the new pass manager infrastructure. This patch implements 'coro-elide'.
The new pass manager infrastructure does not implicitly repeat CGSCC
pass pipelines when a function is devirtualized, and so the tests
for the new pass manager that rely on that behavior now explicitly
specify `repeat<2>`.
Reviewers: GorNishanov, lewissbaker, chandlerc, jdoerfert, junparser, deadalnix, wenlei
Reviewed By: wenlei
Subscribers: wenlei, EricWF, Prazek, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D71900
Summary:
This patch has four dependencies:
1. The first in this series of patches that implement coroutine passes in the
new pass manager: https://reviews.llvm.org/D71898.
2. A patch that introduces an API for CGSCC passes to add new reference
edges to a `LazyCallGraph`, `updateCGAndAnalysisManagerForCGSCCPass`:
https://reviews.llvm.org/D72025.
3. A patch that introduces a `CallGraphUpdater` helper class that is
capable of mutating internal `LazyCallGraph` state in order to insert
new function nodes into a specific SCC: https://reviews.llvm.org/D70927.
4. And finally, a small edge case fix for updating `LazyCallGraph` that
patch 3 above happens to run into: https://reviews.llvm.org/D72226.
This is the second in a series of patches that ports the LLVM coroutines
passes to the new pass manager infrastructure. This patch implements
'coro-split'.
Some notes:
* Using the new CGSCC pass manager resulted in IR being printed in the
reverse order in some tests. To prevent FileCheck checks from failing due
to these reversed orders, this patch splits up test files that test
multiple different coroutine functions: specifically
coro-alloc-with-param.ll, coro-split-eh.ll, and coro-eh-aware-edge-split.ll.
* CoroSplit.cpp contained 2 overloads of `splitCoroutine`, one of which
dispatched to the other based on the coroutine ABI being used (C++20
switch-based versus Swift returned-continuation-based). I found this
confusing, especially with the additional branching based on `CallGraph`
vs. `LazyCallGraph`, so I removed the ABI-checking overload of
`splitCoroutine`.
Reviewers: GorNishanov, lewissbaker, chandlerc, jdoerfert, junparser, deadalnix, wenlei
Reviewed By: wenlei
Subscribers: wenlei, qcolombet, EricWF, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D71899
I readily admit that I don't know why this fixes the modules build, but
it seems to get things building again. Previously I saw the error
message:
http://lab.llvm.org:8080/green/view/LLDB/job/lldb-cmake/9404/consoleFull#-361314398a1ca8a51-895e-46c6-af87-ce24fa4cd561
```
/Users/buildslave/jenkins/workspace/lldb-cmake/llvm-project/llvm/include/llvm/IR/IRBuilderFolder.h:18:10: fatal error: cyclic dependency in module 'LLVM_intrinsic_gen': LLVM_intrinsic_gen -> LLVM_IR -> LLVM_intrinsic_gen
^
While building module 'LLVM_intrinsic_gen' imported from /Users/buildslave/jenkins/workspace/lldb-cmake/llvm-project/llvm/lib/IR/IRBuilder.cpp:14:
In file included from <module-includes>:1:
/Users/buildslave/jenkins/workspace/lldb-cmake/llvm-project/llvm/include/llvm/IR/Argument.h:19:10: fatal error: could not build module 'LLVM_IR'
~~~~~~~~^~~~~~~~~~~~~~~~~
/Users/buildslave/jenkins/workspace/lldb-cmake/llvm-project/llvm/lib/IR/IRBuilder.cpp:14:10: fatal error: could not build module 'LLVM_intrinsic_gen'
```
And reproduced with:
cmake -G Ninja /Users/vsk/src/llvm-backup-master/llvm -DCLANG_ENABLE_ARCMT=Off -DCLANG_ENABLE_STATIC_ANALYZER=Off -DLLVM_ENABLE_PROJECTS='clang;clang-tools-extra;lld;libcxx;libcxxabi;compiler-rt;libunwind;lldb' -DLLDB_USE_SYSTEM_DEBUGSERVER=On -DCMAKE_BUILD_TYPE=RelWithDebInfo -DLLVM_ENABLE_ASSERTIONS=On -DLLVM_ENABLE_MODULES=On
Summary:
The first in a series of patches that ports the LLVM coroutines passes
to the new pass manager infrastructure. This patch implements
'coro-early'.
NB: All coroutines passes begin by checking that coroutine intrinsics are
declared within the LLVM IR module they're operating on. To do so, they call
`coro::declaresIntrinsics`. The next 3 patches in this series, which add new
pass manager implementations of the 'coro-split', 'coro-elide', and
'coro-cleanup' passes, use a similar pattern as the one used here: a static
function is shared across both old and new passes to detect if relevant
coroutine intrinsics are delcared. To make this pattern easier to read, this
patch adds `const` keywords to the parameters of `coro::declaresIntrinsics`.
Reviewers: GorNishanov, lewissbaker, junparser, chandlerc, deadalnix, wenlei
Reviewed By: wenlei
Subscribers: ychen, wenlei, EricWF, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D71898
Relative to the original commit, this fixes some warnings,
and is based on the deletion of the IRBuilder copy constructor
in D74693. The automatic copy constructor would no longer be
safe.
-----
Related llvm-dev thread:
http://lists.llvm.org/pipermail/llvm-dev/2020-February/138951.html
This patch moves the IRBuilder from templating over the constant
folder and inserter towards making both of these virtual.
There are a couple of motivations for this:
1. It's not possible to share code between use-sites that use
different IRBuilder folders/inserters (short of templating the code
and moving it into headers).
2. Methods currently defined on IRBuilderBase (which is not templated)
do not use the custom inserter, resulting in subtle bugs (e.g.
incorrect InstCombine worklist management). It would be possible to
move those into the templated IRBuilder, but...
3. The vast majority of the IRBuilder implementation has to live
in the header, because it depends on the template arguments.
4. We have many unnecessary dependencies on IRBuilder.h,
because it is not easy to forward-declare. (Significant parts of
the backend depend on it via TargetLowering.h, for example.)
This patch addresses the issue by making the following changes:
* IRBuilderDefaultInserter::InsertHelper becomes virtual.
IRBuilderBase accepts a reference to it.
* IRBuilderFolder is introduced as a virtual base class. It is
implemented by ConstantFolder (default), NoFolder and TargetFolder.
IRBuilderBase has a reference to this as well.
* All the logic is moved from IRBuilder to IRBuilderBase. This means
that methods can in the future replace their IRBuilder<> & uses
(or other specific IRBuilder types) with IRBuilderBase & and thus
be usable with different IRBuilders.
* The IRBuilder class is now a thin wrapper around IRBuilderBase.
Essentially it only stores the folder and inserter and takes care
of constructing the base builder.
What this patch doesn't do, but should be simple followups after this change:
* Fixing use of the inserter for creation methods originally defined
on IRBuilderBase.
* Replacing IRBuilder<> uses in arguments with IRBuilderBase, where useful.
* Moving code from the IRBuilder header to the source file.
From the user perspective, these changes should be mostly transparent:
The only thing that consumers using a custom inserted may need to do is
inherit from IRBuilderDefaultInserter publicly and mark their InsertHelper
as public.
Differential Revision: https://reviews.llvm.org/D73835
Summary:
Depends on https://reviews.llvm.org/D70927.
`LazyCallGraph::addNewFunctionIntoSCC` allows users to insert a new
function node into a call graph, into a specific, existing SCC.
Extend this interface such that functions can be added even when they do
not belong in any existing SCC, but instead in a new SCC within an
existing RefSCC.
The ability to insert new functions as part of a RefSCC is necessary for
outlined functions that do not form a strongly connected cycle with the
function they are outlined from. An example of such a function would be the
coroutine funclets 'f.resume', etc., which are outlined from a coroutine 'f'.
Coroutine 'f' only references the funclets' addresses, it does not call
them directly.
Reviewers: jdoerfert, chandlerc, wenlei, hfinkel
Reviewed By: jdoerfert
Subscribers: hfinkel, JonChesterfield, mehdi_amini, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D72226
D73835 will make IRBuilder no longer trivially copyable. This patch
deletes the copy constructor in advance, to separate out the breakage.
Currently, the IRBuilder copy constructor is usually used by accident,
not by intention. In rG7c362b25d7a9 I've fixed a number of cases where
functions accepted IRBuilder rather than IRBuilder &, thus performing
an unnecessary copy. In rG5f7b92b1b4d6 I've fixed cases where an
IRBuilder was copied, while an InsertPointGuard should have been used
instead.
The only non-trivial use of the copy constructor is the
getIRBForDbgInsertion() helper, for which I separated construction and
setting of the insertion point in this patch.
Differential Revision: https://reviews.llvm.org/D74693
These are going to be useful in TargetLowering::SimplifyDemandedBits, so expose these helpers outside of SelectionDAG.cpp
Also add an getValidShiftAmountConstant early-out to getValidMinimumShiftAmountConstant/getValidMaximumShiftAmountConstant so we can use them for scalar cases as well.
Produce an unmerge to a narrower type and introduce a narrower shift
if needed. I wasn't sure if there was a better way to parameterize the
target's preferred shift type for the GICombineRule, so manually call
the combine helper.
Summary:
This patch adds assembly-level support for a new Arm M-profile
architecture extension, Custom Datapath Extension (CDE).
A brief description of the extension is available at
https://developer.arm.com/architectures/instruction-sets/custom-instructions
The latest specification for CDE is currently a beta release and is
available at
https://static.docs.arm.com/ddi0607/aa/DDI0607A_a_armv8m_arm_supplement_cde.pdf
CDE allows chip vendors to add custom CPU instructions. The CDE
instructions re-use the same encoding space as existing coprocessor
instructions (such as MRC, MCR, CDP etc.). Each coprocessor in range
cp0-cp7 can be configured as either general purpose (GCP) or custom
datapath (CDEv1). This configuration is defined by the CPU vendor and
is provided to LLVM using 8 subtarget features: cdecp0 ... cdecp7.
The semantics of CDE instructions are implementation-defined, but the
instructions are guaranteed to be pure (that is, they are stateless,
they do not access memory or any registers except their explicit
inputs/outputs).
CDE requires the CPU to support at least Armv8.0-M mainline
architecture. CDE includes 3 sets of instructions:
* Instructions that operate on general purpose registers and NZCV
flags
* Instructions that operate on the S or D register file (require
either FP or MVE extension)
* Instructions that operate on the Q register file, require MVE
The user-facing names that can be specified on the command line are
the same as the 8 subtarget feature names. For example:
$ clang -target arm-none-none-eabi -march=armv8m.main+cdecp0+cdecp3
tells the compiler that the coprocessors 0 and 3 are configured as
CDEv1 and the remaining coprocessors are configured as GCP (which is
the default).
Reviewers: simon_tatham, ostannard, dmgreen, eli.friedman
Reviewed By: simon_tatham
Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D74044
Summary:
Implements the @llvm.aarch64.sve.index intrinsic, which
takes a scalar base and step value.
This patch also adds the printSImm function to AArch64InstPrinter
to ensure that immediates of type i8 & i16 are printed correctly.
Reviewers: sdesmalen, andwar, efriedma, dancgr, cameron.mcinally, rengolin
Reviewed By: cameron.mcinally
Subscribers: tatyana-krasnukha, tschuett, kristof.beyls, hiraditya, rkruppe, arphaman, psnobl, cfe-commits, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D74550
Summary:
Many directives are unavailable, and support for others may be limited.
This first draft has preliminary support for:
- conditional directives (including errors),
- data allocation (unsigned types up to 8 bytes, and ALIGN),
- equates/variables (numeric and text),
- and procedure directives (without parameters),
as well as COMMENT, ECHO, INCLUDE, INCLUDELIB, PUBLIC, and EXTERN. Text variables (aka text macros) are expanded in-place wherever the identifier occurs.
We deliberately ignore all ml.exe processor directives.
Prominent features not yet supported:
- structs
- macros (both procedures and functions)
- procedures (with specified parameters)
- substitution & expansion operators
Conditional directives are complicated by the fact that "ifdef rax" is a valid way to check if a file is being assembled for a 64-bit x86 processor; we add support for "ifdef <register>" in general, which requires adding a tryParseRegister method to all MCTargetAsmParsers. (Some targets require backtracking in the non-register case.)
Reviewers: rnk, thakis
Reviewed By: thakis
Subscribers: kerbowa, merge_guards_bot, wuzish, arsenm, dschuff, jyknight, dylanmckay, sdardis, nemanjai, jvesely, mgorny, sbc100, jgravelle-google, hiraditya, aheejin, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, jsji, Jim, s.egerton, pzheng, sameer.abuasal, apazos, luismarques, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D72680
Related llvm-dev thread:
http://lists.llvm.org/pipermail/llvm-dev/2020-February/138951.html
This patch moves the IRBuilder from templating over the constant
folder and inserter towards making both of these virtual.
There are a couple of motivations for this:
1. It's not possible to share code between use-sites that use
different IRBuilder folders/inserters (short of templating the code
and moving it into headers).
2. Methods currently defined on IRBuilderBase (which is not templated)
do not use the custom inserter, resulting in subtle bugs (e.g.
incorrect InstCombine worklist management). It would be possible to
move those into the templated IRBuilder, but...
3. The vast majority of the IRBuilder implementation has to live
in the header, because it depends on the template arguments.
4. We have many unnecessary dependencies on IRBuilder.h,
because it is not easy to forward-declare. (Significant parts of
the backend depend on it via TargetLowering.h, for example.)
This patch addresses the issue by making the following changes:
* IRBuilderDefaultInserter::InsertHelper becomes virtual.
IRBuilderBase accepts a reference to it.
* IRBuilderFolder is introduced as a virtual base class. It is
implemented by ConstantFolder (default), NoFolder and TargetFolder.
IRBuilderBase has a reference to this as well.
* All the logic is moved from IRBuilder to IRBuilderBase. This means
that methods can in the future replace their IRBuilder<> & uses
(or other specific IRBuilder types) with IRBuilderBase & and thus
be usable with different IRBuilders.
* The IRBuilder class is now a thin wrapper around IRBuilderBase.
Essentially it only stores the folder and inserter and takes care
of constructing the base builder.
What this patch doesn't do, but should be simple followups after this change:
* Fixing use of the inserter for creation methods originally defined
on IRBuilderBase.
* Replacing IRBuilder<> uses in arguments with IRBuilderBase, where useful.
* Moving code from the IRBuilder header to the source file.
From the user perspective, these changes should be mostly transparent:
The only thing that consumers using a custom inserted may need to do is
inherit from IRBuilderDefaultInserter publicly and mark their InsertHelper
as public.
Differential Revision: https://reviews.llvm.org/D73835
Currently we always return true, when markConstantRange is used on an
object already containing a constant range. If NewR is equal to the
existing constant range however, nothing changes and we should return
false.
I also went ahead and added a clarifying comment and improved the
assertion.
Reviewers: efriedma, davide, nikic
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D73240
This patch prepares ValueLatticeElement to be used by SCCP, by:
* making the mark* functions public
* make the mark* functions return a bool indicating if the value has changed.
Reviewers: efriedma, davide, mssimpso, nikic
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D60581
Summary:
This patch is extracted from D74308.
It patches all usages of WithColor::error() and WithColor::warning
in DebugInfoDWARF library.
Depends on D74481
Reviewers: jhenderson, dblaikie, probinson, aprantl, JDevlieghere
Reviewed By: JDevlieghere
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D74635
Summary:
this review is extracted from D74308.
It creates two error handlers which allow to redefine error
reporting routine and should be used for all places
where errors are reported:
std::function<void(Error)> RecoverableErrorHandler = defaultErrorHandler;
std::function<void(Error)> WarningHandler = defaultWarningHandler;
It also creates accessors to above handlers which should be used to
report errors.
function_ref<void(Error)> getRecoverableErrorHandler() {
return RecoverableErrorHandler;
}
function_ref<void(Error)> getWarningHandler() { return WarningHandler; }
It patches all error reporting places inside DWARFContext and DWARLinker.
Reviewers: jhenderson, dblaikie, probinson, aprantl, JDevlieghere
Reviewed By: jhenderson, JDevlieghere
Subscribers: hiraditya, llvm-commits
Tags: #llvm, #debug-info
Differential Revision: https://reviews.llvm.org/D74481
In addition to a single bit per memory locations, e.g., globals and
arguments, we now collect more information about the actual accesses,
e.g., what instruction caused it, was it a read/write/read+write, and
what the underlying base pointer was. Follow up patches will make
explicit use of this.
Reviewed By: uenoku
Differential Revision: https://reviews.llvm.org/D73527
Add support for Master and Critical directive in the OMPIRBuilder. Both make use of a new common interface for emitting inlined OMP regions called `emitInlinedRegion` which was added in this patch as well.
Also this patch modifies clang to use the new directives when `-fopenmp-enable-irbuilder` commandline option is passed.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D72304
This swaps out the OpenMPDefaultClauseKind enum with a
llvm::omp::DefaultKind enum which is stored in OMPConstants.h.
This should not change any functionality.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D74513
Follow-up for D74006.
When the integrated assembler is used, we use SHF_LINK_ORDER. The
linked-to symbol is part of ELFSectionKey, thus we can omit the unique
ID.
https://bugs.llvm.org/show_bug.cgi?id=44775
This rule has been implemented by GNU as https://sourceware.org/ml/binutils/2020-02/msg00028.html (binutils >= 2.35)
It allows us to simplify
```
.section .foo,"o",foo,unique,0
.section .foo,"o",bar,unique,1 # different section
```
to
```
.section .foo,"o",foo
.section .foo,"o",bar # different section
```
We consider the two `.foo` different even if the linked-to symbols foo and bar
are defined in the same section. This is a deliberate choice so that we don't
need to know the section where foo and bar are defined beforehand.
Differential Revision: https://reviews.llvm.org/D74006
In addition to memory behavior attributes (readonly/writeonly) we now
derive memory location attributes (argmemonly/inaccessiblememonly/...).
The former is part of AAMemoryBehavior and the latter part of
AAMemoryLocation. While they are similar in nature it got messy when
they were put in a single AA. Location attributes for arguments and
floating values will follow later.
Note that both memory attributes kinds can derive readnone. If there are
no accesses AAMemoryBehavior will derive readnone. If there are accesses
but only to stack (=local) locations AAMemoryLocation will derive
readnone.
Reviewed By: uenoku
Differential Revision: https://reviews.llvm.org/D73426
This patch implements an almost complete handling of OpenMP
contexts/traits such that we can reuse most of the logic in Flang
through the OMPContext.{h,cpp} in llvm/Frontend/OpenMP.
All but construct SIMD specifiers, e.g., inbranch, and the device ISA
selector are define in `llvm/lib/Frontend/OpenMP/OMPKinds.def`. From
these definitions we generate the enum classes `TraitSet`,
`TraitSelector`, and `TraitProperty` as well as conversion and helper
functions in `llvm/lib/Frontend/OpenMP/OMPContext.{h,cpp}`.
The above enum classes are used in the parser, sema, and the AST
attribute. The latter is not a collection of multiple primitive variant
arguments that contain encodings via numbers and strings but instead a
tree that mirrors the `match` clause (see `struct OpenMPTraitInfo`).
The changes to the parser make it more forgiving when wrong syntax is
read and they also resulted in more specialized diagnostics. The tests
are updated and the core issues are detected as before. Here and
elsewhere this patch tries to be generic, thus we do not distinguish
what selector set, selector, or property is parsed except if they do
behave exceptionally, as for example `user={condition(EXPR)}` does.
The sema logic changed in two ways: First, the OMPDeclareVariantAttr
representation changed, as mentioned above, and the sema was adjusted to
work with the new `OpenMPTraitInfo`. Second, the matching and scoring
logic moved into `OMPContext.{h,cpp}`. It is implemented on a flat
representation of the `match` clause that is not tied to clang.
`OpenMPTraitInfo` provides a method to generate this flat structure (see
`struct VariantMatchInfo`) by computing integer score values and boolean
user conditions from the `clang::Expr` we keep for them.
The OpenMP context is now an explicit object (see `struct OMPContext`).
This is in anticipation of construct traits that need to be tracked. The
OpenMP context, as well as the `VariantMatchInfo`, are basically made up
of a set of active or respectively required traits, e.g., 'host', and an
ordered container of constructs which allows duplication. Matching and
scoring is kept as generic as possible to allow easy extension in the
future.
---
Test changes:
The messages checked in `OpenMP/declare_variant_messages.{c,cpp}` have
been auto generated to match the new warnings and notes of the parser.
The "subset" checks were reversed causing the wrong version to be
picked. The tests have been adjusted to correct this.
We do not print scores if the user did not provide one.
We print spaces to make lists in the `match` clause more legible.
Reviewers: kiranchandramohan, ABataev, RaviNarayanaswamy, gtbercea, grokos, sdmitriev, JonChesterfield, hfinkel, fghanim
Subscribers: merge_guards_bot, rampitec, mgorny, hiraditya, aheejin, fedor.sergeev, simoncook, bollu, guansong, dexonsmith, jfb, s.egerton, llvm-commits, cfe-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D71830
This is more or less directly ported from the AMDGPU custom lowering
for FP_TO_FP16. I made a few minor fixups (using G_UNMERGE_VALUES
instead of creating shift/trunc to extract the two halves, and zexting
an inverted compare instead of select_cc).
This also does not include the fast math expansion the DAG which
converts to f32 and then to f16. I think that belongs in a
pre-legalize combine instead.
Like COPY instructions explained in D70616, we don't check the constraints
when combining G_UNMERGE_VALUES. Use the same logic used in D70616 to check
if registers can be replaced, or a COPY instruction needs to be built.
https://reviews.llvm.org/D70564
The goal of this patch is to maximize CPU utilization on multi-socket or high core count systems, so that parallel computations such as LLD/ThinLTO can use all hardware threads in the system. Before this patch, on Windows, a maximum of 64 hardware threads could be used at most, in some cases dispatched only on one CPU socket.
== Background ==
Windows doesn't have a flat cpu_set_t like Linux. Instead, it projects hardware CPUs (or NUMA nodes) to applications through a concept of "processor groups". A "processor" is the smallest unit of execution on a CPU, that is, an hyper-thread if SMT is active; a core otherwise. There's a limit of 32-bit processors on older 32-bit versions of Windows, which later was raised to 64-processors with 64-bit versions of Windows. This limit comes from the affinity mask, which historically is represented by the sizeof(void*). Consequently, the concept of "processor groups" was introduced for dealing with systems with more than 64 hyper-threads.
By default, the Windows OS assigns only one "processor group" to each starting application, in a round-robin manner. If the application wants to use more processors, it needs to programmatically enable it, by assigning threads to other "processor groups". This also means that affinity cannot cross "processor group" boundaries; one can only specify a "preferred" group on start-up, but the application is free to allocate more groups if it wants to.
This creates a peculiar situation, where newer CPUs like the AMD EPYC 7702P (64-cores, 128-hyperthreads) are projected by the OS as two (2) "processor groups". This means that by default, an application can only use half of the cores. This situation could only get worse in the years to come, as dies with more cores will appear on the market.
== The problem ==
The heavyweight_hardware_concurrency() API was introduced so that only *one hardware thread per core* was used. Once that API returns, that original intention is lost, only the number of threads is retained. Consider a situation, on Windows, where the system has 2 CPU sockets, 18 cores each, each core having 2 hyper-threads, for a total of 72 hyper-threads. Both heavyweight_hardware_concurrency() and hardware_concurrency() currently return 36, because on Windows they are simply wrappers over std:🧵:hardware_concurrency() -- which can only return processors from the current "processor group".
== The changes in this patch ==
To solve this situation, we capture (and retain) the initial intention until the point of usage, through a new ThreadPoolStrategy class. The number of threads to use is deferred as late as possible, until the moment where the std::threads are created (ThreadPool in the case of ThinLTO).
When using hardware_concurrency(), setting ThreadCount to 0 now means to use all the possible hardware CPU (SMT) threads. Providing a ThreadCount above to the maximum number of threads will have no effect, the maximum will be used instead.
The heavyweight_hardware_concurrency() is similar to hardware_concurrency(), except that only one thread per hardware *core* will be used.
When LLVM_ENABLE_THREADS is OFF, the threading APIs will always return 1, to ensure any caller loops will be exercised at least once.
Differential Revision: https://reviews.llvm.org/D71775
This patch adds DenseMapInfo<> support for BitVector and SmallBitVector.
This is part of https://reviews.llvm.org/D71775, where a BitVector is used as a thread affinity mask.
The code generation is exactly the same as it was.
But not that the special handling of untied tasks is still handled by
emitUntiedSwitch in clang.
Differential Revision: https://reviews.llvm.org/D69828
When the DW_AT_call_return_pc matches a relocation, the call return pc
would get relocated twice, once because of the relocation in the object
file and once because of dsymutil. The same problem exists for the low
and high PC and the fix is the same. We remember the low, high and
return pc of the original DIE and relocate that, rather than the
potentially already relocated value.
Reviewed offline by Fred Riss.
replaceDbgDeclare is used to update the descriptions of stack variables
when they are moved (e.g. by ASan or SafeStack). A side effect of
replaceDbgDeclare is that it moves dbg.declares around in the
instruction stream (typically by hoisting them into the entry block).
This behavior was introduced in llvm/r227544 to fix an assertion failure
(llvm.org/PR22386), but no longer appears to be necessary.
Hoisting a dbg.declare generally does not create problems. Usually,
dbg.declare either describes an argument or an alloca in the entry
block, and backends have special handling to emit locations for these.
In optimized builds, LowerDbgDeclare places dbg.values in the right
spots regardless of where the dbg.declare is. And no one uses
replaceDbgDeclare to handle things like VLAs.
However, there doesn't seem to be a positive case for moving
dbg.declares around anymore, and this reordering can get in the way of
understanding other bugs. I propose getting rid of it.
Testing: stage2 RelWithDebInfo sanitized build, check-llvm
rdar://59397340
Differential Revision: https://reviews.llvm.org/D74517
Summary:
The DWARF transformer is added as a class so it can be unit tested fully.
The DWARF is converted to GSYM format and handles many special cases for functions:
- omit functions in compile units with 4 byte addresses whose address is UINT32_MAX (dead stripped)
- omit functions in compile units with 8 byte addresses whose address is UINT64_MAX (dead stripped)
- omit any functions whose high PC is <= low PC (dead stripped)
- StringTable builder doesn't copy strings, so we need to make backing copies of strings but only when needed. Many strings come from sections in object files and won't need to have backing copies, but some do.
- When a function doesn't have a mangled name, store the fully qualified name by creating a string by traversing the parent decl context DIEs and then. If we don't do this, we end up having cases where some function might appear in the GSYM as "erase" instead of "std::vector<int>::erase".
- omit any functions whose address isn't in the optional TextRanges member variable of DwarfTransformer. This allows object file to register address ranges that are known valid code ranges and can help omit functions that should have been dead stripped, but just had their low PC values set to zero. In this case we have many functions that all appear at address zero and can omit these functions by making sure they fall into good address ranges on the object file. Many compilers do this when the DWARF has a DW_AT_low_pc with a DW_FORM_addr, and a DW_AT_high_pc with a DW_FORM_data4 as the offset from the low PC. In this case the linker can't write the same address to both the high and low PC since there is only a relocation for the DW_AT_low_pc, so many linkers tend to just zero it out.
Reviewers: aprantl, dblaikie, probinson
Subscribers: mgorny, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D74450
This reverts commit 80a34ae311 with fixes.
Previously, since bots turning on EXPENSIVE_CHECKS are essentially turning on
MachineVerifierPass by default on X86 and the fact that
inline-asm-avx-v-constraint-32bit.ll and inline-asm-avx512vl-v-constraint-32bit.ll
are not expected to generate functioning machine code, this would go
down to `report_fatal_error` in MachineVerifierPass. Here passing
`-verify-machineinstrs=0` to make the intent explicit.
This reverts commit 80a34ae311 with fixes.
On bots llvm-clang-x86_64-expensive-checks-ubuntu and
llvm-clang-x86_64-expensive-checks-debian only,
llc returns 0 for these two tests unexpectedly. I tweaked the RUN line a little
bit in the hope that LIT is the culprit since this change is not in the
codepath these tests are testing.
llvm\test\CodeGen\X86\inline-asm-avx-v-constraint-32bit.ll
llvm\test\CodeGen\X86\inline-asm-avx512vl-v-constraint-32bit.ll
This reverts commit 61b35e4111.
This commit causes a timeout in chromium builds; likely to have a
similar cause to the previous timeout issue caused by this commit (see
6ded69f294 for more details). It is possible that there is no way to
fix this bug that will not cause this issue; further investigations as
to the efficiency of handling large amounts of debug info will be
necessary.
Reapply 8a56d64d76 with minor fixes.
The problem was that cancellation can cause new edges to the parallel
region exit block which is not outlined. The CodeExtractor will encode
the information which "exit" was taken as a return value. The fix is to
ensure we do not return any value from the outlined function, to prevent
control to value conversion we ensure a single exit block for the
outlined region.
This reverts commit 3aac953afa.
This patch is replacements missed in my last change doing this across LLVM.
No functional change, although I think there was a missing typename
in struct conjunction that is now fixed.
In order to fix PR44560 and to prepare for loop transformations we now
finalize a function late, which will also do the outlining late. The
logic is as before but the actual outlining step happens now after the
function was fully constructed. Once we have loop transformations we
can apply them in the finalize step before the outlining.
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D74372
We used coarse-grained liveness before, thus we looked if the
instruction was executed, but we did not use fine-grained liveness,
hence if the instruction was needed or could be deleted even if the
surrounding ones are live. This patches introduces this level of
liveness checks together with other liveness queries, e.g., for uses.
For more control we enforce that all liveness queries go through the
Attributor.
Test have been adjusted to reflect the changes or augmented to prevent
deletion of the parts we want to check.
Reviewed By: sstefan1
Differential Revision: https://reviews.llvm.org/D73313
Current tail duplication embedded in MBP duplicates a BB into all or none of its predecessors without too much cost analysis. So sometimes it is duplicated into cold predecessors, and in other cases it may miss the duplication into hot predecessors.
This patch improves tail duplication in 3 aspects:
A successor can be duplicated into part of its predecessors.
A more fine-grained benefit analysis, combined with 1, now a successor is duplicated into hot predecessors only.
If a successor can't be duplicated into one predecessor, it doesn't impact the duplication into other predecessors.
Differential Revision: https://reviews.llvm.org/D73387
Summary:
This was a very odd API, where you had to pass a flag into a zext
function to say whether the extended bits really were zero or not. All
callers passed in a literal true or false.
I think it's much clearer to make the function name reflect the
operation being performed on the value we're tracking (rather than on
the KnownBits Zero and One fields), so zext means the value is being
zero extended and new function anyext means the value is being extended
with unknown bits.
NFC.
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D74482
Summary:
As far as I can tell, the SFINAE was broken; there is no such thing as
std::is_trivially_constructible<T>::type.
Subscribers: dexonsmith, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D74380
In a previous patch I changed `std::decay<T>::type` to `std::decay<T>`
rather than `std::decay_t<T>`. This seems to have broken the build
*only for clang-cl*. I don't know why.
Submitting with post-commit review because this is an obvious fix for a
build breakage and we've verified that it fixes the breakage.
Added support for the intrinsic llvm.ppc.dcbfl and llvm.ppc.dcbflp.
These will be used for emitting cache control instructions dcbfl and dcbflp
which are actually mnemonics for using dcbf instruction with different
immediate arguments.
dcbfl ra, rb -> dcbf ra, rb, 1
dcbflp, ra, rb -> dcbf ra, rb, 3
Differential Revision: https://reviews.llvm.org/D68411
This reverts commit 636c93ed11.
The original patch caused build failures on TSan buildbots. Commit 6ded69f294
fixes this issue by reducing the rate at which empty debug intrinsics
propagate, reducing the memory footprint and preventing a fatal spike.
The isNegatibleForFree/getNegatedExpression methods currently rely on a raw char value to indicate whether a negation is beneficial or not.
This patch replaces the char return value with an NegatibleCost enum to more clearly demonstrate what is implied.
It also renames isNegatibleForFree to getNegatibleCost to more accurately reflect whats going on.
Differential Revision: https://reviews.llvm.org/D74221
With the fixed implementation of the "remainder" operation in
rG9d0956ebd471, we can now add support to folding calls to it.
Differential Revision: https://reviews.llvm.org/D69777
This patch enables the debug entry values feature.
- Remove the (CC1) experimental -femit-debug-entry-values option
- Enable it for x86, arm and aarch64 targets
- Resolve the test failures
- Leave the llc experimental option for targets that do not
support the CallSiteInfo yet
Differential Revision: https://reviews.llvm.org/D73534
The patch removes unnecessary members of DWARFDebugAddr and further
simplifies the implementation by separating parsing methods of tables
in the DWARFv5 and pre-standard formats.
Differential Revision: https://reviews.llvm.org/D74197
This adds a strict version of FP16_TO_FP and FP_TO_FP16 and uses
them to implement soft promotion for the half type. This is
enough to provide basic support for __fp16 with strictfp.
Add the necessary X86 support to use VCVTPS2PH/VCVTPH2PS when F16C
is enabled.
This reverts commit rGcd5b308b828e, rGcd5b308b828e, rG8cedf0e2994c.
There are issues to be investigated for polly bots and bots turning on
EXPENSIVE_CHECKS.
Summary:
Dwarf stores source-file names the three parts:
<compilation_directory><include_directory><filename>
Prior to this change, the code only allowed retrieving either all
three as the absolute path, or just the filename. But many
compile-command lines--especially those in hermetic build systems
don't specify an absolute path, nor just the filename, but rather the
path relative to the compilation directory. This features allows
retrieving them in that style.
Add tests for path printing styles.
Modify createBasicPrologue to handle include directories.
Subscribers: aprantl, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D73383
This restores commit 748bb5a0f1, along
with a fix for a Chromium test suite build issue (and a new test for
that case).
Differential Revision: https://reviews.llvm.org/D73242
The changeXXXAfterManifest functions are better suited to deal with
changes so we should prefer them. These functions also recursively
delete dead instructions which is why we see test changes.
Split out from D73835. I removed some of these before, but missed
these ones. They are not part of the ConstantFolder interface
and are not going to be used by the IRBuilder.
Summary:
Add a new method (tryParseRegister) that attempts to parse a register specification.
MASM allows the use of IFDEF <register>, as well as IFDEF <symbol>. To accommodate this, we make it possible to check whether a register specification can be parsed at the current location, without failing the entire parse if it can't.
Reviewers: thakis
Reviewed By: thakis
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D73486
Summary:
Simplifies the C++11-style "-> decltype(...)" return-type deduction.
Note that you have to be careful about whether the function return type
is `auto` or `decltype(auto)`. The difference is that bare `auto`
strips const and reference, just like lambda return type deduction. In
some cases that's what we want (or more likely, we know that the return
type is a value type), but whenever we're wrapping a templated function
which might return a reference, we need to be sure that the return type
is decltype(auto).
No functional change.
Subscribers: dexonsmith, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D74383
Summary:
This used std::enable_if without referencing ::type. Changed to use
std::enable_if_t.
Subscribers: llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D74381
Added a test for #pragma clang __debug llvm_fatal_error to test for the original issue.
Added llvm::sys::Process::Exit() and replaced ::exit() in places where it was appropriate. This new function would call the current CrashRecoveryContext if one is running on the same thread; or call ::exit() otherwise.
Fixes PR44705.
Differential Revision: https://reviews.llvm.org/D73742
Summary:
That patch is extracted from https://reviews.llvm.org/D74308.
Currently there are two patterns to name error handling functions:
using "Callback" and "Handler". This patch uses "Handler" for all
usage places.
Reviewers: jhenderson, dblaikie, probinson, aprantl
Reviewed By: jhenderson, dblaikie
Subscribers: hiraditya, llvm-commits
Tags: #llvm, #debug-info
Differential Revision: https://reviews.llvm.org/D74354
New intrinisics are implemented for when we need to port SIMD code from other
arhitectures and only load or store portions of MSA registers.
Following intriniscs are added which only load/store element 0 of a vector:
v4i32 __builtin_msa_ldrq_w (const void *, imm_n2048_2044);
v2i64 __builtin_msa_ldr_d (const void *, imm_n4096_4088);
void __builtin_msa_strq_w (v4i32, void *, imm_n2048_2044);
void __builtin_msa_str_d (v2i64, void *, imm_n4096_4088);
Differential Revision: https://reviews.llvm.org/D73644
Summary:
There are a few field init values that are concrete but not complete/foldable (e.g. `?`). This allows for using those values as initializers without erroring out.
Example:
```
class A {
string value = ?;
}
class B<A impl> : A {
let value = impl.value; // This currently emits an error.
let value = ?; // This doesn't emit an error.
}
```
Differential Revision: https://reviews.llvm.org/D74360
I'm /guessing/ this isn't terribly testable without a very large input
file. Even generated from a more compact assembly file, it's probably
best not to generate a giant temporary test file - if I'm wrong about
that/anyone has good suggestions for testing, I'm all ears!
Based on post-commit review feedback from Igor Kudrin on
eed0242330
Summary: It attempts to devirtualize a call on alloca through vtable loads.
Reviewers: davidxl
Subscribers: mgorny, Prazek, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D71308
ConstantInt values are always represented as constant ranges with a
single element. getConstantInt is obsolete, as pointed out by @nikic
during D60581.
Reviewers: nikic
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D74329
We have several bug reports that could be characterized as "reducing scalarization",
and this topic was also raised on llvm-dev recently:
http://lists.llvm.org/pipermail/llvm-dev/2020-January/138157.html
...so I'm proposing that we deal with these patterns in a new, lightweight IR vector
pass that runs before/after other vectorization passes.
There are 4 alternate options that I can think of to deal with this kind of problem
(and we've seen various attempts at all of these), but they all have flaws:
InstCombine - can't happen without TTI, but we don't want target-specific
folds there.
SDAG - too late to assist other vectorization passes; TLI is not equipped
for these kind of cost queries; limited to a single basic block.
CGP - too late to assist other vectorization passes; would need to re-implement
basic cleanups like CSE/instcombine.
SLP - doesn't fit with existing transforms; limited to a single basic block.
This initial patch/transform is based on existing code in AggressiveInstCombine:
we walk backwards through the function looking for a pattern match. But we diverge
from that cost-independent IR canonicalization pass by using TTI to decide if the
vector alternative is profitable.
We probably have at least 10 similar bug reports/patterns (binops, constants,
inserts, cheap shuffles, etc) that would fit in this pass as follow-up enhancements.
It's possible that we could iterate on a worklist to fix-point like InstCombine does,
but it's safer to start with a most basic case and evolve from there, so I didn't
try to do anything fancy with this initial implementation.
Differential Revision: https://reviews.llvm.org/D73480
Implement protection against the stack clash attack [0] through inline stack
probing.
Probe stack allocation every PAGE_SIZE during frame lowering or dynamic
allocation to make sure the page guard, if any, is touched when touching the
stack, in a similar manner to GCC[1].
This extends the existing `probe-stack' mechanism with a special value `inline-asm'.
Technically the former uses function call before stack allocation while this
patch provides inlined stack probes and chunk allocation.
Only implemented for x86.
[0] https://www.qualys.com/2017/06/19/stack-clash/stack-clash.txt
[1] https://gcc.gnu.org/ml/gcc-patches/2017-07/msg00556.html
This a recommit of 39f50da2a3 with proper LiveIn
declaration, better option handling and more portable testing.
Differential Revision: https://reviews.llvm.org/D68720
Implement protection against the stack clash attack [0] through inline stack
probing.
Probe stack allocation every PAGE_SIZE during frame lowering or dynamic
allocation to make sure the page guard, if any, is touched when touching the
stack, in a similar manner to GCC[1].
This extends the existing `probe-stack' mechanism with a special value `inline-asm'.
Technically the former uses function call before stack allocation while this
patch provides inlined stack probes and chunk allocation.
Only implemented for x86.
[0] https://www.qualys.com/2017/06/19/stack-clash/stack-clash.txt
[1] https://gcc.gnu.org/ml/gcc-patches/2017-07/msg00556.html
This a recommit of 39f50da2a3 with proper LiveIn
declaration, better option handling and more portable testing.
Differential Revision: https://reviews.llvm.org/D68720
In addition to the module pass, this patch introduces a CGSCC pass that
runs the Attributor on a strongly connected component of the call graph
(both old and new PM). The Attributor was always design to be used on a
subset of functions which makes this patch mostly mechanical.
The one change is that we give up `norecurse` deduction in the module
pass in favor of doing it during the CGSCC pass. This makes the
interfaces simpler but can be revisited if needed.
Reviewed By: hfinkel
Differential Revision: https://reviews.llvm.org/D70767
Add support for Master and Critical directive in the OMPIRBuilder. Both make use of a new common interface for emitting inlined OMP regions called `emitInlinedRegion` which was added in this patch as well.
Also this patch modifies clang to use the new directives when `-fopenmp-enable-irbuilder` commandline option is passed.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D72304
This adds ~27 more runtime calls to the OpenMPKinds.def file, all with
attributes. We deduplicate 16 of those automatically in function =
thread scope. And we annotate all of them automatically during the
OpenMPOpt discovery step. A test with all omp_XXXX runtime calls to
track annotation coverage is included.
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D69984
The OpenMPOpt pass is a CGSCC pass in which OpenMP specific
optimizations can reside.
The OpenMPOpt pass uses the OpenMPKinds.def file to identify runtime
calls and their uses. This allows targeted transformations and eases
their implementation.
This initial patch deduplicates `__kmpc_global_thread_num` and
`omp_get_thread_num` calls. We can also identify arguments that are
equivalent to such a call result and use it instead. Later we can
determine "gtid" arguments based on the use in kernel functions etc.
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D69930
The CallGraphUpdater is a helper that simplifies the process of updating
the call graph, both old and new style, while running an CGSCC pass.
The uses are contained in different commits, e.g. D70767.
More functionality is added as we need it.
Reviewed By: modocache, hfinkel
Differential Revision: https://reviews.llvm.org/D70927
Bionic has had `__strlen_chk` for a while. Optimizing that into a
constant is quite profitable, when possible.
Differential Revision: https://reviews.llvm.org/D74079
Implement protection against the stack clash attack [0] through inline stack
probing.
Probe stack allocation every PAGE_SIZE during frame lowering or dynamic
allocation to make sure the page guard, if any, is touched when touching the
stack, in a similar manner to GCC[1].
This extends the existing `probe-stack' mechanism with a special value `inline-asm'.
Technically the former uses function call before stack allocation while this
patch provides inlined stack probes and chunk allocation.
Only implemented for x86.
[0] https://www.qualys.com/2017/06/19/stack-clash/stack-clash.txt
[1] https://gcc.gnu.org/ml/gcc-patches/2017-07/msg00556.html
This a recommit of 39f50da2a3 with better option
handling and more portable testing
Differential Revision: https://reviews.llvm.org/D68720
This hasn't been used for years, its original implementation, D35700, had bugs that caused the reversion of most of the code, and since then x86 shuffle lowering/combining has handled most cases and can deal with the rest as well.
Implement protection against the stack clash attack [0] through inline stack
probing.
Probe stack allocation every PAGE_SIZE during frame lowering or dynamic
allocation to make sure the page guard, if any, is touched when touching the
stack, in a similar manner to GCC[1].
This extends the existing `probe-stack' mechanism with a special value `inline-asm'.
Technically the former uses function call before stack allocation while this
patch provides inlined stack probes and chunk allocation.
Only implemented for x86.
[0] https://www.qualys.com/2017/06/19/stack-clash/stack-clash.txt
[1] https://gcc.gnu.org/ml/gcc-patches/2017-07/msg00556.html
This a recommit of 39f50da2a3 with correct option
flags set.
Differential Revision: https://reviews.llvm.org/D68720
Add the isCandidateForCallSiteEntry predicate to MachineInstr to
determine whether a DWARF call site entry should be created for an
instruction.
For now, it's enough to have any call instruction that doesn't belong to
a blacklisted set of opcodes. For these opcodes, a call site entry isn't
meaningful.
Differential Revision: https://reviews.llvm.org/D74159
Allows more flexible use of buildMerge in places where
use operands are available as SrcOp since it does not
require explicit conversion to Register.
Simplify code with new buildMerge.
Differential Revision: https://reviews.llvm.org/D74223
Printing floating point number in decimal is inconvenient for humans.
Verbose asm output will print out floating point values in comments, it
helps.
But in lots of cases, users still need additional work to covert the
decimal back to hex or binary to check the bit patterns,
especially when there are small precision difference.
Hexadecimal form is one of the supported form in LLVM IR, and easier for
debugging.
This patch try to print all FP constant in hex form instead.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D73566
This reverts commit 39f50da2a3.
The -fstack-clash-protection is being passed to the linker too, which
is not intended.
Reverting and fixing that in a later commit.
Summary: This patch introduces an API for MemOp in order to simplify and tighten the client code.
Reviewers: courbet
Subscribers: arsenm, nemanjai, jvesely, nhaehnle, hiraditya, kbarton, jsji, kerbowa, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D73964
This patch adds versions of isImpliedCondition and
isImpliedByDomCondition that take a predicate, LHS and RHS operands as
instead of a Value representing the condition.
This allows using those functions to check conditions without having a
concrete ICmp instruction.
Reviewers: nikic, RKSimon, lebedev.ri, spatel
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D74065
Implement protection against the stack clash attack [0] through inline stack
probing.
Probe stack allocation every PAGE_SIZE during frame lowering or dynamic
allocation to make sure the page guard, if any, is touched when touching the
stack, in a similar manner to GCC[1].
This extends the existing `probe-stack' mechanism with a special value `inline-asm'.
Technically the former uses function call before stack allocation while this
patch provides inlined stack probes and chunk allocation.
Only implemented for x86.
[0] https://www.qualys.com/2017/06/19/stack-clash/stack-clash.txt
[1] https://gcc.gnu.org/ml/gcc-patches/2017-07/msg00556.html
Differential Revision: https://reviews.llvm.org/D68720
As discussed in D70568, remove this because it isn't used anywhere, and I think it's better to go through real crashes for testing (#pragma clang __debug crash).
Also remove the support function llvm::CrashRecoveryContext::HandleCrash() which was added at the same time by @ddunbar.
Differential Revision: https://reviews.llvm.org/D74063
"linked-to section" is used by the ELF spec. By analogy, "linked-to
symbol" is a good name for the signature symbol. The word "linked-to"
implies a directed edge and makes it clear its relation with "sh_link",
while one can argue that "associated" means an undirected edge.
Also, combine tests and add precise SMLoc to improve diagnostics.
Reviewed By: eugenis, grimar, jhenderson
Differential Revision: https://reviews.llvm.org/D74082
To find the instruction in the block for a given ID, first a count and then a
lookup was performed in the map, which is almost the same thing, thus doing
double the work.
Differential Revision: https://reviews.llvm.org/D73866
This adds some of LLD specific scopes and picks up optimisation scopes
via LTO/ThinLTO. Makes use of TimeProfiler multi-thread support added in
77e6bb3c.
Differential Revision: https://reviews.llvm.org/D71060
As detailed on PR43943, we're seeing static analyzer use after move warnings in the iplist_impl move constructor/operator as they call std::move to both the TraitsT and IntrusiveListT base classes.
As suggested by @dexonsmith this patch casts the moved value to the base classes to silence the warnings.
Differential Revision: https://reviews.llvm.org/D74062
This is a compile-time optimization for PHIElimination (splitting of critical
edges), which was reported at https://bugs.llvm.org/show_bug.cgi?id=44249. As
discussed there, the way to remedy the slowdowns with huge functions is to
pre-compute the live-in registers for each MBB in an efficient way in
PHIElimination.cpp and then pass that information along to
LiveVariabless::addNewBlock().
In all the huge test programs where this slowdown has been noticable, it has
dissapeared entirely with this patch.
Review: Björn Pettersson, Quentin Colombet.
Differential Revision: https://reviews.llvm.org/D73152
I was debug stepping through an x86 shuffle lowering and
noticed we were doing an N^2 search for splat index. I
didn't find the equivalent functionality anywhere else in
LLVM, so here's a helper that takes an array of int and
returns a splatted index while ignoring undefs (any
negative value).
This might also be used inside existing
ShuffleVectorInst/ShuffleVectorSDNode functions and/or
help with D72467.
Differential Revision: https://reviews.llvm.org/D74064
Removed some #ifdefs specific to Windows handling of VFS paths. This
eliminates most of the differences between the Windows and non-Windows
code paths.
Making this work required some changes to account for the fact that VFS
file paths can be Posix style or Windows style, so you cannot just assume
that they use the host's native path style. In one case, this means
implementing our own version of make_absolute, since the filesystem code
in Support doesn't have styles in the sense that the path code does.
Differential Review: https://reviews.llvm.org/D71092
Use cmp ord instead of cmp_class compared to the DAG version for the
nan check, but mostly try to match the existsing pattern.
I think the sign doesn't matter for fract, so we could do a little
better with the source modifier matching.
I think this is also still broken as in D22898, but I'm leaving it
as-is for now while I don't have an SI system to test on.
contractCrossBankCopyIntoStore() finds the instruction defines the
source register and uses its output to replace the register. There are,
however, instructions that have multiple outputs, e.g. G_UNMERGE_VALUES.
Current implementation hardcodes to operand 0 and has no way of knowing
which output should be used.
This change adds another function to directly return the register that
is the source of the register and use that for folding.
This fixes https://bugs.llvm.org/show_bug.cgi?id=44783
Differential Revision: https://reviews.llvm.org/D74005
This is safer in case anyone tries to run MI optimization passes on
pre-selected MIR. If there turns out to be a real reason to do this,
we might need to add separate convergent intrinsic opcodes.
Summary:
Tune the profile threshold flag value for instrumentation PGO based on internal
benchmarks.
Also, add flags to allow profile guided size optimizations for non-cold code
to be enabled separately for instrumentation and sample PGSO.
Neither changes the default behavior (yet) as it's disabled for non-cold code.
Reviewers: davidxl
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D72937
This reverts commit 41784bed01.
Since the original revision ead815924e,
this revision fixes three issues:
- This revision fixes the Windows build. My original patch improperly
copied EH pads on Windows. This patch disregards jump threading
opportunities having to do with EH pads.
- This revision fixes jump threading to a wrong destination.
Specifically, my original patch treated any Constant other than 0 as 1
while evaluating the branch condition. This bug led to treating
constant expressions like:
icmp ugt i8* null, inttoptr (i64 4 to i8*)
to "true". This patch fixes the bug by calling isOneValue.
- This revision fixes the cost calculation of two basic blocks being
threaded through. Note that getJumpThreadDuplicationCost returns
"(unsigned)~0" for those basic blocks that cannot be duplicated. If
we sum of two return values from getJumpThreadDuplicationCost, we
could have an unsigned overflow like:
(unsigned)~0 + 5 = 4
and mistakenly determine that it's safe and profitable to proceed
with the jump threading opportunity. The patch fixes the bug by
checking each return value before summing them up.
[JumpThreading] Thread jumps through two basic blocks
Summary:
This patch teaches JumpThreading.cpp to thread through two basic
blocks like:
bb3:
%var = phi i32* [ null, %bb1 ], [ @a, %bb2 ]
%tobool = icmp eq i32 %cond, 0
br i1 %tobool, label %bb4, label ...
bb4:
%cmp = icmp eq i32* %var, null
br i1 %cmp, label bb5, label bb6
by duplicating basic blocks like bb3 above. Once we duplicate bb3 as
bb3.dup and redirect edge bb2->bb3 to bb2->bb3.dup, we have:
bb3:
%var = phi i32* [ @a, %bb2 ]
%tobool = icmp eq i32 %cond, 0
br i1 %tobool, label %bb4, label ...
bb3.dup:
%var = phi i32* [ null, %bb1 ]
%tobool = icmp eq i32 %cond, 0
br i1 %tobool, label %bb4, label ...
bb4:
%cmp = icmp eq i32* %var, null
br i1 %cmp, label bb5, label bb6
Then the existing code in JumpThreading.cpp can thread edge
bb3.dup->bb4 through bb4 and eventually create bb3.dup->bb5.
Reviewers: wmi
Subscribers: hiraditya, jfb, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70247
Summary:
Currently type test assume sequences inserted for devirtualization are
removed during WPD. This patch delays their removal until later in the
optimization pipeline. This is an enabler for upcoming enhancements to
indirect call promotion, for example streamlined promotion guard
sequences that compare against vtable address instead of the target
function, when there are small number of possible vtables (either
determined via WPD or by in-progress type profiling). We need the type
tests to correlate the callsites with the address point offset needed in
the compare sequence, and optionally to associated type summary info
computed during WPD.
This depends on work in D71913 to enable invocation of LowerTypeTests to
drop type test assume sequences, which will now be invoked following ICP
in the ThinLTO post-LTO link pipelines, and also after the existing
export phase LowerTypeTests invocation in regular LTO (which is already
after ICP). We cannot simply move the existing import phase
LowerTypeTests pass later in the ThinLTO post link pipelines, as the
comment in PassBuilder.cpp notes (it must run early because when
performing CFI other passes may disturb the sequences it looks for).
This necessitated adding a new type test resolution "Unknown" that we
can use on the type test assume sequences previously removed by WPD,
that we now want LTT to ignore.
Depends on D71913.
Reviewers: pcc, evgeny777
Subscribers: mehdi_amini, Prazek, hiraditya, steven_wu, dexonsmith, arphaman, davidxl, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D73242
Previously the description allowed to describe symbols with use of
`Name` and `Index` keys. This patch removes them and now it is still
possible to use either names or symbol indexes, but the code is simpler
and the format is slightly different.
Such a change will be useful for another patches, e.g:
https://reviews.llvm.org/D73788#inline-671077
Differential revision: https://reviews.llvm.org/D73888
Summary:
This reverts commit 3ef169e586. The
purpose of this commit was to allow stack machines to perform
instruction selection for instructions with variadic defs. However,
MachineInstrs fundamentally cannot support variadic defs right now, so
this change does not turn out to be useful.
Depends on D73927.
Reviewers: aheejin
Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D73928
This extends the RemarkStreamer to allow for other emitters (e.g.
frontends, SIL, etc.) to emit remarks through a common interface.
See changes in llvm/docs/Remarks.rst for motivation and design choices.
Differential Revision: https://reviews.llvm.org/D73676
The CATCHPAD node mostly existed to be selected into the EH_RESTORE
instruction, which sets the frame back up when 32-bit Windows exceptions
return to the parent function. However, creating this MachineInstr early
increases the risk that other passes will come along and insert
instructions that use the stack before ESP and EBP are restored. That
happened in PR44697.
Instead of representing these in the instruction stream early, delay it
until PEI. Mark the blocks where this needs to happen as EHPads, but not
funclet entry blocks. Passes after PEI have to be careful not to hoist
instructions that can use stack across frame setup instructions, so this
should be relatively reliable.
Fixes PR44697
Reviewed By: hans
Differential Revision: https://reviews.llvm.org/D73752
Add support for Flush in the OMPIRBuilder. This patch also adds changes
to clang to use the OMPIRBuilder when '-fopenmp-enable-irbuilder'
commandline option is used.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D70712
Summary:
Add a debug check for frequency queries for unknown blocks (typically blocks
that are created after BFI is computed but their frequencies are not
communicated to BFI.)
This is useful for detecting and debugging missed BFI updates.
This is debug build only and disabled behind a flag.
Reviewers: davidxl
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D73920
AMDGPU and x86 at least both have separate controls for whether
denormal results are flushed on output, and for whether denormals are
implicitly treated as 0 as an input. The current DAGCombiner use only
really cares about the input treatment of denormals.
Summary:
After following Simon's suggestion about additional testing posted at
https://reviews.llvm.org/D73906, I found several more places that
need to be updated.
Reviewers: simon_tatham, dmgreen, ostannard, eli.friedman
Reviewed By: simon_tatham
Subscribers: merge_guards_bot, kristof.beyls, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D73963
Summary:
This patch changes the underlying type of the ARM::ArchExtKind
enumeration to uint64_t and adjusts the related code.
The goal of the patch is to prepare the code base for a new
architecture extension.
Reviewers: simon_tatham, eli.friedman, ostannard, dmgreen
Reviewed By: dmgreen
Subscribers: merge_guards_bot, kristof.beyls, hiraditya, cfe-commits, llvm-commits, pbarrio
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D73906
Adds the global (cl::opt) GVNOption enable-load-in-loop-pre in order
to control whether the optimization will be performed if the load
is part of a loop.
Patch by Hendrik Greving!
Differential Revision: https://reviews.llvm.org/D73804
Summary:
The AIX assembler .space directive can't take a second non-zero argument to fill
with. But LLVM emitFill currently assumes it can. We add a flag to the AsmInfo
to check if non-zero fill is supported, and if we can't zerofill non-zero values
we just splat the .byte directives.
Reviewers: stevewan, sfertile, DiggerLin, jasonliu, Xiangling_L
Reviewed By: jasonliu
Subscribers: Xiangling_L, wuzish, nemanjai, hiraditya, kbarton, jsji, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D73554
Start using a new strategy with a combination of merge and unmerges.
This allows scalarizing before lowering, which in cases like
<2 x s128> avoids producing giant illegal shifts.
RegAllocGreedy uses a fairly compile time intensive splitting heuristic
called region splitting. This heuristic was disabled via another heuristic
when it is likely that it won't be worth the compile time. The only way
to control this other heuristic was via a command line option (huge-size-for-split).
This commit gives more control on this heuristic by making it overridable
by the target using a target hook in TargetRegisterInfo called
shouldRegionSplitForVirtReg.
The default implementation of this hook keeps the heuristic as it was
before this patch.
ClangBuildAnalyzer results show that a lot of time is spent
instantiating AnalysisManager::getResultImpl across the code base:
**** Templates that took longest to instantiate:
50445 ms: llvm::AnalysisManager<llvm::Function>::getResultImpl (412 times, avg 122 ms)
47797 ms: llvm::AnalysisManager<llvm::Function>::getResult<llvm::TargetLibraryAnalysis> (389 times, avg 122 ms)
46894 ms: std::tie<const unsigned long long, const bool> (2452 times, avg 19 ms)
43851 ms: llvm::BumpPtrAllocatorImpl<llvm::MallocAllocator, 4096, 4096>::Allocate (3228 times, avg 13 ms)
33911 ms: std::tie<const unsigned int, const unsigned int, const unsigned int, const unsigned int> (897 times, avg 37 ms)
33854 ms: std::tie<const unsigned long long, const unsigned long long> (1897 times, avg 17 ms)
27886 ms: std::basic_string<char, std::char_traits<char>, std::allocator<char> >::basic_string (11156 times, avg 2 ms)
I mentioned this result to @chandlerc, and he suggested this direction.
AnalysisManager is already explicitly instantiated, and getResultImpl
doesn't need to be inlined. Move the definition to an Impl header, and
include that header in files that explicitly instantiate
AnalysisManager. There are only four (real) IR units:
- function
- module
- loop
- cgscc
Looking at a specific transform (ArgumentPromotion.cpp), here are three
compilations before & after this change:
BEFORE:
$ for i in $(seq 3) ; do ./ccit.bat ; done
peak memory: 258.15MB
real: 0m6.297s
peak memory: 257.54MB
real: 0m5.906s
peak memory: 257.47MB
real: 0m6.219s
AFTER:
$ for i in $(seq 3) ; do ./ccit.bat ; done
peak memory: 235.35MB
real: 0m5.454s
peak memory: 234.72MB
real: 0m5.235s
peak memory: 234.39MB
real: 0m5.469s
The 20MB of memory saved seems real, and the time improvement seems like
it is there.
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D73817
Summary:
Method appendLoopsToWorklist is duplicate in LoopUnroll and in the
LoopPassManager as an internal method. Make it an utility.
Reviewers: dmgreen, chandlerc, fedor.sergeev, yamauchi
Subscribers: mehdi_amini, hiraditya, zzheng, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D73569
Summary:
* Most of the simplifications in SimplifyShuffleVectorInst depend on the
concrete value of, or the length of the mask vector. For scalable
vectors, this cannot be known at compile time.
** for these tests, detect if the vector is scalable before attempting
the transformation
* The functions ShuffleVectorInst::getMaskValue and
ShuffleVectorInst::getShuffleMask access the value of the constant mask.
However, since the length of the mask is unknown at compile time, these
function do not work for scalable vectors. Add asserts to ensure that
the input mask is not scalable
Reviewers: efriedma, sdesmalen, apazos, chrisj, huihuiz
Reviewed By: efriedma
Subscribers: tschuett, hiraditya, rkruppe, psnobl, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D73555
This renames Worklist.AddDeferred() to Worklist.add() and
Worklist.Add() to Worklist.push(). The intention here is that
Worklist.add() should be the go-to method for explicit worklist
management, while the raw Worklist.push() is mostly for
InstCombine internals. I will then migrate uses of Worklist.push()
to Worklist.add() in followup changes.
As suggested by spatel on D73411 I'm also changing the remaining
method names to lowercase first character, in line with current
coding standards.
Differential Revision: https://reviews.llvm.org/D73745
The fix in b3d7d1061d compiled nicely,
but didn't link because at least the VS 2017 version I use doesn't
have the builtin yet. Instead, make use of the builtin with MSVC
conditional on VS 2019 or later.
Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790
Reviewers: courbet
Subscribers: arsenm, dschuff, jyknight, sdardis, nemanjai, jvesely, nhaehnle, sbc100, jgravelle-google, hiraditya, aheejin, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, jsji, Jim, lenary, s.egerton, pzheng, sameer.abuasal, apazos, luismarques, kerbowa, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D73885
Otherwise Visual Studio 2017 will complain about
llvm::StringRef::strlen not being constexpr:
StringRef.h(80): error C3615: constexpr function 'llvm::StringRef::strLen' cannot result in a constant expression
StringRef.h(84): note: failure was caused by call of undefined function or one not declared 'constexpr'
Add support for Master and Critical directive in the OMPIRBuilder. Both make use of a new common interface for emitting inlined OMP regions called `emitInlinedRegion` which was added in this patch as well.
Also this patch modifies clang to use the new directives when `-fopenmp-enable-irbuilder` commandline option is passed.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D72304