Summary:
This patch adds intrinsics and ISelDAG nodes for signed
and unsigned fixed-point division:
```
llvm.sdiv.fix.sat.*
llvm.udiv.fix.sat.*
```
These intrinsics perform scaled, saturating division
on two integers or vectors of integers. They are
required for the implementation of the Embedded-C
fixed-point arithmetic in Clang.
Reviewers: bjope, leonardchan, craig.topper
Subscribers: hiraditya, jdoerfert, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D71550
This is based on this llvm-dev thread http://lists.llvm.org/pipermail/llvm-dev/2019-December/137521.html
The current strategy for f16 is to promote type to float every except where the specific width is required like loads, stores, and bitcasts. This results in rounding occurring in odd places instead of immediately after arithmetic operations. This interacts in weird ways with the __fp16 type in clang which is a storage only type where arithmetic is always promoted to float. InstCombine can remove some fpext/fptruncs around such arithmetic and turn it into arithmetic on half. This wouldn't be so bad if SelectionDAG was able to put those fpext/fpround back in when it promotes.
It is also not obvious how to handle to make the existing strategy work with STRICT fp. We need to use STRICT versions of the conversions which require chain operands. But if the conversions are created for a bitcast, there is no place to get an appropriate chain from.
This patch implements a different strategy where conversions are emitted directly around arithmetic operations. And otherwise its passed around as an i16 including in arguments and return values. This can result in more conversions between arithmetic operations, but is closer to matching the IR the frontend generates for __fp16. And it will allow us to use the chain from constrained arithmetic nodes to link the STRICT_FP_TO_FP16/STRICT_FP16_TO_FP that will need to be added. I've set it up so that each target can opt into the new behavior. Converting all the targets myself was more than I was able to handle.
Differential Revision: https://reviews.llvm.org/D73749
and macro FUNCTION likewise. NFCI.
Some functions like fmuladd don't really have a node, we should divide
the declaration form those have node to avoid introducing fake nodes.
Differential Revision: https://reviews.llvm.org/D72871
This was dropping the invariant metadata on dead argument loads, so
they weren't deleted.
Atomics still need to be fixed the same way. Also, apparently store
was never preserving dereferencable which should also be fixed.
Summary:
This always just used the same libcall as unordered, but the comparison predicate was different. This change appears to have been made when targets were given the ability to override the predicates. Before that they were hardcoded into the type legalizer. At that time we never inverted predicates and we handled ugt/ult/uge/ule compares by emitting an unordered check ORed with a ogt/olt/oge/ole checks. So only ordered needed an inverted predicate. Later ugt/ult/uge/ule were optimized to only call a single libcall and invert the compare.
This patch removes the ordered entries and just uses the inverting logic that is now present. This removes some odd things in both the Mips and WebAssembly code.
Reviewers: efriedma, ABataev, uweigand, cameron.mcinally, kpn
Reviewed By: efriedma
Subscribers: dschuff, sdardis, sbc100, arichardson, jgravelle-google, kristof.beyls, hiraditya, aheejin, sunfish, atanasyan, Petar.Avramovic, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D72536
Summary:
This patch adds intrinsics and ISelDAG nodes for
signed and unsigned fixed-point division:
llvm.sdiv.fix.*
llvm.udiv.fix.*
These intrinsics perform scaled division on two
integers or vectors of integers. They are required
for the implementation of the Embedded-C fixed-point
arithmetic in Clang.
Patch by: ebevhan
Reviewers: bjope, leonardchan, efriedma, craig.topper
Reviewed By: craig.topper
Subscribers: Ka-Ka, ilya, hiraditya, jdoerfert, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70007
For now, we didn't set the default operation action for SIGN_EXTEND_INREG for
vector type, which is 0 by default, that is legal. However, most target didn't
have native instructions to support this opcode. It should be set as expand by
default, as what we did for ANY_EXTEND_VECTOR_INREG.
Differential Revision: https://reviews.llvm.org/D70000
MVE has a basic symmetry between it's normal loads/store operations and
the masked variants. This means that masked loads and stores can use
pre-inc and post-inc addressing modes, just like the standard loads and
stores already do.
To enable that, this patch adds all the relevant infrastructure for
treating masked loads/stores addressing modes in the same way as normal
loads/stores.
This involves:
- Adding an AddressingMode to MaskedLoadStoreSDNode, along with an extra
Offset operand that is added after the PtrBase.
- Extending the IndexedModeActions from 8bits to 16bits to store the
legality of masked operations as well as normal ones. This array is
fairly small, so doubling the size still won't make it very large.
Offset masked loads can then be controlled with
setIndexedMaskedLoadAction, similar to standard loads.
- The same methods that combine to indexed loads, such as
CombineToPostIndexedLoadStore, are adjusted to handle masked loads in
the same way.
- The ARM backend is then adjusted to make use of these indexed masked
loads/stores.
- The X86 backend is adjusted to hopefully be no functional changes.
Differential Revision: https://reviews.llvm.org/D70176
float node
This patch add an option 'disable-strictnode-mutation' to prevent strict
node mutating to an normal node.
So we can make sure that the patch which sets strict-node as legal works
correctly.
Patch by Chen Liu(LiuChen3)
Differential Revision: https://reviews.llvm.org/D70226
Summary
In several places we need to enumerate all constrained intrinsics or IR
nodes that should be represented by them. It is easy to miss some of
the cases. To make working with these intrinsics more convenient and
robust, this change introduces file containing definitions of all
constrained intrinsics and some of their properties. This file can be
included to generate constrained intrinsics processing code.
Reviewers: kpn, andrew.w.kaylor, cameron.mcinally, uweigand
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D69887
v256i1 on X86 without avx512 breaks down to 256 i8 values when passed between basic blocks. But the NumRegistersForVT was sized at a byte for each VT. This results in 256 being stored as 0.
This patch enlarges the type to 16 bits and adds an assert to ensure that no information is lost when the entry is stored.
Differential Revision: https://reviews.llvm.org/D70138
Move TargetLoweringBase::isSuitableForJumpTable from
llvm/CodeGen/TargetLowering.h to .cpp, to avoid the undefined reference
from all LLVM${Target}ISelLowering.cpp.
Another fix is to add a dependency on TransformUtils to all
lib/Target/$Target/LLVMBuild.txt, but that is too disruptive.
Adds a new ISD node to replicate a scalar value across all elements of
a vector. This is needed for scalable vectors, since BUILD_VECTOR cannot
be used.
Fixes up default type legalization for scalable vectors after the
new MVT type ranges were introduced.
At present I only use this node for scalable vectors. A DAGCombine has
been added to transform a BUILD_VECTOR into a SPLAT_VECTOR if all
elements are the same, but only if the default operation action of
Expand has been overridden by the target.
I've only added result promotion legalization for scalable vector
i8/i16/i32/i64 types in AArch64 for now.
Reviewers: t.p.northover, javed.absar, greened, cameron.mcinally, jmolloy
Reviewed By: jmolloy
Differential Revision: https://reviews.llvm.org/D47775
llvm-svn: 375222
Earlier in the year intrinsics for lrint, llrint, lround and llround were
added to llvm. The constrained versions are now implemented here.
Reviewed by: andrew.w.kaylor, craig.topper, cameron.mcinally
Approved by: craig.topper
Differential Revision: https://reviews.llvm.org/D64746
llvm-svn: 373900
This caused severe compile-time regressions, see PR43455.
> Modern processors predict the targets of an indirect branch regardless of
> the size of any jump table used to glean its target address. Moreover,
> branch predictors typically use resources limited by the number of actual
> targets that occur at run time.
>
> This patch changes the semantics of the option `-max-jump-table-size` to limit
> the number of different targets instead of the number of entries in a jump
> table. Thus, it is now renamed to `-max-jump-table-targets`.
>
> Before, when `-max-jump-table-size` was specified, it could happen that
> cluster jump tables could have targets used repeatedly, but each one was
> counted and typically resulted in tables with the same number of entries.
> With this patch, when specifying `-max-jump-table-targets`, tables may have
> different lengths, since the number of unique targets is counted towards the
> limit, but the number of unique targets in tables is the same, but for the
> last one containing the balance of targets.
>
> Differential revision: https://reviews.llvm.org/D60295
llvm-svn: 373060
Rename old function to explicitly show that it cares only about alignment.
The new allowsMemoryAccess call the function related to alignment by default
and can be overridden by target to inform whether the memory access is legal or
not.
Differential Revision: https://reviews.llvm.org/D67121
llvm-svn: 372935
Modern processors predict the targets of an indirect branch regardless of
the size of any jump table used to glean its target address. Moreover,
branch predictors typically use resources limited by the number of actual
targets that occur at run time.
This patch changes the semantics of the option `-max-jump-table-size` to limit
the number of different targets instead of the number of entries in a jump
table. Thus, it is now renamed to `-max-jump-table-targets`.
Before, when `-max-jump-table-size` was specified, it could happen that
cluster jump tables could have targets used repeatedly, but each one was
counted and typically resulted in tables with the same number of entries.
With this patch, when specifying `-max-jump-table-targets`, tables may have
different lengths, since the number of unique targets is counted towards the
limit, but the number of unique targets in tables is the same, but for the
last one containing the balance of targets.
Differential revision: https://reviews.llvm.org/D60295
llvm-svn: 372893
* Reordered MVT simple types to group scalable vector types
together.
* New range functions in MachineValueType.h to only iterate over
the fixed-length int/fp vector types.
* Stopped backends which don't support scalable vector types from
iterating over scalable types.
Reviewers: sdesmalen, greened
Reviewed By: greened
Differential Revision: https://reviews.llvm.org/D66339
llvm-svn: 372099
This is the main CodeGen patch to support the arm64_32 watchOS ABI in LLVM.
FastISel is mostly disabled for now since it would generate incorrect code for
ILP32.
llvm-svn: 371722
Summary:
Add an intrinsic that takes 2 unsigned integers with
the scale of them provided as the third argument and
performs fixed point multiplication on them. The
result is saturated and clamped between the largest and
smallest representable values of the first 2 operands.
This is a part of implementing fixed point arithmetic
in clang where some of the more complex operations
will be implemented as intrinsics.
Patch by: leonardchan, bjope
Reviewers: RKSimon, craig.topper, bevinh, leonardchan, lebedev.ri, spatel
Reviewed By: leonardchan
Subscribers: ychen, wuzish, nemanjai, MaskRay, jsji, jdoerfert, Ka-Ka, hiraditya, rjmccall, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D57836
llvm-svn: 371308
Summary:
This patch renames functions that takes or returns alignment as log2, this patch will help with the transition to llvm::Align.
The renaming makes it explicit that we deal with log(alignment) instead of a power of two alignment.
A few renames uncovered dubious assignments:
- `MirParser`/`MirPrinter` was expecting powers of two but `MachineFunction` and `MachineBasicBlock` were using deal with log2(align). This patch fixes it and updates the documentation.
- `MachineBlockPlacement` exposes two flags (`align-all-blocks` and `align-all-nofallthru-blocks`) supposedly interpreted as power of two alignments, internally these values are interpreted as log2(align). This patch updates the documentation,
- `MachineFunctionexposes` exposes `align-all-functions` also interpreted as power of two alignment, internally this value is interpreted as log2(align). This patch updates the documentation,
Reviewers: lattner, thegameg, courbet
Subscribers: dschuff, arsenm, jyknight, dylanmckay, sdardis, nemanjai, jvesely, nhaehnle, javed.absar, hiraditya, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, dexonsmith, PkmX, jocewei, jsji, Jim, s.egerton, llvm-commits, courbet
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D65945
llvm-svn: 371045
This implements constrained floating point intrinsics for FP to signed and
unsigned integers.
Quoting from D32319:
The purpose of the constrained intrinsics is to force the optimizer to
respect the restrictions that will be necessary to support things like the
STDC FENV_ACCESS ON pragma without interfering with optimizations when
these restrictions are not needed.
Reviewed by: Andrew Kaylor, Craig Topper, Hal Finkel, Cameron McInally, Roman Lebedev, Kit Barton
Approved by: Craig Topper
Differential Revision: http://reviews.llvm.org/D63782
llvm-svn: 370228
Summary:
Here is the commit introducing the fields
https://github.com/llvm/llvm-project/commit/cf6749e4c091
It dates back from 2006 and was used by AArch64 backend.
There is no more reference to these fields in the whole codebase so I think it's fine.
Reviewers: courbet
Subscribers: javed.absar, kristof.beyls, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D66683
llvm-svn: 369810
These were recently made simple types. This restores their
behavior back to something like their EVT legalization.
We might be able to fix the code in type legalization where the
assert was failing, but I didn't investigate too much as I had
already looked at the computeRegisterProperties code during the
review for v3i16/v3f16.
Most of the test changes restore the X86 codegen back to what
it looked like before the recent change. The test case in
vec_setcc.ll and is a reduced version of the reproducer from
the fuzzer.
Fixes https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=16490
llvm-svn: 369205
AMDGPU has some buffer intrinsics which theoretically could use
this. Some of the generated tables include the 3 and 4 element vector
versions of these rounded to 64-bits, which is ambiguous. Add these to
help the table disambiguate these.
Assertion change is for the path odd sized vectors now take for R600.
v3i16 is widened to v4i16, which then needs to be promoted to v4i32.
llvm-svn: 369038
As discussed on D62910, we need to check whether particular types of memory access are allowed, not just their alignment/address-space.
This NFC patch adds a MachineMemOperand::Flags argument to allowsMemoryAccess and allowsMisalignedMemoryAccesses, and wires up calls to pass the relevant flags to them.
If people are happy with this approach I can then update X86TargetLowering::allowsMisalignedMemoryAccesses to handle misaligned NT load/stores.
Differential Revision: https://reviews.llvm.org/D63075
llvm-svn: 363179
As suggested by @arsenm on D63075 - this adds a TargetLowering::allowsMemoryAccess wrapper that takes a Load/Store node's MachineMemOperand to handle the AddressSpace/Alignment arguments and will also implicitly handle the MachineMemOperand::Flags change in D63075.
llvm-svn: 363048
The ISD::STRICT_ nodes used to implement the constrained floating-point
intrinsics are currently never passed to the target back-end, which makes
it impossible to handle them correctly (e.g. mark instructions are depending
on a floating-point status and control register, or mark instructions as
possibly trapping).
This patch allows the target to use setOperationAction to switch the action
on ISD::STRICT_ nodes to Legal. If this is done, the SelectionDAG common code
will stop converting the STRICT nodes to regular floating-point nodes, but
instead pass the STRICT nodes to the target using normal SelectionDAG
matching rules.
To avoid having the back-end duplicate all the floating-point instruction
patterns to handle both strict and non-strict variants, we make the MI
codegen explicitly aware of the floating-point exceptions by introducing
two new concepts:
- A new MCID flag "mayRaiseFPException" that the target should set on any
instruction that possibly can raise FP exception according to the
architecture definition.
- A new MI flag FPExcept that CodeGen/SelectionDAG will set on any MI
instruction resulting from expansion of any constrained FP intrinsic.
Any MI instruction that is *both* marked as mayRaiseFPException *and*
FPExcept then needs to be considered as raising exceptions by MI-level
codegen (e.g. scheduling).
Setting those two new flags is straightforward. The mayRaiseFPException
flag is simply set via TableGen by marking all relevant instruction
patterns in the .td files.
The FPExcept flag is set in SDNodeFlags when creating the STRICT_ nodes
in the SelectionDAG, and gets inherited in the MachineSDNode nodes created
from it during instruction selection. The flag is then transfered to an
MIFlag when creating the MI from the MachineSDNode. This is handled just
like fast-math flags like no-nans are handled today.
This patch includes both common code changes required to implement the
new features, and the SystemZ implementation.
Reviewed By: andrew.w.kaylor
Differential Revision: https://reviews.llvm.org/D55506
llvm-svn: 362663
This patch add the ISD::LRINT and ISD::LLRINT along with new
intrinsics. The changes are straightforward as for other
floating-point rounding functions, with just some adjustments
required to handle the return value being an interger.
The idea is to optimize lrint/llrint generation for AArch64
in a subsequent patch. Current semantic is just route it to libm
symbol.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D62017
llvm-svn: 361875
Add an intrinsic that takes 2 signed integers with the scale of them provided
as the third argument and performs fixed point multiplication on them. The
result is saturated and clamped between the largest and smallest representable
values of the first 2 operands.
This is a part of implementing fixed point arithmetic in clang where some of
the more complex operations will be implemented as intrinsics.
Differential Revision: https://reviews.llvm.org/D55720
llvm-svn: 361289
This patch add the ISD::LROUND and ISD::LLROUND along with new
intrinsics. The changes are straightforward as for other
floating-point rounding functions, with just some adjustments
required to handle the return value being an interger.
The idea is to optimize lround/llround generation for AArch64
in a subsequent patch. Current semantic is just route it to libm
symbol.
llvm-svn: 360889
We don't have FP exception limits in the IR constant folder for the binops (apart from strict ops),
so it does not make sense to have them here in the DAG either. Nothing else in the backend tries
to preserve exceptions (again outside of strict ops), so I don't see how this could have ever
worked for real code that cares about FP exceptions.
There are still cases (examples: unary opcodes in SDAG, FMA in IR) where we are trying (at least
partially) to preserve exceptions without even asking if the target supports FP exceptions. Those
should be corrected in subsequent patches.
Real support for FP exceptions requires several changes to handle the constrained/strict FP ops.
Differential Revision: https://reviews.llvm.org/D61331
llvm-svn: 359791
The existing statepoint lowering code does something odd; it adds machine memory operands post instruction selection. This was copied from the stackmap/patchpoint implementation, but appears to be non-idiomatic.
This change is largely NFC. It moves the MMO creation logic into SelectionDAG building. It ends up not quite being NFC because the size of the stack slot is reflected in the MMO. The old code blindly used pointer size for the MMO size, which appears to have always been incorrect for larger values. It just happened nothing actually relied on the MMOs, so it worked out okay.
For context, I'm planning on removing the MOVolatile flag from these in a future commit, and then removing the MOStore flag from deopt spill slots in a separate one. Doing so is motivated by a small test case where we should be able to better schedule spill slots, but don't do so due to a memory use/def implied by the statepoint.
Differential Revision: https://reviews.llvm.org/D59106
llvm-svn: 355953
Fixes https://bugs.llvm.org/show_bug.cgi?id=36796.
Implement basic legalizations (PromoteIntRes, PromoteIntOp,
ExpandIntRes, ScalarizeVecOp, WidenVecOp) for VECREDUCE opcodes.
There are more legalizations missing (esp float legalizations),
but there's no way to test them right now, so I'm not adding them.
This also includes a few more changes to make this work somewhat
reasonably:
* Add support for expanding VECREDUCE in SDAG. Usually
experimental.vector.reduce is expanded prior to codegen, but if the
target does have native vector reduce, it may of course still be
necessary to expand due to legalization issues. This uses a shuffle
reduction if possible, followed by a naive scalar reduction.
* Allow the result type of integer VECREDUCE to be larger than the
vector element type. For example we need to be able to reduce a v8i8
into an (nominally) i32 result type on AArch64.
* Use the vector operand type rather than the scalar result type to
determine the action, so we can control exactly which vector types are
supported. Also change the legalize vector op code to handle
operations that only have vector operands, but no vector results, as
is the case for VECREDUCE.
* Default VECREDUCE to Expand. On AArch64 (only target using VECREDUCE),
explicitly specify for which vector types the reductions are supported.
This does not handle anything related to VECREDUCE_STRICT_*.
Differential Revision: https://reviews.llvm.org/D58015
llvm-svn: 355860
This patch accompanies the RFC posted here:
http://lists.llvm.org/pipermail/llvm-dev/2018-October/127239.html
This patch adds a new CallBr IR instruction to support asm-goto
inline assembly like gcc as used by the linux kernel. This
instruction is both a call instruction and a terminator
instruction with multiple successors. Only inline assembly
usage is supported today.
This also adds a new INLINEASM_BR opcode to SelectionDAG and
MachineIR to represent an INLINEASM block that is also
considered a terminator instruction.
There will likely be more bug fixes and optimizations to follow
this, but we felt it had reached a point where we would like to
switch to an incremental development model.
Patch by Craig Topper, Alexander Ivchenko, Mikhail Dvoretckii
Differential Revision: https://reviews.llvm.org/D53765
llvm-svn: 353563
Add an intrinsic that takes 2 unsigned integers with the scale of them
provided as the third argument and performs fixed point multiplication on
them.
This is a part of implementing fixed point arithmetic in clang where some of
the more complex operations will be implemented as intrinsics.
Differential Revision: https://reviews.llvm.org/D55625
llvm-svn: 353059
This cleans up all CallInst creation in LLVM to explicitly pass a
function type rather than deriving it from the pointer's element-type.
Differential Revision: https://reviews.llvm.org/D57170
llvm-svn: 352909
Recommit r352791 after tweaking DerivedTypes.h slightly, so that gcc
doesn't choke on it, hopefully.
Original Message:
The FunctionCallee type is effectively a {FunctionType*,Value*} pair,
and is a useful convenience to enable code to continue passing the
result of getOrInsertFunction() through to EmitCall, even once pointer
types lose their pointee-type.
Then:
- update the CallInst/InvokeInst instruction creation functions to
take a Callee,
- modify getOrInsertFunction to return FunctionCallee, and
- update all callers appropriately.
One area of particular note is the change to the sanitizer
code. Previously, they had been casting the result of
`getOrInsertFunction` to a `Function*` via
`checkSanitizerInterfaceFunction`, and storing that. That would report
an error if someone had already inserted a function declaraction with
a mismatching signature.
However, in general, LLVM allows for such mismatches, as
`getOrInsertFunction` will automatically insert a bitcast if
needed. As part of this cleanup, cause the sanitizer code to do the
same. (It will call its functions using the expected signature,
however they may have been declared.)
Finally, in a small number of locations, callers of
`getOrInsertFunction` actually were expecting/requiring that a brand
new function was being created. In such cases, I've switched them to
Function::Create instead.
Differential Revision: https://reviews.llvm.org/D57315
llvm-svn: 352827
This reverts commit f47d6b38c7 (r352791).
Seems to run into compilation failures with GCC (but not clang, where
I tested it). Reverting while I investigate.
llvm-svn: 352800
The FunctionCallee type is effectively a {FunctionType*,Value*} pair,
and is a useful convenience to enable code to continue passing the
result of getOrInsertFunction() through to EmitCall, even once pointer
types lose their pointee-type.
Then:
- update the CallInst/InvokeInst instruction creation functions to
take a Callee,
- modify getOrInsertFunction to return FunctionCallee, and
- update all callers appropriately.
One area of particular note is the change to the sanitizer
code. Previously, they had been casting the result of
`getOrInsertFunction` to a `Function*` via
`checkSanitizerInterfaceFunction`, and storing that. That would report
an error if someone had already inserted a function declaraction with
a mismatching signature.
However, in general, LLVM allows for such mismatches, as
`getOrInsertFunction` will automatically insert a bitcast if
needed. As part of this cleanup, cause the sanitizer code to do the
same. (It will call its functions using the expected signature,
however they may have been declared.)
Finally, in a small number of locations, callers of
`getOrInsertFunction` actually were expecting/requiring that a brand
new function was being created. In such cases, I've switched them to
Function::Create instead.
Differential Revision: https://reviews.llvm.org/D57315
llvm-svn: 352791
to reflect the new license.
We understand that people may be surprised that we're moving the header
entirely to discuss the new license. We checked this carefully with the
Foundation's lawyer and we believe this is the correct approach.
Essentially, all code in the project is now made available by the LLVM
project under our new license, so you will see that the license headers
include that license only. Some of our contributors have contributed
code under our old license, and accordingly, we have retained a copy of
our old license notice in the top-level files in each project and
repository.
llvm-svn: 351636
Add an intrinsic that takes 2 signed integers with the scale of them provided
as the third argument and performs fixed point multiplication on them.
This is a part of implementing fixed point arithmetic in clang where some of
the more complex operations will be implemented as intrinsics.
Differential Revision: https://reviews.llvm.org/D54719
llvm-svn: 348912
This is an initial patch to add a minimum level of support for funnel shifts to the SelectionDAG and to begin wiring it up to the X86 SHLD/SHRD instructions.
Some partial legalization code has been added to handle the case for 'SlowSHLD' where we want to expand instead and I've added a few DAG combines so we don't get regressions from the existing DAG builder expansion code.
Differential Revision: https://reviews.llvm.org/D54698
llvm-svn: 348353
The IEEE-754 Standard makes it clear that fneg(x) and
fsub(-0.0, x) are two different operations. The former is a bitwise
operation, while the latter is an arithmetic operation. This patch
creates a dedicated FNeg IR Instruction to model that behavior.
Differential Revision: https://reviews.llvm.org/D53877
llvm-svn: 346774
Add an intrinsic that takes 2 integers and perform saturation subtraction on
them.
This is a part of implementing fixed point arithmetic in clang where some of
the more complex operations will be implemented as intrinsics.
Differential Revision: https://reviews.llvm.org/D53783
llvm-svn: 345512
Summary:
Changes all uses of minnan/maxnan to minimum/maximum
globally. These names emphasize that the semantic difference between
these operations is more than just NaN-propagation.
Reviewers: arsenm, aheejin, dschuff, javed.absar
Subscribers: jholewinski, sdardis, wdng, sbc100, jgravelle-google, jrtc27, atanasyan, llvm-commits
Differential Revision: https://reviews.llvm.org/D53112
llvm-svn: 345218
Add an intrinsic that takes 2 integers and perform unsigned saturation
addition on them.
This is a part of implementing fixed point arithmetic in clang where some of
the more complex operations will be implemented as intrinsics.
Differential Revision: https://reviews.llvm.org/D53340
llvm-svn: 344971
Introduce new versions that follow the IEEE semantics
to help with legalization that may need quieted inputs.
There are some regressions from inserting unnecessary
canonicalizes when these are matched from fast math
fcmp + select which should be fixed in a future commit.
llvm-svn: 344914
Add an intrinsic that takes 2 integers and perform saturation addition on them.
This is a part of implementing fixed point arithmetic in clang where some of
the more complex operations will be implemented as intrinsics.
Differential Revision: https://reviews.llvm.org/D53053
llvm-svn: 344629
Since Android API version 9 the Android libm has had the sincos functions, so
they should be recognised as libcalls and sincos optimisation should be applied.
Differential Revision: https://reviews.llvm.org/D52025
llvm-svn: 342471
This is a follow-up suggested in D51630 and originally proposed as an IR transform in D49040.
Copying the motivational statement by @evandro from that patch:
"This transformation helps some benchmarks in SPEC CPU2000 and CPU2006, such as 188.ammp,
447.dealII, 453.povray, and especially 300.twolf, as well as some proprietary benchmarks.
Otherwise, no regressions on x86-64 or A64."
I'm proposing to add only the minimum support for a DAG node here. Since we don't have an
LLVM IR intrinsic for cbrt, and there are no other DAG ways to create a FCBRT node yet, I
don't think we need to worry about DAG builder, legalization, a strict variant, etc. We
should be able to expand as needed when adding more functionality/transforms. For reference,
these are transform suggestions currently listed in SimplifyLibCalls.cpp:
// * cbrt(expN(X)) -> expN(x/3)
// * cbrt(sqrt(x)) -> pow(x,1/6)
// * cbrt(cbrt(x)) -> pow(x,1/9)
Also, given that we bail out on long double for now, there should not be any logical
differences between platforms (unless there's some platform out there that has pow()
but not cbrt()).
Differential Revision: https://reviews.llvm.org/D51753
llvm-svn: 342348
a generically extensible collection of extra info attached to
a `MachineInstr`.
The primary change here is cleaning up the APIs used for setting and
manipulating the `MachineMemOperand` pointer arrays so chat we can
change how they are allocated.
Then we introduce an extra info object that using the trailing object
pattern to attach some number of MMOs but also other extra info. The
design of this is specifically so that this extra info has a fixed
necessary cost (the header tracking what extra info is included) and
everything else can be tail allocated. This pattern works especially
well with a `BumpPtrAllocator` which we use here.
I've also added the basic scaffolding for putting interesting pointers
into this, namely pre- and post-instruction symbols. These aren't used
anywhere yet, they're just there to ensure I've actually gotten the data
structure types correct. I'll flesh out support for these in
a subsequent patch (MIR dumping, parsing, the works).
Finally, I've included an optimization where we store any single pointer
inline in the `MachineInstr` to avoid the allocation overhead. This is
expected to be the overwhelmingly most common case and so should avoid
any memory usage growth due to slightly less clever / dense allocation
when dealing with >1 MMO. This did require several ergonomic
improvements to the `PointerSumType` to reasonably support the various
usage models.
This also has a side effect of freeing up 8 bits within the
`MachineInstr` which could be repurposed for something else.
The suggested direction here came largely from Hal Finkel. I hope it was
worth it. ;] It does hopefully clear a path for subsequent extensions
w/o nearly as much leg work. Lots of thanks to Reid and Justin for
careful reviews and ideas about how to do all of this.
Differential Revision: https://reviews.llvm.org/D50701
llvm-svn: 339940
When building with LTO, builtin functions that are defined but whose calls have not been inserted yet, get internalized. The Global Dead Code Elimination phase in the new LTO implementation then removes these function definitions. Later optimizations add calls to those functions, and the linker then dies complaining that there are no definitions. This CL fixes the new LTO implementation to check if a function is builtin, and if so, to not internalize (and later DCE) the function. As part of this fix I needed to move the RuntimeLibcalls.{def,h} files from the CodeGen subidrectory to the IR subdirectory. I have updated all the files that accessed those two files to access their new location.
Fixes PR34169
Patch by Caroline Tice!
Differential Revision: https://reviews.llvm.org/D49434
llvm-svn: 337847
Summary:
They've been deprecated in favor of UADDO/ADDCARRY or USUBO/SUBCARRY for a while.
Target that uses these opcodes are changed in order to ensure their behavior doesn't change.
Reviewers: efriedma, craig.topper, dblaikie, bkramer
Subscribers: jholewinski, arsenm, jyknight, sdardis, nemanjai, nhaehnle, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, jordy.potman.lists, apazos, sabuasal, niosHD, jrtc27, zzheng, edward-jones, mgrang, atanasyan, llvm-commits
Differential Revision: https://reviews.llvm.org/D47422
llvm-svn: 333748
Keep loads and stores together (target defines how many loads
and stores to gang up), such that it will help in pairing
and vectorization.
Differential Revision https://reviews.llvm.org/D46477
llvm-svn: 332482
In certain cases, the compiler might try to merge __stack_chk_guard with
another global variable. (Or someone could theoretically define
__stack_chk_guard as an alias.) In that case, make sure we don't crash.
Differential Revision: https://reviews.llvm.org/D45746
llvm-svn: 330495
Summary:
Add an LLVM intrinsic for type discriminated event logging with XRay.
Similar to the existing intrinsic for custom events, but also accepts
a type tag argument to allow plugins to be aware of different types
and semantically interpret logged events they know about without
choking on those they don't.
Relies on a symbol defined in compiler-rt patch D43668. I may wait
to submit before I can see demo everything working together including
a still to come clang patch.
Reviewers: dberris, pelikan, eizan, rSerge, timshen
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D45633
llvm-svn: 330219
Currently EVT is in the IR layer only because of Function.cpp needing a very small piece of the functionality of EVT::getEVTString(). The rest of EVT is used in codegen making CodeGen a better place for it.
The previous code converted a Type* to EVT and then called getEVTString. This was only expected to handle the primitive types from Type*. Since there only a few primitive types, we can just print them as strings directly.
Differential Revision: https://reviews.llvm.org/D45017
llvm-svn: 328806
This is used by llvm tblgen as well as by LLVM Targets, so the only
common place is Support for now. (maybe we need another target for these
sorts of things - but for now I'm at least making them correct & we can
make them better if/when people have strong feelings)
llvm-svn: 328395
DAGCombiner and SimplifySetCC both use getPointerTy for shift amounts pre-legalization. DAGCombiner uses a single helper function to hide this. SimplifySetCC does it in multiple places.
This patch adds a defaulted parameter to getShiftAmountTy that can make it return getPointerTy for scalar types. Use this parameter to simplify the SimplifySetCC and DAGCombiner.
Additionally, there were two places in SimplifySetCC that were creating shifts using the target's preferred shift amount pre-legalization. If the target uses a narrow type and the type is illegal, this can cause SimplfiySetCC to create a shift with an amount that can't represent all possible shift values for the type. To fix this we should use pointer type there too.
Alternatively we could make getScalarShiftAmountTy for each target return a safe value for large types as proposed in D43445. And maybe we should still do that, but fixing the SimplifySetCC code keeps other targets from tripping over this in the future.
Fixes PR36250.
Differential Revision: https://reviews.llvm.org/D43449
llvm-svn: 325602
Summary:
This change expands the amount of registers stashed by the entry and
`__xray_CustomEvent` trampolines.
We've found that since the `__xray_CustomEvent` trampoline calls can show up in
situations where the scratch registers are being used, and since we don't
typically want to affect the code-gen around the disabled
`__xray_customevent(...)` intrinsic calls, that we need to save and restore the
state of even the scratch registers in the handling of these custom events.
Reviewers: pcc, pelikan, dblaikie, eizan, kpw, echristo, chandlerc
Reviewed By: echristo
Subscribers: chandlerc, echristo, hiraditya, davide, dblaikie, llvm-commits
Differential Revision: https://reviews.llvm.org/D40894
llvm-svn: 323940
Make sure I really get back to the beahvior before my rewrite in r321035
which turned out not to be completely NFC as I changed the behavior for
the ios simulator environment.
llvm-svn: 322223
Another followup to my refactoring in r321036: Turns out we can end up
with an x86 darwin target that is not macos (simulator triples can look
like i386-apple-ios) so we need the x86/32bit check in all cases.
llvm-svn: 321104
I missed some prefixes and the fact that on AArch64 we use "bzero"
instead of "__bzero" as on X86 when doing my refactoring in r321035.
Improve tests for bzero.
llvm-svn: 321046
I missed the fact that the later called InitLibcallCallingConvs()
overrides some things set in InitLibcalls() when I did the refactoring
in r321036.
Fix by merging InitLibcallCallingConvs() into InitLibcalls() and doing
the initialization earlier.
llvm-svn: 321045
Note:
- X86ISelLowering: setLibcallName(SINCOS) was superfluous as
InitLibcalls() already does it.
- ARMISelLowering: Setting libcallnames for sincos/sincosf seemed
superfluous as in the darwin case it wouldn't be used while for all
other cases InitLibcalls already does it.
llvm-svn: 321036
Summary:
This relaxes an assertion inside SelectionDAGBuilder which is overly
restrictive on targets which have no concept of alignment (such as AVR).
In these architectures, all types are aligned to 8-bits.
After this, LLVM will only assert that accesses are aligned on targets
which actually require alignment.
This patch follows from a discussion on llvm-dev a few months ago
http://llvm.1065342.n5.nabble.com/llvm-dev-Unaligned-atomic-load-store-td112815.html
Reviewers: bogner, nemanjai, joerg, efriedma
Reviewed By: efriedma
Subscribers: efriedma, cactus, llvm-commits
Differential Revision: https://reviews.llvm.org/D39946
llvm-svn: 320243
All these headers already depend on CodeGen headers so moving them into
CodeGen fixes the layering (since CodeGen depends on Target, not the
other way around).
llvm-svn: 318490
This patch makes LSR generate better code for SystemZ in the cases of memory
intrinsics, Load->Store pairs or comparison of immediate with memory.
In order to achieve this, the following common code changes were made:
* New TTI hook: LSRWithInstrQueries(), which defaults to false. Controls if
LSR should do instruction-based addressing evaluations by calling
isLegalAddressingMode() with the Instruction pointers.
* In LoopStrengthReduce: handle address operands of memset, memmove and memcpy
as address uses, and call isFoldableMemAccessOffset() for any LSRUse::Address,
not just loads or stores.
SystemZ changes:
* isLSRCostLess() implemented with Insns first, and without ImmCost.
* New function supportedAddressingMode() that is a helper for TTI methods
looking at Instructions passed via pointers.
Review: Ulrich Weigand, Quentin Colombet
https://reviews.llvm.org/D35262https://reviews.llvm.org/D35049
llvm-svn: 308729
This will allow eliminating the duplication of the names, and allow adding
extra information such as signatures in a future commit.
Differential Revision: https://reviews.llvm.org/D35522
llvm-svn: 308531
Summary: Continuing the work from https://reviews.llvm.org/D33240, this change introduces an element unordered-atomic memset intrinsic. This intrinsic is essentially memset with the implementation requirement that all stores used for the assignment are done with unordered-atomic stores of a given element size.
Reviewers: eli.friedman, reames, mkazantsev, skatkov
Reviewed By: reames
Subscribers: jfb, dschuff, sbc100, jgravelle-google, aheejin, efriedma, llvm-commits
Differential Revision: https://reviews.llvm.org/D34885
llvm-svn: 307854
Summary: Continuing the work from https://reviews.llvm.org/D33240, this change introduces an element unordered-atomic memmove intrinsic. This intrinsic is essentially memmove with the implementation requirement that all loads/stores used for the copy are done with unordered-atomic loads/stores of a given element size.
Reviewers: eli.friedman, reames, mkazantsev, skatkov
Reviewed By: reames
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D34884
llvm-svn: 307796
Summary:
Background: http://lists.llvm.org/pipermail/llvm-dev/2017-May/112779.html
This change is to alter the prototype for the atomic memcpy intrinsic. The prototype itself is being changed to more closely resemble the semantics and parameters of the llvm.memcpy intrinsic -- to ease later combination of the llvm.memcpy and atomic memcpy intrinsics. Furthermore, the name of the atomic memcpy intrinsic is being changed to make it clear that it is not a generic atomic memcpy, but specifically a memcpy is unordered atomic.
Reviewers: reames, sanjoy, efriedma
Reviewed By: reames
Subscribers: mzolotukhin, anna, llvm-commits, skatkov
Differential Revision: https://reviews.llvm.org/D33240
llvm-svn: 305558
By target hookifying getRegisterType, getNumRegisters, getVectorBreakdown,
backends can request that LLVM to scalarize vector types for calls
and returns.
The MIPS vector ABI requires that vector arguments and returns are passed in
integer registers. With SelectionDAG's new hooks, the MIPS backend can now
handle LLVM-IR with vector types in calls and returns. E.g.
'call @foo(<4 x i32> %4)'.
Previously these cases would be scalarized for the MIPS O32/N32/N64 ABI for
calls and returns if vector types were not legal. If vector types were legal,
a single 128bit vector argument would be assigned to a single 32 bit / 64 bit
integer register.
By teaching the MIPS backend to inspect the original types, it can now
implement the MIPS vector ABI which requires a particular method of
scalarizing vectors.
Previously, the MIPS backend relied on clang to scalarize types such as "call
@foo(<4 x float> %a) into "call @foo(i32 inreg %1, i32 inreg %2, i32 inreg %3,
i32 inreg %4)".
This patch enables the MIPS backend to take either form for vector types.
The previous version of this patch had a "conditional move or jump depends on
uninitialized value".
Reviewers: zoran.jovanovic, jaydeep, vkalintiris, slthakur
Differential Revision: https://reviews.llvm.org/D27845
llvm-svn: 305083
I did this a long time ago with a janky python script, but now
clang-format has built-in support for this. I fed clang-format every
line with a #include and let it re-sort things according to the precise
LLVM rules for include ordering baked into clang-format these days.
I've reverted a number of files where the results of sorting includes
isn't healthy. Either places where we have legacy code relying on
particular include ordering (where possible, I'll fix these separately)
or where we have particular formatting around #include lines that
I didn't want to disturb in this patch.
This patch is *entirely* mechanical. If you get merge conflicts or
anything, just ignore the changes in this patch and run clang-format
over your #include lines in the files.
Sorry for any noise here, but it is important to keep these things
stable. I was seeing an increasing number of patches with irrelevant
re-ordering of #include lines because clang-format was used. This patch
at least isolates that churn, makes it easy to skip when resolving
conflicts, and gets us to a clean baseline (again).
llvm-svn: 304787