This is recommit of the patch 16ff91ebcc,
reverted in 0c28a7c990 because it had
an error in call of getFastMathFlags (base type should be FPMathOperator
but not Instruction). The original commit message is duplicated below:
Clang has builtin function '__builtin_isnan', which implements C
library function 'isnan'. This function now is implemented entirely in
clang codegen, which expands the function into set of IR operations.
There are three mechanisms by which the expansion can be made.
* The most common mechanism is using an unordered comparison made by
instruction 'fcmp uno'. This simple solution is target-independent
and works well in most cases. It however is not suitable if floating
point exceptions are tracked. Corresponding IEEE 754 operation and C
function must never raise FP exception, even if the argument is a
signaling NaN. Compare instructions usually does not have such
property, they raise 'invalid' exception in such case. So this
mechanism is unsuitable when exception behavior is strict. In
particular it could result in unexpected trapping if argument is SNaN.
* Another solution was implemented in https://reviews.llvm.org/D95948.
It is used in the cases when raising FP exceptions by 'isnan' is not
allowed. This solution implements 'isnan' using integer operations.
It solves the problem of exceptions, but offers one solution for all
targets, however some can do the check in more efficient way.
* Solution implemented by https://reviews.llvm.org/D96568 introduced a
hook 'clang::TargetCodeGenInfo::testFPKind', which injects target
specific code into IR. Now only SystemZ implements this hook and it
generates a call to target specific intrinsic function.
Although these mechanisms allow to implement 'isnan' with enough
efficiency, expanding 'isnan' in clang has drawbacks:
* The operation 'isnan' is hidden behind generic integer operations or
target-specific intrinsics. It complicates analysis and can prevent
some optimizations.
* IR can be created by tools other than clang, in this case treatment
of 'isnan' has to be duplicated in that tool.
Another issue with the current implementation of 'isnan' comes from the
use of options '-ffast-math' or '-fno-honor-nans'. If such option is
specified, 'fcmp uno' may be optimized to 'false'. It is valid
optimization in general, but it results in 'isnan' always returning
'false'. For example, in some libc++ implementations the following code
returns 'false':
std::isnan(std::numeric_limits<float>::quiet_NaN())
The options '-ffast-math' and '-fno-honor-nans' imply that FP operation
operands are never NaNs. This assumption however should not be applied
to the functions that check FP number properties, including 'isnan'. If
such function returns expected result instead of actually making
checks, it becomes useless in many cases. The option '-ffast-math' is
often used for performance critical code, as it can speed up execution
by the expense of manual treatment of corner cases. If 'isnan' returns
assumed result, a user cannot use it in the manual treatment of NaNs
and has to invent replacements, like making the check using integer
operations. There is a discussion in https://reviews.llvm.org/D18513#387418,
which also expresses the opinion, that limitations imposed by
'-ffast-math' should be applied only to 'math' functions but not to
'tests'.
To overcome these drawbacks, this change introduces a new IR intrinsic
function 'llvm.isnan', which realizes the check as specified by IEEE-754
and C standards in target-agnostic way. During IR transformations it
does not undergo undesirable optimizations. It reaches instruction
selection, where is lowered in target-dependent way. The lowering can
vary depending on options like '-ffast-math' or '-ffp-model' so the
resulting code satisfies requested semantics.
Differential Revision: https://reviews.llvm.org/D104854
All information to fix-up the reduction phi nodes in the vectorized loop
is available in VPlan now. This patch moves the code to do so, to make
this clearer. Fixing up the loop exit value still relies on other
information and remains outside of VPlan for now.
Reviewed By: Ayal
Differential Revision: https://reviews.llvm.org/D100113
Avoiding absolute imports allows the code to be relocatable (which is used for out of tree integrations).
Differential Revision: https://reviews.llvm.org/D107617
Now the recursive functions may get specialized many times when
`func-specialization-max-iters` increases. See discussion in
https://reviews.llvm.org/D106426 for details.
CastOp::areCastCompatible does not check whether casts are definitely compatible.
When going from dynamic to static offset or stride, the canonicalization cannot
know whether it is really cast compatible. In that case, it can only canonicalize
to an alloc plus copy.
Differential Revision: https://reviews.llvm.org/D107545
I just hit a nasty bug when writing a unit test after calling MF->getFrameInfo()
without declaring the variable as a reference.
Deleting the copy-constructor also showed a place in the ARM backend which was
doing the same thing, albeit it didn't impact correctness there from the looks of it.
Include windows.h with an all lowercase filename; Windows SDK headers
aren't self consistent so they can't be used in an entirely
case sensitive setting, and mingw headers use all lowercase names
for such headers.
This fixes building after 881faf4190.
For a very large module, __llvm_gcov_reset can become very large.
__llvm_gcov_reset previously emitted stores to a bunch of globals in one
huge basic block. MemCpyOpt would turn many of these stores into
memsets, and updating MemorySSA would be extremely slow.
Verified that this makes the compile time of certain files go down
drastically (20min -> 5min).
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D107538
The intent of the negative #{{.*}} checks is to verify that the line
declaring/defining a function has no attribute, but they could restrict
later function declarations instead.
The 2008-09-02-FunctionNotes.ll check had allowed @fn3 to have an
attribute, because there is only a single "define void @fn3()" in the
output.
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D107614
This implements LanaiTargetLowering::CanLowerReturn, thereby ensuring
all return values conform to the RetCC and get sret-demoted as
necessary.
A regression test is also added that exercises this functionality.
Reviewed By: jpienaar
Differential Revision: https://reviews.llvm.org/D107086
Temporarily revert this patch to unbreak the bots/builds
until we can understand what was intended; is_pad() call
isn't defined.
This reverts commit 2b89f40a41.
On AVR, '.ctors' is used, not '.init_array'. Make this the default
unless specifically overridden by driver argument.
This matches gcc, and it matches the behavior in (e.g.) the NetBSD
driver (for certain OS variants).
Reviewed by: MaskRay
Differential Revision: https://reviews.llvm.org/D107610
Tested with gcc-10. Other compilers may generate additional warnings. This does not fix all warnings. There are a few extra ones in LLVMCore and MLIR.
* `OpEmitter::getAttrNameIndex`: -Wunused-function (function is private and not used anywhere)
* `PrintOpPass` copy constructor: -Wextra ("Base class should be explicitly initialized in the copy constructor")
* `LegalizeForLLVMExport.cpp`: -Woverflow (overflow is expected, silence warning by making the cast explicit)
Differential Revision: https://reviews.llvm.org/D107525
Similar cleanup to G_EXTRACT (51bd4e874f).
Also swap the order of clamp/widen to avoid unnecessary complex merges.
Add a bunch of missing testcases to legalize-inserts while we're at it.
Differential Revision: https://reviews.llvm.org/D107601
Similar to other cleanup commits which widen instructions before clamping
during legalization. Purpose of this is to avoid weird type breakdowns.
In terms of G_IMPLICIT_DEF, this simplifies legalization for other instructions.
The legalizer has to emit G_IMPLICIT_DEF to legalize certain instructions, so
this can help with emitting merges elsewhere.
Differential Revision: https://reviews.llvm.org/D107604
Fixes issue where late materialized constants can be more strictly
aligned then their containing csect.
Differential Revision: https://reviews.llvm.org/D103103
In some cases, like with inserts, we may have a matching size register already,
but still decide to try to look further. This change adds a CurrentBest
register to the value finder state, and any time a method fails to make progress,
returns that register (which may just be an empty Register).
To facilitate this, add a new entry point to the findValueFromDef() function
which initializes this state.
Also fix the build vector finder to return the current build_vector if all
sources are being requested.
Differential Revision: https://reviews.llvm.org/D107017
One performance issue happened in profile generation and it turned out the line 525 loop is the bottleneck.
Moving the code outside of loop scope can fix this issue. The run time is improved from 30+mins to ~30s.
Reviewed By: hoy, wenlei
Differential Revision: https://reviews.llvm.org/D107529
This patch adds a new method SubSurface to the Surface class. The method
returns another surface that is a subset of this surface. This is
important to further abstract away drawing from the ncurses objects. For
instance, fields could previously be drawn on subpads only but can now
be drawn on any surface. This is needed to create the file search
dialogs and similar functionalities.
There is an opportunity to refactor window drawing in general using
surfaces, but we shall consider this separately later.
Reviewed By: clayborg
Differential Revision: https://reviews.llvm.org/D107182
Before D45736, getc_unlocked was available by default, but turned off
for non-Cygwin/non-MinGW Windows. D45736 then added 9 more unlocked
functions, which were unavailable by default, but it also:
* left getc_unlocked enabled by default,
* removed the disabling line for Windows, and
* added code to enable getc_unlocked for GNU, Android, and OSX.
For consistency, make getc_unlocked unavailable by default. Maybe this
was the intent of D45736 anyway.
Reviewed By: MaskRay, efriedma
Differential Revision: https://reviews.llvm.org/D107527
Using REG_SEQUENCE produces better code than INSERT_SUBREG,
we can omit one move instruction in many cases.
Fixes: SWDEV-298028
Differential Revision: https://reviews.llvm.org/D107602
When there is a `setjmp` call in a function, we transform every callsite
of `setjmp` to record its information by calling `saveSetjmp` function,
and we also transform every callsite of a function that can longjmp to
to check if a longjmp occurred and if so jump to the corresponding
post-setjmp BB. Currently we are doing this for every function that
contains a call to `setjmp`, but if there is no other function call
within that function that can longjmp, this transformation of `setjmp`
callsite and all the preparation of `setjmpTable` in the entry of the
function are not necessary.
This checks if a setjmp-calling function has any other calls that can
longjmp, and if not, skips the function for the SjLj transformation.
Reviewed By: dschuff
Differential Revision: https://reviews.llvm.org/D107530
This takes the existing SVE costing for the various min/max reduction
intrinsics and expands it to NEON, where I believe it applies equally
well.
In the process it changes the lowering to use min/max cost, as opposed
to summing up the cost of ICmp+Select.
Differential Revision: https://reviews.llvm.org/D106239
`__alignof__(x)` always returns `ABIAlign` if the "x" is marked `__attribute__((aligned()))`. However, the "aligned" attribute should only increase the alignment of a struct, or struct member, unless it's used together with the "packed" attribute, or used as a part of a typedef, in which case, the "aligned" attribute can both increase and decrease alignment.
Reviewed By: sfertile
Differential Revision: https://reviews.llvm.org/D107598
This allows us to avoid odd type breakdowns + allows us to legalize types like
s88 in the first place.
Add some testcases for known legal types + testcases for s4 and s88.
Differential Revision: https://reviews.llvm.org/D107607
That seems this test does not check what was stated in the
comment anymore. Just switch to generated checks.
Differential Revision: https://reviews.llvm.org/D107590
This simplifies our existing G_EXTRACT rules and adds some test coverage. Mostly
changing this because it should make it easier to improve legalization for
instructions which use G_EXTRACT as part of the legalization process.
This also adds support for legalizing some weird types. Similar to other recent
legalizer changes, this changes the order of widening/clamping.
There was some dead code in our existing rules (e.g. the p0 case would never get
hit), so this knocks those out and makes the types we want to handle explicit.
This also removes some checks which, nowadays, are handled by the
MachineVerifier.
Differential Revision: https://reviews.llvm.org/D107505
Deduplicate some code and add an additional test to verify that the
sprintf->stpcpy optimization still works on android21 (which properly
supports it).
This follows up 5848166369.
Differential Revision: https://reviews.llvm.org/D107526
They used to be referenced from the .xcodeproj files, but those are long gone.
No behavior change.
Differential Revision: https://reviews.llvm.org/D107444
Add unittests for IslMaxOperationsGuard and the behaviour of the isl-noexception.h wrapper under exceeded max_operations.
Reviewed By: patacca
Differential Revision: https://reviews.llvm.org/D107401
The foreach callback wrappers tests check the return values of isl::stat:ok() and isl::stat::error() separately. However, due to the the container they are iterating over containing just one element, they are actually not testing the difference between them.
This patch changes to set to be iterated over to contain 2 element to make returning sl::stat:ok (continue iterating the next element) and isl::stat::error (break after current element) have different effects other than the return value of the foreach itself.
Reviewed By: patacca
Differential Revision: https://reviews.llvm.org/D107395