All the windows bots are failing match-tree.td and there's no obvious cause that
I can see. It's not just the %p formatting problem. My best guess is that
there's an ordering issue too but I'll need further information to figure that
out. Revert while I'm investigating.
This reverts commit 64f1bb5cd2 and 77d4b5f5fe
Static archives contain object files which contain sections pointing to
external remark files.
When static archives are shipped without the remark files, dsymutil
shouldn't generate an error.
Instead, generate a warning to inform the user that remarks for that
library won't be available in the .dSYM.
Summary:
GIMatchTree's job is to build a decision tree by zipping all the
GIMatchDag's together.
Each DAG is added to the tree builder as a leaf and partitioners are used
to subdivide each node until there are no more partitioners to apply. At
this point, the code generator is responsible for testing any untested
predicates and following any unvisited traversals (there shouldn't be any
of the latter as the getVRegDef partitioner handles them all).
Note that the leaves don't always fit into partitions cleanly and the
partitions may overlap as a result. This is resolved by cloning the leaf
into every partition it belongs to. One example of this is a rule that can
match one of N opcodes. The leaf for this rule would end up in N partitions
when processed by the opcode partitioner. A similar example is the
getVRegDef partitioner where having rules (add $a, $b), and (add ($a, $b), $c)
will result in the former being in the partition for successfully
following the vreg-def and failing to do so as it doesn't care which
happens.
Depends on D69151
Reviewers: bogner, volkan
Reviewed By: volkan
Subscribers: lkail, mgorny, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D69152
This uses an alternative implementation of this conversion derived
from our v2i32->v2f32 handling. We can zero extend the v2i32 to
v2i64, or it with the bit representation of 2.0^52 which will give
us 2.0^52 plus the 32-bit integer since double's mantissa is 52 bits.
Then we just need to subtract 2.0^52 as a double and let the floating
point unit normalize the remaining bits into a valid double.
This is less instructions then our previous code, but does require
a port 5 shuffle for the zero extend or unpack.
Differential Revision: https://reviews.llvm.org/D71945
When the "disable-tail-calls" attribute was added, checks were added for
it in various backends. Now this code has proliferated, and it is
something the target is responsible for checking. Move that
responsibility back to the ISels (fast, global, and SD).
There's no major functionality change, except for targets that never
implemented this check.
This LLVM attribute was originally added in
d9699bc7bd (2015).
Reviewers: echristo, MaskRay
Differential Revision: https://reviews.llvm.org/D72118
Name: (X & (- Y)) - X -> - (X & (Y - 1)) (PR44448)
%negy = sub i8 0, %y
%unbiasedx = and i8 %negy, %x
%r = sub i8 %unbiasedx, %x
=>
%ymask = add i8 %y, -1
%xmasked = and i8 %ymask, %x
%r = sub i8 0, %xmasked
https://rise4fun.com/Alive/OIpla
This decreases use count of %x, may allow us to
later hoist said negation even further,
and results in marginally nicer X86 codegen.
See
https://bugs.llvm.org/show_bug.cgi?id=44448https://reviews.llvm.org/D71499
If we replace a function with a new one because we rewrite the
signature, dead users may still refer to the old version. With this
patch we reuse the code that deals with dead functions, which the old
versions are, to avoid problems.
An inbounds GEP results in poison if the value is not "inbounds", not in
UB. We accidentally derived nonnull and dereferenceable from these
inbounds GEPs even in the absence of accesses that would make the poison
to UB.
The fold 'A - (A & (B - 1))' -> 'A & (0 - B)'
added in 8dab0a4a7d
is too specific. It should/can just be 'A - (A & B)' -> 'A & (~B)'
Even if we don't manage to fold `~` into B,
we have likely formed `ANDN` node.
Also, this way there's less similar-but-duplicate folds.
Name: X - (X & Y) -> X & (~Y)
%o = and i32 %X, %Y
%r = sub i32 %X, %o
=>
%n = xor i32 %Y, -1
%r = and i32 %X, %n
https://rise4fun.com/Alive/kOUl
See
https://bugs.llvm.org/show_bug.cgi?id=44448https://reviews.llvm.org/D71499
The fold 'A - (A & (B - 1))' -> 'A & (0 - B)'
added in 8dab0a4a7d
is too specific. It should just be 'A - (A & B)' -> 'A & (~B)',
but we currently fail to sink that '~' into `(B - 1)`.
Name: ~(X - 1) -> (0 - X)
%o = add i32 %X, -1
%r = xor i32 %o, -1
=>
%r = sub i32 0, %X
https://rise4fun.com/Alive/rjU
The fold 'A - (A & (B - 1))' -> 'A & (0 - B)'
added in 8dab0a4a7d
is too specific. It should just be 'A - (A & B)' -> 'A & (~B)',
but we currently fail to sink that '~' into `(B - 1)`.
Name: ~(X - 1) -> (0 - X)
%o = add i32 %X, -1
%r = xor i32 %o, -1
=>
%r = sub i32 0, %X
https://rise4fun.com/Alive/rjU
While we do manage to fold integer-typed IR in middle-end,
we can't do that for the main motivational case of pointers.
There is @llvm.ptrmask() intrinsic which may or may not be helpful,
but i'm not sure it is fully considered canonical yet,
not everything is fully aware of it likely.
Name: PR44448 ptr - (ptr & C) -> ptr & (~C)
%bias = and i32 %ptr, C
%r = sub i32 %ptr, %bias
=>
%r = and i32 %ptr, ~C
See
https://bugs.llvm.org/show_bug.cgi?id=44448https://reviews.llvm.org/D71499
Name: PR44448 ptr - (ptr & C) -> ptr & (~C)
%bias = and i32 %ptr, C
%r = sub i32 %ptr, %bias
=>
%r = and i32 %ptr, ~C
The main motivational pattern involes pointer-typed values,
so this transform can't really be done in middle-end.
See
https://bugs.llvm.org/show_bug.cgi?id=44448https://reviews.llvm.org/D71499
The patch makes sure that the LastThrowing pointer does not point to any instruction deleted by call to DeleteDeadInstruction.
While iterating through the instructions the pass maintains a pointer to the lastThrowing Instruction. A call to deleteDeadInstruction deletes a dead store and other instructions feeding the original dead instruction which also become dead. The instruction pointed by the lastThrowing pointer could also be deleted by the call to DeleteDeadInstruction and thus it becomes a dangling pointer. Because of this, we see an error in the next iteration.
In the patch, we maintain a list of throwing instructions encountered previously and use the last non deleted throwing instruction from the container.
Reviewers: fhahn, bcahoon, efriedma
Reviewed By: fhahn
Differential Revision: https://reviews.llvm.org/D65326
As shown in P44383:
https://bugs.llvm.org/show_bug.cgi?id=44383
...we can't safely propagate a vector constant through this icmp fold
if that vector constant contains undefined elements.
We know that each defined element of the constant is safe though, so
find the first of those and replicate it into the formerly undef lanes.
Differential Revision: https://reviews.llvm.org/D72101
The V5 directory and filename tables had checks in to make sure we
hadn't read past the end of the line table prologue. Since previous
changes to the data extractor class ensure we never read past the end,
these checks are now redundant, so this patch removes them.
There is still a check to show that the whole prologue remains within
the prologue length.
Reviewed By: JDevlieghere
Differential Revision: https://reviews.llvm.org/D71768
This removes the need to duplicate the LASTONLY check pattern and the
last part of the NONFATAL pattern in the modified test.
Reviewed By: MaskRay, JDevlieghere
Differential Revision: https://reviews.llvm.org/D71757
The line tables in debug_line_malformed.s had contents that varied more
than was necessary for the testing, making it harder to follow what was
important. This patch normalises them so that they all share
more-or-less the same body. Additionally, it makes the testing for what
was printed more consistent, to show that the right parts of the line
table prologue and body are/are not parsed and printed.
Reviewed By: JDevlieghere
Differential Revision: https://reviews.llvm.org/D71755
Some of the tables in debug_line_malformed.s were not being checked in
the NONFATAL checks in debug_line_invalid.test (only the warnings coming
from them were being checked). This made the test harder to follow.
Additionally, a later change will change the way the errors are handled
such that more of the line table will be printed. That will require
checks for these tables (or something equivalent) so that the difference
in behaviour can be observed. This patch adds checks for the three
tables that were missing checks.
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D71753
While we do manage to fold integer-typed IR in middle-end,
we can't do that for the main motivational case of pointers.
There is @llvm.ptrmask() intrinsic which may or may not be helpful,
but i'm not sure it is fully considered canonical yet,
not everything is fully aware of it likely.
https://rise4fun.com/Alive/ZVdp
Name: ptr - (ptr & (alignment-1)) -> ptr & (0 - alignment)
%mask = add i64 %alignment, -1
%bias = and i64 %ptr, %mask
%r = sub i64 %ptr, %bias
=>
%highbitmask = sub i64 0, %alignment
%r = and i64 %ptr, %highbitmask
See
https://bugs.llvm.org/show_bug.cgi?id=44448https://reviews.llvm.org/D71499
Summary: This patch is related to https://bugs.llvm.org/show_bug.cgi?id=42967 and it fixes llvm-size's sysv format output by adding a blank line between archieve members
Reviewers: jhenderson, Jim, MaskRay
Reviewed By: jhenderson, Jim, MaskRay
Subscribers: MaskRay, Jim, rupprecht, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D71957
Summary:
We previously disabled this under fast math due to aggressive
reassociation by the machine combiner. But I think we can work
around this by using a FSUB instead of FADD for the first
operation.
This matches the similar algorithm we do for uint_to_fp i64->f64
in TargetLowering::expandUINT_TO_FP. If reassociation hasn't
been a problem for that, hopefully its not a problem here.
Reviewers: RKSimon, spatel, scanon
Reviewed By: spatel
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D71968
For now, we didn't set the default operation action for SIGN_EXTEND_INREG for
vector type, which is 0 by default, that is legal. However, most target didn't
have native instructions to support this opcode. It should be set as expand by
default, as what we did for ANY_EXTEND_VECTOR_INREG.
Differential Revision: https://reviews.llvm.org/D70000
Summary:
After bugfix the undef value case here, we used more operations to implement inserting vxi1 sub vector into vXi1 vector, I optimize it by use less operations.
The history information at https://reviews.llvm.org/D68311
Reviewers: craig.topper, LuoYuanke, yubing, annita.zhang, pengfei, LiuChen3, RKSimon
Reviewed By: craig.topper
Subscribers: hiraditya, llvm-commits
Patch by Xiang Zhang (xiangzhangllvm)
Differential Revision: https://reviews.llvm.org/D71917
Commit 0f0330a787 legalized these nodes on PPC without consideration of
unsafe math which means that we get inexact exceptions raised for nearbyint.
Since this doesn't conform to the standard, switch this legalization to depend
on unsafe fp math.
Rather than handling zlib handling manually, use `find_package` from CMake
to find zlib properly. Use this to normalize the `LLVM_ENABLE_ZLIB`,
`HAVE_ZLIB`, `HAVE_ZLIB_H`. Furthermore, require zlib if `LLVM_ENABLE_ZLIB` is
set to `YES`, which requires the distributor to explicitly select whether
zlib is enabled or not. This simplifies the CMake handling and usage in
the rest of the tooling.
This restores 68a235d07f,
e6c7ed6d21. The problem with the windows
bot is a need for clearing the cache.
All FP0-6 operands should be removed by the FP stackifier. By
removing these we fix the machine verifier error in PR39437.
I've also made it so that only defs are counted for STReturns
which removes what I think were extra stack cleanup instructions.
And I've removed the regcall assert because it was checking the
attributes of the caller, but here we're concerned with the
attributes of the callee. But I don't know how to get that
information from this level.
On Windows hosts, the error message will be something like
`c:\src\llvm-project\out\gn\bin\llvm-ranlib.exe: error: Invalid option: '--D'`.
Due to the .exe after llvm-ranlib the existing CHECK lines do not match.
Fix this by ignoring the program name and starting the check line at "error:".
This patch adds and improves comments in the debug_line_invalid.test and
its associated input file so that it is easier to follow. It uses '##'
to make comments stand out from lit and FileCheck commands.
It also reflows some commands so that the lines are not so long and are
easier to read and fixes some copy/paste errors.
Reviewed by: JDevlieghere
Differential Revision: https://reviews.llvm.org/D71752
This reverts commit 68a235d07f.
This commit broke the clang-x64-windows-msvc build bot and a follow-up
commit did not fix it. Reverting to fix the bot.
The NoFPExcept bit in SDNodeFlags currently defaults to true, unlike all
other such flags. This is a problem, because it implies that all code that
transforms SDNodes without copying flags can introduce a correctness bug,
not just a missed optimization.
This patch changes the default to false. This makes it necessary to move
setting the (No)FPExcept flag for constrained intrinsics from the
visitConstrainedIntrinsic routine to the generic visit routine at the
place where the other flags are set, or else the intersectFlagsWith
call would erase the NoFPExcept flag again.
In order to avoid making non-strict FP code worse, whenever
SelectionDAGISel::SelectCodeCommon matches on a set of orignal nodes
none of which can raise FP exceptions, it will preserve this property
on all results nodes generated, by setting the NoFPExcept flag on
those result nodes that would otherwise be considered as raising
an FP exception.
To check whether or not an SD node should be considered as raising
an FP exception, the following logic applies:
- For machine nodes, check the mayRaiseFPException property of
the underlying MI instruction
- For regular nodes, check isStrictFPOpcode
- For target nodes, check a newly introduced isTargetStrictFPOpcode
The latter is implemented by reserving a range of target opcodes,
similarly to how memory opcodes are identified. (Note that there a
bit of a quirk in identifying target nodes that are both memory nodes
and strict FP nodes. To simplify the logic, right now all target memory
nodes are automatically also considered strict FP nodes -- this could
be fixed by adding one more range.)
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D71841
There's quite a lot of references to Polly in the LLVM CMake codebase. However
the registration pattern used by Polly could be useful to other external
projects: thanks to that mechanism it would be possible to develop LLVM
extension without touching the LLVM code base.
This patch has two effects:
1. Remove all code specific to Polly in the llvm/clang codebase, replaicing it
with a generic mechanism
2. Provide a generic mechanism to register compiler extensions.
A compiler extension is similar to a pass plugin, with the notable difference
that the compiler extension can be configured to be built dynamically (like
plugins) or statically (like regular passes).
As a result, people willing to add extra passes to clang/opt can do it using a
separate code repo, but still have their pass be linked in clang/opt as built-in
passes.
Differential Revision: https://reviews.llvm.org/D61446
It appears that Windows hosts always report rwxrwxrwx even with the
chmod 644 invocation. As this test only cares about the timestamps
and not the permissions, use a regex wildcard instead.
This is a less ambitious alternative to previous attempts to fix
this bug with:
rG56b2aee1875a
rGef02831f0a4e
rG56b2aee1875a
...because those all failed bot testing with use-after-free or
other problems.
The original crashing/assert problem is still showing up on
various fuzzers, so I've added a new minimal test based on
another one of those failures.
Instead of trying to manage and coordinate the logic in
isAllocSiteRemovable() with the deletion loops, just loosen
the existing code that handles casts and GEP by replacing
with undef to allow other opcodes. That means that no
instructions with uses should assert on deletion, and there
are hopefully no non-obvious sanitizer bugs induced.
The version string can be customized by CMake options, so the 'LLVM
version' substring is not guaranteed to appear (see
VersionPrinter::print in llvm/lib/Support/CommandLine.cpp).
Some of the instructions in these tests were technically invalid
combinations (using ARM opcodes in Thumb mode, for example). Update the
targets and the instructions used to be more correct.
Summary:
Currently 32 bit unpacked offsets are passed as nxv2i64. However, as
pointed out in https://reviews.llvm.org/D71074, using nxv2i32 instead
would improve consistency with:
* how other arguments are treated
* how scatter stores are implemented
This patch makes sure that 32 bit unpacked offsets are passes as nxv2i32
instead of nxv2i64.
Reviewers: sdesmalen, efriedma
Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D71724
I have been trying to build CheriBSD (a fork for FreeBSD for the CHERI
CPU) with LLVM binutils instead of the default elftoolchain utilities.
I noticed that building static archives was failing because ranlib is
invoked with the -D flag. This failed with llvm-ranlib since it parses
the -D flag as the archive path and reports an error that more than one
archive has been passed.
This fixes https://llvm.org/PR41707
Reviewed By: rupprecht
Differential Revision: https://reviews.llvm.org/D71554
xray-empty-firstmbb.mir does not test the intended code path. Change
xray-instruction-threshold to 0 to exercise the code path.
Delete xray-empty-function.mir . Empty MachineFunction does not work.
Various passes (e.g. MachineDominatorTree) assume the presence of an
entry block.
Rather than handling zlib handling manually, use `find_package` from CMake
to find zlib properly. Use this to normalize the `LLVM_ENABLE_ZLIB`,
`HAVE_ZLIB`, `HAVE_ZLIB_H`. Furthermore, require zlib if `LLVM_ENABLE_ZLIB` is
set to `YES`, which requires the distributor to explicitly select whether
zlib is enabled or not. This simplifies the CMake handling and usage in
the rest of the tooling.
clang/lib/CodeGen/CodeGenModule performs the -mpie-copy-relocations
check and sets dso_local on applicable global variables. We don't need
to duplicate the work in TargetMachine shouldAssumeDSOLocal.
Verified that -mpie-copy-relocations can still emit PC relative
relocations for external variable accesses.
clang -target x86_64 -fpie -mpie-copy-relocations -c => R_X86_64_PC32
clang -target aarch64 -fpie -mpie-copy-relocations -c => R_AARCH64_ADR_PREL_PG_HI21+R_AARCH64_LDST64_ABS_LO12_NC
This patch introduces `AAValueConstantRange`, which answers a possible range for integer value in a specific program point.
One of the motivations is propagating existing `range` metadata. (I think we need to change the situation that `range` metadata cannot be put to Argument).
The state is a tuple of `ConstantRange` and it is initialized to (known, assumed) = ([-∞, +∞], empty).
Currently, AAValueConstantRange is created when AAValueSimplify cannot
simplify the value.
Supported
- BinaryOperator(add, sub, ...)
- CmpInst(icmp eq, ...)
- !range metadata
`AAValueConstantRange` is not intended to extend to polyhedral range value analysis.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D71620
This was increasing the number of instructions when fsub was legalized
on AMDGPU with no signed zeros enabled. This fold should be guarded by
hasOneUse, and I don't think getNode should be doing that. The same
fold is already done as a regular combine through isNegatibleForFree.
This does require duplicating, even though isNegatibleForFree does
this combine already (and properly checks hasOneUse) to avoid one PPC
regression. In the regression, the outer fneg has nsz but the fsub
operand does not. isNegatibleForFree only sees the operand, and
doesn't see it's used from a nsz context. A nsz parameter needs to be
added and threaded through isNegatibleForFree to avoid this.
The instructions use a mask to either pack disjoint bits together(pext) or spread bits to disjoint locations(pdep). If the mask is all 0s then no bits are extracted or deposited. If the mask is all ones, then the source value is written to the result since no compression or expansion happens. Otherwise if both the source and mask are constant we can walk the bits in the source/mask and calculate the result.
There other crazier things we could do like computeKnownBits or turning pext into shift/and if only a single contiguous range of bits is extracted.
Fixes PR44389
Differential Revision: https://reviews.llvm.org/D71952
If we just subtracted 1 and are checking if the result is -1. We can use the carry flag from the ADD instead of an explicit CMP. I'm using the same checks for the add users as EmitTest.
Fixes one case from PR44412
Differential Revision: https://reviews.llvm.org/D72019
This does not solve PR17101, but it is one of the
underlying diffs noted here:
https://bugs.llvm.org/show_bug.cgi?id=17101#c8
We could ease the one-use checks for the 'clear'
(no 'not' op) half of the transform, but I do not
know if that asymmetry would make things better
or worse.
Proofs:
https://rise4fun.com/Alive/uVB
Name: masked bit set
%sh1 = shl i32 1, %y
%and = and i32 %sh1, %x
%cmp = icmp ne i32 %and, 0
%r = zext i1 %cmp to i32
=>
%s = lshr i32 %x, %y
%r = and i32 %s, 1
Name: masked bit clear
%sh1 = shl i32 1, %y
%and = and i32 %sh1, %x
%cmp = icmp eq i32 %and, 0
%r = zext i1 %cmp to i32
=>
%xn = xor i32 %x, -1
%s = lshr i32 %xn, %y
%r = and i32 %s, 1
Judging by the existing comments, this was the intention, but the
transform never actually checked if the existing phi's would be removed.
See https://bugs.llvm.org/show_bug.cgi?id=44242 for an example where
this causes much worse code generation on AMDGPU.
Differential Revision: https://reviews.llvm.org/D71209
When functions exist for some but not all run lines we need to be
careful when selecting the prefix. So far, a common prefix was
potentially chosen as there was never a "conflict" that would have
caused otherwise. With this patch we avoid common prefixes if they
are used by run lines that do not emit the function.
Reviewed By: lebedev.ri
Differential Revision: https://reviews.llvm.org/D68850
As part of the Attributor manifest we want to change the signature of
functions. This patch introduces a fairly generic interface to do so.
As a first, very simple, use case, we remove unused arguments. A second
use case, pointer privatization, will be committed with this patch as
well.
A lot of the code and ideas are taken from argument promotion and we
run all argument promotion tests through this framework as well.
Reviewed By: uenoku
Differential Revision: https://reviews.llvm.org/D68765
If we have `int foo(int a) { return a; }` and we run with --function-signature
enabled, we want a single variable declaration for `a` which is reused
later.
Reviewed By: lebedev.ri
Differential Revision: https://reviews.llvm.org/D69722
Attribute annotations on calls, e.g., #0, are not useful on their own.
This patch adds a flag to update_test_checks.py to scrub them.
Reviewed By: lebedev.ri
Differential Revision: https://reviews.llvm.org/D68851
Since the information is known we can simply use it at the call site.
This is especially useful for callbacks but also helps regular calls.
The test changes are mechanical.
This is the second step after D67871 to make use of abstract call sites.
In this patch the argument we associate with a abstract call site
argument can be the one in the callback callee instead of the one in the
callback broker.
Caveat: We cannot allow no-alias arguments for problematic callbacks:
As described in [1], adding no-alias (or restrict) to arguments could
break synchronization as the synchronization effect, e.g., a barrier,
does not "alias" with the pointer anymore. This disables no-alias
annotation for potentially problematic arguments until we implement the
fix described in [1].
Reviewed By: uenoku
Differential Revision: https://reviews.llvm.org/D68008
[1] Compiler Optimizations for OpenMP, J. Doerfert and H. Finkel,
International Workshop on OpenMP 2018,
http://compilers.cs.uni-saarland.de/people/doerfert/par_opt18.pdf
Especially for callbacks, annotating the call site arguments is
important. Doing so exposed a too strong dependence of AAMemoryBehavior
on AANoCapture since we handle the case of potentially captured pointers
explicitly.
The changes to the tests are all mechanical.
Seeing some curious CFI failures internally - which makes little sense
to me, as I don't think anyone is using this flag (even us,
internally)... so sounds like a bug in my code somewhere (possibly a
latent one that propagating this flag exposed, not sure). Reverting
while I investigate.
This reverts commit c51b45e32e.
Summary:
Amend MS offset operator implementation, to more closely fit with its MS counterpart:
1. InlineAsm: evaluate non-local source entities to their (address) location
2. Provide a mean with which one may acquire the address of an assembly label via MS syntax, rather than yielding a memory reference (i.e. "offset asm_label" and "$asm_label" should be synonymous
3. address PR32530
Based on http://llvm.org/D37461
Fix broken test where the break appears unrelated.
- Set up appropriate memory-input rewrites for variable references.
- Intel-dialect assembly printing now correctly handles addresses by adding "offset".
- Pass offsets as immediate operands (using "r" constraint for offsets of locals).
Reviewed By: rnk
Differential Revision: https://reviews.llvm.org/D71436
This assumed a single pattern if there was a predicate. Relax this a
bit, and allow multiple patterns as long as they have the same class.
This was only broken for the DAG path. GlobalISel seems to have
handled this correctly already.
Parsing `ls -l` output to obtain the size of a file is unreliable; the
exact output format is not specified, and some user or group names may
contain multiple words, causing `cut -f5 -d' '` to extract an incorrect
value. `wc -c`, on the other hand, is portable, and there are precendents
of its use in test cases.
D56351 (included in LLVM 8.0.0) introduced "frame-pointer". All tests
which use "no-frame-pointer-elim" or "no-frame-pointer-elim-non-leaf"
have been migrated to use "frame-pointer".
Implement UpgradeFramePointerAttributes to upgrade the two obsoleted
function attributes for bitcode. Their semantics are ignored.
Differential Revision: https://reviews.llvm.org/D71863
G_BITREVERSE is generated from llvm.bitreverse.<type> intrinsics,
clang genrates these intrinsics from __builtin_bitreverse32 and
__builtin_bitreverse64.
Add lower and narrowscalar for G_BITREVERSE.
Lower G_BITREVERSE on MIPS32.
Recommit notes:
Introduce temporary variables in order to make sure
instructions get inserted into MachineFunction in same order
regardless of compiler used to build llvm.
Differential Revision: https://reviews.llvm.org/D71363
Summary:
Add missing part of patch D71361. Now that the stack-frame
can be operated using a addw/subw instruction, they should
appear in the unwinding list.
Reviewers: dmgreen, efriedma
Reviewed By: dmgreen
Subscribers: kristof.beyls, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D72000
Sometimes the result bank of the phi is already assigned to something,
and should not be ignored. This is in preparation for additional
boolean phi handling changes.
Also refine the logic to fix some cases that were incorrectly deciding
to use SGPRs.
VSX provides a full complement of rounding instructions yet we somehow ended up
with some of them legal and others not. This just legalizes all of the FP
rounding nodes and the FP -> int rounding nodes with unsafe math.
Differential revision: https://reviews.llvm.org/D69949
This adds ICmp to the list of instructions that we sink a splat to in a
loop, allowing the register forms of instructions to be selected more
often. It does not add FCmp yet as the results look a little odd, trying
to keep the register in an float reg and having to move it back to a GPR.
Differential Revision: https://reviews.llvm.org/D70997
Summary:
This patch allows to emit thumb2 add and sub
instructions with 12 bit immediates in the
emitT2RegPlusImmediate function.
- Splitting parts of the D70680
Reviewers: eli.friedman, olista01, efriedma
Reviewed By: efriedma
Subscribers: efriedma, kristof.beyls, hiraditya, dmgreen, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D71361
G_BITREVERSE is generated from llvm.bitreverse.<type> intrinsics,
clang genrates these intrinsics from __builtin_bitreverse32 and
__builtin_bitreverse64.
Add lower and narrowscalar for G_BITREVERSE.
Lower G_BITREVERSE on MIPS32.
Differential Revision: https://reviews.llvm.org/D71363
G_BSWAP is generated from llvm.bswap.<type> intrinsics, clang genrates
these intrinsics from __builtin_bswap32 and __builtin_bswap64.
Add lower and narrowscalar for G_BSWAP.
Lower G_BSWAP on MIPS32, select G_BSWAP on MIPS32 revision 2 and later.
Differential Revision: https://reviews.llvm.org/D71362
This patch adds necessary test cases for load-update-store pattern
which only updates single element of vector.
Differential Revision: https://reviews.llvm.org/D71886
Summary: This patch makes `AAValueSimplify` use `changeUsesAfterManifest` in `manifest`. This will invoke simple folding after the manifest.
Reviewers: jdoerfert, sstefan1
Reviewed By: jdoerfert
Subscribers: hiraditya, arphaman, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D71972
For now, PowerPC will using several instructions to get the constant and "and" it with the following case:
define i32 @test1(i32 %a) {
%and = and i32 %a, -2
ret i32 %and
}
However, we could exploit it with the rotate mask instructions.
MB ME
+----------------------+
|xxxxxxxxxxx00011111000|
+----------------------+
0 32 64
Notice that, we can only do it if the MB is larger than 32 and MB <= ME as
RLWINM will replace the content of [0 - 32) with [32 - 64) even we didn't rotate it.
Differential Revision: https://reviews.llvm.org/D71829
A branch is considered UB if it depends on an undefined / uninitialized value.
At this point this handles simple UB branches in the form: `br i1 undef, ...`
We query `AAValueSimplify` to get a value for the branch condition, so the branch
can be more complicated than just: `br i1 undef, ...`.
Patch By: Stefanos Baziotis (@baziotis)
Reviewers: jdoerfert, sstefan1, uenoku
Reviewed By: uenoku
Differential Revision: https://reviews.llvm.org/D71799
If we have references to the same extern_weak in multiple objects,
all of them would generate external symbols with the same name. Make
them static to avoid duplicate definitions; nothing should need to
refer to this symbol outside of the current object.
GCC/binutils seems to handle the same by not using a fixed string
for the ".default" suffix, but instead using the name of some other
defined external symbol from the same object (which is supposed to
be unique among objects unless there's other duplicate definitions).
Differential Revision: https://reviews.llvm.org/D71711
This is a fix for https://bugs.llvm.org/show_bug.cgi?id=40554
Some CPU's trap to the kernel on unaligned floating point access and there are
kernels that do not handle the interrupt. The program then fails with a SIGBUS
according to the PR. This just switches the default for unaligned access to only
allow it on recent server CPUs that are known to allow this.
Differential revision: https://reviews.llvm.org/D71954
Summary:
If we didn't set the value for hasSideEffects bit in our td file, `llvm-tblgen`
will set it as true for those instructions which has no match pattern.
Below 6 instructions don't set the hasSideEffects flag and don't have match
pattern, so their hasSideEffects flag will be set true by llvm-tblgen.
But in fact below instructions don't modify any special register and don't have
other SideEffects, they shouldn't have SideEffects.
This patch is to modify the hasSideEffects of below instructions from 1 to 0.
```
VEXTUHLX
VEXTUHRX
VEXTUWLX
VEXTUWRX
VSPLTBs
VSPLTHs
```
Reviewed By: jsji
Differential Revision: https://reviews.llvm.org/D71391