This assert tried to check that AND constants are only on the RHS. But its possible for both operands to be constants if one is opaque which will prevent the AND from being constant folded.
Fixes PR38771
llvm-svn: 341102
With r295105, some 'noreturn' blocks (those that don't return and have no
successors) may be merged.
If such blocks' predecessors have different outgoing offset or register, don't
report an error in CFIInstrInserter verify().
Thanks to Vlad Tsyrklevich for reporting the issue.
Differential Revision: https://reviews.llvm.org/D51161
llvm-svn: 341087
Summary:
In the case of (and reg, constant) or (or reg, constant), it can be
beneficial to use a ANDNrr/ORNrr instruction instead of ANDrr/ORrr,
if the complement of the constant can be encoded using a single SETHI
instruction instead of a SETHI/ORri pair.
If the constant has more than one use, it is probably better to keep it
in its original form.
Reviewers: jyknight, venkatra
Reviewed By: jyknight
Subscribers: fedor.sergeev, jrtc27, llvm-commits
Differential Revision: https://reviews.llvm.org/D50964
llvm-svn: 341069
Summary:
This is a continuation of https://reviews.llvm.org/D49727
Below the original text, current changes in the comments:
Currently, in line with GCC, when specifying reserved registers like sp or pc on an inline asm() clobber list, we don't always preserve the original value across the statement. And in general, overwriting reserved registers can have surprising results.
For example:
extern int bar(int[]);
int foo(int i) {
int a[i]; // VLA
asm volatile(
"mov r7, #1"
:
:
: "r7"
);
return 1 + bar(a);
}
Compiled for thumb, this gives:
$ clang --target=arm-arm-none-eabi -march=armv7a -c test.c -o - -S -O1 -mthumb
...
foo:
.fnstart
@ %bb.0: @ %entry
.save {r4, r5, r6, r7, lr}
push {r4, r5, r6, r7, lr}
.setfp r7, sp, #12
add r7, sp, #12
.pad #4
sub sp, #4
movs r1, #7
add.w r0, r1, r0, lsl #2
bic r0, r0, #7
sub.w r0, sp, r0
mov sp, r0
@APP
mov.w r7, #1
@NO_APP
bl bar
adds r0, #1
sub.w r4, r7, #12
mov sp, r4
pop {r4, r5, r6, r7, pc}
...
r7 is used as the frame pointer for thumb targets, and this function needs to restore the SP from the FP because of the variable-length stack allocation a. r7 is clobbered by the inline assembly (and r7 is included in the clobber list), but LLVM does not preserve the value of the frame pointer across the assembly block.
This type of behavior is similar to GCC's and has been discussed on the bugtracker: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=11807 . No consensus seemed to have been reached on the way forward. Clang behavior has briefly been discussed on the CFE mailing (starting here: http://lists.llvm.org/pipermail/cfe-dev/2018-July/058392.html). I've opted for following Eli Friedman's advice to print warnings when there are reserved registers on the clobber list so as not to diverge from GCC behavior for now.
The patch uses MachineRegisterInfo's target-specific knowledge of reserved registers, just before we convert the inline asm string in the AsmPrinter.
If we find a reserved register, we print a warning:
repro.c:6:7: warning: inline asm clobber list contains reserved registers: R7 [-Winline-asm]
"mov r7, #1"
^
Reviewers: efriedma, olista01, javed.absar
Reviewed By: efriedma
Subscribers: eraman, kristof.beyls, llvm-commits
Differential Revision: https://reviews.llvm.org/D51165
llvm-svn: 341062
Providing that the load is known to be 4 byte aligned, we can optimise a
ldr(adr address) to just ldr address.
Differential Revision: https://reviews.llvm.org/D51030
llvm-svn: 341058
Bots are unhappy:
/Users/buildslave/jenkins/workspace/clang-stage1-cmake-RA-incremental/llvm/test/CodeGen/Hexagon/swp-const-tc2.ll:10:14: error: CHECK-NOT: excluded string found in input
; CHECK-NOT: = mpy
^
<stdin>:22:6: note: found here
r5 += mpyi(r2,r3)
^~~~~
This reverts commit r341046.
llvm-svn: 341049
Hacker's Delight 10-17: when C is constant,
the result of X % C == 0 can be computed more cheaply
without actually calculating the remainder.
The motivation is discussed here:
https://bugs.llvm.org/show_bug.cgi?id=35479.
Patch by: hermord (Dmytro Shynkevych)!
For https://reviews.llvm.org/D50222
llvm-svn: 341047
Summary:
As suggested in D50222, this has been refactored into a separate patch.
The undef and the infinite loop at the end cause this test to be translated
unpredictably. In particular, the checked-for `mpy` disappears under
certain legal optimizations (e.g. the one in D50222).
Since the use of these constructs is not relevant to the behavior tested,
according to the header comment, this change, suggested by @kparzysz,
eliminates them.
Patch by: hermord (Dmytro Shynkevych)!
Reviewers: kparzysz
Reviewed By: kparzysz
Subscribers: llvm-commits, kparzysz
Differential Revision: https://reviews.llvm.org/D50944
llvm-svn: 341046
In computeRegisterLiveness, the max instructions to search
was counting dbg_value instructions, which could potentially
cause an observable codegen change from the presence of debug
info.
llvm-svn: 341028
If there is an unused def, this would previously
report that the register was live. Check for uses
first so that it is reported as dead if never used.
llvm-svn: 341027
If the end of the block is reached during the scan, check
the live ins of the successors. This was already done in the
other direction if the block entry was reached.
llvm-svn: 341026
Check that Machine CSE correctly handles during the transformation, the
debug location information for local variables.
Differential Revision: https://reviews.llvm.org/D50887
llvm-svn: 341025
We now only add +64bit to the CPU string for "generic" CPU. All other CPU names are assumed to have the feature flag already set if they support 64-bit. I've remove the implies from CMPXCHG8 so that Feature64Bit only comes in via CPUs or user passing -mattr=+64bit.
I've changed the assert to a report_fatal_error so it's not lost in Release builds.
The test updates are to fix things that tripped the new error.
Differential Revision: https://reviews.llvm.org/D51231
llvm-svn: 341022
If an ABI-like value is used in a different block,
the type split used is not necessarily the same as
the call's ABI. The value is used through an intermediate
copy virtual registers from the other block. This
resulted in copies with inconsistent sizes later.
Fixes regressions since r338197 when AMDGPU started
splitting vector types for calls.
llvm-svn: 341018
We don't have enough information to know if struct types being
bitcast will cause validation failures or not, so be conservative
and allow such cases to persist (fot now).
Fixes: https://bugs.llvm.org/show_bug.cgi?id=38711
Subscribers: dschuff, jgravelle-google, aheejin, sunfish, llvm-commits
Differential Revision: https://reviews.llvm.org/D51460
llvm-svn: 341010
Summary:
Global variables that are external and zero initialized are
supposed to be merged with global variables in the bss section
rather than the data section.
Reviewers: efriedma, rengolin, t.p.northover, javed.absar, asl, john.brawn, pcc
Reviewed By: efriedma
Subscribers: dmgreen, llvm-commits
Differential Revision: https://reviews.llvm.org/D51379
llvm-svn: 341008
Variables declared with the dllimport attribute are accessed via a
stub variable named __imp_<var>. In MinGW configurations, variables that
aren't declared with a dllimport attribute might still end up imported
from another DLL with runtime pseudo relocs.
For x86_64, this avoids the risk that the target is out of range
for a 32 bit PC relative reference, in case the target DLL is loaded
further than 4 GB from the reference. It also avoids having to make the
text section writable at runtime when doing the runtime fixups, which
makes it worthwhile to do for i386 as well.
Add stub variables for all dso local data references where a definition
of the variable isn't visible within the module, since the DLL data
autoimporting might make them imported even though they are marked as
dso local within LLVM.
Don't do this for variables that actually are defined within the same
module, since we then know for sure that it actually is dso local.
Don't do this for references to functions, since there's no need for
runtime pseudo relocations for autoimporting them; if a function from
a different DLL is called without the appropriate dllimport attribute,
the call just gets routed via a thunk instead.
GCC does something similar since 4.9 (when compiling with -mcmodel=medium
or large; from that version, medium is the default code model for x86_64
mingw), but only for x86_64.
Differential Revision: https://reviews.llvm.org/D51288
llvm-svn: 340942
MipsSEInstrInfo class defines for internal purpose unconditional
branches as Mips::B nad Mips:J even in case of microMIPS code
generation. Under some conditions that leads to the bug - for rather long
branch which fits to Mips jump instruction offset size, but does not fit
to microMIPS jump offset size, we generate 'short' branch and later show
an error 'out of range PC16 fixup' after check in the isBranchOffsetInRange
routine.
Differential revision: https://reviews.llvm.org/D50615
llvm-svn: 340932
Involves microMIPS's jump in the analyzable branch set to reduce some
code patterns.
Differential revision: https://reviews.llvm.org/D50613
llvm-svn: 340931
For a certain combination of options, BuildPairF64_{64}, ExtractElementF64{_64}
may be expanded into instructions using stack.
Add implicit operand $sp for such cases so that ShrinkWrapping doesn't move
prologue setup below them.
Fixes MultiSource/Benchmarks/MallocBench/cfrac for
'--target=mips-img-linux-gnu -mcpu=mips32r6 -mfpxx -mnan=2008'
and
'--target=mips-img-linux-gnu -mcpu=mips32r6 -mfp64 -mnan=2008 -mno-odd-spreg'.
Differential Revision: https://reviews.llvm.org/D50986
llvm-svn: 340927
Adjust missed test to avoid the X / X -> 1 & X % X -> 0 folds while keeping their original purposes.
Differential Revision: https://reviews.llvm.org/D50636
llvm-svn: 340917
Adjust tests to avoid the X / X -> 1 & X % X -> 0 folds while keeping their original purposes.
Differential Revision: https://reviews.llvm.org/D50636
llvm-svn: 340916
Noticed while looking at D49562 codegen - we can avoid a large constant mask load and a slow VPBLENDVB select op by using VPBLENDW+VPBLENDD instead.
TODO: As discussed on the patch, we should investigate adding VPBLENDVB handling to target shuffle combining as well, that will allow us to extend this to VPBLENDW+VPBLENDW+VPBLENDD.
Differential Revision: https://reviews.llvm.org/D50074
llvm-svn: 340913
Summary:
Add some optional code to validate getInstSizeInBytes for emitted
instructions. This flushed out some issues which are fixed by this
patch:
- Streamline getInstSizeInBytes
- Properly define the VI readlane/writelane instruction as VOP3
- Fix the inline constant determination. Specifically, this change
fixes an issue where a 32-bit value of 0xffffffff was recorded
as unsigned. This is equal to -1 when restricting to a 32-bit
comparison, and an inline constant can be used.
Reviewers: arsenm, rampitec
Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits
Differential Revision: https://reviews.llvm.org/D50629
Change-Id: Id87c3b7975839da0de8156a124b0ce98c5fb47f2
llvm-svn: 340903
Summary: This is split out from D41062 to cover the code in LegalVectorTypes.cpp
Reviewers: RKSimon, spatel, efriedma
Reviewed By: efriedma
Subscribers: sdardis, jvesely, nhaehnle, jrtc27, atanasyan, llvm-commits
Differential Revision: https://reviews.llvm.org/D51337
llvm-svn: 340891
These are intrinsics for supporting kadd builtins in clang. These builtins are already in gcc to implement intrinsics from icc. Though they are missing from the Intel Intrinsics Guide.
This instruction adds two mask registers together as if they were scalar rather than a vXi1. We might be able to get away with a bitcast to scalar and a normal add instruction, but that would require DAG combine smarts in the backend to recoqnize add+bitcast. For now I'd prefer to go with the easiest implementation so we can get these builtins in to clang with good codegen.
Differential Revision: https://reviews.llvm.org/D51370
llvm-svn: 340869
This can leave behind the uses with the defs removed.
Since this should only really happen in tests, it's not worth the
effort of trying to handle this.
llvm-svn: 340866
https://reviews.llvm.org/D51197
Currently, IRTranslator (and GISel) seems to be arbitrarily picking
which overflow intrinsics get mapped into opcodes which either have a
carry as an input or not.
For intrinsics such as Intrinsic::uadd_with_overflow, translate it to an
opcode (G_UADDO) which doesn't have any carry inputs (similar to LLVM
IR).
This patch adds 4 missing opcodes for completeness - G_UADDO, G_USUBO,
G_SSUBE and G_SADDE.
llvm-svn: 340865
The original motivating example uses a 64-bit add, so the carry
is used. Insert a copy from VCC. This may allow shrinking of
the used carry instruction. At worst, we are replacing a
mov to materialize the constant with a copy of vcc.
llvm-svn: 340862
This needs to be done in the SSA fold operands
pass to be effective, so there is a bit of overlap
with SIShrinkInstructions but I don't think this
is practically avoidable.
llvm-svn: 340859
Summary:
The updated tests were previously infallible because the SIMD bitwise
operations do not contain vector types in their names.
Reviewers: aheejin, dschuff
Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits
Differential Revision: https://reviews.llvm.org/D51369
llvm-svn: 340858
This patch creates the shift mask and actual shift using the vXi16 vector shift ops.
Differential Revision: https://reviews.llvm.org/D51263
llvm-svn: 340813