Summary:
The pattern dsll/dsrl $rd, $rt, $rs is found in handwritten assembly which
is just a shorthand version of dsllv/dsrlv $rd, $rt, $rs.
Reviewers: dsanders
Reviewed By: dsanders
Differential Revision: http://reviews.llvm.org/D3486
llvm-svn: 207664
Summary:
The pattern sll/srl $rd, $rt, $rs is found in handwritten assembly which
is just a shorthand version of sllv/srlv $rd, $rt, $rs.
Reviewers: dsanders
Reviewed By: dsanders
Differential Revision: http://reviews.llvm.org/D3483
llvm-svn: 207657
target cannot be determined accurately. This is the case for NaCl where the
sandboxing instructions are added in MC layer, after the MipsLongBranch pass.
It is also the case when the code has inline assembly. Instead of calculating
offset in the MipsLongBranch pass, use %hi(sym1 - sym2) and %lo(sym1 - sym2)
expressions that are resolved during the fixup.
This patch also deletes microMIPS test file test/CodeGen/Mips/micromips-long-branch.ll
and implements microMIPS CHECKs in a much simpler way in a file
test/CodeGen/Mips/longbranch.ll, together with MIPS32 and MIPS64.
llvm-svn: 207656
Summary:
Also renamed non-portable register names (e.g. $t2) so that we don't end up
with a different encoding for what appears to be an equivalent instruction.
Reviewers: dsanders
Reviewed By: dsanders
Differential Revision: http://reviews.llvm.org/D3505
llvm-svn: 207655
The canonical syntax for shifts by a variable amount does not end with 'v', but
that syntax should be supported as an alias (presumably for legacy reasons).
llvm-svn: 207649
On instructions using the NZCV register, a couple of conditions have dual
representations: HS/CS and LO/CC (meaning unsigned-higher-or-same/carry-set and
unsigned-lower/carry-clear). The first of these is more descriptive in most
circumstances, so we should print it.
llvm-svn: 207644
Summary:
This isn't supported directly so we rotate the vector by the desired number of
elements, insert to element zero, then rotate back.
The i64 case generates rather poor code on MIPS32. There is an obvious
optimisation to be made in future (do both insert.w's inside a shared
rotate/unrotate sequence) but for now it's sufficient to select valid code
instead of aborting.
Depends on D3536
Reviewers: matheusalmeida
Reviewed By: matheusalmeida
Differential Revision: http://reviews.llvm.org/D3537
llvm-svn: 207640
Summary:
This directive is used for setting up $gp in the beginning of a function.
It expands to three instructions if PIC is enabled:
lui $gp, %hi(_gp_disp)
addui $gp, $gp, %lo(_gp_disp)
addu $gp, $gp, $reg
_gp_disp is a special symbol that the linker sets to the distance between
the lui instruction and the context pointer (_gp).
Reviewers: dsanders
Reviewed By: dsanders
Differential Revision: http://reviews.llvm.org/D3480
llvm-svn: 207637
Summary:
The N64 ABI allows up to three operations to be specified per relocation record
independently of the endianness.
Reviewers: dsanders
Reviewed By: dsanders
Differential Revision: http://reviews.llvm.org/D3529
llvm-svn: 207636
Since these are mostly used in "lsl #16", "lsl #32", "lsl #48" combinations to
piece together an immediate in 16-bit chunks, hex is probably the most
appropriate format.
llvm-svn: 207635
This is mostly aimed at the NEON logical operations and MOVI/MVNI (since they
accept weird shifts which are more naturally understandable in hex notation).
Also changes BRK/HINT etc, which is probably a neutral change, but easier than
the alternative.
llvm-svn: 207634
Since these instructions only accept a 12-bit immediate, possibly shifted left
by 12, the canonical syntax used by the architecture reference manual is "#N {,
lsl #12 }". We should accept an immediate that has already been shifted, (e.g.
Also, print a comment giving the full addend since it can be helpful.
llvm-svn: 207633
This introduces the stack lowering emission of the stack probe function for
Windows on ARM. The stack on Windows on ARM is a dynamically paged stack where
any page allocation which crosses a page boundary of the following guard page
will cause a page fault. This page fault must be handled by the kernel to
ensure that the page is faulted in. If this does not occur and a write access
any memory beyond that, the page fault will go unserviced, resulting in an
abnormal program termination.
The watermark for the stack probe appears to be at 4080 bytes (for
accommodating the stack guard canaries and stack alignment) when SSP is
enabled. Otherwise, the stack probe is emitted on the page size boundary of
4096 bytes.
llvm-svn: 207615
Emit the COFF header when printing out the function. This is important as the
header contains two important pieces of information: the storage class for the
symbol and the symbol type information. This bit of information is required for
the linker to correctly identify the type of symbol that it is dealing with.
llvm-svn: 207613
IMAGE_REL_ARM_MOV32T relocations require that the movw/movt pair-wise
relocation is not split up and reordered. When expanding the mov32imm
pseudo-instruction, create a bundle if the machine operand is referencing an
address. This helps ensure that the relocatable address load is not reordered
by subsequent passes.
Unfortunately, this only partially handles the case as the Constant Island Pass
occurs after the instructions are unbundled and does not properly handle
bundles. That is a more fundamental issue with the pass itself and beyond the
scope of this change.
llvm-svn: 207608
Currently, musttail codegen is relying on sibcall optimization, and
reporting a fatal error if fails. Sibcall optimization fails when stack
arguments need to be modified, which is insufficient for musttail.
The logic for moving arguments in memory safely is already implemented
for GuaranteedTailCallOpt. This change merely arranges for musttail
calls to use it.
No functional change for GuaranteedTailCallOpt.
Reviewers: espindola
Differential Revision: http://reviews.llvm.org/D3493
llvm-svn: 207598
SI_IF and SI_ELSE are terminators which also produce a value. For
these instructions ISel always inserts a COPY to move their value
to another basic block. This COPY ends up between SI_(IF|ELSE)
and the S_BRANCH* instruction at the end of the block.
This breaks MachineBasicBlock::getFirstTerminator() and also the
machine verifier which assumes that terminators are grouped together at
the end of blocks.
To solve this we coalesce the copy away right after ISel to make sure
there are no instructions in between terminators at the end of blocks.
llvm-svn: 207591
SALU instructions ignore control flow, so it is not always safe to use
them within branches. This is a partial solution to this problem
until we can come up with something better.
llvm-svn: 207590
This is a squash of several optimization commits:
- calculate DIV_Lo and DIV_Hi separately
- use BFE_U32 if we are operating on 32bit values
- use precomputed constants instead of shifting in UDVIREM
- skip the first 32 iterations of udivrem
v2: Check whether BFE is supported before using it
Patch by: Jan Vesely
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 207589
The instcomine logic to handle vpermilvar's pd and 256 variants was incorrect.
The _256 variants have indexes into the individual 128 bit lanes and in all
cases it also has to mask out unused bits.
llvm-svn: 207577
This patch changes the vectorization remarks to also inform when
vectorization is possible but not beneficial.
Added tests to exercise some loop remarks.
llvm-svn: 207574
Since the ARM64 backend doesn't implement support for the old JIT those tests are failing when the regression tests are run on an AArch64 host.
llvm-svn: 207530
Summary:
This isn't supported directly so we splat the vector element and extract
the most convenient copy.
Reviewers: matheusalmeida
Reviewed By: matheusalmeida
Differential Revision: http://reviews.llvm.org/D3530
llvm-svn: 207524
This patch centralizes the handling of the thumb bit around
MCStreamer::isThumbFunc and makes isThumbFunc handle aliases.
This fixes a corner case, but the main advantage is having just one
way to check if a MCSymbol is thumb or not. This should still be
refactored to be ARM only, but at least now it is just one predicate
that has to be refactored instead of 3 (isThumbFunc,
ELF_Other_ThumbFunc, and SF_ThumbFunc).
llvm-svn: 207522
clang directly from the LLVM test suite! That doesn't work. I've
followed up on the review thread to try and get a viable solution sorted
out, but trying to get the tree clean here.
llvm-svn: 207462
When evaluating an assembly expression for a relocation, we want to
stop at MCSymbols that are in the symbol table, even if they are variables.
This is needed since the semantics may require that the relocation use them.
That is not the case when computing the value of a symbol in the symbol table.
There are no relocations in this case and we have to keep going until we hit
a section or find out that the expression doesn't have an assembly time
value.
llvm-svn: 207445
While refactoring out constructScopeDIE into two functions I realized we
were emitting DW_AT_object_pointer in the inlined subroutine when we
didn't need to (GCC doesn't, and the abstract subprogram definition has
the information already).
So here's the refactoring and the bug fix. This is one step of
refactoring to remove some subtle memory ownership semantics. It turns
out the original constructScopeDIE returned ownership in its return
value in some cases and not in others. The split into two functions now
separates those two semantics - further cleanup (unique_ptr, etc) will
follow.
llvm-svn: 207441
This reverts commit r207287, reapplying r207286.
I'm hoping that declaring an explicit struct and instantiating
`addBlockEdges()` directly works around the GCC crash from r207286.
This is a lot more boilerplate, though.
llvm-svn: 207438
The symbol table itself has no relocations, so it is not possible to represent
things like
a = undefined + 1
With the patch we just omit these variables. That matches the behaviour of the
gnu assembler.
llvm-svn: 207419
Someone couldn't bear to have a completely orthogonal set of floating-point
registers, so we've got some instructions that only accept v0-v15 (coming in
ARMv9, V128_prime: you're allowed v2, v3, v5, v7, ...).
Anyway, we were permitting even the out of range registers during assembly
(CodeGen handled it correctly). This adds a diagnostic.
llvm-svn: 207412
by avoiding inlining massive switches merely because they have no
instructions in them. These switches still show up where we fail to form
lookup tables, and in those cases they are actually going to cause
a very significant code size hit anyways, so inlining them is not the
right call. The right way to fix any performance regressions stemming
from this is to enhance the switch-to-lookup-table logic to fire in more
places.
This makes PR19499 about 5x less bad. It uncovers a second compile time
problem in that test case that is unrelated (surprisingly!).
llvm-svn: 207403
odd to have the output of 'llvm-ar tv' depend on the format of
TimeValue::str(), but that's what we have today. If anyone needs the
output to remain compatible with GNU ar or old versions of llvm-ar, just
shout and I'll switch the code to manually format its times.
Note that there isn't a portable format -- Mac and GNU have different
formats at least (thanks Rafael!) so...
llvm-svn: 207387
Introduce support for WoA PE/COFF object file emission from LLVM. Add the new
target specific PE/COFF Streamer (ARMWinCOFFStreamer) that handles the ARM
specific behaviour of PE/COFF object emission. ARM exception information is not
yet emitted and is a TODO item.
The ARM specific object writer (ARMWinCOFFObjectWriter) handles the ARM specific
relocation handling in conjunction with the WinCOFFObjectWriter in the MC layer.
The MC layer needs to be updated to deal with the relocation adjustments.
Branch relocations are adjusted by 4 bytes (unlikely their ELF counterparts).
Minor tweaks to switch multiple conditional checks into equivalent switch
statements. The ObjectFileInfo is updated to relax the object file setup for
Windows COFF. Move the architecture checks into an assertion. Windows COFF is
currently only supported on x86, x86_64, and ARM (thumb). Rather than
defaulting to ELF, we will refuse to generate an object file. This is better
though as you do not get an (arbitrary) object file which is different from the
request.
llvm-svn: 207345
Since there's no way to ensure the type unit in the .dwo and the type
unit skeleton in the .o are correlated, this cannot work.
This implementation is a bit inefficient for a few reasons, called out
in comments.
llvm-svn: 207323
Otherwise the legalizer would just scalarize everything. Support for
mulhi in the targets isn't that great yet so on most targets we get
exactly the same scalarized output. Add a test for x86 vector udiv.
I had to disable the mulhi nodes on ARM because there aren't any patterns
for it. As far as I know ARM has instructions for getting the high part of
a multiply so this should be fixed.
llvm-svn: 207315
more than 1 instruction. The caller need to be aware of this
and adjust instruction iterators accordingly.
rdar://16679376
Repaired r207302.
llvm-svn: 207309
The included test case would return the incorrect results, because the expansion
of an shift with a constant shift amount of 0 would generate undefined behavior.
This is because ExpandShiftByConstant assumes that all shifts by constants with
a value of 0 have already been optimized away. This doesn't happen for opaque
constants and usually this isn't a problem, because opaque constants won't take
this code path - they are not supposed to. In the case that the opaque constant
has to be expanded by the legalizer, the legalizer would drop the opaque flag.
In this case we hit the limitations of ExpandShiftByConstant and create incorrect
code.
This commit fixes the legalizer by not dropping the opaque flag when expanding
opaque constants and adding an assertion to ExpandShiftByConstant to catch this
not supported case in the future.
This fixes <rdar://problem/16718472>
llvm-svn: 207304
Scaling factors are not free on X86 because every "complex" addressing mode
breaks the related instruction into 2 allocations instead of 1.
<rdar://problem/16730541>
llvm-svn: 207301
right intrinsics.
A packed logical shift right with a shift count bigger than or equal to the
element size always produces a zero vector. In all other cases, it can be
safely replaced by a 'lshr' instruction.
llvm-svn: 207299
Summary:
If we're doing a v4f32/v4i32 shuffle on x86 with SSE4.1, we can lower
certain shufflevectors to an insertps instruction:
When most of the shufflevector result's elements come from one vector (and
keep their index), and one element comes from another vector or a memory
operand.
Added tests for insertps optimizations on shufflevector.
Added support and tests for v4i32 vector optimization.
Reviewers: nadav
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D3475
llvm-svn: 207291
This reverts commit r207286. It causes an ICE on the
cmake-llvm-x86_64-linux buildbot [1]:
llvm/lib/Analysis/BlockFrequencyInfo.cpp: In lambda function:
llvm/lib/Analysis/BlockFrequencyInfo.cpp:182:1: internal compiler error: in get_expr_operands, at tree-ssa-operands.c:1035
[1]: http://bb.pgr.jp/builders/cmake-llvm-x86_64-linux/builds/12093/steps/build_llvm/logs/stdio
llvm-svn: 207287
Previously, irreducible backedges were ignored. With this commit,
irreducible SCCs are discovered on the fly, and modelled as loops with
multiple headers.
This approximation specifies the headers of irreducible sub-SCCs as its
entry blocks and all nodes that are targets of a backedge within it
(excluding backedges within true sub-loops). Block frequency
calculations act as if we insert a new block that intercepts all the
edges to the headers. All backedges and entries to the irreducible SCC
point to this imaginary block. This imaginary block has an edge (with
even probability) to each header block.
The result is now reasonable enough that I've added a number of
testcases for irreducible control flow. I've outlined in
`BlockFrequencyInfoImpl.h` ways to improve the approximation.
<rdar://problem/14292693>
llvm-svn: 207286
This adds support for an -mattr option to the gold plugin and to llvm-lto. This
allows the caller to specify details of the subtarget architecture, like +aes,
or +ssse3 on x86. Note that this requires a change to the include/llvm-c/lto.h
interface: it adds a function lto_codegen_set_attr and it increments the
version of the interface.
llvm-svn: 207279
Consider this use from the new testcase:
LSR Use: Kind=ICmpZero, Offsets={0}, widest fixup type: i32
reg({1000,+,-1}<nw><%for.body>)
-3003 + reg({3,+,3}<nw><%for.body>)
-1001 + reg({1,+,1}<nuw><nsw><%for.body>)
-1000 + reg({0,+,1}<nw><%for.body>)
-3000 + reg({0,+,3}<nuw><%for.body>)
reg({-1000,+,1}<nw><%for.body>)
reg({-3000,+,3}<nsw><%for.body>)
This is the last use we consider for a solution in SolveRecurse, so CurRegs is
a large set. (CurRegs is the set of registers that are needed by the
previously visited uses in the in-progress solution.)
ReqRegs is {
{3,+,3}<nw><%for.body>,
{1,+,1}<nuw><nsw><%for.body>
}
This is the intersection of the regs used by any of the formulas for the
current use and CurRegs.
Now, the code requires a formula to contain *all* these regs (the comment is
simply wrong), otherwise the formula is immediately disqualified. Obviously,
no formula for this use contains two regs so they will all get disqualified.
The fix modifies the check to allow the formula in this case. The idea is
that neither of these formulae is introducing any new registers which is the
point of this early pruning as far as I understand.
In terms of set arithmetic, we now allow formulas whose used regs are a subset
of the required regs not just the other way around.
There are few more loops in the test-suite that are now successfully LSRed. I
have benchmarked those and found very minimal change.
Fixes <rdar://problem/13965777>
llvm-svn: 207271
buildbot - do not insert debug intrinsics before phi nodes.
Debug info for optimized code: Support variables that are on the stack and
described by DBG_VALUEs during their lifetime.
Previously, when a variable was at a FrameIndex for any part of its
lifetime, this would shadow all other DBG_VALUEs and only a single
fbreg location would be emitted, which in fact is only valid for a small
range and not the entire lexical scope of the variable. The included
dbg-value-const-byref testcase demonstrates this.
This patch fixes this by
Local
- emitting dbg.value intrinsics for allocas that are passed by reference
- dropping all dbg.declares (they are now fully lowered to dbg.values)
SelectionDAG
- renamed constructors for SDDbgValue for better readability.
- fix UserValue::match() to handle indirect values correctly
- not inserting an MMI table entries for dbg.values that describe allocas.
- lowering dbg.values that describe allocas into *indirect* DBG_VALUEs.
CodeGenPrepare
- leaving dbg.values for an alloca were they are (see comment)
Other
- regenerated/updated instcombine.ll testcase and included source
rdar://problem/16679879
http://reviews.llvm.org/D3374
llvm-svn: 207269
This intrinsic is no longer needed with the new @llvm.arm.hint(i32) intrinsic
which provides a generic, extensible manner for adding hint instructions. This
functionality can now be represented as @llvm.arm.hint(i32 5).
llvm-svn: 207246
override the default cold threshold.
When we use command line argument to set the inline threshold, the default
cold threshold will not be used. This is in line with how we use
OptSizeThreshold. When we want a higher threshold for all functions, we
do not have to set both inline threshold and cold threshold.
llvm-svn: 207245
Introduce the llvm.arm.hint(i32) intrinsic that can be used to inject hints into
the instruction stream. This is particularly useful for generating IR from a
compiler where the user may inject an intrinsic (e.g. __yield). These are then
pattern substituted into the correct instruction which already existed.
llvm-svn: 207242
Debug info: Let dbg.values inserted by LowerDbgDeclare inherit the location
of the dbg.value. This gets rid of tons of redundant variable DIEs in
subscopes.
rdar://problem/14874886, rdar://problem/16679936
llvm-svn: 207236
AllocaInst that was missing in one location.
Debug info for optimized code: Support variables that are on the stack and
described by DBG_VALUEs during their lifetime.
Previously, when a variable was at a FrameIndex for any part of its
lifetime, this would shadow all other DBG_VALUEs and only a single
fbreg location would be emitted, which in fact is only valid for a small
range and not the entire lexical scope of the variable. The included
dbg-value-const-byref testcase demonstrates this.
This patch fixes this by
Local
- emitting dbg.value intrinsics for allocas that are passed by reference
- dropping all dbg.declares (they are now fully lowered to dbg.values)
SelectionDAG
- renamed constructors for SDDbgValue for better readability.
- fix UserValue::match() to handle indirect values correctly
- not inserting an MMI table entries for dbg.values that describe allocas.
- lowering dbg.values that describe allocas into *indirect* DBG_VALUEs.
CodeGenPrepare
- leaving dbg.values for an alloca were they are (see comment)
Other
- regenerated/updated instcombine.ll testcase and included source
rdar://problem/16679879
http://reviews.llvm.org/D3374
llvm-svn: 207235
There's no need for local symbols to go through the GOT, in fact it seems GNU ld is not even emitting GOT entries for local symbols and will error out when trying to resolve a GOT relocation for a local symbol.
This bug triggers when bootstrapping clang on AArch64 Linux with -fPIC and the ARM64 backend. The AArch64 backend is not affected.
With this commit it's now possible to bootstrap clang on AArch64 Linux with the ARM64 backend (-fPIC, -O3).
llvm-svn: 207226
This patch is a supplement of implementing predicate of FP, enabling aarch64 backend
no-fp tests on arm64 target for verification. During this, one bug is exposed and
fixed by this patch.
llvm-svn: 207215
Remove the concepts of "forward" and "general" mass distributions, which
was wrong. The split might have made sense in an early version of the
algorithm, but it's definitely wrong now.
<rdar://problem/14292693>
llvm-svn: 207195
Strip irreducible testcases to pure control flow. The function calls
made the branch weights more believable but cluttered it up a lot.
There isn't going to be any constant analysis here, so just use dumb
branch logic to clarify the important parts.
<rdar://problem/14292693>
llvm-svn: 207192
AllocaInst that was missing in one location.
Debug info for optimized code: Support variables that are on the stack and
described by DBG_VALUEs during their lifetime.
Previously, when a variable was at a FrameIndex for any part of its
lifetime, this would shadow all other DBG_VALUEs and only a single
fbreg location would be emitted, which in fact is only valid for a small
range and not the entire lexical scope of the variable. The included
dbg-value-const-byref testcase demonstrates this.
This patch fixes this by
Local
- emitting dbg.value intrinsics for allocas that are passed by reference
- dropping all dbg.declares (they are now fully lowered to dbg.values)
SelectionDAG
- renamed constructors for SDDbgValue for better readability.
- fix UserValue::match() to handle indirect values correctly
- not inserting an MMI table entries for dbg.values that describe allocas.
- lowering dbg.values that describe allocas into *indirect* DBG_VALUEs.
CodeGenPrepare
- leaving dbg.values for an alloca were they are (see comment)
Other
- regenerated/updated instcombine.ll testcase and included source
rdar://problem/16679879
http://reviews.llvm.org/D3374
llvm-svn: 207165
This is similar to the 'tail' marker, except that it guarantees that
tail call optimization will occur. It also comes with convervative IR
verification rules that ensure that tail call optimization is possible.
Reviewers: nicholas
Differential Revision: http://llvm-reviews.chandlerc.com/D3240
llvm-svn: 207143
described by DBG_VALUEs during their lifetime.
Previously, when a variable was at a FrameIndex for any part of its
lifetime, this would shadow all other DBG_VALUEs and only a single
fbreg location would be emitted, which in fact is only valid for a small
range and not the entire lexical scope of the variable. The included
dbg-value-const-byref testcase demonstrates this.
This patch fixes this by
Local
- emitting dbg.value intrinsics for allocas that are passed by reference
- dropping all dbg.declares (they are now fully lowered to dbg.values)
SelectionDAG
- renamed constructors for SDDbgValue for better readability.
- fix UserValue::match() to handle indirect values correctly
- not inserting an MMI table entries for dbg.values that describe allocas.
- lowering dbg.values that describe allocas into *indirect* DBG_VALUEs.
CodeGenPrepare
- leaving dbg.values for an alloca were they are (see comment)
Other
- regenerated/updated instcombine-intrinsics testcase and included source
rdar://problem/16679879
http://reviews.llvm.org/D3374
llvm-svn: 207130
This patch:
- Adds two new X86 builtin intrinsics ('int_x86_rdtsc' and
'int_x86_rdtscp') as GCCBuiltin intrinsics;
- Teaches the backend how to lower the two new builtins;
- Introduces a common function to lower READCYCLECOUNTER dag nodes
and the two new rdtsc/rdtscp intrinsics;
- Improves (and extends) the existing x86 test 'rdtsc.ll'; now test 'rdtsc.ll'
correctly verifies that both READCYCLECOUNTER and the two new intrinsics
work fine for both 64bit and 32bit Subtargets.
llvm-svn: 207127
This skips a couple of compare ones due to the different syntaxt for
floating-point 0.0. AArch64 does it more canonically, and we'll need to fiddle
ARM64 to make it work.
llvm-svn: 207119
This matches ARM64 behaviour, which I think is clearer. It also puts all the
churn from that difference into one easily ignored commit.
llvm-svn: 207116
These can have different relocations in ELF. In particular both:
b.eq global
ldr x0, global
are valid, giving different relocations. The only possible way to distinguish
them is via a different fixup, so the operands had to be separated throughout
the backend.
llvm-svn: 207105
ARM64 was not producing pure BFI instructions for bitfield insertion
operations, unlike AArch64. The approach had to be a little different (in
ISelDAGToDAG rather than ISelLowering), and the outcomes aren't identical but
hopefully this gives it similar power.
This should address PR19424.
llvm-svn: 207102
This allows us to compile
return (mask & 0x8 ? a : b);
into
testb $8, %dil
cmovnel %edx, %esi
instead of
andl $8, %edi
shrl $3, %edi
cmovnel %edx, %esi
which we formed previously because dag combiner canonicalizes setcc of and into shift.
llvm-svn: 207088
Added support for bytes replication feature, so it could be GAS compatible.
E.g. instructions below:
"vmov.i32 d0, 0xffffffff"
"vmvn.i32 d0, 0xabababab"
"vmov.i32 d0, 0xabababab"
"vmov.i16 d0, 0xabab"
are incorrect, but we could deal with such cases.
For first one we should emit:
"vmov.i8 d0, 0xff"
For second one ("vmvn"):
"vmov.i8 d0, 0x54"
For last two instructions it should emit:
"vmov.i8 d0, 0xab"
P.S.: In ARMAsmParser.cpp I have also fixed few nearby style issues in old code.
Just for keeping method bodies in harmony with themselves.
llvm-svn: 207080
This excludes avx512 as I don't have hardware to verify. It excludes _dq
variants because they are represented in the IR as <{2,4} x i64> when it's
actually a byte shift of the entire i{128,265}.
This also excludes _dq_bs as they aren't at all supported by the backend.
There are also no corresponding instructions in the ISA. I have no idea why
they exist...
llvm-svn: 207058
Summary:
Since the upper 64 bits of the destination register are undefined when
performing this operation, we can substitute it and let the optimizer
figure out that only a copy is needed.
Also added range merging, if an instruction copies a range that can be
merged with a previous copied range.
Added test cases for both optimizations.
Reviewers: grosbach, nadav
CC: llvm-commits
Differential Revision: http://reviews.llvm.org/D3357
llvm-svn: 207055
This is dependent on changes that are not fully ready to be merged yet (WoA
object file emission). The test can be re-enabled for that target later.
llvm-svn: 207038
GCOV provides an option to prepend output file names with the source
file name, to disambiguate between covered data that's included from
multiple sources. Add a flag to llvm-cov that does the same.
llvm-svn: 207035
ANDS does not use the same encoding scheme as other xxxS instructions (e.g.,
ADDS). Take that into account to avoid wrong peephole optimization.
<rdar://problem/16693089>
llvm-svn: 207020
For now it contains a single flag, SanitizeAddress, which enables
AddressSanitizer instrumentation of inline assembly.
Patch by Yuri Gorshenin.
llvm-svn: 206971
Use -stats to see how many loops were analyzed for possible vectorization and how many of them were actually vectorized.
Patch by Zinovy Nis
Differential Revision: http://reviews.llvm.org/D3438
llvm-svn: 206956
AArch64 has feature predicates for NEON, FP and CRYPTO instructions.
This allows the compiler to generate code without using FP, NEON
or CRYPTO instructions.
llvm-svn: 206949
In the case where the constant comes from a cloned cast instruction, the
materialization code has to go before the cloned cast instruction.
This commit fixes the method that finds the materialization insertion point
by making it aware of this case.
This fixes <rdar://problem/15532441>
llvm-svn: 206913
diagnostic that includes location information.
Currently if one has this assembly:
.quad (0x1234 + (4 * SOME_VALUE))
where SOME_VALUE is undefined ones gets the less than
useful error message with no location information:
% clang -c x.s
clang -cc1as: fatal error: error in backend: expected relocatable expression
With this fix one now gets a more useful error message
with location information:
% clang -c x.s
x.s:5:8: error: expected relocatable expression
.quad (0x1234 + (4 * SOME_VALUE))
^
To do this I plumbed the SMLoc through the MCObjectStreamer
EmitValue() and EmitValueImpl() interfaces so it could be used
when creating the MCFixup.
rdar://12391022
llvm-svn: 206906
The point of these calls is to allow Thumb-1 code to make use of the VFP unit
to perform its operations. This is not desirable with -msoft-float, since most
of the reasons you'd want that apply equally to the runtime library.
rdar://problem/13766161
llvm-svn: 206874
The branch that skips irreducible backedges was only active when
propagating mass at the top-level. In particular, when propagating mass
through a loop recognized by `LoopInfo` with irreducible control flow
inside, irreducible backedges would not be skipped.
Not sure where that idea came from, but the result was that mass was
lost until after loop exit. Added a testcase that covers this case.
llvm-svn: 206860
while checking candidate for bit field extract.
Otherwise the value may not fit in uint64_t and this will trigger an
assertion.
This fixes PR19503.
llvm-svn: 206834
With a constant mask a vpermil* is just a shufflevector. This patch implements
that simplification. This allows us to produce denser code. It should also
allow more folding down the line.
llvm-svn: 206801
With this MC is able to handle _GLOBAL_OFFSET_TABLE_ in 64 bit mode, which is
needed for medium and large code models.
This fixes pr19470.
llvm-svn: 206793
The -tailcallelim pass should be checking if byval or inalloca args can
be captured before marking calls as tail calls. This was the real root
cause of PR7272.
With a better fix in place, revert the inliner change from r105255. The
test case it introduced still passes and has been moved to
test/Transforms/Inline/byval-tail-call.ll.
Reviewers: chandlerc
Differential Revision: http://reviews.llvm.org/D3403
llvm-svn: 206789
This reverts commit r206707, reapplying r206704. The preceding commit
to CalcSpillWeights should have sorted out the failing buildbots.
<rdar://problem/14292693>
llvm-svn: 206766
LazyCallGraph analysis framework. Wire it up all the way through the opt
driver and add some very basic testing that we can build pass pipelines
including these components. Still a lot more to do in terms of testing
that all of this works, but the basic pieces are here.
There is a *lot* of boiler plate here. It's something I'm going to
actively look at reducing, but I don't have any immediate ideas that
don't end up making the code terribly complex in order to fold away the
boilerplate. Until I figure out something to minimize the boilerplate,
almost all of this is based on the code for the existing pass managers,
copied and heavily adjusted to suit the needs of the CGSCC pass
management layer.
The actual CG management still has a bunch of FIXMEs in it. Notably, we
don't do *any* updating of the CG as it is potentially invalidated.
I wanted to get this in place to motivate the new analysis, and add
update APIs to the analysis and the pass management layers in concert to
make sure that the *right* APIs are present.
llvm-svn: 206745
Generating BZHI in the variable mask case, i.e. (and X, (sub (shl 1, N), 1)),
was already supported, but we were missing the constant-mask case. This patch
fixes that.
<rdar://problem/15480077>
llvm-svn: 206738
This reverts commit r206677, reapplying my BlockFrequencyInfo rewrite.
I've done a careful audit, added some asserts, and fixed a couple of
bugs (unfortunately, they were in unlikely code paths). There's a small
chance that this will appease the failing bots [1][2]. (If so, great!)
If not, I have a follow-up commit ready that will temporarily add
-debug-only=block-freq to the two failing tests, allowing me to compare
the code path between what the failing bots and what my machines (and
the rest of the bots) are doing. Once I've triggered those builds, I'll
revert both commits so the bots go green again.
[1]: http://bb.pgr.jp/builders/ninja-x64-msvc-RA-centos6/builds/1816
[2]: http://llvm-amd64.freebsd.your.org/b/builders/clang-i386-freebsd/builds/18445
<rdar://problem/14292693>
llvm-svn: 206704
Win64 stack unwinder gets confused when execution flow "falls through" after
a call to 'noreturn' function. This fixes the "missing epilogue" problem by
emitting a trap instruction for IR 'unreachable' on x86_x64-pc-windows.
A secondary use for it would be for anyone wanting to make double-sure that
'noreturn' functions, indeed, do not return.
llvm-svn: 206684
This reverts commit r206666, as planned.
Still stumped on why the bots are failing. Sanitizer bots haven't
turned anything up. If anyone can help me debug either of the failures
(referenced in r206666) I'll owe them a beer. (In the meantime, I'll be
auditing my patch for undefined behaviour.)
llvm-svn: 206677