This commit extends the coverage of the constant hoisting pass, adds additonal
debug output and updates the function names according to the style guide.
Related to <rdar://problem/16381500>
llvm-svn: 204389
For functions where esi is used as base pointer, we would previously fall back
from lowering memcpy with "rep movs" because that clobbers esi.
With this patch, we just store esi in another physical register, and restore
it afterwards. This adds a little bit of register preassure, but the more
efficient memcpy should be worth it.
Differential Revision: http://llvm-reviews.chandlerc.com/D2968
llvm-svn: 204174
Rather than LegalizeAction::Expand, this needs LegalizeAction::Promote to get
promoted to fp_to_sint v8f32->v8i32. This is a legal operation on AVX.
For that to work properly, we also need to teach the legalizer about the
specific promotion required here. The default vector promotion uses
bitcasting to a vector type of the same total size. We want to promote the
vector element type, effectively widening the operation and then truncating
the result. This is analogous to the current logic of how int_to_fp is
promoted.
The change also factors out some code from the int_to_fp promotion code to
ValueType::widenIntegerVectorElementType. This is now shared between
int_to_fp and fp_to_int.
There is no longer need for the custom lowering of fp_to_sint f32->v8i16 in
X86. It can now go through the new target-independent fp_to_*int promotion
logic.
I also checked that no other target uses Promote for these ops yet, so there
shouldn't be any unexpected change in behavior.
Fixes <rdar://problem/16202247>
llvm-svn: 204058
- Adds support for inserting vzerouppers before tail-calls.
This is enabled implicitly by having MachineInstr::copyImplicitOps preserve
regmask operands, which allows VZeroUpperInserter to see where tail-calls use
vector registers.
- Fixes a bug that caused the previous version of this optimization to miss some
vzeroupper insertion points in loops. (Loops-with-vector-code that followed
loops-without-vector-code were mistakenly overlooked by the previous version).
- New algorithm never revisits instructions.
Fixes <rdar://problem/16228798>
llvm-svn: 204021
These linkages were introduced some time ago, but it was never very
clear what exactly their semantics were or what they should be used
for. Some investigation found these uses:
* utf-16 strings in clang.
* non-unnamed_addr strings produced by the sanitizers.
It turns out they were just working around a more fundamental problem.
For some sections a MachO linker needs a symbol in order to split the
section into atoms, and llvm had no idea that was the case. I fixed
that in r201700 and it is now safe to use the private linkage. When
the object ends up in a section that requires symbols, llvm will use a
'l' prefix instead of a 'L' prefix and things just work.
With that, these linkages were already dead, but there was a potential
future user in the objc metadata information. I am still looking at
CGObjcMac.cpp, but at this point I am convinced that linker_private
and linker_private_weak are not what they need.
The objc uses are currently split in
* Regular symbols (no '\01' prefix). LLVM already directly provides
whatever semantics they need.
* Uses of a private name (start with "\01L" or "\01l") and private
linkage. We can drop the "\01L" and "\01l" prefixes as soon as llvm
agrees with clang on L being ok or not for a given section. I have two
patches in code review for this.
* Uses of private name and weak linkage.
The last case is the one that one could think would fit one of these
linkages. That is not the case. The semantics are
* the linker will merge these symbol by *name*.
* the linker will hide them in the final DSO.
Given that the merging is done by name, any of the private (or
internal) linkages would be a bad match. They allow llvm to rename the
symbols, and that is really not what we want. From the llvm point of
view, these objects should really be (linkonce|weak)(_odr)?.
For now, just keeping the "\01l" prefix is probably the best for these
symbols. If we one day want to have a more direct support in llvm,
IMHO what we should add is not a linkage, it is just a hidden_symbol
attribute. It would be applicable to multiple linkages. For example,
on weak it would produce the current behavior we have for objc
metadata. On internal, it would be equivalent to private (and we
should then remove private).
llvm-svn: 203866
This patch fixes the bug in peephole optimization that folds a load which defines one vreg into the one and only use of that vreg. With debug info, a DBG_VALUE that referenced the vreg considered to be a use, preventing the optimization. The fix is to ignore DBG_VALUE's during the optimization, and undef a DBG_VALUE that references a vreg that gets removed.
Patch by Trevor Smigiel!
llvm-svn: 203829
Summary:
This helps the instruction selector to lower an i64 * i64 -> i128
multiplication into a single instruction on targets which support it.
This is an update of D2973 which was reverted because of a bug reported
as PR19084.
Reviewers: t.p.northover, chapuni
Reviewed By: t.p.northover
CC: llvm-commits, alex, chapuni
Differential Revision: http://llvm-reviews.chandlerc.com/D3021
llvm-svn: 203797
Extend what's currently done for shift because the HW performs this masking
implicitly:
(rotl:i32 x, (and y, 31)) -> (rotl:i32 x, y)
I use the newly factored out multiclass that was only supporting shifts so
far.
For testing I extended my testcase for the new rotation idiom.
<rdar://problem/15295856>
llvm-svn: 203718
On ELF and COFF an alias is just another name for a position in the file.
There is no way to refer to a position in another file, so an alias to
undefined is meaningless.
MachO currently doesn't support aliases. The spec has a N_INDR, which when
implemented will have a different set of restrictions. Adding support for
it shouldn't be harder than any other IR extension.
For now, having the IR represent what is actually possible with current
tools makes it easier to fix the design of GlobalAlias.
llvm-svn: 203705
This fixes the bug where we would bitcast the 64-bit floating point result
of cmpneqsd to a 64-bit integer even on 32-bit targets.
Differential Revision: http://llvm-reviews.chandlerc.com/D3009
llvm-svn: 203581
The syntax for "cmpxchg" should now look something like:
cmpxchg i32* %addr, i32 42, i32 3 acquire monotonic
where the second ordering argument gives the required semantics in the case
that no exchange takes place. It should be no stronger than the first ordering
constraint and cannot be either "release" or "acq_rel" (since no store will
have taken place).
rdar://problem/15996804
llvm-svn: 203559
When the MOVBE instructions are available, use them for 16-bit endian
swapping as well as for 32 and 64 bit.
The patterns were already present on the instructions, but weren't being
matched because the operation was unconditionally marked to 'Expand.'
Change that to be conditional on whether the MOVBE instructions are
available. Use 'rolw' to implement the in-register version (32 and 64
bit have the dedicated 'bswap' instruction for that).
Patch by Louis Gerbarg <lgg@apple.com>.
rdar://15479984
llvm-svn: 203524
The grammar for LLVM IR is not well specified in any document but seems
to obey the following rules:
- Attributes which have parenthesized arguments are never preceded by
commas. This form of attribute is the only one which ever has
optional arguments. However, not all of these attributes support
optional arguments: 'thread_local' supports an optional argument but
'addrspace' does not. Interestingly, 'addrspace' is documented as
being a "qualifier". What constitutes a qualifier? I cannot find a
definition.
- Some attributes use a space between the keyword and the value.
Examples of this form are 'align' and 'section'. These are always
preceded by a comma.
- Otherwise, the attribute has no argument. These attributes do not
have a preceding comma.
Sometimes an attribute goes before the instruction, between the
instruction and it's type, or after it's type. 'atomicrmw' has
'volatile' between the instruction and the type while 'call' has 'tail'
preceding the instruction.
With all this in mind, it seems most consistent for 'inalloca' on an
'inalloca' instruction to occur before between the instruction and the
type. Unlike the current formulation, there would be no preceding
comma. The combination 'alloca inalloca' doesn't look particularly
appetizing, perhaps a better spelling of 'inalloca' is down the road.
llvm-svn: 203376
This is the new idiom:
x<<(y&31) | x>>((0-y)&31)
which is recognized as:
x ROTL (y&31)
The change refines matchRotateSub. In
Neg & (OpSize - 1) == (OpSize - Pos) & (OpSize - 1), if Pos is
Pos' & (OpSize - 1) we can just use Pos' instead of Pos.
llvm-svn: 203315
be split and the result type widened.
When the condition of a vselect has to be split it makes no sense widening the
vselect and thereby widening the condition. We end up in an endless loop of
widening (vselect result type) and splitting (condition mask type) doing this.
Instead, split both the condition and the vselect and widen the result.
I ran this over the test suite with i686 and mattr=+sse and saw no regressions.
Fixes PR18036.
llvm-svn: 203311
This helps the instruction selector to lower an i64 * i64 -> i128
multiplication into a single instruction on targets which support it.
Patch by Manuel Jacob.
llvm-svn: 203230
The old system was fairly convoluted:
* A temporary label was created.
* A single PROLOG_LABEL was created with it.
* A few MCCFIInstructions were created with the same label.
The semantics were that the cfi instructions were mapped to the PROLOG_LABEL
via the temporary label. The output position was that of the PROLOG_LABEL.
The temporary label itself was used only for doing the mapping.
The new CFI_INSTRUCTION has a 1:1 mapping to MCCFIInstructions and points to
one by holding an index into the CFI instructions of this function.
I did consider removing MMI.getFrameInstructions completelly and having
CFI_INSTRUCTION own a MCCFIInstruction, but MCCFIInstructions have non
trivial constructors and destructors and are somewhat big, so the this setup
is probably better.
The net result is that we don't create temporary labels that are never used.
llvm-svn: 203204
This patch teaches the DAGCombiner how to fold a binary OR between two
shufflevector into a single shuffle vector when possible.
The rules are:
1. fold (or (shuf A, V_0, MA), (shuf B, V_0, MB)) -> (shuf A, B, Mask1)
2. fold (or (shuf A, V_0, MA), (shuf B, V_0, MB)) -> (shuf B, A, Mask2)
The DAGCombiner can take advantage of the fact that OR is commutative and
compute two possible shuffle masks (Mask1 and Mask2) for the resulting
shuffle node.
Before folding a dag according to either rule 1 or 2, DAGCombiner verifies
that the resulting shuffle mask is legal for the target.
DAGCombiner would firstly try to fold according to 1.; If not possible
then it will try to fold according to 2.
If both Mask1 and Mask2 are illegal then we conservatively don't fold
the OR instruction.
llvm-svn: 203156
Before llvm-mc would print it, but llc was assuming that it would produce
another section changing directive before one was needed. That assumption is
false with inline asm.
Fixes PR19049.
Another option would be to always create the section, but in the asm printer
avoid printing sections changes during initialization. That would work, but
* We do use the fact that llvm-mc prints it in testing. The tests can be changed
if needed.
* A quick poll on IRC suggest that most developers prefer the implicit .text to
be printed.
llvm-svn: 203001
Patchpoints already did this. Doing it for stackmaps is a convenience
for the runtime in the event that it needs to scratch register to
patch or perform a runtime call thunk.
Unlike patchpoints, we just assume the AnyRegCC calling
convention. This is the only language and target independent calling
convention specific to stackmaps so makes sense. Although the calling
convention is not currently used to select the scratch registers.
llvm-svn: 202943
selection dag (PR19012)
In X86SelectionDagInfo::EmitTargetCodeForMemcpy we check with MachineFrameInfo
to make sure that ESI isn't used as a base pointer register before we choose to
emit rep movs (which clobbers esi).
The problem is that MachineFrameInfo wouldn't know about dynamic allocas or
inline asm that clobbers the stack pointer until SelectionDAGBuilder has
encountered them.
This patch fixes the problem by checking for such things when building the
FunctionLoweringInfo.
Differential Revision: http://llvm-reviews.chandlerc.com/D2954
llvm-svn: 202930
Inside iterate, we scan backwards then scan forwards in a loop. When iteration
is not zero, the last node was just updated so we can skip it. But when
iteration is zero, we can't skip the last node.
For the testing case, fixing this will save a spill and move register copies
from hot path to cold path.
llvm-svn: 202557
Summary:
Fixes an issue where a test attempts to use -mcpu=x86-64 on non-X86-64 targets.
This triggers an assertion in the MIPS backend since it doesn't know what ABI to
use by default for unrecognized processors.
CC: llvm-commits, rafael
Differential Revision: http://llvm-reviews.chandlerc.com/D2877
llvm-svn: 202369
This handles pathological cases in which we see 2x increase in spill
code for large blocks (~50k instructions). I don't have a unit test
for this behavior.
Fixes rdar://16072279.
llvm-svn: 202304
The current approach to lower a vsetult is to flip the sign bit of the
operands, swap the operands and then use a (signed) pcmpgt. psubus (unsigned
saturating subtract) can be used to emulate a vsetult more efficiently:
+ case ISD::SETULT: {
+ // If the comparison is against a constant we can turn this into a
+ // setule. With psubus, setule does not require a swap. This is
+ // beneficial because the constant in the register is no longer
+ // destructed as the destination so it can be hoisted out of a loop.
I also enable lowering via psubus in a few other cases where it's clearly
beneficial: setule and setuge if minu/maxu cannot be used.
rdar://problem/14338765
Patch by Adam Nemet <anemet@apple.com>.
llvm-svn: 202301
Now that DataLayout is not a pass, store one in Module.
Since the C API expects to be able to get a char* to the datalayout description,
we have to keep a std::string somewhere. This patch keeps it in Module and also
uses it to represent modules without a DataLayout.
Once DataLayout is mandatory, we should probably move the string to DataLayout
itself since it won't be necessary anymore to represent the special case of a
module without a DataLayout.
llvm-svn: 202190
For targeting pecoff, ".def foo" appears before ".short 32".
.def foo;
...
.LCPI0_0:
.short 32
foo:
CHECK-LABEL seeks not from ".short 32" but from the top of the input.
llvm-svn: 201931
shifted mask rather than masking and shifting separately.
The patch adds this transformation to the DAGCombiner:
(shl (and (setcc:i8v16 ...) N01C) N1C) -> (and (setcc:i8v16 ...) N01C<<N1C)
<rdar://problem/16054492>
Patch by Adam Nemet <anemet@apple.com>
llvm-svn: 201906
r201608 made llvm corretly handle private globals with MachO. r201622 fixed
a bug in it and r201624 and r201625 were changes for using private linkage,
assuming that llvm would do the right thing.
They all got reverted because r201608 introduced a crash in LTO. This patch
includes a fix for that. The issue was that TargetLoweringObjectFile now has
to be initialized before we can mangle names of private globals. This is
trivially true during the normal codegen pipeline (the asm printer does it),
but LTO has to do it manually.
llvm-svn: 201700
When outputting an object we check its section to find its name, but when
looking for the section with -ffunction-section we look for the symbol name.
Break the loop by requesting a name with the private prefix when constructing
the section name. This matches the behavior before r201608.
llvm-svn: 201622
The IR
@foo = private constant i32 42
is valid, but before this patch we would produce an invalid MachO from it. It
was invalid because it would use an L label in a section where the liker needs
the labels in order to atomize it.
One way of fixing it would be to just reject this IR in the backend, but that
would not be very front end friendly.
What this patch does is use an 'l' prefix in sections that we know the linker
requires symbols for atomizing them. This allows frontends to just use
private and not worry about which sections they go to or how the linker handles
them.
One small issue with this strategy is that now a symbol name depends on the
section, which is not available before codegen. This is not a problem in
practice. The reason is that it only happens with private linkage, which will
be ignored by the non codegen users (llvm-nm and llvm-ar).
llvm-svn: 201608
Summary:
AsmPrinter::EmitInlineAsm() will no longer use the EmitRawText() call for
targets with mature MC support. Such targets will always parse the inline
assembly (even when emitting assembly). Targets without mature MC support
continue to use EmitRawText() for assembly output.
The hasRawTextSupport() check in AsmPrinter::EmitInlineAsm() has been replaced
with MCAsmInfo::UseIntegratedAs which when true, causes the integrated assembler
to parse inline assembly (even when emitting assembly output). UseIntegratedAs
is set to true for targets that consider any failure to parse valid assembly
to be a bug. Target specific subclasses generally enable the integrated
assembler in their constructor. The default value can be overridden with
-no-integrated-as.
All tests that rely on inline assembly supporting invalid assembly (for example,
those that use mnemonics such as 'foo' or 'hello world') have been updated to
disable the integrated assembler.
Changes since review (and last commit attempt):
- Fixed test failures that were missed due to configuration of local build.
(fixes crash.ll and a couple others).
- Fixed tests that happened to pass because the local build was on X86
(should fix 2007-12-17-InvokeAsm.ll)
- mature-mc-support.ll's should no longer require all targets to be compiled.
(should fix ARM and PPC buildbots)
- Object output (-filetype=obj and similar) now forces the integrated assembler
to be enabled regardless of default setting or -no-integrated-as.
(should fix SystemZ buildbots)
Reviewers: rafael
Reviewed By: rafael
CC: llvm-commits
Differential Revision: http://llvm-reviews.chandlerc.com/D2686
llvm-svn: 201333
This fix checks the original LLVM IR node to identify opaque constants by
looking for the bitcast-constant pattern. Originally we looked at the generated
SDNode, but this might lead to incorrect results. The SDNode could have been
generated by an constant expression that was folded to a constant.
This fixes <rdar://problem/16050719>
llvm-svn: 201291
Instead of expanding a packed shift into a sequence of scalar shifts,
the backend now tries (when possible) to convert the vector shift into a
vector multiply.
Before this change, a shift of a MVT::v8i16 vector by a
build_vector of constants was always scalarized into a long sequence of "vector
extracts + scalar shifts + vector insert".
With this change, if there is SSE2 support, we emit a single vector multiply.
This change also affects SSE4.1, AVX, AVX2 shifts:
- A shift of a MVT::v4i32 vector by a build_vector of non uniform constants
is now lowered when possible into a single SSE4.1 vector multiply.
- Packed v16i16 shift left by constant build_vector are now expanded when
possible into a single AVX2 vpmullw.
This change also improves the lowering of AVX512f vector shifts.
Added test CodeGen/X86/vec_shift6.ll with some code examples that are affected
by this change.
llvm-svn: 201271
Summary:
AsmPrinter::EmitInlineAsm() will no longer use the EmitRawText() call for targets with mature MC support. Such targets will always parse the inline assembly (even when emitting assembly). Targets without mature MC support continue to use EmitRawText() for assembly output.
The hasRawTextSupport() check in AsmPrinter::EmitInlineAsm() has been replaced with MCAsmInfo::UseIntegratedAs which when true, causes the integrated assembler to parse inline assembly (even when emitting assembly output). UseIntegratedAs is set to true for targets that consider any failure to parse valid assembly to be a bug. Target specific subclasses generally enable the integrated assembler in their constructor. The default value can be overridden with -no-integrated-as.
All tests that rely on inline assembly supporting invalid assembly (for example, those that use mnemonics such as 'foo' or 'hello world') have been updated to disable the integrated assembler.
Reviewers: rafael
Reviewed By: rafael
CC: llvm-commits
Differential Revision: http://llvm-reviews.chandlerc.com/D2686
llvm-svn: 201237
These tests were unnecessarily sensitive to the presence and ordering of
elements in the line table file_names list which will break on a future
change I'm working on.
llvm-svn: 201185
BUILD_VECTOR nodes, e.g.:
(concat_vectors (BUILD_VECTOR a1, a2, a3, a4), (BUILD_VECTOR b1, b2, b3, b4))
->
(BUILD_VECTOR a1, a2, a3, a4, b1, b2, b3, b4)
This fixes an issue with AVX, where a sequence was not recognized as a 256-bit
vbroadcast due to the concat_vectors.
llvm-svn: 201158
profitability check due to some other checks in the addressing
mode matcher. I.e., test case for commit r201121.
<rdar://problem/16020230>
llvm-svn: 201132
Generalize the AArch64 .td nodes for AssertZext and AssertSext. Use
them to match the relevant pextr store instructions.
The test widen_load-2.ll requires a slight change because with the
stores gone, the remaining instructions are scheduled in a different
order.
Add test cases for SSE4 and AVX variants.
Resolves rdar://13414672.
Patch by Adam Nemet <anemet@apple.com>.
llvm-svn: 200957
mode.
Basically the idea is to transform code like this:
%idx = add nsw i32 %a, 1
%sextidx = sext i32 %idx to i64
%gep = gep i8* %myArray, i64 %sextidx
load i8* %gep
Into:
%sexta = sext i32 %a to i64
%idx = add nsw i64 %sexta, 1
%gep = gep i8* %myArray, i64 %idx
load i8* %gep
That way the computation can be folded into the addressing mode.
This transformation is done as part of the addressing mode matcher.
If the matching fails (not profitable, addressing mode not legal, etc.), the
matcher will revert the related promotions.
<rdar://problem/15519855>
llvm-svn: 200947
find a register.
The idea is to choose a color for the variable that cannot be allocated and
recolor its interferences around. Unlike the current register allocation scheme,
it is allowed to change the color of an already assigned (but maybe not
splittable or spillable) live interval while propagating this change to its
neighbors.
In other word, there are two things that may help finding an available color:
- Already assigned variables (RS_Done) can be recolored to different color.
- The recoloring allows to catch solutions that needs to touch more that just
the neighbors of the current allocated variable.
E.g.,
vA can use {R1, R2 }
vB can use { R2, R3}
vC can use {R1 }
Where vA, vB, and vC cannot be split anymore (they are reloads for instance) and
they all interfere.
vA is assigned R1
vB is assigned R2
vC tries to evict vA but vA is already done.
=> Regular register allocation heuristic fails.
Last chance recoloring kicks in:
vC does as if vA was evicted => vC uses R1.
vC is marked as fixed.
vA needs to find a color.
None are available.
vA cannot evict vC: vC is a fixed virtual register now.
vA does as if vB was evicted => vA uses R2.
vB needs to find a color.
R3 is available.
Recoloring => vC = R1, vA = R2, vB = R3.
<rdar://problem/15947839>
llvm-svn: 200883
A bunch of test cases needed to be cleaned up for this, many my fault -
when implementid imported modules I updated test cases by simply
duplicating the prior metadata field - which wasn't always the empty
metadata entry.
llvm-svn: 200731
This changes the PrologueEpilogInserter and LocalStackSlotAllocation passes to
follow the extended stack layout rules for sspstrong and sspreq.
The sspstrong layout rules are:
1. Large arrays and structures containing large arrays (>= ssp-buffer-size)
are closest to the stack protector.
2. Small arrays and structures containing small arrays (< ssp-buffer-size) are
2nd closest to the protector.
3. Variables that have had their address taken are 3rd closest to the
protector.
Differential Revision: http://llvm-reviews.chandlerc.com/D2546
llvm-svn: 200601
Calls with inalloca are lowered by skipping all stores for arguments
passed in memory and the initial stack adjustment to allocate argument
memory.
Now the frontend is responsible for the memory layout, and the backend
doesn't have to do any work. As a result these changes are pretty
minimal.
Reviewers: echristo
Differential Revision: http://llvm-reviews.chandlerc.com/D2637
llvm-svn: 200596
Allocas marked inalloca are never static, but we were trying to put them
into the static alloca map if they were in the entry block. Also add an
assertion in x86 fastisel.
llvm-svn: 200593
It looks like these pseudos were only used for pattern matching. Def pats are
the appropriate way to do that. As a bonus, these intrinsics will now have
memory operands folded properly, and better FMA3 variants selected where
appropriate (see r199933).
<rdar://problem/15611947>
llvm-svn: 200577
MSVC always places the 'this' parameter for a method first. The
implicit 'sret' pointer for methods always comes second. We already
implement this for __thiscall by putting sret parameters on the stack,
but __cdecl methods require putting both parameters on the stack in
opposite order.
Using a special calling convention allows frontends to keep the sret
parameter first, which avoids breaking lots of assumptions in LLVM and
Clang.
Fixes PR15768 with the corresponding change in Clang.
Reviewers: ributzka, majnemer
Differential Revision: http://llvm-reviews.chandlerc.com/D2663
llvm-svn: 200561
when the input is a concat_vectors and the insert replaces one of the
concat halves:
Lower half: fold (insert_subvector (concat_vectors X, Y), Z) ->
(concat_vectors Z, Y)
Upper half: fold (insert_subvector (concat_vectors X, Y), Z) ->
(concat_vectors X, Z)
This can be seen with the following IR:
define <8 x float> @lower_half(<4 x float> %v1, <4 x float> %v2, <4 x
float> %v3) {
%1 = shufflevector <4 x float> %v1, <4 x float> %v2, <8 x i32> <i32
0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
%2 = tail call <8 x float> @llvm.x86.avx.vinsertf128.ps.256(<8 x
float> %1, <4 x float> %v3, i8 0)
The vinsertf128 intrinsic is converted into an insert_subvector node
in SelectionDAGBuilder.cpp.
Using AVX, without the patch this generates two vinsertf128 instructions:
vinsertf128 $1, %xmm1, %ymm0, %ymm0
vinsertf128 $0, %xmm2, %ymm0, %ymm0
With the patch this is optimized into:
vinsertf128 $1, %xmm1, %ymm2, %ymm0
Patch by Robert Lougher.
llvm-svn: 200506
When converting from "or + br" to two branches, or converting from
"and + br" to two branches, we correctly update the edge weights of
the two branches.
The previous attempt at r200431 was reverted at r200434 because of
two testing case failures. I modified my patch a little, but forgot
to re-run "make check-all".
Testing case CodeGen/ARM/lsr-unfolded-offset.ll is updated because of
the patch's impact on branch probability which causes changes in
spill placement.
llvm-svn: 200502
When converting from "or + br" to two branches, or converting from
"and + br" to two branches, we correctly update the edge weights of
the two branches.
llvm-svn: 200431
This is a bit more convenient for some callers, but more importantly, it is
easier to implement correctly. Doing this removes the patching of already
printed data that was used for fastcall, fixing a crash with private fastcall
symbols.
llvm-svn: 200367
Make sure that we don't introduce illegal build_vector dag nodes
when trying to fold a sign_extend of a build_vector.
This fixes a regression introduced by r200234.
Added test CodeGen/X86/fold-vector-sext-crash.ll
to verify that llc no longer crashes with an assertion failure
due to an illegal build_vector of type MVT::v4i64.
Thanks to Ilia Filippov for spotting this regression and for
providing a reproducible test case.
llvm-svn: 200313
This patch teaches the DAGCombiner how to fold a sext/aext/zext dag node when
the operand in input is a build vector of constants (or UNDEFs).
The inability to fold a sext/zext of a constant build_vector was the root
cause of some pcg bugs affecting vselect expansion on x86-64 with AVX support.
Before this change, the DAGCombiner only knew how to fold a sext/zext/aext of a
ConstantSDNode.
llvm-svn: 200234
This commit caused -Woverloaded-virtual warnings. The two new
TargetTransformInfo::getIntImmCost functions were only added to the superclass,
and to the X86 subclass. The other targets were not updated, and the
warning highlighted this by pointing out that e.g. ARMTTI::getIntImmCost was
hiding the two new getIntImmCost variants.
We could pacify the warning by adding "using TargetTransformInfo::getIntImmCost"
to the various subclasses, or turning it off, but I suspect that it's wrong to
leave the functions unimplemnted in those targets. The default implementations
return TCC_Free, which I don't think is right e.g. for ARM.
llvm-svn: 200058
Retry commit r200022 with a fix for the build bot errors. Constant expressions
have (unlike instructions) module scope use lists and therefore may have users
in different functions. The fix is to simply ignore these out-of-function uses.
llvm-svn: 200034
This pass identifies expensive constants to hoist and coalesces them to
better prepare it for SelectionDAG-based code generation. This works around the
limitations of the basic-block-at-a-time approach.
First it scans all instructions for integer constants and calculates its
cost. If the constant can be folded into the instruction (the cost is
TCC_Free) or the cost is just a simple operation (TCC_BASIC), then we don't
consider it expensive and leave it alone. This is the default behavior and
the default implementation of getIntImmCost will always return TCC_Free.
If the cost is more than TCC_BASIC, then the integer constant can't be folded
into the instruction and it might be beneficial to hoist the constant.
Similar constants are coalesced to reduce register pressure and
materialization code.
When a constant is hoisted, it is also hidden behind a bitcast to force it to
be live-out of the basic block. Otherwise the constant would be just
duplicated and each basic block would have its own copy in the SelectionDAG.
The SelectionDAG recognizes such constants as opaque and doesn't perform
certain transformations on them, which would create a new expensive constant.
This optimization is only applied to integer constants in instructions and
simple (this means not nested) constant cast experessions. For example:
%0 = load i64* inttoptr (i64 big_constant to i64*)
Reviewed by Eric
llvm-svn: 200022
This commit teaches the X86 backend to create the same X86 instructions when it
lowers an sadd/ssub with overflow intrinsic and a conditional branch that uses
that overflow result. This allows SelectionDAG to recognize and remove one of
the redundant operations.
This fixes <rdar://problem/15874016> and <rdar://problem/15661073>.
Reviewed by Nadav
llvm-svn: 199976
This is a horrible bit of code. We're calling a simplification routine *in the middle* of type legalization. We tell the
simplification routine that it's running after legalization, but some of the types it will encounter will be illegal! The
fix is only to invoke the simplification if the types in question were legal, so that none of its invariants will be violated.
llvm-svn: 199847
This actually totally breaks and causes the machine verifier to cry in several cases, one of which being:
%RAX<def> = COPY %RCX<kill>
%ECX<def> = COPY %EAX<kill>, %RAX<imp-use,kill>
These subregister copies are together identified as noops, so are both removed. However, the second one as it has an imp-use gets converted into a kill:
%ECX<def> = KILL %EAX<kill>, %RAX<imp-use,kill>
As the original COPY has been removed, the verifier goes into tears at the use of undefined EAX and RAX.
There are several hacky solutions to this hacky problem (which is all to do with imp-use/def weirdnesses), but the least hacky I've come up with is to *always* remove COPYs by converting to KILLs. KILLs are no-ops to the code generator so the generated code doesn't change (which is why they were partially used in the first place), but using them also keeps the def/use and imp-def/imp-use chains alive:
%RAX<def> = KILL %RCX<kill>
%ECX<def> = KILL %EAX<kill>, %RAX<imp-use,kill>
The patch passes all test cases including the ones that check the removal of MOVs in this circumstance, along with an extra test I added to check subregister behaviour (which made the machine verifier fall over before my patch).
The patch also adds some DEBUG() statements because the file hadn't got any.
llvm-svn: 199797
Add target specific rules for combining vselect dag nodes into movss/movsd
when possible.
If the vector type of the vselect dag node in input is either MVT::v4i13 or
MVT::v4f32, then try to fold according to rules:
1) fold (vselect (build_vector (0, -1, -1, -1)), A, B) -> (movss A, B)
2) fold (vselect (build_vector (-1, 0, 0, 0)), A, B) -> (movss B, A)
If the vector type of the vselect dag node in input is either MVT::v2i64 or
MVT::v2f64 (and we have SSE2), then try to fold according to rules:
3) fold (vselect (build_vector (0, -1)), A, B) -> (movsd A, B)
4) fold (vselect (build_vector (-1, 0)), A, B) -> (movsd B, A)
llvm-svn: 199683
The way that stack coloring updated MMOs when merging stack slots, while
correct, is suboptimal, and is incompatible with the use of AA during
instruction scheduling. The solution, which involves the use of const_cast (and
more importantly, updating the IR from within an MI-level pass), obviously
requires some explanation:
When the stack coloring pass was originally committed, the code in
ScheduleDAGInstrs::buildSchedGraph tracked possible alias sets by using
GetUnderlyingObject, and all load/store and store/store memory control
dependencies where added between SUs at the object level (where only one
object, that returned by GetUnderlyingObject, was used to identify the object
associated with each MMO). When stack coloring merged stack slots, it would
replace MMOs derived from the remapped alloca with the alloca with which the
remapped alloca was being replaced. Because ScheduleDAGInstrs only used single
objects, and tracked alias sets at the object level, this was a fine solution.
In r169744, (Andy and) I updated the code in ScheduleDAGInstrs to use
GetUnderlyingObjects, and track alias sets using, potentially, multiple
underlying objects for each MMO. This was done, primarily, to provide the
ability to look through PHIs, and provide better scheduling for
induction-variable-dependent loads and stores inside loops. At this point, the
MMO-updating code in stack coloring became suboptimal, because it would clear
the MMOs for (i.e. completely pessimize) all instructions for which r169744
might help in scheduling. Updating the IR directly is the simplest fix for this
(and the one with, by far, the least compile-time impact), but others are
possible (we could give each MMO a small vector of potential values, or make
use of a remapping table, constructed from MFI, inside ScheduleDAGInstrs).
Unfortunately, replacing all MMO values derived from the remapped alloca with
the base replacement alloca fundamentally breaks our ability to use AA during
instruction scheduling (which is critical to performance on some targets). The
reason is that the original MMO might have had an offset (either constant or
dynamic) from the base remapped alloca, and that offset is not present in the
updated MMO. One possible way around this would be to use
GetPointerBaseWithConstantOffset, and update not only the MMO's value, but also
its offset based on the original offset. Unfortunately, this solution would
only handle constant offsets, and for safety (because AA is not completely
restricted to deducing relationships with constant offsets), we would need to
clear all MMOs without constant offsets over the entire function. This would be
an even worse pessimization than the current single-object restriction. Any
other solution would involve passing around a vector of remapped allocas, and
teaching AA to use it, introducing additional complexity and overhead into AA.
Instead, when remapping an alloca, we replace all IR uses of that alloca as
well (optionally inserting a bitcast as necessary). This is even more efficient
that the old MMO-updating code in the stack coloring pass (because it removes
the need to call GetUnderlyingObject on all MMO values), removes the
single-object pessimization in the default configuration, and enables the
correct use of AA during instruction scheduling (all without any additional
overhead).
LLVM now no longer miscompiles itself on x86_64 when using -enable-misched
-enable-aa-sched-mi -misched-bottomup=0 -misched-topdown=0 -misched=shuffle!
Fixed PR18497.
Because the alloca replacement is now done at the IR level, unless the MMO
directly refers to the remapped alloca, the change cannot be seen at the MI
level. As a result, there is no good way to fix test/CodeGen/X86/pr14090.ll.
llvm-svn: 199658
This patch adds two new target-independent calling conventions for runtime
calls - PreserveMost and PreserveAll.
The target-specific implementation for X86-64 is defined as following:
- Arguments are passed as for the default C calling convention
- The same applies for the return value(s)
- PreserveMost preserves all GPRs - except R11
- PreserveAll preserves all GPRs and all XMMs/YMMs - except R11
Reviewed by Lang and Philip
llvm-svn: 199508
This fixes a regression intruced by r199135.
Revision 199135 tried to simplify part of the logic in method
DAGCombiner::SimplifyVBinOp introducing calls to method BuildVectorSDNode::isConstant().
However, that revision wrongly changed the check performed by method
SimplifyVBinOp to identify dag nodes that can be folded.
Before revision 199135, that method only tried to simplify vector binary operations
if both operands were build_vector of Constant/ConstantFP/Undef only.
After revision 199135, method SimplifyVBinop tried to
simplify also vector binary operations with only one constant operand.
This fixes the problem restoring the old behavior of SimplifyVBinOp.
llvm-svn: 199328
Representing dllexport/dllimport as distinct linkage types prevents using
these attributes on templates and inline functions.
Instead of introducing further mixed linkage types to include linkonce and
weak ODR, the old import/export linkage types are replaced with a new
separate visibility-like specifier:
define available_externally dllimport void @f() {}
@Var = dllexport global i32 1, align 4
Linkage for dllexported globals and functions is now equal to their linkage
without dllexport. Imported globals and functions must be either
declarations with external linkage, or definitions with
AvailableExternallyLinkage.
llvm-svn: 199218
This fixes a regression intruced by r198113.
Revision r198113 introduced an algorithm that tries to fold a vector shift
by immediate count into a build_vector if the input vector is a known vector
of constants.
However the algorithm only worked under the assumption that the input vector
type and the shift type are exactly the same.
This patch disables the folding of vector shift by immediate count if the
input vector type and the shift value type are not the same.
llvm-svn: 199213
Representing dllexport/dllimport as distinct linkage types prevents using
these attributes on templates and inline functions.
Instead of introducing further mixed linkage types to include linkonce and
weak ODR, the old import/export linkage types are replaced with a new
separate visibility-like specifier:
define available_externally dllimport void @f() {}
@Var = dllexport global i32 1, align 4
Linkage for dllexported globals and functions is now equal to their linkage
without dllexport. Imported globals and functions must be either
declarations with external linkage, or definitions with
AvailableExternallyLinkage.
llvm-svn: 199204
We need to ensure that StackSlotColoring.cpp does not reuse stack
spill slots in functions that call "returns_twice" functions such as
setjmp(), otherwise this can lead to miscompiled code, because a stack
slot would be clobbered when it's still live.
This was already handled correctly for functions that call setjmp()
(though this wasn't covered by a test), but not for functions that
invoke setjmp().
We fix this by changing callsFunctionThatReturnsTwice() to check for
invoke instructions.
This fixes PR18244.
llvm-svn: 199180
This commit teaches DAG to reassociate vector ops, which in turn enables
constant folding of vector op chains that appear later on during custom lowering
and DAG combine.
Reviewed by Andrea Di Biagio
llvm-svn: 199135
Use separate callee-save masks for XMM and YMM registers for anyregcc on X86 and
select the proper mask depending on the target cpu we compile for.
llvm-svn: 198985
In the stackmap format we advertise the constant field as signed.
However, we were determining whether to promote to a 64-bit constant
pool based on an unsigned comparison.
This fix allows -1 to be encoded as a small constant.
llvm-svn: 198816
MIsNeedChainEdge, which is used by -enable-aa-sched-mi (AA in misched), had an
llvm_unreachable when -enable-aa-sched-mi is enabled and we reach an
instruction with multiple MMOs. Instead, return a conservative answer. This
allows testing -enable-aa-sched-mi on x86.
Also, this moves the check above the isUnsafeMemoryObject checks.
isUnsafeMemoryObject is currently correct only for instructions with one MMO
(as noted in the comment in isUnsafeMemoryObject):
// We purposefully do no check for hasOneMemOperand() here
// in hope to trigger an assert downstream in order to
// finish implementation.
The problem with this is that, had the candidate edge passed the
"!MIa->mayStore() && !MIb->mayStore()" check, the hoped-for assert would never
happen (which could, in theory, lead to incorrect behavior if one of these
secondary MMOs was volatile, for example).
llvm-svn: 198795
I couldn't see how to do this sanely without splitting RETQ from RETL.
Eric says: "sad about the inability to roundtrip them now, but...".
I have no idea what that means, but perhaps it wants preserving in the
commit comment.
llvm-svn: 198756
Modern versions of OSX/Darwin's ld (ld64 > 97.17) have an optimisation present that allows the back end to omit relocations (and replace them with an absolute difference) for FDE some text section refs.
This patch allows a backend to opt-in to this behaviour by setting "DwarfFDESymbolsUseAbsDiff". At present, this is only enabled for modern x86 OSX ports.
test changes by David Fang.
llvm-svn: 198744
With the gnu objc runtime private strings are used. Since we only need to
produce a unique label, the fix is to just drop the asserts.
llvm-svn: 198701
The greedy register allocator tries to split a live-range around each
instruction where it is used or defined to relax the constraints on the entire
live-range (this is a last chance split before falling back to spill).
The goal is to have a big live-range that is unconstrained (i.e., that can use
the largest legal register class) and several small local live-range that carry
the constraints implied by each instruction.
E.g.,
Let csti be the constraints on operation i.
V1=
op1 V1(cst1)
op2 V1(cst2)
V1 live-range is constrained on the intersection of cst1 and cst2.
tryInstructionSplit relaxes those constraints by aggressively splitting each
def/use point:
V1=
V2 = V1
V3 = V2
op1 V3(cst1)
V4 = V2
op2 V4(cst2)
Because of how the coalescer infrastructure works, each new variable (V3, V4)
that is alive at the same time as V1 (or its copy, here V2) interfere with V1.
Thus, we end up with an uncoalescable copy for each split point.
To make tryInstructionSplit less aggressive, we check if the split point
actually relaxes the constraints on the whole live-range. If it does not, we do
not insert it.
Indeed, it will not help the global allocation problem:
- V1 will have the same constraints.
- V1 will have the same interference + possibly the newly added split variable
VS.
- VS will produce an uncoalesceable copy if alive at the same time as V1.
<rdar://problem/15570057>
llvm-svn: 198369
vector shift by immedate count (VSHLI/VSRLI/VSRAI) into a build_vector when
the vector in input to the shift is a build_vector of all constants or UNDEFs.
Target specific nodes for packed shifts by immediate count are in
general introduced by function 'getTargetVShiftByConstNode' (in
X86ISelLowering.cpp) when lowering shift operations, SSE/AVX immediate
shift intrinsics and (only in very few cases) SIGN_EXTEND_INREG dag
nodes.
This patch adds extra rules for simplifying vector shifts inside
function 'getTargetVShiftByConstNode'.
Added file test/CodeGen/X86/vec_shift5.ll to verify that packed
shifts by immediate are correctly folded into a build_vector when the
input vector to the shift dag node is a vector of constants or undefs.
llvm-svn: 198113
ConstantSDNodes (or UNDEFs) into a simple BUILD_VECTOR.
For example, given the following sequence of dag nodes:
i32 C = Constant<1>
v4i32 V = BUILD_VECTOR C, C, C, C
v4i32 Result = SIGN_EXTEND_INREG V, ValueType:v4i1
The SIGN_EXTEND_INREG node can be folded into a build_vector since
the vector in input is a BUILD_VECTOR of constants.
The optimized sequence is:
i32 C = Constant<-1>
v4i32 Result = BUILD_VECTOR C, C, C, C
llvm-svn: 198084
The condition in selects is supposed to be i1.
Make sure we are just reading the less significant bit
of the 8 bits width value to match this constraint.
<rdar://problem/15651765>
llvm-svn: 197712
This changes the MachineFrameInfo API to use the new SSPLayoutKind information
produced by the StackProtector pass (instead of a boolean flag) and updates a
few pass dependencies (to preserve the SSP analysis).
The stack layout follows the same approach used prior to this change - i.e.,
only LargeArray stack objects will be placed near the canary and everything
else will be laid out normally. After this change, structures containing large
arrays will also be placed near the canary - a case previously missed by the
old implementation.
Out of tree targets will need to update their usage of
MachineFrameInfo::CreateStackObject to remove the MayNeedSP argument.
The next patch will implement the rules for sspstrong and sspreq. The end goal
is to support ssp-strong stack layout rules.
WIP.
Differential Revision: http://llvm-reviews.chandlerc.com/D2158
llvm-svn: 197653
This effectively backs out r197465 but leaves some of the general
fixes in place. Not all targets are ready to handle this feature. To
enable it, some infrastructure work is needed to better handle
register class constraints.
llvm-svn: 197514
This reapplies r197438 and fixes the link-time circular dependency between
IR and Support. The fix consists in moving the diagnostic support into IR.
The patch adds a new LLVMContext::diagnose that can be used to communicate to
the front-end, if any, that something of interest happened.
The diagnostics are supported by a new abstraction, the DiagnosticInfo class.
The base class contains the following information:
- The kind of the report: What this is about.
- The severity of the report: How bad this is.
This patch also adds 2 classes:
- DiagnosticInfoInlineAsm: For inline asm reporting. Basically, this diagnostic
will be used to switch to the new diagnostic API for LLVMContext::emitError.
- DiagnosticStackSize: For stack size reporting. Comes as a replacement of the
hard coded warning in PEI.
This patch also features dynamic diagnostic identifiers. In other words plugins
can use this infrastructure for their own diagnostics (for more details, see
getNextAvailablePluginDiagnosticKind).
This patch introduces a new DiagnosticHandlerTy and a new DiagnosticContext in
the LLVMContext that should be set by the front-end to be able to map these
diagnostics in its own system.
http://llvm-reviews.chandlerc.com/D2376
<rdar://problem/15515174>
llvm-svn: 197508
This reverts commit r197481, recommiting r197469 with an extra fix.
The vastart_save_xmm_regs pseudo-instruction expands to a test and a
branch, so it modifies EFLAGS. Mark it so, or else the scheduler might
place it in the middle of another test+branch.
This fixes a bug exposed by r192750, which changed the initial scheduler
to source-order as part of enabling the MI Scheduler for X86.
This re-commit changes the VASTART_SAVE_XMM_REGS custom inserter not to
try to save %flags, and adds a test that catches the bad behavior of
r197469.
<rdar://problem/15627766>
llvm-svn: 197503
http://llvm.org/bugs/show_bug.cgi?id=18045
Short issue description:
For X86 machines with sse < sse4.1 we got failures for some
particular load/store vector sequences:
$ clang-trunk -m32 -O2 test-case.c
fatal error: error in backend: Cannot select: 0x4200920: v4i32,ch = load 0x41d6ab0, 0x4205850,
0x41dcb10<LD16[getelementptr inbounds ([4 x i32]* @e, i32 0, i32 0)](align=4)> [ORD=82]
[ID=58]
0x4205850: i32 = X86ISD::Wrapper 0x41d5490 [ORD=26] [ID=43]
0x41d5490: i32 = TargetGlobalAddress<[4 x i32]* @e> 0 [ORD=26] [ID=23]
0x41dcb10: i32 = undef [ID=2]
The reason is that EltsFromConsecutiveLoads could emit such load instruction
both before and after legalize stage. Though this instruction is not legal for
machines with SSSE3 and lower.
The fix: In EltsFromConsecutiveLoads, if we have passed legalize stage, we
check whether nodes it emits are legal.
P.S.: If you get failure in time from 12:00 and till 22:00 (UTC-8),
perhaps I'll slow with response, so you better reject this commit. Thanks!
llvm-svn: 197492
This reverts commit r197469.
The sanitizer and dragonegg buildbots are failing, I think because of
this change. Reverting until I figure out why.
llvm-svn: 197481
The vastart_save_xmm_regs pseudo-instruction expands to a test and a
branch, so it modifies EFLAGS. Mark it so, or else the scheduler might
place it in the middle of another test+branch.
This fixes a bug exposed by r192750, which turned on the MI Scheduler
for X86.
<rdar://problem/15627766>
llvm-svn: 197469
Without this, MachineCSE is powerless to handle redundant operations with truncated source operands.
This required fixing the 2-addr pass to handle tied subregisters. It isn't clear what combinations of subregisters can legally be tied, but the simple case of truncated source operands is now safely handled:
%vreg11<def> = COPY %vreg1:sub_32bit; GR32:%vreg11 GR64:%vreg1
%vreg12<def> = COPY %vreg2:sub_32bit; GR32:%vreg12 GR64:%vreg2
%vreg13<def,tied1> = ADD32rr %vreg11<tied0>, %vreg12<kill>, %EFLAGS<imp-def>
Test case: cse-add-with-overflow.ll.
This exposed an existing bug in
PPCInstrInfo::commuteInstruction. Thanks to Rafael for the test case:
PowerPC/crash.ll.
llvm-svn: 197465
The patch adds a new LLVMContext::diagnose that can be used to communicate to
the front-end, if any, that something of interest happened.
The diagnostics are supported by a new abstraction, the DiagnosticInfo class.
The base class contains the following information:
- The kind of the report: What this is about.
- The severity of the report: How bad this is.
This patch also adds 2 classes:
- DiagnosticInfoInlineAsm: For inline asm reporting. Basically, this diagnostic
will be used to switch to the new diagnostic API for LLVMContext::emitError.
- DiagnosticStackSize: For stack size reporting. Comes as a replacement of the
hard coded warning in PEI.
This patch also features dynamic diagnostic identifiers. In other words plugins
can use this infrastructure for their own diagnostics (for more details, see
getNextAvailablePluginDiagnosticKind).
This patch introduces a new DiagnosticHandlerTy and a new DiagnosticContext in
the LLVMContext that should be set by the front-end to be able to map these
diagnostics in its own system.
http://llvm-reviews.chandlerc.com/D2376
<rdar://problem/15515174>
llvm-svn: 197438
that it coalesces normal copies.
Without this, MachineCSE is powerless to handle redundant operations
with truncated source operands.
This required fixing the 2-addr pass to handle tied subregisters. It
isn't clear what combinations of subregisters can legally be tied, but
the simple case of truncated source operands is now safely handled:
%vreg11<def> = COPY %vreg1:sub_32bit; GR32:%vreg11 GR64:%vreg1
%vreg12<def> = COPY %vreg2:sub_32bit; GR32:%vreg12 GR64:%vreg2
%vreg13<def,tied1> = ADD32rr %vreg11<tied0>, %vreg12<kill>, %EFLAGS<imp-def>
llvm-svn: 197414
Added scalar compare VCMPSS, VCMPSD.
Implemented LowerSELECT for scalar FP operations.
I replaced FSETCCss, FSETCCsd with one node type FSETCCs.
Node extract_vector_elt(v16i1/v8i1, idx) returns an element of type i1.
llvm-svn: 197384
This optional register liveness analysis pass can be enabled with either
-enable-stackmap-liveness, -enable-patchpoint-liveness, or both. The pass
traverses each basic block in a machine function. For each basic block the
instructions are processed in reversed order and if a patchpoint or stackmap
instruction is encountered the current live-out register set is encoded as a
register mask and attached to the instruction.
Later on during stackmap generation the live-out register mask is processed and
also emitted as part of the stackmap.
This information is optional and intended for optimization purposes only. This
will enable a client of the stackmap to reason about the registers it can use
and which registers need to be preserved.
Reviewed by Andy
llvm-svn: 197317
This reverts commit r197254.
This was an accidental merge of Juergen's patch. It will be checked in
shortly, but wasn't meant to go in quite yet.
Conflicts:
include/llvm/CodeGen/StackMaps.h
lib/CodeGen/StackMaps.cpp
test/CodeGen/X86/stackmap-liveness.ll
llvm-svn: 197260
While it's safe for the X86-specific shift nodes, dag combining will
kill generic nodes. Insert an AND to make it safe, isel will nuke it
as x86's shift instructions have an implicit AND.
Fixes PR16108, which contains a contraption to hit this case in between
constant folders.
llvm-svn: 197228
Since gcc 4.6 the compiler uses ___chkstk_ms which has the same semantics as the
MS CRT function __chkstk. This simplifies the prologue generation a bit.
Reviewed by Rafael Espíndola.
llvm-svn: 197205
a vector packed single/double fp operation followed by a vector insert.
The effect is that the backend coverts the packed fp instruction
followed by a vectro insert into a SSE or AVX scalar fp instruction.
For example, given the following code:
__m128 foo(__m128 A, __m128 B) {
__m128 C = A + B;
return (__m128) {c[0], a[1], a[2], a[3]};
}
previously we generated:
addps %xmm0, %xmm1
movss %xmm1, %xmm0
we now generate:
addss %xmm1, %xmm0
llvm-svn: 197145
The linkers on these systems don't have anything special to do with these
symbols. Since the intent is for them to be absent from the final object,
just treat them as private.
llvm-svn: 197080
I moved a test from avx512-vbroadcast-crash.ll to avx512-vbroadcast.ll
I defined HasAVX512 predicate as AssemblerPredicate. It means that you should invoke llvm-mc with "-mcpu=knl" to get encoding for AVX-512 instructions. I need this to let AsmMatcher to set different encoding for AVX and AVX-512 instructions that have the same mnemonic and operands (all scalar instructions).
llvm-svn: 197041
The combination of inline asm, stack realignment, and dynamic allocas
turns out to be too common to reject out of hand.
ASan inserts empy inline asm fragments and uses aligned allocas.
Compiling any trivial function containing a dynamic alloca with ASan is
enough to trigger the check.
XFAIL the test cases that would be miscompiled and add one that uses the
relevant functionality.
llvm-svn: 196986
This re-lands commit r196876, which was reverted in r196879.
The tests have been fixed to pass on platforms with a stack alignment
larger than 4.
Update to clang side tests will land shortly.
llvm-svn: 196939
immediately after SSE scalar fp instructions like addss or mulss.
Added patterns to select SSE scalar fp arithmetic instructions from a scalar
fp operation followed by a blend.
For example, given the following code:
__m128 foo(__m128 A, __m128 B) {
A[0] += B[0];
return A;
}
previously we generated:
addss %xmm0, %xmm1
movss %xmm1, %xmm0
now we generate:
addss %xmm1, %xmm0
llvm-svn: 196925
For stack frames requiring realignment, three pointers may be needed:
- ebp to address incoming arguments
- esi (could be any callee-saved register) to address locals
- esp to address outgoing arguments
We would use esi unconditionally without verifying that it did not
conflict with inline assembly.
This change doesn't do the verification, it simply emits a fatal error
on functions that use stack realignment, dynamic SP adjustments, and
inline assembly.
Because stack realignment is common on Windows, we also no longer assume
that MS inline assembly clobbers esp. Instead, we analyze the inline
instructions for implicit definitions and check if esp is there. If so,
we require the use of a base pointer and consider it in the condition
above.
Mostly fixes PR16830, but we could try harder to find a non-conflicting
base pointer.
Reviewers: sunfish
Differential Revision: http://llvm-reviews.chandlerc.com/D1317
llvm-svn: 196876
given
declare void @llvm.memset.p0i8.i32(i8* nocapture, i8, i32, i32, i1)
declare void @foo()
define void @bar() {
call void @foo()
call void @llvm.memset.p0i8.i32(i8* null, i8 0, i32 188, i32 1, i1 false)
ret void
}
We used to produce
L_foo$stub:
.indirect_symbol _foo
.ascii "\364\364\364\364\364"
_memset$stub:
.indirect_symbol _memset
.ascii "\364\364\364\364\364"
We not produce a private stub for memset too.
Stubs are not needed with recent linkers, but we still produce them for darwin8.
Thanks to David Fang for confirming that gcc used to do this too.
llvm-svn: 196468
Unlike msvc, when handling a thiscall + sret gcc will
* Put the sret in %ecx
* Put the this pointer is (%esp)
This fixes, for example, calling stringstream::str.
llvm-svn: 196312
- The fix to PR17631 fixes part of the cases where 'vzeroupper' should
not be issued before 'call' insn. There're other cases where helper
calls will be inserted not limited to epilog. These helper calls do
not follow the standard calling convention and won't clobber any YMM
registers. (So far, all call conventions will clobber any or part of
YMM registers.)
This patch enhances the previous fix to cover more cases 'vzerosupper' should
not be inserted by checking if that function call won't clobber any YMM
registers and skipping it if so.
llvm-svn: 196261
It is only used for asm printing.
On X86 we put basic block addresses on register before passing them to inline
asm, so the MO_MachineBasicBlock case was dead.
MO_ExternalSymbol was dead since any symbol being passed to inline asm
is represented as MO_GlobalAddress.
The MO_GlobalAddress and MO_Register cases were not tested.
llvm-svn: 195824
- Fix bug in (vsext (vzext x)) -> (vsext x) in SIGN_EXTEND_IN_REG
lowering where we need to check whether x is a vector type (in-reg
type) of i8, i16 or i32; otherwise, that optimization is not valid.
llvm-svn: 195779
A Direct stack map location records the address of frame index. This
address is itself the value that the runtime requested. This differs
from IndirectMemRefOp locations, which refer to a stack locations from
which the requested values must be loaded. Direct locations can
directly communicate the address if an alloca, while IndirectMemRefOp
handle register spills.
For example:
entry:
%a = alloca i64...
llvm.experimental.stackmap(i32 <ID>, i32 <shadowBytes>, i64* %a)
Since both the alloca and stackmap intrinsic are in the entry block,
and the intrinsic takes the address of the alloca, the runtime can
assume that LLVM will not substitute alloca with any intervening
value. This must be verified by the runtime by checking that the stack
map's location is a Direct location type. The runtime can then
determine the alloca's relative location on the stack immediately after
compilation, or at any time thereafter. This differs from Register and
Indirect locations, because the runtime can only read the values in
those locations when execution reaches the instruction address of the
stack map.
llvm-svn: 195712
We are going to drop debug info without a version number or with a different
version number, to make sure we don't crash when we see bitcode files with
different debug info metadata format.
Make tests more robust by removing hard-coded metadata numbers in CHECK lines.
llvm-svn: 195535
We are going to drop debug info without a version number or with a different
version number, to make sure we don't crash when we see bitcode files with
different debug info metadata format.
llvm-svn: 195504
Utilizing the 8 and 16 bit comparison instructions, even when an input can
be folded into the comparison instruction itself, is typically not worth it.
There are too many partial register stalls as a result, leading to significant
slowdowns. By always performing comparisons on at least 32-bit
registers, performance of the calculation chain leading to the
comparison improves. Continue to use the smaller comparisons when
minimizing size, as that allows better folding of loads into the
comparison instructions.
rdar://15386341
llvm-svn: 195496
Improvements over r195317:
- Set/restore EnableFastISel flag instead of just running FastISel within
SelectAllBasicBlocks; the flag is checked in various places, and
FastISel won't run properly if those places don't do the right thing.
- Test looks for normal ISel versus FastISel behavior, and not
something more subtle that doesn't work everywhere.
Based on work by Andrea Di Biagio.
llvm-svn: 195491
- When simplifying the mask generation for BLEND, check whether that mask is
also consumed by other non-BLEND insns. If true, skip that simplification.
llvm-svn: 195476
AMD's processors family K7, K8, K10, K12, K15 and K16 are known to have SHLD/SHRD instructions with very poor latency. Optimization guides for these processors recommend using an alternative sequence of instructions. For these AMD's processors, I disabled folding (or (x << c) | (y >> (64 - c))) when we are not optimizing for size.
It might be beneficial to disable this folding for some of the Intel's processors. However, since I couldn't find specific recommendations regarding using SHLD/SHRD instructions on Intel's processors, I haven't disabled this peephole for Intel.
llvm-svn: 195383
clang optimizes tail calls, as in this example:
int foo(void);
int bar(void) {
return foo();
}
where the call is transformed to:
calll .L0$pb
.L0$pb:
popl %eax
.Ltmp0:
addl $_GLOBAL_OFFSET_TABLE_+(.Ltmp0-.L0$pb), %eax
movl foo@GOT(%eax), %eax
popl %ebp
jmpl *%eax # TAILCALL
However, the GOT references must all be resolved at dlopen() time, and so this
approach cannot be used with lazy dynamic linking (e.g. using RTLD_LAZY), which
usually populates the PLT with stubs that perform the actual resolving.
This patch changes X86TargetLowering::LowerCall() to skip tail call
optimization, if the called function is a global or external symbol.
Patch by Dimitry Andric!
PR15086
llvm-svn: 195318
We now only allow breaking source order if the exit block frequency is
significantly higher than the other exit block. The actual bias is
currently under a flag so the best cut-off can be found; the flag
defaults to the old behavior. The idea is to get some benchmark coverage
over different values for the flag and pick the best one.
When we require the new frequency to be at least 20% higher than the old
frequency I see a 5% speedup on zlib's deflate when compressing a random
file on x86_64/westmere. Hal reported a small speedup on Fhourstones on
a BG/Q and no regressions in the test suite.
The test case is the full long_match function from zlib's deflate. I was
reluctant to add it for previous tweaks to branch probabilities because
it's large and potentially fragile, but changed my mind since it's an
important use case and more likely to break with all the current work
going into the PGO infrastructure.
Differential Revision: http://llvm-reviews.chandlerc.com/D2202
llvm-svn: 195265
Implementing this on bigendian platforms could get strange. I added a
target hook, getStackSlotRange, per Jakob's recommendation to make
this as explicit as possible.
llvm-svn: 194942
Stop folding constant adds into GEP when the type size doesn't match.
Otherwise, the adds' operands are effectively being promoted, changing the
conditions of an overflow. Results are different when:
sext(a) + sext(b) != sext(a + b)
Problem originally found on x86-64, but also fixed issues with ARM and PPC,
which used similar code.
<rdar://problem/15292280>
Patch by Duncan Exon Smith!
llvm-svn: 194840
In ELF and COFF an alias is just another offset in a section. There is no way
to represent an alias to something in another file.
In MachO, the spec has the N_INDR type which should allow for exactly that, but
is not currently implemented. Given that it is specified but not implemented,
we error in codegen to avoid miscompiling but don't reject aliases to
declarations in the verifier to leave the option open of implementing it.
In the past we have used alias to declarations as a way of implementing
weakref, which is why it exists in some old tests which this patch updates.
llvm-svn: 194705