The SplitIndexingFromLoad changes exposed a latent isel bug in the PowerPC64
backend. We matched an immediate offset with STWX8 even though it only
supports register offset.
The culprit is the complex-pattern predicate, SelectAddrIdx, which decides
that if the offset is not ISD::Constant it must be a register.
Many thanks to Bill Schmidt for testing this.
llvm-svn: 209219
After discussion with Zoran, we have decided to temporarily revert this commit.
It's causing some difficult to resolve conflicts and we are under time pressure
to deliver an initial MIPS64r6 compiler.
We will re-apply an equivalent patch once the time pressure has passed.
llvm-svn: 209211
When multiple aliases overlap, the correct string to print can often be
determined purely by considering the InstAlias declarations in some particular
order. This allows the user to specify that order manually when desired,
without resorting to hacking around with the default lexicographical order on
Record instantiation, which is error-prone and ugly.
I was also mistaken about "add w2, w3, w4" being the same as "add w2, w3, w4,
uxtw". That's only true if Rn is the stack pointer.
llvm-svn: 209199
This workaround (presumably for ancient GDB) doesn't appear to be
required (GDB 7.5 seems to tolerate function definition DIEs in
namespace scope just fine).
llvm-svn: 209189
Since we visit the whole list of subprograms for each CU at module
start, this is clearly true - don't test for the case, just assert it.
A few old test cases seemed to have incomplete subprogram lists, but any
attempt to reproduce them shows full subprogram lists that even include
entities that have been completely inlined and the out of line
definition removed.
llvm-svn: 209178
When I refactored this in r208636 I accidentally caused this to be added
multiple times to each abstract subprogram (not accounting for the
deduplicating effect of the InlinedSubprogramDIEs set).
This got better in r208798 when the abstract definitions got the
attribute added to them at construction time, but still had the
redundant copies introduced in r208636.
This commit removes those excess DW_AT_inlines and relies solely on the
insertion in r208798.
llvm-svn: 209166
The check in DwarfDebug::constructScopeDIE was meant to consider inlined
subroutines as any non-top-level scope that was a subprogram. Instead of
checking "not top level scope" it was checking if the /subprogram's/
scope was non-top-level.
Fix this and beef up a test case to demonstrate some of the missing
inlined_subroutines are no longer missing.
In the course of fixing this I also found that r208748 (with this fix)
found one /extra/ inlined_subroutine in concrete_out_of_line.ll due to
two inlined_subroutines having the same inlinedAt location. The previous
implementation was collapsing these into a single inlined subroutine.
I'm not sure what the original code was that created this .ll file so
I'm not sure if this actually happens in practice today. Since we
deliberately include column information to disambiguate two calls on the
same line, that may've addressed this bug in the frontend, but it's good
to know that workaround isn't necessary for this particular case
anymore.
llvm-svn: 209165
Currently the X86 backend doesn't support types larger than i128 very well. For
example an i192 multiply will assert in codegen when the 2nd argument is a constant and the constant got hoisted.
This fix changes the cost model to never hoist constants for types larger than
i128. Once the codegen issues have been resolved, the cost model can be updated
to allow also larger types.
This is related to <rdar://problem/16954938>
llvm-svn: 209162
Instructions TZCNT (requires BMI1) and LZCNT (requires LZCNT), always
provide the operand size as output if the input operand is zero.
We can take advantage of this knowledge during instruction selection
stage in order to simplify a few corner case.
llvm-svn: 209159
so that llvm-size will total up all the sections in the Berkeley format. This
allows for rough categorizations for Mach-O sections. And allows the total of
llvm-size’s Berkeley and System V formats to be the same.
llvm-svn: 209158
Summary:
When inserting an element that's coming from a vector load or a broadcast
of a vector (or scalar) load, combine the load into the insertps
instruction.
Added PerformINSERTPSCombine for the case where we need to fix the load
(load of a vector + insertps with a non-zero CountS).
Added patterns for the broadcasts.
Also added tests for SSE4.1, AVX, and AVX2.
Reviewers: delena, nadav, craig.topper
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D3581
llvm-svn: 209156
For GOT relocations the addend should modify the offset to the
GOT entry, not the value of the entry itself. Teach RuntimeDyldMachO
to do The Right Thing here.
Fixes <rdar://problem/16961886>.
llvm-svn: 209154
The cost model conservatively assumes that it will always get scalarized and
that's about as good as we can get with the generic TTI; reasoning whether a
shuffle with an efficient lowering is available is hard. We can override that
conservative estimate for some targets in the future.
llvm-svn: 209125
- On ARM/ARM64 we get a vrev because the shuffle matching code is really smart. We still unroll anything that's not v4i32 though.
- On X86 we get a pshufb with SSSE3. Required more cleverness in isShuffleMaskLegal.
- On PPC we get a vperm for v8i16 and v4i32. v2i64 is unrolled.
llvm-svn: 209123
Windows on ARM uses R11 for the frame pointer even though the environment is a
pure Thumb-2, thumb-only environment. Replicate this behaviour to improve
Windows ABI compatibility. This register is used for fast stack walking, and
thus is part of the Windows ABI.
llvm-svn: 209085
WoA uses COFF, not ELF. ARMISelLowering::createTLOF would previously return ELF
for any non-MachO platform. This was a missed site when the original change for
target format support for Windows on ARM was done.
llvm-svn: 209057
were added in SSE2, no SSSE3. Found this while auditing all uses of
SSSE3 in the X86 target. I don't actually expect this to make
a significant difference on anything and I don't have any detailed test
cases but I updated the existing test cases that already covered some of
this code path.
llvm-svn: 209056
Change --functions option in llvm-symbolizer tool to accept
values "none", "short" or "linkage". Update the tests and docs
accordingly.
llvm-svn: 209050
Currently LLVM will generally merge GEPs. This allows backends to use more
complex addressing modes. In some cases this is not happening because there
is PHI inbetween the two GEPs:
GEP1--\
|-->PHI1-->GEP3
GEP2--/
This patch checks to see if GEP1 and GEP2 are similiar enough that they can be
cloned (GEP12) in GEP3's BB, allowing GEP->GEP merging (GEP123):
GEP1--\ --\ --\
|-->PHI1-->GEP3 ==> |-->PHI2->GEP12->GEP3 == > |-->PHI2->GEP123
GEP2--/ --/ --/
This also breaks certain use chains that are preventing GEP->GEP merges that the
the existing instcombine would merge otherwise.
Tests included.
rdar://15547484
llvm-svn: 209049
vselects with constant masks, after legalization, will get turned into
specialized shuffle_vectors so they can be matched to blend+imm
instructions.
Fixed some tests.
llvm-svn: 209044
LowerVSELECT will, if possible, generate a X86ISD::BLENDI DAG node if the
condition is constant and we can emit that instruction, given the
subtarget.
This is not enough for all cases. An additional SELECTCombine optimization
will be committed.
Fixed tests that were expecting variable blends but where a blend+imm can
be generated.
Added test where we can't emit blend+immediate.
Added avx2 blend+imm tests.
llvm-svn: 209043
This allows us to put dynamic initializers for weak data into the same
comdat group as the data being initialized. This is necessary for MSVC
ABI compatibility. Once we have comdats for guard variables, we can use
the combination to help GlobalOpt fire more often for weak data with
guarded initialization on other platforms.
Reviewers: nlewycky
Differential Revision: http://reviews.llvm.org/D3499
llvm-svn: 209015
This patch changes the design of GlobalAlias so that it doesn't take a
ConstantExpr anymore. It now points directly to a GlobalObject, but its type is
independent of the aliasee type.
To avoid changing all alias related tests in this patches, I kept the common
syntax
@foo = alias i32* @bar
to mean the same as now. The cases that used to use cast now use the more
general syntax
@foo = alias i16, i32* @bar.
Note that GlobalAlias now behaves a bit more like GlobalVariable. We
know that its type is always a pointer, so we omit the '*'.
For the bitcode, a nice surprise is that we were writing both identical types
already, so the format change is minimal. Auto upgrade is handled by looking
through the casts and no new fields are needed for now. New bitcode will
simply have different types for Alias and Aliasee.
One last interesting point in the patch is that replaceAllUsesWith becomes
smart enough to avoid putting a ConstantExpr in the aliasee. This seems better
than checking and updating every caller.
A followup patch will delete getAliasedGlobal now that it is redundant. Another
patch will add support for an explicit offset.
llvm-svn: 209007