This is a better fix than r308708 for the problem introduced in
r304020. It restores the skeleton CU testcases modified by that commit
to their original form and most importantly ensures that
frontend-generated skeleton CUs (such as used to point to Clang
modules) come after the regular CUs. This broke for DICompileUnit
nodes that don't have any immediate children because they are now
constructed lazily instead of the order in which they are listed in
!llvm.dbg.cu. After this commit we still don't guarantee that order,
but we do guarantee that empty skeletons come last.
Shipping versions of LLDB are very sensitive to the ordering of
CUs. I'll track a fix for LLDB to be more permissive separately.
This fixes a test failure in the LLDB testsuite.
rdar://problem/33357252
llvm-svn: 309154
Summary:
Bash interperets the '?' character as matching an arbitrary character.
On systems that have a file or directory with exactly one character in
their root directory, '/?' gets reinterpreted into that pathname, which
fails to match the expected Help text for llvm-rc.
This patch quotes the '/?' to avoid that edge case.
Reviewers: mnbvmar, ecbeckmann, rnk
Reviewed By: rnk
Subscribers: dyung, ruiu, llvm-commits
Differential Revision: https://reviews.llvm.org/D35852
llvm-svn: 309133
Summary: The new PM needs to invoke add-discriminator pass when building with -fdebug-info-for-profiling.
Reviewers: chandlerc, davidxl
Reviewed By: chandlerc
Subscribers: sanjoy, llvm-commits
Differential Revision: https://reviews.llvm.org/D35744
llvm-svn: 309121
The ARM bots have started failing and while this patch should be an improvement
for these bots, it's also the only suspect in the blamelist. Reverting while
Diana and I investigate the problem.
llvm-svn: 309111
In COFF, a symbol offset can't be stored in the relocation (as is
done in ELF or MachO), but is stored as the immediate in the
instruction itself. The immediate in the ADRP thus is the symbol
offset in bytes, not in pages. For the PAGEOFFSET_12A/L relocations,
ignore any offset outside of the lowest 12 bits; they won't have any
effect on the ADD/LDR/STR instruction itself but only on the associated
ADRP.
This is similar to how the same issue is handled for MOVW/MOVT
instructions in ELF (see e.g. SVN r307713, and r307728 in lld).
This fixes "fixup out of range" errors while building larger object
files, where temporary symbols end up as a plain section symbol and
an offset, and fixes any cases where the symbol offset mean that
the actual target ended up on a different page than the symbol
itself.
Differential Revision: https://reviews.llvm.org/D35791
llvm-svn: 309105
Summary:
Now that we have control flow in place, fuse the per-rule tables into a
single table. This is a compile-time saving at this point. However, this will
also enable the optimization of a table so that similar instructions can be
tested together, reducing the time spent on the matching the code.
This is NFC in terms of externally visible behaviour but some internals have
changed slightly. State.MIs is no longer reset between each rule that is
attempted because it's not necessary to do so. As a consequence of this the
restriction on the order that instructions are added to State.MIs has been
relaxed to only affect recorded instructions that require new elements to be
added to the vector. GIM_RecordInsn can now write to any element from 1 to
State.MIs.size() instead of just State.MIs.size().
Reviewers: ab, t.p.northover, qcolombet, rovka, aditya_nandakumar
Reviewed By: rovka
Subscribers: kristof.beyls, igorb, llvm-commits
Differential Revision: https://reviews.llvm.org/D35681
llvm-svn: 309094
This patch expands the support of lowerInterleavedStore to 32x8i stride 4.
LLVM creates suboptimal shuffle code-gen for AVX2. In overall, this patch is a specific fix for the pattern (Strid=4 VF=32) and we plan to include more patterns in the future. To reach our goal of "more patterns". We include two mask creators. The first function creates shuffle's mask equivalent to unpacklo/unpackhi instructions. The other creator creates mask equivalent to a concat of two half vectors(high/low).
The patch goal is to optimize the following sequence:
At the end of the computation, we have ymm2, ymm0, ymm12 and ymm3 holding
each 32 chars:
c0, c1, , c31
m0, m1, , m31
y0, y1, , y31
k0, k1, ., k31
And these need to be transposed/interleaved and stored like so:
c0 m0 y0 k0 c1 m1 y1 k1 c2 m2 y2 k2 c3 m3 y3 k3 ....
Reviewers:
dorit
Farhana
RKSimon
guyblank
DavidKreitzer
Differential Revision: https://reviews.llvm.org/D34601
llvm-svn: 309086
This patch adds a cache for computeExitLimit to save compilation time. A lot of examples of
tests that take extensive time to compile are attached to the bug 33494.
Differential Revision: https://reviews.llvm.org/D35827
llvm-svn: 309080
Summary: The aligned load predicates don't suppress themselves if the load is non-temporal the way the unaligned predicates do. For the most part this isn't a problem because the aligned predicates are mostly used for instructions that only load the the non-temporal loads have priority over those. The exception are masked loads.
Reviewers: RKSimon, zvi
Reviewed By: RKSimon
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D35712
llvm-svn: 309079
Summary: This patch adds flags and tests to cover the PGOOpt handling logic in new PM.
Reviewers: chandlerc, davide
Reviewed By: chandlerc
Subscribers: sanjoy, llvm-commits
Differential Revision: https://reviews.llvm.org/D35807
llvm-svn: 309076
The PDB "symbol stream" actually contains symbol records for the publics
and the globals stream. The globals and publics streams are essentially
hash tables that point into a single stream of records. In order to
match cvdump's behavior, we need to only dump symbol records referenced
from the hash table. This patch implements that, and then implements
global stream dumping, since it's just a subset of public stream
dumping.
Now we shouldn't see S_PROCREF or S_GDATA32 records when dumping
publics, and instead we should see those record in the globals stream.
llvm-svn: 309066
This is a workaround for the bug described in PR31652 and
http://lists.llvm.org/pipermail/llvm-dev/2017-July/115497.html. The temporary
solution is to add a function EqualityPropUnSafe. In EqualityPropUnSafe, for
some simple patterns we can know the equality comparison may contains undef,
so we regard such comparison as unsafe and will not do loop-unswitching for
them. We also need to disable the select simplification when one of select
operand is undef and its result feeds into equality comparison.
The patch cannot clear the safety issue caused by the bug, but it can suppress
the issue from happening to some extent.
Differential Revision: https://reviews.llvm.org/D35811
llvm-svn: 309059
This is needed, among others, to respect --section-ordering-file
with LTO. I'll follow up with a similar change for data sections.
I hope every version of gold available on the bots has support for
--section-ordering file.
llvm-svn: 309056
Summary:
Does a simple merge, where mergeable elements are combined, all others
are appended. Does not apply trickly namespace rules.
Subscribers: llvm-commits, hiraditya
Differential Revision: https://reviews.llvm.org/D35753
llvm-svn: 309047
As discussed on llvm-dev I've implemented the first basic steps towards
llvm-objcopy/llvm-objtool (name pending).
This change adds the ability to copy (without modification) 64-bit
little endian ELF executables that have SHT_PROGBITS, SHT_NOBITS,
SHT_NULL and SHT_STRTAB sections.
Patch by Jake Ehrlich
Differential Revision: https://reviews.llvm.org/D33964
llvm-svn: 309043
As discussed on llvm-dev I've implemented the first basic steps towards
llvm-objcopy/llvm-objtool (name pending).
This change adds the ability to copy (without modification) 64-bit
little endian ELF executables that have SHT_PROGBITS, SHT_NOBITS,
SHT_NULL and SHT_STRTAB sections.
Patch by Jake Ehrlich
Differential Revision: https://reviews.llvm.org/D33964
llvm-svn: 309032
Summary:
ELF linkers generate __start_<secname> and __stop_<secname> symbols
when there is a value in a section <secname> where the name is a valid
C identifier. If dead stripping determines that the values declared
in section <secname> are dead, and we then internalize (and delete)
such a symbol, programs that reference the corresponding start and end
section symbols will get undefined reference linking errors.
To fix this, add the section name to the IRSymtab entry when a symbol is
defined in a specific section. Then use this in the gold-plugin to mark
the symbol as external and visible from outside the summary when the
section name is a valid C identifier.
Reviewers: pcc
Subscribers: mehdi_amini, inglorion, eraman, llvm-commits
Differential Revision: https://reviews.llvm.org/D35639
llvm-svn: 309009
This patch just adds printing of CR bit registers in a more human-readable
form akin to that used by the GNU binutils.
Differential Revision: https://reviews.llvm.org/D31494
llvm-svn: 309001
D35067/rL308322 attempted to support up to 4 load pairs for memcmp inlining which resulted in regressions for some optimized libc memcmp implementations (PR33914).
Until we can match these more optimal cases, this patch reduces the memcmp expansion to a maximum of 2 load pairs (which matches what we do for -Os).
This patch should be considered for the 5.0.0 release branch as well
Differential Revision: https://reviews.llvm.org/D35830
llvm-svn: 308986
Summary:
Some SPARC TLS relocations were applying nontrivial adjustments
to zero value, leading to unexpected non-zero values in ELF and then
Solaris linker failures.
Getting rid of these adjustments.
Fixes PR33825.
Reviewers: rafael, asb, jyknight
Subscribers: joerg, jyknight, llvm-commits
Differential Revision: https://reviews.llvm.org/D35567
llvm-svn: 308978
it when safe.
Very often the BE count is the trip count minus one, and the plus one
here should fold with that minus one. But because the BE count might in
theory be UINT_MAX or some such, adding one before we extend could in
some cases wrap to zero and break when we scale things.
This patch checks to see if it would be safe to add one because the
specific case that would cause this is guarded for prior to entering the
preheader. This should handle essentially all of the common loop idioms
coming out of C/C++ code once canonicalized by LLVM.
Before this patch, both forms of loop in the added test cases ended up
subtracting one from the size, extending it, scaling it up by 8 and then
adding 8 back onto it. This is really silly, and it turns out made it
all the way into generated code very often, so this is a surprisingly
important cleanup to do.
Many thanks to Sanjoy for showing me how to do this with SCEV.
Differential Revision: https://reviews.llvm.org/D35758
llvm-svn: 308968
Enable runtime and partial loop unrolling of simple loops without
calls on M-class cores. The thresholds are calculated based on
whether the target is Thumb or Thumb-2.
Differential Revision: https://reviews.llvm.org/D34619
llvm-svn: 308956
Create a dummy 8 byte fixed object for the unused slot below the first
stored vararg.
Alternative ideas tested but skipped: One could try to align the whole
fixed object to 16, but I haven't found how to add an offset to the stack
frame used in LowerWin64_VASTART.
If only the size of the fixed stack object size is padded but not the offset, via
MFI.CreateFixedObject(alignTo(GPRSaveSize, 16), -(int)GPRSaveSize, false),
PrologEpilogInserter crashes due to "Attempted to reset backwards range!".
This fixes misconceptions about where registers are spilled, since
AArch64FrameLowering.cpp assumes the offset from fixed objects is
aligned to 16 bytes (and the Win64 case there already manually aligns
the offset to 16 bytes).
This fixes cases where local stack allocations could overwrite callee
saved registers on the stack.
Differential Revision: https://reviews.llvm.org/D35720
llvm-svn: 308950
This starts the development on one of MS Visual Studio binutils,
Resource Converter. The tool compiles resource scripts (.rc)
into binary resource files (.res).
The current implementation does nothing but parse the command
line arguments. It is going to be extended in the future.
Differential Revision: https://reviews.llvm.org/D35810
llvm-svn: 308940
This patch removes unnecessary zero copies in BBs that are targets of b.eq/b.ne
and we know the result of the compare instruction is zero. For example,
BB#0:
subs w0, w1, w2
str w0, [x1]
b.ne .LBB0_2
BB#1:
mov w0, wzr ; <-- redundant
str w0, [x2]
.LBB0_2
Differential Revision: https://reviews.llvm.org/D35075
llvm-svn: 308849
When SCEV calculates product of two SCEVAddRecs from the same loop, it
tries to combine them into one big AddRecExpr. If the sizes of the initial
SCEVs were `S1` and `S2`, the size of their product is `S1 + S2 - 1`, and every
operand of the resulting SCEV is combined from operands of initial SCEV and
has much higher complexity than they have.
As result, if we try to calculate something like:
%x1 = {a,+,b}
%x2 = mul i32 %x1, %x1
%x3 = mul i32 %x2, %x1
%x4 = mul i32 %x3, %x2
...
The size of such SCEVs grows as `2^N`, and the arguments
become more and more complex as we go forth. This leads
to long compilation and huge memory consumption.
This patch sets a limit after which we don't try to combine two
`SCEVAddRecExpr`s into one. By default, max allowed size of the
resulting AddRecExpr is set to 16.
Differential Revision: https://reviews.llvm.org/D35664
llvm-svn: 308847
These patterns were only missing to favor using the legacy instructions when the shift was a constant. With careful adjustment of the pattern complexity we can make sure the immediate instructions still have priority over these patterns.
llvm-svn: 308834
Check the actual memory type stored and not the extended value size
when considering if truncated store merge is worthwhile.
Reviewers: efriedma, RKSimon, spatel, jyknight
Reviewed By: efriedma
Subscribers: llvm-commits, nhaehnle
Differential Revision: https://reviews.llvm.org/D35623
llvm-svn: 308833
As discussed on llvm-dev I've implemented the first basic steps towards
llvm-objcopy/llvm-objtool (name pending).
This change adds the ability to copy (without modification) 64-bit
little endian ELF executables that have SHT_PROGBITS, SHT_NOBITS,
SHT_NULL and SHT_STRTAB sections.
Patch by Jake Ehrlich
Differential Revision: https://reviews.llvm.org/D33964
llvm-svn: 308821
As discussed on llvm-dev I've implemented the first basic steps towards
llvm-objcopy/llvm-objtool (name pending).
This change adds the ability to copy (without modification) 64-bit
little endian ELF executables that have SHT_PROGBITS, SHT_NOBITS,
SHT_NULL and SHT_STRTAB sections.
Patch by Jake Ehrlich
Differential Revision: https://reviews.llvm.org/D33964
llvm-svn: 308803
Summary: Currently the ThinLTO minimized bitcode file only strip the debug info, but there is still a lot of information in the minimized bit code file that will be not used for thin linker. In this patch, most of the extra information is striped to reduce the minimized bitcode file. Now only ModuleVersion, ModuleInfo, ModuleGlobalValueSummary, ModuleHash, Symtab and Strtab are left. Now the minimized bitcode file size is reduced to 15%-30% of the debug info stripped bitcode file size.
Reviewers: danielcdh, tejohnson, pcc
Reviewed By: pcc
Subscribers: mehdi_amini, aprantl, inglorion, eraman, llvm-commits
Differential Revision: https://reviews.llvm.org/D35334
llvm-svn: 308760
-membedded-data changes the location of constant data from the .sdata to
the .rodata section. Previously it was (incorrectly) always located in the
.rodata section.
Reviewers: atanasyan
Differential Revision: https://reviews.llvm.org/D35686
llvm-svn: 308758
This patch makes LSR generate better code for SystemZ in the cases of memory
intrinsics, Load->Store pairs or comparison of immediate with memory.
In order to achieve this, the following common code changes were made:
* New TTI hook: LSRWithInstrQueries(), which defaults to false. Controls if
LSR should do instruction-based addressing evaluations by calling
isLegalAddressingMode() with the Instruction pointers.
* In LoopStrengthReduce: handle address operands of memset, memmove and memcpy
as address uses, and call isFoldableMemAccessOffset() for any LSRUse::Address,
not just loads or stores.
SystemZ changes:
* isLSRCostLess() implemented with Insns first, and without ImmCost.
* New function supportedAddressingMode() that is a helper for TTI methods
looking at Instructions passed via pointers.
Review: Ulrich Weigand, Quentin Colombet
https://reviews.llvm.org/D35262https://reviews.llvm.org/D35049
llvm-svn: 308729
Currently we only support (i32 bitcast(v32i1)) using the AVX2 VPMOVMSKB ymm instruction.
This patch adds support for splitting pre-AVX2 targets into 2 x (V)PMOVMSKB xmm instructions and merging the integer results.
In future we could probably generalize this to handle more cases.
Differential Revision: https://reviews.llvm.org/D35303
llvm-svn: 308723
This patch teaches dsymutil to strip types from the imported
DW_TAG_module inside of an object file (not inside the PCM) if they
can be resolved to the full definition inside the PCM. This reduces
the size of the .dSYM from WebCore from webkit.org by almost 2/3.
<rdar://problem/33047213>
llvm-svn: 308710
lld needs a matching change for this will be my next commit.
Expect it to fail build until that matching commit is picked up by the bots.
Like the changes in r296527 for dyld bind entires and the changes in
r298883 for lazy bind, weak bind and rebase entries the export
entries are the last of the dyld compact info to have error handling added.
This follows the model of iterators that can fail that Lang Hanes
designed when fixing the problem for bad archives r275316 (or r275361).
So that iterating through the exports now terminates if there is an error
and returns an llvm::Error with an error message in all cases for malformed
input.
This change provides the plumbing for the error handling, all the needed
testing of error conditions and test cases for all of the unique error messages.
llvm-svn: 308690
It revealed a bug in the Localizer pass which has now been fixed.
This includes the fix for SUBREG_TO_REG committed separately last time.
llvm-svn: 308688
If the localizer pass puts one of its constants before the label that tells the
unwinder "jump here to handle your exception" then control-flow will skip it,
leaving uninitialized registers at runtime. That's bad.
llvm-svn: 308687
Summary: Implement parsing and writing of a single xml manifest file.
Subscribers: mgorny, llvm-commits, hiraditya
Differential Revision: https://reviews.llvm.org/D35425
llvm-svn: 308679
The patch adds support of i128 params lowering. The changes are quite trivial to
support i128 as a "special case" of integer type. With this patch, we lower i128
params the same way as aggregates of size 16 bytes: .param .b8 _ [16].
Currently, NVPTX can't deal with the 128 bit integers:
* in some cases because of failed assertions like
ValVTs.size() == OutVals.size() && "Bad return value decomposition"
* in other cases emitting PTX with .i128 or .u128 types (which are not valid [1])
[1] http://docs.nvidia.com/cuda/parallel-thread-execution/index.html#fundamental-types
Differential Revision: https://reviews.llvm.org/D34555
Patch by: Denys Zariaiev (denys.zariaiev@gmail.com)
llvm-svn: 308675
On AMDGPU SGPR spills are really spilled to another register.
The spiller creates the spills to new frame index objects,
which is used as a placeholder.
This will eventually be replaced with a reference to a position
in a VGPR to write to and the frame index deleted. It is
most likely not a real stack location that can be shared
with another stack object.
This is a problem when StackSlotColoring decides it should
combine a frame index used for a normal VGPR spill with
a real stack location and a frame index used for an SGPR.
Add an ID field so that StackSlotColoring has a way
of knowing the different frame index types are
incompatible.
llvm-svn: 308673
Summary:
Also enable no-fsmuld for sparcv7 (which doesn't have the
instruction).
The previous code which used a post-processing pass to do this was
unnecessary; disabling the instruction is entirely sufficient.
Reviewers: jacob_hansen, ekedaigle
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D35576
llvm-svn: 308661
We allow wider than 5 bits in the 16 and 32 bit store forms. And we allow wider than 6 bits on the 64-bit regsiter form.:w
I'm assuming this was a mistake made back in r148024.
llvm-svn: 308656
Previously we were (mis)handling jump table members with a prevailing
definition in a full LTO module and a non-prevailing definition in a
ThinLTO module by dropping type metadata on those functions entirely,
which would cause type tests involving such functions to fail.
This patch causes us to drop metadata only if we are about to replace
it with metadata from cfi.functions.
We also want to replace metadata for available_externally functions,
which can arise in the opposite scenario (prevailing ThinLTO
definition, non-prevailing full LTO definition). The simplest way
to handle that is to remove the definition; there's little value in
keeping it around at this point (i.e. after most optimization passes
have already run) and later code will try to use the function's linkage
to create an alias, which would result in invalid IR if the function
is available_externally.
Fixes PR33832.
Differential Revision: https://reviews.llvm.org/D35604
llvm-svn: 308642
Summary:
When pushing an extension of a constant bitwise operator on a load
into the load, change other uses of the load value if they exist to
prevent the old load from persisting.
Reviewers: spatel, RKSimon, efriedma
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D35030
llvm-svn: 308618
Test constant folding both on node creation (which already works) and once the input nodes have been folded themselves (not working yet).
llvm-svn: 308611
This patch adds handling of the `long_call`, `far`, and `near`
attributes passed by front-end. The patch depends on D35479.
Differential revision: https://reviews.llvm.org/D35480.
llvm-svn: 308606
Most combines currently recognise scalar and splat-vector constants, but not non-uniform vector constants.
This patch introduces a matching mechanism that uses predicates to check against BUILD_VECTOR of ConstantSDNode, as well as scalar ConstantSDNode cases.
I've changed a couple of predicates to demonstrate - the combine-shl changes add currently unsupported cases, while the MatchRotate replaces an existing mechanism.
Differential Revision: https://reviews.llvm.org/D35492
llvm-svn: 308598
Summary:
This will allow us to merge the various sub-tables into a single table. This is a
compile-time saving at this point. However, this will also enable the optimization
of a table so that similar instructions can be tested together, reducing the time
spent on the matching the code.
The bulk of this patch is a mechanical conversion to the new MatchTable object
which is responsible for tracking label definitions and filling in the index of
the jump targets. It is also responsible for nicely formatting the table.
This was necessary to support the new GIM_Try opcode which takes the index to
jump to if the match should fail. This value is unknown during table
construction and is filled in during emission. To support nesting try-blocks
(although we currently don't emit tables with nested try-blocks), GIM_Reject
has been re-introduced to explicitly exit a try-block or fail the overall match
if there are no active try-blocks.
Reviewers: ab, t.p.northover, qcolombet, rovka, aditya_nandakumar
Reviewed By: rovka
Subscribers: kristof.beyls, igorb, llvm-commits
Differential Revision: https://reviews.llvm.org/D35117
llvm-svn: 308596
Introduced FSELECT node necesary when lowering ISD::SELECT
which has i32, f64, f64 as its operands.
SEL_D instruction required that its output and first operand
of a SELECT node, which it used, have matching types.
MTC1_D64 node introduced to aid FSELECT lowering.
This fixes machine verifier errors on following tests:
CodeGen/Mips/llvm-ir/select-dbl.ll
CodeGen/Mips/llvm-ir/select-flt.ll
CodeGen/Mips/select.ll
Differential Revision: https://reviews.llvm.org/D35408
llvm-svn: 308595
The soffset field needs to be be set to 0x7f to disable it,
not 0. 0 is interpreted as an SGPR offset.
This should be enough to get basic usage of the global instructions
working. Technically it is possible to use an SGPR_32 offset,
but I'm not sure if it's correct with 64-bit pointers, but
that is not handled now. This should also be cleaned up
to be more similar to how different MUBUF modes are handled,
and to have InstrMappings between the different types.
llvm-svn: 308583
SUMMARY
This patch adds a verification check on the abbreviation declarations in the .debug_abbrev section.
The check makes sure that no abbreviation declaration has more than one attributes with the same name.
Differential Revision: https://reviews.llvm.org/D35643
llvm-svn: 308579
As discussed on llvm-dev I've implemented the first basic steps towards
llvm-objcopy/llvm-objtool (name pending).
This change adds the ability to copy (without modification) 64-bit
little endian ELF executables that have SHT_PROGBITS, SHT_NOBITS,
SHT_NULL and SHT_STRTAB sections.
Patch by Jake Ehrlich
Differential Revision: https://reviews.llvm.org/D33964
llvm-svn: 308559
Add optimization remarks support to the PrologueEpilogueInserter. For
now, emit the stack size as an analysis remark, but more additions wrt
shrink-wrapping may be added.
https://reviews.llvm.org/D35645
llvm-svn: 308556
This change adds basic support for program headers.
I need to do some testing which requires generating program headers but
I can't use ld.lld or clang to produce programs that have headers. I'd
also like to test some strange things that those programs may never
produce.
Patch by Jake Ehrlich
Differential Revision: https://reviews.llvm.org/D35276
llvm-svn: 308520
This generalizes an existing fix from ELF to MachO and COFF.
Test that an ADRP to a local symbol whose offset is known at assembly
time still produces relocations, both for MachO and COFF. Test that
an ADRP without a @page modifier on MachO fails (previously it
didn't).
Differential Revision: https://reviews.llvm.org/D35544
llvm-svn: 308518
If the LowerTypeTests pass decides to add a function to a jump
table for CFI, it will add its name to the set cfiFunctionDefs,
which among other things will cause the function to be renamed in
the ThinLTO backend.
One other thing that we must do with such functions is to not
internalize them, because the jump table in the full LTO object will
contain a reference to the actual function body in the ThinLTO object.
This patch handles that by ensuring that we export any functions
whose names appear in the cfiFunctionDefs set.
Fixes PR33831.
Differential Revision: https://reviews.llvm.org/D35605
llvm-svn: 308504
This patch cleans up and fixes issues in the M-Class system register handling:
1. It defines the system registers and the encoding (SYSm values) in one place:
a new ARMSystemRegister.td using SearchableTable, thereby removing the
hand-coded values which existed in multiple places.
2. Some system registers e.g. BASEPRI_MAX_NS which do not exist were being allowed!
Ref: ARMv6/7/8M architecture reference manual.
Reviewed by: @t.p.northover, @olist01, @john.brawn
Differential Revision: https://reviews.llvm.org/D35209
llvm-svn: 308456
Summary:
When simplifying unconditional branches from empty blocks, we pre-test if the
BB belongs to a set of loop headers and keep the block to prevent passes from
destroying canonical loop structure. However, the current algorithm fails if
the destination of the branch is a loop header. Especially when such a loop's
latch block is folded into loop header it results in additional backedges and
LoopSimplify turns it into a nested loop which prevent later optimizations
from being applied (e.g., loop unrolling and loop interleaving).
This patch augments the existing algorithm by further checking if the
destination of the branch belongs to a set of loop headers and defer
eliminating it if yes to LateSimplifyCFG.
Fixes PR33605: https://bugs.llvm.org/show_bug.cgi?id=33605
Reviewers: efriedma, mcrosier, pacxx, hsung, davidxl
Reviewed By: efriedma
Subscribers: ashutosh.nema, gberry, javed.absar, llvm-commits
Differential Revision: https://reviews.llvm.org/D35411
llvm-svn: 308422
Generate a single test to decide if there are enough iterations to jump to the
vectorized loop, or else go to the scalar remainder loop. This test compares the
Scalar Trip Count: if STC < VF * UF go to the scalar loop. If
requiresScalarEpilogue() holds, at-least one iteration must remain scalar; the
rest can be used to form vector iterations. So in this case the test checks
instead if (STC - 1) < VF * UF by comparing STC <= VF * UF, and going to the
scalar loop if so. Otherwise the vector loop is entered for at-least one vector
iteration.
This test covers the case where incrementing the backedge-taken count will
overflow leading to an incorrect trip count of zero. In this (rare) case we will
also avoid the vector loop and jump to the scalar loop.
This patch simplifies the existing tests and effectively removes the basic-block
originally named "min.iters.checked", leaving the single test in block
"vector.ph".
Original observation and initial patch by Evgeny Stupachenko.
Differential Revision: https://reviews.llvm.org/D34150
llvm-svn: 308421
Allowing cycles in Phi traversal increases the scope of optimize memory instruction
in case we are in loop.
The added test shows an example of enabling optimization inside a loop.
Reviewers: loladiro, spatel, efriedma
Reviewed By: efriedma
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D35294
llvm-svn: 308419
That part was reverted because the underlying change necessitating it
(r308025) was reverted in r308271.
Nirav re-landed r308025 again in r308350, so re-landing this fix.
llvm-svn: 308418
functions.
In the prior commit, we provide ordering to the LCG between functions
and library function definitions that they might begin to call through
transformations. But we still would delete these library functions from
the call graph if they became dead during inlining.
While this immediately crashed, it also exposed a loss of information.
We shouldn't remove definitions of library functions that can still
usefully participate in the LCG-powered CGSCC optimization process. If
new call edges are formed, we want to have definitions to be called.
We can still remove these functions if truly dead using global-dce, etc,
but removing them during the CGSCC walk is premature.
This fixes a crash in the new PM when optimizing some unusual libraries
that end up with "internal" lib functions such as the code in the "R"
language's libraries.
llvm-svn: 308417
Summary:
This patch adds the following
1. Adds a skeleton scheduler model for AMD Znver1.
2. Introduces the znver1 execution units and pipes.
3. Caters the instructions based on the generic scheduler classes.
4. Further additions to the scheduler model with instruction itineraries will be carried out incrementally based on
a. Instructions types
b. Registers used
5. Since itineraries are not added based on instructions, throughput information are bound to change when incremental changes are added.
6. Scheduler testcases are modified accordingly to suit the new model.
Patch by Ganesh Gopalasubramanian. With minor formatting tweaks from me.
Reviewers: craig.topper, RKSimon
Subscribers: javed.absar, shivaram, ddibyend, vprasad
Differential Revision: https://reviews.llvm.org/D35293
llvm-svn: 308411
Install an llvm-readelf symlink to llvm-readobj.
When invoked as *readelf*, default to -elf-output-style=GNU.
Patch by Roland McGrath
Differential Revision: https://reviews.llvm.org/D33869
llvm-svn: 308408
Summary: Currently, when GVN creates a load and when InstCombine creates a new store for unreachable Load, the DebugLoc info gets lost.
Reviewers: dberlin, davide, aprantl
Reviewed By: aprantl
Subscribers: davide, llvm-commits
Differential Revision: https://reviews.llvm.org/D34639
llvm-svn: 308404
DIImportedEntity has a line number, but not a file field. To determine
the decl_line/decl_file we combine the line number from the
DIImportedEntity with the file from the DIImportedEntity's scope. This
does not work correctly when the parent scope is a DINamespace or a
DIModule, both of which do not have a source file.
This patch adds a file field to DIImportedEntity to unambiguously
identify the source location of the using/import declaration. Most
testcase updates are mechanical, the interesting one is the removal of
the FIXME in test/DebugInfo/Generic/namespace.ll.
This fixes PR33822. See https://bugs.llvm.org/show_bug.cgi?id=33822
for more context.
<rdar://problem/33357889>
https://bugs.llvm.org/show_bug.cgi?id=33822
Differential Revision: https://reviews.llvm.org/D35583
llvm-svn: 308398
Accept and ignore --wide/-W. In GNU readelf this switch is
necessary to get the output format that's consistent between
32-bit and 64-bit targets. llvm-readobj always produces that
output format.
Patch by Roland McGrath
Differential Revision: https://reviews.llvm.org/D33873
llvm-svn: 308396
In GNU readelf, the short option for --sections is upper-case -S.
Note that GNU uses lower-case -s to mean --symbols, while LLVM
uses -s to mean --sections and -t to mean --symbols (-t has yet a
different meaning in GNU). So command-line uses with -S can now
be compatible, but uses with -s or -t are still incompatible.
Patch by Roland McGrath
Differential Revision: https://reviews.llvm.org/D33872
llvm-svn: 308392
Summary:
ASan determines the stack layout from alloca instructions. Since
arguments marked as "byval" do not have an explicit alloca instruction, ASan
does not produce red zones for them. This commit produces an explicit alloca
instruction and copies the byval argument into the allocated memory so that red
zones are produced.
Submitted on behalf of @morehouse (Matt Morehouse)
Reviewers: eugenis, vitalybuka
Reviewed By: eugenis
Subscribers: hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D34789
llvm-svn: 308387
A PE COFF spec compliant import library generator.
Intended to be used with mingw-w64.
Supports:
PE COFF spec (section 8, Import Library Format)
PE COFF spec (Aux Format 3: Weak Externals)
Reviewed By: ruiu
Differential Revision: https://reviews.llvm.org/D29892
This reapplies rL308329, which was reverted in rL308374
llvm-svn: 308379
Re-recommiting after landing DAG extension-crash fix.
Recommiting after adding check to avoid miscomputing alias information
on addresses of the same base but different subindices.
Memory accesses offset from frame indices may alias, e.g., we
may merge write from function arguments passed on the stack when they
are contiguous. As a result, when checking aliasing, we consider the
underlying frame index's offset from the stack pointer.
Static allocs are realized as stack objects in SelectionDAG, but its
offset is not set until post-DAG causing DAGCombiner's alias check to
consider access to static allocas to frequently alias. Modify isAlias
to consider access between static allocas and access from other frame
objects to be considered aliasing.
Many test changes are included here. Most are fixes for tests which
indirectly relied on our aliasing ability and needed to be modified to
preserve their original intent.
The remaining tests have minor improvements due to relaxed
ordering. The exception is CodeGen/X86/2011-10-19-widen_vselect.ll
which has a minor degradation dispite though the pre-legalized DAG is
improved.
Reviewers: rnk, mkuper, jonpa, hfinkel, uweigand
Reviewed By: rnk
Subscribers: sdardis, nemanjai, javed.absar, llvm-commits
Differential Revision: https://reviews.llvm.org/D33345
llvm-svn: 308350
Added a feature to the Sparc back-end that replaces the integer multiply and
divide instructions with calls to .mul/.sdiv/.udiv. This is a step towards
having full v7 support.
Patch by: Eric Kedaigle
Differential Revision: https://reviews.llvm.org/D35500
llvm-svn: 308343
When replacing a node and it's operand, replacing the operand node may
cause the deletion of the original node leading to an assertion
failure. Case around these replacements to avoid this without relying
on inspecting the DELETED_NODE opcode in various extend
dagcombiner cases.
Fixes PR32515.
Reviewers: dbabokin, RKSimon, davide, chandlerc
Subscribers: chandlerc, llvm-commits
Differential Revision: https://reviews.llvm.org/D34095
llvm-svn: 308330
A PE COFF spec compliant import library generator.
Intended to be used with mingw-w64.
Supports:
PE COFF spec (section 8, Import Library Format)
PE COFF spec (Aux Format 3: Weak Externals)
Reviewed By: ruiu
Differential Revision: https://reviews.llvm.org/D29892
llvm-svn: 308329
As an approximation of the existing handling to avoid
regressions. Fixes using too many registers with calls
on subtargets with the SGPR allocation bug.
llvm-svn: 308326
Introduce pseudo-registers for registers needed for stack
access, which are replaced during finalizeLowering.
Note these pseudo-registers are currently only used for the
used register location, and not for determining their
input argument register.
This is better because it avoids the need to try to predict
whether a call will be emitted from the IR, and also
detects stack objects introduced by legalization.
Test changes are from the HasStackObjects check being more
accurate since stack objects introduced during legalization
are now known.
llvm-svn: 308325
It should be a win to avoid going out to the system lib for all small memcmp() calls using scalar ops. For x86 32-bit, this means most everything up to 16 bytes. For 64-bit, that doubles because we can do 8-byte loads.
Notes:
Reduced from 4 to 2 loads for -Os behavior, which might not be optimal in all cases. It's effectively a question of how much do we trust the system implementation. Linux and macOS (and Windows I assume, but did not test) have optimized memcmp() code for x86, so it's probably not bad either way? PPC is using 8/4 for defaults on these. We do not expand at all for -Oz.
There are still potential improvements to make for the CGP expansion IR and/or lowering such as avoiding select-of-constants (D34904) and not doing zexts to the max load type before doing a compare.
We have special-case SSE/AVX codegen for (memcmp(x, y, 16/32) == 0) that will no longer be produced after this patch. I've shown the experimental justification for that change in PR33329:
https://bugs.llvm.org/show_bug.cgi?id=33329#c12
TLDR: While the vector code is a likely winner, we can't guarantee that it's a winner in all cases on all CPUs, so I'm willing to sacrifice it for the greater good of expanding all small memcmp(). If we want to resurrect that codegen, it can be done by adjusting the CGP params or poking a hole to let those fall-through the CGP expansion.
Committed on behalf of Sanjay Patel
Differential Revision: https://reviews.llvm.org/D35067
llvm-svn: 308322
The flag "-hexagon-emit-lut-text" (defaulted to false) is added to decide
on where to keep the switch generated lookup table.
Differential Revision: https://reviews.llvm.org/D34818
llvm-svn: 308316
Summary:
When an immediate is folded by constant folding, we re-scan the entire
use list for two reasons:
1. The constant folding may have created a new use of the same reg.
2. The constant folding may have removed an additional use in the list
we're currently traversing (e.g., constant folding an S_ADD_I32 c, c).
However, this could previously lead to a crash when an unrelated use was
added twice into the FoldList. Since we re-scan the whole list anyway, we
might as well just clear the FoldList again before we do so.
Using a MIR test to show this because real code seems to trigger the issue
only in connection with some really subtle control flow structures.
Fixes GL45-CTS.shading_language_420pack.binding_images on gfx9.
Reviewers: arsenm
Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, llvm-commits, t-tye
Differential Revision: https://reviews.llvm.org/D35416
llvm-svn: 308314
As discussed by @spatel on D35067:
"I added the cmov attribute to the 32-bit codegen test because it removes some noise for that file. I think the intent for the SSE vs no-SSE runs is to show the potential difference for the 16 and 32 byte cases rather than the lack of cmov (which has been available for all CPUs since SSE1, so that's why it shows up automatically with -mattr=sse2)."
llvm-svn: 308309
Summary:
G_FMA was recently added to GlobalISel which enables the import of rules
involving fma. Add the mapping to allow it.
Reviewers: ab, t.p.northover, qcolombet, rovka, aditya_nandakumar
Reviewed By: rovka
Subscribers: kristof.beyls, javed.absar, igorb, llvm-commits
Differential Revision: https://reviews.llvm.org/D35130
llvm-svn: 308308
This change introduces additional machine instructions in functions
dealing with the expansion of msa pseudo f16 instructions due to
register classes being inappropriate when checked with machine
verifier.
Differential Revision: https://reviews.llvm.org/D34276
llvm-svn: 308301
using runtime checks
Extend the SCEVPredicateRewriter to work a bit harder when it encounters an
UnknownSCEV for a Phi node; Try to build an AddRecurrence also for Phi nodes
whose update chain involves casts that can be ignored under the proper runtime
overflow test. This is one step towards addressing PR30654.
Differential revision: http://reviews.llvm.org/D30041
llvm-svn: 308299
Coverage hooks that take less-than-64-bit-integers as parameters need the
zeroext parameter attribute (http://llvm.org/docs/LangRef.html#paramattrs)
to make sure they are properly extended by the x86_64 ABI.
llvm-svn: 308296
Summary:
Currently most tests for the loop interchange pass are in
test/Transforms/LoopInterchange/interchange.ll. This patch splits up the
large test file in smaller pieces, which makes debugging test failures
easier.
Reviewers: karthikthecool, blitz.opensource, hfinkel
Reviewed By: hfinkel
Subscribers: hfinkel, mcrosier, mkuper, mzolotukhin, mssimpso, llvm-commits
Differential Revision: https://reviews.llvm.org/D35488
llvm-svn: 308284