Since the calls don't return, the instruction afterwards will never run,
and is just taking up unnecessary space in the binary.
Differential Revision: http://reviews.llvm.org/D20406
llvm-svn: 270109
It broke buildbot:
http://lab.llvm.org:8011/builders/clang-s390x-linux/builds/4817/steps/ninja%20check%201/logs/stdio
Actually it is just because D20273 not yet commited, but these 2 were crossing with each other,
and I`ll better find the way to land them separatelly soon.
Initial commit message:
[llvm-mc] - Teach llvm-mc to generate compressed debug sections in zlib style.
Before this patch llvm-mc generated zlib-gnu styled sections.
That means no SHF_COMPRESSED flag was set, magic 'zlib' signature
was used in combination with full size field. Sections were renamed to "*.z*".
This patch reimplements the compression style to zlib one as zlib-gnu looks
to be depricated everywhere.
Differential revision: http://reviews.llvm.org/D20331
llvm-svn: 270075
This patch changes the order in which we attempt to prove the independence of
strided accesses. We previously did this after we knew the dependence distance
was positive. With this change, we check for independence before handling the
negative distance case. The patch prevents LAA from reporting forward
dependences for independent strided accesses.
This change was requested in the review of D19984.
llvm-svn: 270072
Before this patch llvm-mc generated zlib-gnu styled sections.
That means no SHF_COMPRESSED flag was set, magic 'zlib' signature
was used in combination with full size field. Sections were renamed to "*.z*".
This patch reimplements the compression style to zlib one as zlib-gnu looks
to be depricated everywhere.
Differential revision: http://reviews.llvm.org/D20331
llvm-svn: 270070
Mask0Imm and ~Mask1Imm must be equivalent and one of the MaskImms is a shifted
mask (e.g., 0x000ffff0). Both 'and's must have a single use.
This changes code like:
and w8, w0, #0xffff000f
and w9, w1, #0x0000fff0
orr w0, w9, w8
into
lsr w8, w1, #4
bfi w0, w8, #4, #12
llvm-svn: 270063
- Renamed intrinsics.ll to intrinsics-coprocessor.ll
as all the tests were testing coprocessor instructions,
also made the test checks match the full instruction.
Differential Revision: http://reviews.llvm.org/D20393
llvm-svn: 270057
Fixes for MUBUF_Atomic instructions to make operand list valid:
- For RTN insns, make a copy of $vdata_in operand as $vdata.
- Do not add operand for GLC, it is hardcoded and comes as a token.
Workaround to avoid adding multiple default optional operands.
Tests added.
Differential Revision: http://reviews.llvm.org/D20257
llvm-svn: 270049
Enable "Remove Redundant LEAs" part of the LEA optimization pass for -O2.
This gives 6.4% performance improve on Broadwell on nnet benchmark from Coremark-pro.
There is no significant effect on other benchmarks (Geekbench, Spec2000, Spec2006).
Differential Revision: http://reviews.llvm.org/D19659
llvm-svn: 270036
Summary:
Implement guard widening in LLVM. Description from GuardWidening.cpp:
The semantics of the `@llvm.experimental.guard` intrinsic lets LLVM
transform it so that it fails more often that it did before the
transform. This optimization is called "widening" and can be used hoist
and common runtime checks in situations like these:
```
%cmp0 = 7 u< Length
call @llvm.experimental.guard(i1 %cmp0) [ "deopt"(...) ]
call @unknown_side_effects()
%cmp1 = 9 u< Length
call @llvm.experimental.guard(i1 %cmp1) [ "deopt"(...) ]
...
```
to
```
%cmp0 = 9 u< Length
call @llvm.experimental.guard(i1 %cmp0) [ "deopt"(...) ]
call @unknown_side_effects()
...
```
If `%cmp0` is false, `@llvm.experimental.guard` will "deoptimize" back
to a generic implementation of the same function, which will have the
correct semantics from that point onward. It is always _legal_ to
deoptimize (so replacing `%cmp0` with false is "correct"), though it may
not always be profitable to do so.
NB! This pass is a work in progress. It hasn't been tuned to be
"production ready" yet. It is known to have quadriatic running time and
will not scale to large numbers of guards
Reviewers: reames, atrick, bogner, apilipenko, nlewycky
Subscribers: mcrosier, llvm-commits
Differential Revision: http://reviews.llvm.org/D20143
llvm-svn: 269997
Summary: Set default branch weight to 1:1 if one of the branch has profile missing when simplifying CFG.
Reviewers: spatel, davidxl
Subscribers: danielcdh, llvm-commits
Differential Revision: http://reviews.llvm.org/D20307
llvm-svn: 269995
Having an enum member named Default is quite confusing: Is it distinct
from the others?
This patch removes that member and instead uses Optional<Reloc> in
places where we have a user input that still hasn't been maped to the
default value, which is now clear has no be one of the remaining 3
options.
llvm-svn: 269988
When looking for an available spill slot, the register scavenger would stop
after finding the first one with no register assigned to it. That slot may
have size and alignment that do not meet the requirements of the register
that is to be spilled. Instead, find an available slot that is the closest
in size and alignment to one that is needed to spill a register from RC.
Differential Revision: http://reviews.llvm.org/D20295
llvm-svn: 269969
with an additional fix to make RegAllocFast ignore undef physreg uses. It would
previously get confused about the "push %eax" instruction's use of eax. That
method for adjusting the stack pointer is used in X86FrameLowering::emitSPUpdate
as well, but since that runs after register-allocation, we didn't run into the
RegAllocFast issue before.
llvm-svn: 269949
Don't expand divisions by constants if it would require multiple instructions.
The current assumption is that engines will perform the desired optimizations.
llvm-svn: 269930
Summary:
The ordering of registers in BinaryRRF instructions are wrong, and
affects the copysign instruction (CPSDR). This results in the wrong
magnitude and sign being set.
Author: zhanjunl
Reviewers: kbarton, uweigand
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D20308
llvm-svn: 269922
Summary:
MONITORX/MWAITX instructions provide similar capability to the MONITOR/MWAIT
pair while adding a timer function, such that another termination of the MWAITX
instruction occurs when the timer expires. The presence of the MONITORX and
MWAITX instructions is indicated by CPUID 8000_0001, ECX, bit 29.
The MONITORX and MWAITX instructions are intercepted by the same bits that
intercept MONITOR and MWAIT. MONITORX instruction establishes a range to be
monitored. MWAITX instruction causes the processor to stop instruction execution
and enter an implementation-dependent optimized state until occurrence of a
class of events.
Opcode of MONITORX instruction is "0F 01 FA". Opcode of MWAITX instruction is
"0F 01 FB". These opcode information is used in adding tests for the
disassembler.
These instructions are enabled for AMD's bdver4 architecture.
Patch by Ganesh Gopalasubramanian!
Reviewers: echristo, craig.topper, RKSimon
Subscribers: RKSimon, joker.eph, llvm-commits
Differential Revision: http://reviews.llvm.org/D19795
llvm-svn: 269911
MC only needs to know if the output is PIC or not. It never has to
decide about creating GOTs and PLTs for example. The only thing that
MC itself uses this information for is expanding "macros" in sparc and
mips. The rest I am pretty sure could be moved to CodeGen.
This is a cleanup and isolates the code from future changes to
Reloc::Model.
llvm-svn: 269909
Restrict the creation of compact branches so that they meet the ISA encoding
requirements. Notably do not permit $zero to be used as a operand for compact
branches and ensure that some other branches fulfil the requirement that
rs != rt.
Fixup cases where $rs > $rt for bnec and beqc.
Reviewers: dsanders, vkalintiris
Differential Review: http://reviews.llvm.org/D20284
llvm-svn: 269893
This change adds support for software floating point operations for Sparc targets.
This is the first in a set of patches to enable software floating point on Sparc. The next patch will enable the option to be used with Clang.
Differential Revision: http://reviews.llvm.org/D19265
llvm-svn: 269892
Coverage mapping data is organized in a sequence of blocks, each of which is expected
to be aligned by 8 bytes. This feature is used when reading those blocks, see
VersionedCovMapFuncRecordReader::readFunctionRecords(). If a misaligned covearge
mapping data has more than one block, it causes llvm-cov to fail.
Differential Revision: http://reviews.llvm.org/D20285
llvm-svn: 269887
I don't yet fully understand the meaning of these data strcutures,
but at least it seems that their sizes and types are correct.
With this change, we can read publics streams till end.
Differential Revision: http://reviews.llvm.org/D20343
llvm-svn: 269861
We currently don't represent get_local and set_local explicitly; they
are just implied by virtual register use and def. This avoids a lot of
clutter, but it does complicate stackifying: get_locals read their
operands at their position in the stack evaluation order, rather than
at their parent instruction. This patch adds code to walk the stack to
determine the precise ordering, when needed.
llvm-svn: 269854
This patch moves the expansion of WIN_ALLOCA pseudo-instructions
into a separate pass that walks the CFG and lowers the instructions
based on a conservative estimate of the offset between the stack
pointer and the lowest accessed stack address.
The goal is to reduce binary size and run-time costs by removing
calls to _chkstk. While it doesn't fix all the code quality problems
with inalloca calls, it's an incremental improvement for PR27076.
Differential Revision: http://reviews.llvm.org/D20263
llvm-svn: 269828
As discovered in PR27758, GDB does not fully support the DWARF 4 format.
This patch ensures we always emit bitfields in the DWARF 2 when tuning for GDB.
llvm-svn: 269827
When processing inline asm that contains errors, make sure we can recover
gracefully by creating an UNDEF SDValue for the inline asm statement before
returning from SelectionDAGBuilder::visitInlineAsm. This is necessary for
consumers that don't exit on the first error that is emitted (e.g. clang)
and that would assert later on.
Fixes PR24071.
Patch by Diana Picus.
llvm-svn: 269811
This adds support for all the MachO *_command structures. The load_command payloads still are not represented, but that will come next.
llvm-svn: 269808
Summary:
Colons can appear in Windows paths after drive letters. Both colon and
semicolon are valid characters in filenames, but neither are very
common. Semicolon seems just as good, and makes the test pass on
Windows.
Reviewers: tejohnson
Subscribers: joker.eph, llvm-commits
Differential Revision: http://reviews.llvm.org/D20332
llvm-svn: 269798
... for AddRec's in loops for which SCEV is unable to compute a max
tripcount. This is the NUW variant of r269211 and fixes PR27691.
(Note: PR27691 is not a correct or stability bug, it was created to
track a pending task).
llvm-svn: 269790
when the object is in an archive to use something like libx.a(foo.o) as part of
the error message.
Also changed llvm-objdump and llvm-size to be like llvm-nm and ignore non-object
files in archives and not produce any error message.
To do this Archive::Child::getAsBinary() was changed from ErrorOr<...> to
Expected<...> then that was threaded up to its users.
Converting this interface to Expected<> from ErrorOr<> does involve
touching a number of places. To contain the changes for now the use of
errorToErrorCode() is still used in one place yet to be fully converted.
Again there some were bugs in the existing code that did not deal with the
old ErrorOr<> return values. So now with Expected<> since they must be
checked and the error handled, I added a TODO and a comments for those.
llvm-svn: 269784
This adds support for all the MachO *_command structures. The load_command payloads still are not represented, but that will come next.
llvm-svn: 269782
This just checks that we emit all type records once, and then after
merging the type stream with no other type streams, we still emit every
kind of type record.
We could test the dumper output more closely, but that would make the
test very brittle. Currently we're just getting coverage.
llvm-svn: 269778
Since r207518 they are printed exactly like non-hidden stubs on x86 and
since r207517 on ARM.
This means we can use a single set for all stubs in those platforms.
llvm-svn: 269776
Summary:
Add support to control where files for a distributed backend (the
individual index files and optional imports files) are created.
This is invoked with a new thinlto-prefix-replace option in the gold
plugin and llvm-lto. If specified, expects a string of the form
"oldprefix:newprefix", and instead of generating these files in the
same directory path as the corresponding bitcode file, will use a path
formed by replacing the bitcode file's path prefix matching oldprefix
with newprefix.
Also add a new replace_path_prefix helper to Path.h in libSupport.
Depends on D19636.
Reviewers: joker.eph
Subscribers: llvm-commits, joker.eph
Differential Revision: http://reviews.llvm.org/D19644
llvm-svn: 269771
This is assertion is no longer necessary since we never record
constants in the live set anyway. (They are never recorded in
the initial live set, and constant bases are removed near line 2119)
Differential Revision: http://reviews.llvm.org/D20293
llvm-svn: 269764
The movw instruction is only available in ARM state for V6T2 and above.
The MOVi16 instruction has requirement HasV6T2 but the InstAlias
for mov rd, imm where the operand is imm0_65535_expr:$imm does not.
This means that movw can incorrectly be used in ARMv4 and ARMv5 by
writing mov rd, 0x1234. The simple fix is to the requirement HasV6T2
to the InstAlias. Tests added to not-armv4.s.
Patch by Peter Smith.
llvm-svn: 269761
This patch adds the commandline option -mips-compact-branches={never,optimal,always),
which controls how LLVM generates compact branches for MIPS targets. By
default, the compact branch policy is 'optimal' where LLVM will (hopefully)
pick the optimal branch for any situation. The 'never' policy will disable
the generation of compact branches and 'always' will generate compact branches
wherever possible.
Reviewers: dsanders
Differential Review: http://reviews.llvm.org/D20167
llvm-svn: 269753
MachineInstr::isSafeToMove is more conservative than is needed here;
use a more explicit check, and incorporate knowledge of some
WebAssembly-specific opcodes.
llvm-svn: 269736
The DWARF spec states that a member entry may have either a
DW_AT_data_member_location or a DW_AT_data_bit_offset, but not both.
This fixes a bug found in PR 27758.
llvm-svn: 269731
Fix a bug introduced with rL269426 :
[InstCombine] canonicalize* LE/GE vector integer comparisons to LT/GT (PR26701, PR26819)
We were assuming that a ConstantDataVector / ConstantVector / ConstantAggregateZero operand of
an ICMP was composed of ConstantInt elements, but it might have ConstantExpr or UndefValue
elements. Handle those appropriately.
Also, refactor this function to join the scalar and vector paths and eliminate the switches.
Differential Revision: http://reviews.llvm.org/D20289
llvm-svn: 269728
This diagnostic could be improved by adding the name of the input file
containing the invalid data and/or some information about how to
identify the specific offending attribute/tag in the input. But that's
not an immediate priority as these corner cases of invalid input
shouldn't come up too often.
llvm-svn: 269727
This code currently relies on static methods in ProfileSummary to determine whether a function is hot or unlikley. I am refactoring the ProfileSummary code and these methods will be removed. As discussed offline, the right way to re-introduce this is to add a pass to annotate functions with unlikely/hot hints and use the hints to determine the prefix here.
llvm-svn: 269726
The diagnostic could be improved a bit to include information about
which input file had the mistake (& which unit (counted, since the name
of the unit won't be accessible) within the input).
llvm-svn: 269723
The DWARF spec clearly states that a bit field member should have either a
DW_AT_byte_size or a DW_AT_bit_size, but not both.
Also the DW_AT_byte_size is redundant with the size of the type of the member.
This fixes a bug found in PR 27758.
llvm-svn: 269714
This was assuming it could use all memory before, which is
a bad decision because it restricts occupancy.
By default, only try to use enough space that could reduce
occupancy to 7, an arbitrarily chosen limit.
Based on the exist LDS usage, try to round up to the limit
in the current tier instead of further hurting occupancy.
This isn't ideal, because it doesn't accurately know how much
space is going to be used for alignment padding.
llvm-svn: 269708
In practice only a few well known appending linkage variables work.
Currently if codegen sees an unknown appending linkage variable it will
just print it as a regular global. That is wrong as the symbol in the
produced object file has different semantics as the one provided by the
appending linkage.
This just errors early instead of producing a broken .o.
llvm-svn: 269706
Allow two users of the condition if the other user
is also a min/max select. i.e.
%c = icmp slt i32 %x, %y
%min = select i1 %c, i32 %x, i32 %y
%max = select i1 %c, i32 %y, i32 %x
llvm-svn: 269699
Actually use the error return path rather than printing the duplicate
information then a separate error. But also just tidy up/deduplicate
some of the code for generating the diagnostic text.
llvm-svn: 269692
Summary: On Linux, /usr/include/bits/byteswap-16.h defines __byteswap_16(x) as an inlined LRVH (Load Reversed Half-word) instruction. The SystemZ back-end did not support this opcode and the inlined assembly would cause a fatal error.
Reviewers: bryanpkc, uweigand
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D18732
llvm-svn: 269688
The new X86 shuffle lowering can do just fine without transforming vselects
into vector_shuffles. It looks like the only thing this code does right now
is cause trouble - in particular, it can lead to combine/legalization infinite
loops.
Note that it's not completely NFC, since some of the shuffle masks get inverted,
which may cause slight differences further down the line. We may want to find
a way to invert those masks, but that's orthogonal to this commit.
This fixes the hang in PR27689.
llvm-svn: 269676
This patch renames the option enabling the store-to-load forwarding conflict
detection optimization. This change was requested in the review of D20241.
llvm-svn: 269668
The selection of the vectorization factor currently doesn't consider
interleaved accesses. The vectorization factor is based on the maximum safe
dependence distance computed by LAA. However, for loops with interleaved
groups, we should instead base the vectorization factor on the maximum safe
dependence distance divided by the maximum interleave factor of all the
interleaved groups. Interleaved accesses not in a group will be scalarized.
Differential Revision: http://reviews.llvm.org/D20241
llvm-svn: 269659
Without a diagnostic handler installed, llc's behaviour is to exit on the first
error that it encounters. This is very different from the behaviour of clang
and other front ends, which try to gather as many errors as possible before
exiting.
This commit adds a diagnostic handler to llc, allowing it to find and report
more than one error. The old behaviour is preserved under a flag (-exit-on-error).
Some of the tests fail with the new diagnostic handler, so they have to use the
new flag in order to run under the previous behaviour. Some of these are known
bugs, others need further investigation. Ideally, we should fix the tests and
remove the flag at some point in the future.
Reapplied after fixing the LLDB build that was broken due to the new
DiagnosticSeverity in LLVMContext.h, and fixed an UB in the new change.
Patch by Diana Picus.
llvm-svn: 269655
This patch uses PSHUFB to lower vector CTLZ and avoid (slower) scalarizations.
The leading zero count of each 4-bit nibble of the vector is determined by using a PSHUFB lookup. Pairs of results are then repeatedly combined up to the original element width.
Differential Revision: http://reviews.llvm.org/D20016
llvm-svn: 269646
Summary:
The failure r269410 worked around turned out to be caused by an incorrect
evaluation of R_MICROMIPS_GOT16 which then caused the GOT entries to be
incorrect.
This patch fixes the evaluation and reverts r269410.
Reviewers: sdardis, vkalintiris, rafael
Subscribers: rafael, dsanders, sdardis, llvm-commits
Differential Revision: http://reviews.llvm.org/D20242
llvm-svn: 269641
Summary:
This fixes PR27682. Additionally, '.set micromips' by itself is not sufficient
to raise the EF_MIPS_MICROMIPS flag. It is also necessary to emit a microMIPS
instruction. This has also been fixed.
Reviewers: sdardis, vkalintiris, rafael
Subscribers: rafael, dsanders, sdardis, llvm-commits
Differential Revision: http://reviews.llvm.org/D20214
llvm-svn: 269639
Author: obucina
Reviewers: dsanders
Adds support for third operand for [D]DIV[U] instructions. Additional test for case when destination reg is zero register
Differential Revision: http://reviews.llvm.org/D16888
llvm-svn: 269636
Added constant index tests for all 256-bit integer vector types (touching lower / upper 128-bits)
Added variable index tests for all 256-bit integer vector types
Added out-of-range index tests for all 256-bit integer vector types
llvm-svn: 269600
Vector GEP with mixed (vector and scalar) indices failed on the InstSimplify Pass when all indices are constants.
Differential revision http://reviews.llvm.org/D20149
llvm-svn: 269590
It seems that cl will emit the export directives for Windows ARM targets. The
fact that it did this had originally been missed and this functionality was
never implemented. This makes it possible to rely solely on the source code for
indicating what the exported interfaces are and brings us more compatibility
with cl.
llvm-svn: 269574
This reverts;
r269548, "XFAIL ThinLTO Caching test on Windows."
r269561, "Rework r269548, "XFAIL ThinLTO Caching test on Windows.", not to use XFAIL, for now."
llvm-svn: 269567
Without a diagnostic handler installed, llc's behaviour is to exit on the first
error that it encounters. This is very different from the behaviour of clang
and other front ends, which try to gather as many errors as possible before
exiting.
This commit adds a diagnostic handler to llc, allowing it to find and report
more than one error. The old behaviour is preserved under a flag (-exit-on-error).
Some of the tests fail with the new diagnostic handler, so they have to use the
new flag in order to run under the previous behaviour. Some of these are known
bugs, others need further investigation. Ideally, we should fix the tests and
remove the flag at some point in the future.
Reapplied after fixing the LLDB build that was broken due to the new
DiagnosticSeverity in LLVMContext.h.
Patch by Diana Picus.
llvm-svn: 269563
Summary:
The MIPS IAS can now pass 'ninja check-all', recurse, build a bootable linux
kernel, and pass a variety of LNT testing.
Unfortunately we can't enable it by default for 64-bit targets yet since the N32
ABI is still very buggy and this also means we can't enable it for N64 either
because we can't distinguish between N32 and N64 in the relevant code.
Reviewers: vkalintiris
Subscribers: cfe-commits
Differential Revision: http://reviews.llvm.org/D18759
Differential Revision: http://reviews.llvm.org/D18761
llvm-svn: 269560
This reverts commit r269538 and r269542.
"rename()" is expected to fail across filesystems, will handle this.
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 269543
compiler-rt/libgcc shift routines expect the shift count to be an i32, so
use i32 as the shift count for shifts that are legalized to libcalls. This
also reverts r268991, now that the signatures are correct.
llvm-svn: 269531
Summary:
This code is intended to be used as part of LLD's PDB writing. Until
that exists, this is exposed via llvm-readobj for testing purposes.
Type stream merging uses the following algorithm:
- Begin with a new empty stream, and a new empty hash table that maps
from type record contents to new type index.
- For each new type stream, maintain a map from source type index to
destination type index.
- For each record, copy it and rewrite its type indices to be valid in
the destination type stream.
- If the new type record is not already present in the destination
stream hash table, append it to the destination type stream, assign it
the next type index, and update the two hash tables.
- If the type record already exists in the destination stream, discard
it and update the type index map to forward the source type index to
the existing destination type index.
Reviewers: zturner, ruiu
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D20122
llvm-svn: 269521
Publics stream seems to contain information as to public symbols.
It actually contains a serialized hash table along with fixed-sized
headers. This patch is not complete. It scans only till the end of
the stream and dump the header information. I'll write code to
de-serialize the hash table later.
Reviewers: zturner
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D20256
llvm-svn: 269484
Fixing bots failure. test/ExecutionEngine/RuntimeDyld/SystemZ/cfi-relo-pc64.s
requires SystemZ backend. Mark the test as unsupported if the backend is not
available.
llvm-svn: 269470
Summary: This way we can get rid of one of the fields in the .def file.
Reviewers: llvm-commits
Subscribers: zturner
Differential Revision: http://reviews.llvm.org/D20251
llvm-svn: 269461
When setting the frame pointer, the offset from SP is calculated based on the
stack slot it gets allocated, but this slot is in turn based on the order of
the CSR list so that list should match the order we actually save the registers
in. Mostly it did, but in the edge-case of MachO AAPCS targets it was wrong.
llvm-svn: 269459
Recent changes to the instruction selection code exposed a problem where
a dead node was not removed on time. This node had both input and output
chains, which lead to an apparent cycle.
llvm-svn: 269458
- Insert one nop for each high level statement instead of two
- Do not insert nop before prologue
Differential Revision: http://reviews.llvm.org/D20215
llvm-svn: 269452
Most immediates are printed in Aarch64InstPrinter using 'formatImm' macro,
but not all of them.
Implementation contains following rules:
- floating point immediates are always printed as decimal
- signed integer immediates are printed depends on flag settings
(for negative values 'formatImm' macro prints the value as i.e -0x01
which may be convenient when imm is an address or offset)
- logical immediates are always printed as hex
- the 64-bit immediate for advSIMD, encoded in "a🅱️c:d:e:f:g:h" is always printed as hex
- the 64-bit immedaite in exception generation instructions like:
brk, dcps1, dcps2, dcps3, hlt, hvc, smc, svc is always printed as hex
- the rest of immediates is printed depends on availability
of -print-imm-hex
Signed-off-by: Maciej Gabka <maciej.gabka@arm.com>
Signed-off-by: Paul Osmialowski <pawel.osmialowski@arm.com>
Differential Revision: http://reviews.llvm.org/D16929
llvm-svn: 269446
This patch adds basic support for MachO::load_command. Load command types and sizes are encoded in the YAML and expanded back into MachO.
The YAML doesn't yet support load command structs, that is coming next. In the meantime as a temporary measure when writing MachO files the load commands are padded with zeros so that the generated binary is valid.
llvm-svn: 269442
Summary: When the MCJIT generates ELF code, some DWARF data requires 64-bit PC-relative relocation (R_390_PC64). This patch adds support for R_390_PC64 relocation to RuntimeDyld::resolveSystemZRelocation, to avoid an assertion failure.
Reviewers: uweigand
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D20033
llvm-svn: 269436
Summary: This change fix the bug in isProfitableToUseMemset() where MaxIntSize shoule be in byte, not bit.
Reviewers: arsenm, joker.eph, mcrosier
Subscribers: mcrosier, llvm-commits
Differential Revision: http://reviews.llvm.org/D20176
llvm-svn: 269433
This reverts commit r269428, as it breaks the LLDB build. We need to
understand how to change LLDB in the same way as LLC before landing this
again.
llvm-svn: 269432
Without a diagnostic handler installed, llc's behaviour is to exit on the first
error that it encounters. This is very different from the behaviour of clang
and other front ends, which try to gather as many errors as possible before
exiting.
This commit adds a diagnostic handler to llc, allowing it to find and report
more than one error. The old behaviour is preserved under a flag (-exit-on-error).
Some of the tests fail with the new diagnostic handler, so they have to use the
new flag in order to run under the previous behaviour. Some of these are known
bugs, others need further investigation. Ideally, we should fix the tests and
remove the flag at some point in the future.
Patch by Diana Picus.
llvm-svn: 269428
*We don't currently handle the edge case constants (min/max values), so it's not a complete
canonicalization.
To fully solve the motivating bugs, we need to enhance this to recognize a zero vector
too because that's a ConstantAggregateZero which is a ConstantData, not a ConstantVector
or a ConstantDataVector.
Differential Revision: http://reviews.llvm.org/D17859
llvm-svn: 269426
It's not entirely clear why R_MICROMIPS_(GOT|HI16|LO16) are evaluated
incorrectly in a small number of the LNT tests at this point. However, it's not
related to the STO_MIPS_MICROMIPS issue.
At this point all the microMIPS-related changes of r268900 have been reverted.
llvm-svn: 269410
We only really need this to be true for SIFixSGPRCopies.
I'm not sure there's any way this could happen before that point.
Fixes a case where MachineCSE could introduce a cross block
scc use.
llvm-svn: 269391
Summary:
...loop after the last iteration.
This is really hard to do correctly. The core problem is that we need to
model liveness through the induction PHIs from iteration to iteration in
order to get the correct results, and we need to correctly de-duplicate
the common subgraphs of instructions feeding some subset of the
induction PHIs. All of this can be driven either from a side effect at
some iteration or from the loop values used after the loop finishes.
This patch implements this by storing the forward-propagating analysis
of each instruction in a cache to recall whether it was free and whether
it has become live and thus counted toward the total unroll cost. Then,
at each sink for a value in the loop, we recursively walk back through
every value that feeds the sink, including looping back through the
iterations as needed, until we have marked the entire input graph as
live. Because we cache this, we never visit instructions more than twice
-- once when we analyze them and put them into the cache, and once when
we count their cost towards the unrolled loop. Also, because the cache
is only two bits and because we are dealing with relatively small
iteration counts, we can store all of this very densely in memory to
avoid this from becoming an excessively slow analysis.
The code here is still pretty gross. I would appreciate suggestions
about better ways to factor or split this up, I've stared too long at
the algorithmic side to really have a good sense of what the design
should probably look at.
Also, it might seem like we should do all of this bottom-up, but I think
that is a red herring. Specifically, the simplification power is *much*
greater working top-down. We can forward propagate very effectively,
even across strange and interesting recurrances around the backedge.
Because we use data to propagate, this doesn't cause a state space
explosion. Doing this level of constant folding, etc, would be very
expensive to do bottom-up because it wouldn't be until the last moment
that you could collapse everything. The current solution is essentially
a top-down simplification with a bottom-up cost accounting which seems
to get the best of both worlds. It makes the simplification incremental
and powerful while leaving everything dead until we *know* it is needed.
Finally, a core property of this approach is its *monotonicity*. At all
times, the current UnrolledCost is a conservatively low estimate. This
ensures that we will never early-exit from the analysis due to exceeding
a threshold when if we had continued, the cost would have gone back
below the threshold. These kinds of bugs can cause incredibly hard to
track down random changes to behavior.
We could use a techinque similar (but much simpler) within the inliner
as well to avoid considering speculated code in the inline cost.
Reviewers: chandlerc
Subscribers: sanjoy, mzolotukhin, llvm-commits
Differential Revision: http://reviews.llvm.org/D11758
llvm-svn: 269388
Summary:
Currently we consider such instructions as simplified, which is incorrect,
because if their user isn't simplified, we can't actually simplify them too.
This biases our estimates of profitability: for instance the analyzer expects
much more gains from unrolling memcpy loops than there actually are.
Reviewers: hfinkel, chandlerc
Subscribers: mzolotukhin, llvm-commits
Differential Revision: http://reviews.llvm.org/D17365
llvm-svn: 269387
In verbose mode, we emit a warning if the DWOId of a skeleton CU
mismatches the DWOId of the referenced module. This patch updates the
cached DWOId after a module has been loaded to the DWOId of the module
on disk (instead of storing the DWOId we expected to load). This
allows us to correctly emit the mismatch warning for all subsequent
object files that want to import the same module. This patch also
ensures both warnings are only emitted in verbose mode.
rdar://problem/26214027
llvm-svn: 269383
This change implements the transformation in processInstruction() for the
LDR rt, =expression to MOV rt, expression when the expression can be evaluated
and can fit into the immediate field of the MOV or a MVN.
Across the ARM and Thumb instruction sets there are several cases to consider,
each with a different range of representatble constants.
In ARM we have:
* Modified immediate (All ARM architectures)
* MOVW (v6t2 and above)
In Thumb we have:
* Modified immediate (v6t2, v7m and v8m.mainline)
* MOVW (v6t2, v7m, v8.mainline and v8m.baseline)
* Narrow Thumb MOV that can be used in an IT block (non flag-setting)
If the immediate fits any of the available alternatives then we make the transformation.
Fixes 25722.
Patch by Peter Smith.
llvm-svn: 269354
Alter instances in the test-suite that use immediates that can be represented
in the immediate field of a MOV. The reason for doing this is that when the
LDR rt,=imm transformation to MOV rt, imm the existing tests do not need to
be modified.
Required by the patch that fixes PR25722.
Patch by Peter Smith.
llvm-svn: 269353
I've added the reserved field as an "optional" in YAML, but I've added asserts in the yaml2macho code to enforce that the field is present in mach_header_64, but not in mach_header.
llvm-svn: 269320
Summary:
This expands on r269179 to fix an additional case that was not covered by our
tests. The assembler temporary is not needed when the .cprestore offset fits
inside a simm16 and it is not an error to use it inside a '.set noat' in this
case.
Reviewers: emaste, seanbruno, sdardis
Subscribers: dsanders, sdardis, llvm-commits
Differential Revision: http://reviews.llvm.org/D20199
llvm-svn: 269295
As explained in r269196, microMIPS has a special case that is not correctly
implemented in LLVM. If we have a symbol 'foo' which is equivalent to
'.text+0x10'. The value of an R_MICROMIPS_LO16 relocation using 'foo' is
'foo+0x11' and not 'foo+0x10'. The in-place addend should therefore be 0x11.
This commit reverts a little more of the effect of r268900 by keeping the
symbol when the STO_MIPS_MICROMIPS flag is set for R_MIPS_GPREL32 relocations.
This fixes SingleSource/UnitTests/2003-08-11-VaListArg, and
SingleSource/UnitTests/2003-05-07-VarArgs for microMIPS.
I believe there are additional relocations that have the same issue (e.g.
R_MIPS_64, and R_MIPS_GPREL16) but for now I'm focusing on restoring our
internal buildbots back to the green state we had in r268899.
llvm-svn: 269294
For BITREVERSE, bit shifting/masking every bit in a vector element is a very lengthy procedure.
If the input vector type is a whole multiple of bytes wide then we can split this into a BSWAP shuffle stage (to reverse at the byte level) and then a BITREVERSE stage applied to each byte. Most vector capable targets can efficiently BSWAP using shuffles resulting in a considerable reduction in instructions.
With this patch targets would only need to implement a target specific vXi8 BITREVERSE implementation to efficiently reverse most legal vector types.
Differential Revision: http://reviews.llvm.org/D19978
llvm-svn: 269290
Summary:
This eliminates the default case for N64 that was left out of r269047.
The change to R_MIPS_SUB is needed in this patch to make this testable since
%lo(%neg(%gp_rel(foo))) and %hi(%neg(%gp_rel(foo))) remain the only ways to get
a compound relocation from the assembler.
Reviewers: sdardis, rafael
Subscribers: dsanders, llvm-commits, sdardis
Differential Revision: http://reviews.llvm.org/D20097
llvm-svn: 269280
While promoting nodes in PPCTargetLowering::DAGCombineExtBoolTrunc, it is
possible for one of the nodes to be replaced by another. To make sure we do not
visit the deleted nodes, and to make sure we visit the replacement nodes, use a
list of HandleSDNodes to track the to-be-promoted nodes during the promotion
process.
The same fix has been applied to the analogous code in
PPCTargetLowering::DAGCombineTruncBoolExt.
Fixes PR26985.
llvm-svn: 269272
Shifts beyond the bitwidth are undef but SCCP resolved them to zero.
Instead, DTRT and resolve them to undef.
This reimplements the transform which caused PR27712.
llvm-svn: 269269
The promote alloca pass would attempt to promote an alloca with
a select, icmp, or phi user, even though the other operand was
from a non-promotable source, producing a select on two different
pointer types.
Only do this if we know that both operands derive from the same
alloca. In the future we should be able to relax this to an alloca
which will also be promoted.
llvm-svn: 269265
This new verifier rule lets us unambigously pick a calling convention
when creating a new declaration for
`@llvm.experimental.deoptimize.<ty>`. It is also congruent with our
lowering strategy -- since all calls to `@llvm.experimental.deoptimize`
are lowered to calls to `__llvm_deoptimize`, it is reasonable to enforce
a unique calling convention.
Some of the tests that were breaking this verifier rule have had to be
split up into different .ll files.
The inliner was violating this rule as well, and has been fixed to avoid
producing invalid IR.
llvm-svn: 269261
This is to fix the bug in https://llvm.org/bugs/show_bug.cgi?id=27612.
When spill is hoisted to a BB with landingpad successor, and if the VNI
of the spill reg lives into the landingpad successor, the spill should be
inserted before the call which may throw exception. InsertPointAnalysis
is used to compute the safe insert point.
http://reviews.llvm.org/D20027 is a preparing patch for this patch.
Differential Revision: http://reviews.llvm.org/D19884.
llvm-svn: 269249
For narrow stores (e.g., strb, srth) we know the upper bits of the register are
unused/not useful. In some cases we can use this information to eliminate
unnecessary instructions.
For example, without this patch we generate (from the 2nd test case):
ldr w8, [x0]
and w8, w8, #0xfff0
bfxil w8, w2, #16, #4
strh w8, [x1]
and after the patch the 'and' is removed:
ldr w8, [x0]
bfxil w8, w2, #16, #4
strh w8, [x1]
ret
During the lowering of the bitfield insert instruction the 'and' is eliminated
because we know the upper 16-bits that are masked off are unused and the lower
4-bits that are masked off are overwritten by the insert itself. Therefore, the
'and' is unnecessary.
Differential Revision: http://reviews.llvm.org/D20175
llvm-svn: 269226
SCEVExpander::replaceCongruentIVs assumes the backedge value of an
SCEV-analysable PHI to always be an instruction, when this is not
necessarily true. For now address this by bailing out of the
optimization if the backedge value of the PHI is a non-Instruction.
llvm-svn: 269213
`SCEVExpander::replaceCongruentIVs` bypasses `hoistIVInc` if both the
original and the isomorphic increments are PHI nodes. Doing this can
break SSA if the isomorphic increment is not dominated by the original
increment. Get rid of the bypass, and let `hoistIVInc` do the right
thing.
Fixes PR27232 (compile time crash/hang).
llvm-svn: 269212
... for AddRec's in loops for which SCEV is unable to compute a max
tripcount. This is not a problem for "normal" loops[0] that don't have
guards or assumes, but helps in cases where we have guards or assumes in
the loop that can be used to constrain incoming values over the backedge.
This partially fixes PR27691 (we still don't handle the NUW case).
[0]: for "normal" loops, in the cases where we'd be able to prove
no-wrap via isKnownPredicate, we'd also be able to compute a max
tripcount.
llvm-svn: 269211
Equivalent GEP indices with different types are treated as different
indices altogether, leading to an incorrect AA result. Fix the issue
by comparing indices based on their values.
Thanks to Mikael Holmén for reporting the issue!
Differential Revision: http://reviews.llvm.org/D19935
llvm-svn: 269197
microMIPS has a special case that is not correctly implemented in LLVM. If we
have a symbol 'foo' which is equivalent to '.text+0x10'. The value of an
R_MICROMIPS_LO16 relocation using 'foo' is 'foo+0x11' and not 'foo+0x10'. The
in-place addend should therefore be 0x11.
Work around this by partially reverting the effect of r268900 by keeping the
symbol when the STO_MIPS_MICROMIPS flag is set. This fixes
SingleSource/Regression/C/PR640 for microMIPS.
llvm-svn: 269196
When generating .cfi_offset instructions, make sure that the offset is
calculated with respect to the register used to define the CFA (which is
currently always FP+8).
llvm-svn: 269191
Summary:
r268058 unintentionally made the retrieval of the current assembler temporary
unconditional. This was fine for the existing tests but it broke the cases
where the assembler temporary is not needed (N32/N64 or not PIC) and is
unavailable due to a '.set noat' directive.
This fixes FreeBSD's libc.
Reviewers: emaste, sdardis, seanbruno
Subscribers: dsanders, emaste, sdardis, llvm-commits
Differential Revision: http://reviews.llvm.org/D20093
llvm-svn: 269179
Summary: When emitting comparison for fp16, in addition to promote the LHS and RHS to fp32, we need to change the VT as well.
Reviewers: t.p.northover
Subscribers: t.p.northover, aemerson, rengolin, llvm-commits
Differential Revision: http://reviews.llvm.org/D19922
llvm-svn: 269151
Summary: In sample profile, some branches may have profile missing due to profile inaccuracy. We want existing branch probability still valid after propagation.
Reviewers: hfinkel, davidxl, spatel
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D19948
llvm-svn: 269137
Unlike xN/wN, the size of vN is genuinely ambiguous in the assembly, so we
should try to infer what was intended from the type. But only down to 64-bits
(vN can never represent sN, hN or bN).
llvm-svn: 269132
Sort of the BB-local equivalent to idiom-recognizer: if we have a basic-block
that really implements a memcpy operation, backends can benefit from seeing
this.
llvm-svn: 269125
Before r268509, Clang would disable the loop unroll pass when optimizing
for size. That commit enabled it to be able to support unroll pragmas
in -Os builds. However, this regressed binary size in one of Chromium's
DLLs with ~100 KB.
This restores the original behaviour of no unrolling at -Os, but doing it
in LLVM instead of Clang makes more sense, and also allows the pragmas to
keep working.
Differential revision: http://reviews.llvm.org/D20115
llvm-svn: 269124
This patch extend loopreroll to allow the instruction chain
of loop control only IV has sext.
Differential Revision: http://reviews.llvm.org/D19820
llvm-svn: 269121
This fixes a bug introduced in r267623, where we got smarter and avoided to save
EAX before using it. However, we failed to check if any of the subregister of
EAX were alive and thus, missed cases where we have to save EAX before using it.
The problem may happen on every X86/i386/... platform.
This fixes llvm.org/PR27624
llvm-svn: 269115
Do simplifications common to all shift instructions based on the amount shifted:
1. If the shift amount is known larger than the bitwidth, the result is undefined.
2. If the valid bits of the shift amount are all known to be 0, it's a shift by zero, so the shift operand is the result.
Note that we could generalize the shift-by-zero transform into a shift-by-constant if all of the valid bits in the shift
amount are known, but that would have to be done in InstCombine rather than here because it would mean we need to create
a new shift instruction.
Differential Revision: http://reviews.llvm.org/D19874
llvm-svn: 269114
Added support for extended mnemonics for the following branch instructions and
load/store-on-condition opcodes:
BR, LOCR, LOCGR, LOC, LOCG, STOC, STOCG
Phabricator: http://reviews.llvm.org/D19729
Committing on behalf of Zhan Liau
llvm-svn: 269106
for the same subprogram.
This fixes a bug where DW_AT_abstract_origin is being emitted twice for
the same subprogram if a function is both inlined and emitted in the same
translation unit, by restoring the pre-r266446 behavior.
http://reviews.llvm.org/D20072
llvm-svn: 269103
I'm really not sure why we were in the first place, it's the linker's job to
convert between BL/BLX as necessary. Even worse, using BLX left Thumb calls
that could be locally resolved completely unencodable since all offsets to BLX
are multiples of 4.
rdar://26182344
llvm-svn: 269101