This was originally reported as a bug with the symptom being "cvdump
crashes when printing an LLD-linked PDB that has an S_FILESTATIC record
in it". After some additional investigation, I determined that this was
a symptom of a larger problem, and in fact the real problem was in the
way we emitted the global PDB string table. As evidence of this, you can
take any lld-generated PDB, run cvdump -stringtable on it, and it would
return no results.
My hypothesis was that cvdump could not *find* the string table to begin
with. Normally it would do this by looking in the "named stream map",
finding the string /names, and using its value as the stream index. If
this lookup fails, then cvdump would fail to load the string table.
To test this hypothesis, I looked at the name stream map generated by a
link.exe PDB, and I emitted exactly those bytes into an LLD-generated
PDB. Suddenly, cvdump could read our string table!
This code has always been hacky and we knew there was something we
didn't understand. After all, there were some comments to the effect of
"we have to emit strings in a specific order, otherwise things don't
work". The key to fixing this was finally understanding this.
The way it works is that it makes use of a generic serializable hash map
that maps integers to other integers. In this case, the "key" is the
offset into a buffer, and the value is the stream number. If you index
into the buffer at the offset specified by a given key, you find the
name. The underlying cause of all these problems is that we were using
the identity function for the hash. i.e. if a string's offset in the
buffer was 12, the hash value was 12. Instead, we need to hash the
string *at that offset*. There is an additional catch, in that we have
to compute the hash as a uint32 and then truncate it to uint16.
Making this work is a little bit annoying, because we use the same hash
table in other places as well, and normally just using the identity
function for the hash function is actually what's desired. I'm not
totally happy with the template goo I came up with, but it works in any
case.
The reason we never found this bug through our own testing is because we
were building a /parallel/ hash table (in the form of an
llvm::StringMap<>) and doing all of our lookups and "real" hash table
work against that. I deleted all of that code and now everything goes
through the real hash table. Then, to test it, I added a unit test which
adds 7 strings and queries the associated values. I test every possible
insertion order permutation of these 7 strings, to verify that it really
does work as expected.
Differential Revision: https://reviews.llvm.org/D43326
llvm-svn: 325386
The data type is assumed to be a vector, but sometimes it is not, leading
to an assertion.
Add simple test-case to verify this.
Differential revision: https://reviews.llvm.org/D42599
llvm-svn: 325378
Summary:
This patch extends the promotion of alloca to vector to the arrays of up to 16 elements. Also we introduce
an option, -disable-promote-alloca-to-vector, to switch promotion to vector off, if needed.
Reviewers:
arsenm
Differential Revision:
https://reviews.llvm.org/D33559
llvm-svn: 325372
This seems to interfere with a target independent brcond combine that looks for the (srl (and X, C1), C2) pattern to enable TEST instructions. Once we flip, that combine doesn't fire and we end up exposing it to the X86 specific BT combine which causes us to emit a BT instruction. BT has lower throughput than TEST.
We could try to make the brcond combine aware of the alternate pattern, but since the flip was just a code size reduction and not likely to enable other combines, it seemed easier to just delay it until after lowering.
Differential Revision: https://reviews.llvm.org/D43201
llvm-svn: 325371
Add an explicit check before looking up symbol in SymbolIndices.
This was previously silently succeeding and returning zero for such
unnamed temporaries.
Differential Revision: https://reviews.llvm.org/D43365
llvm-svn: 325367
Summary:
The LazyValueInfo pass caches a copy of the DominatorTree when available.
Whenever there are pending DominatorTree updates within JumpThreading's
DeferredDominance object we cannot use the cached DT for LVI analysis.
This commit adds the new methods enableDT() and disableDT() to LVI.
JumpThreading also sets the appropriate usage model before calling LVI
analysis methods.
Fixes https://bugs.llvm.org/show_bug.cgi?id=36133
Reviewers: sebpop, dberlin, kuhar
Reviewed by: sebpop, kuhar
Subscribers: uabelho, llvm-commits, aprantl, hiraditya, a.elovikov
Differential Revision: https://reviews.llvm.org/D42717
llvm-svn: 325356
Summary:
In the current implementation of GPR Indexing Mode when the index is of non-uniform, the s_set_gpr_idx_off instruction
is incorrectly inserted after the loop. This will lead the instructions with vgpr operands (v_readfirstlane for example) to read incorrect
vgpr.
In this patch, we fix the issue by inserting s_set_gpr_idx_on/off immediately around the interested instruction.
Reviewers:
rampitec
Differential Revision:
https://reviews.llvm.org/D43297
llvm-svn: 325355
Based off the DemandedElts mask the and UNDEF elements returned from the SimplifyDemandedVectorElts calls to the shuffle operands, we can attempt to simplify the shuffle mask.
I had to be very conservative here as accepting post-legalized shuffle masks could cause problems for targets that legalize UNDEF mask elements back to inrange values (PowerPC), similarly combining to identity shuffle masks could cause too much UNDEF information to disappear for later combines.
llvm-svn: 325354
Running a bootstrap build with UBSan produces a number of instances where
we have signed integer overflow due to this transform. Change the type to
long to prevent this UB on 64-bit build machines.
llvm-svn: 325347
These instructions conflict with their full length variants
for the purposes of FastISel as they cannot be distingushed
based on the number and type of operands and predicates.
Reviewers: atanasyan
Differential Revision: https://reviews.llvm.org/D41285
llvm-svn: 325341
Now that we have the new TBAA metadata format that is capable of
representing accesses to aggregates, we can propagate TBAA access
tags from memory setting and transferring intrinsics to load and
store instructions and vice versa.
Since SROA produces lots of new loads and stores on optimized
builds, this change significantly decreases the share of
undecorated memory accesses on such builds.
Differential Revision: https://reviews.llvm.org/D41563
llvm-svn: 325329
Enable multiple COPY hints to eliminate more COPYs during register allocation.
Note that this is something all targets should do, see
https://reviews.llvm.org/D38128.
Review: Eli Friedman
llvm-svn: 325327
Summary:
Currently when expanding a SETCC node into a SELECT_CC, LLVM uses
an incorrect type for determining BooleanContent of the result. This
patch fixes the issue.
Fixes PR36079.
Reviewers: rogfer01, javed.absar, efriedma
Reviewed By: efriedma
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D43282
llvm-svn: 325325
This patch combines some cases of ARMISD::CMOV for integers that arise in comparisons of the form
a != b ? x : 0
a == b ? 0 : x
and that currently (e.g. in Thumb1) are emitted as branches.
Differential Revision: https://reviews.llvm.org/D34515
llvm-svn: 325323
Summary: extractBits assumes that `!this->isSingleWord() implies !Result.isSingleWord()`, which may not necessarily be true. Handle both cases.
Reviewers: RKSimon
Subscribers: sanjoy, llvm-commits, hiraditya
Differential Revision: https://reviews.llvm.org/D43363
llvm-svn: 325311
In r325063, we salvaged debug values from dying instructions in
GVN::processBlock() and GVN::performScalarPRE().
The change in performScalarPRE(), while correct, is unhelpful. It
introduced a call to salvageDebugInfo() which was immediately followed
by a RAUW, meaning it prevented the RAUW from efficiently updating
dbg.value intrinsics. This commit reverts the mistake and tightens up
the affected test case.
llvm-svn: 325308
This results in small increases in the size of the .debug_loc section
and the number of unique source variables in a stage2 build of opt.
llvm-svn: 325301
Introduce the LLVM_ENABLE_PDB option so that users can request them
explicitly instead.
Add /OPT:REF and /OPT:ICF back, which /DEBUG disables by default.
Differential Revision: https://reviews.llvm.org/D43156
llvm-svn: 325296
Summary:
This patch makes the decoder understand old AMD 3DNow!
instructions that have never been properly supported in the X86
disassembler, despite being supported in other subsystems. Hopefully
this should make the X86 decoder more complete with respect to binaries
containing legacy code.
Reviewers: craig.topper
Reviewed By: craig.topper
Subscribers: llvm-commits, maksfb, bruno
Differential Revision: https://reviews.llvm.org/D43311
llvm-svn: 325295
We already do this for 64-bit when it won't fit into a 64-bit AND/TEST's immediate field. This adds an additional qualifier to do it for any single bit constant larger than 8-bits under optsize
Differential Revision: https://reviews.llvm.org/D43346
llvm-svn: 325290
Same for the sign extend case.
Currently we check for multiple uses on the binop. Then we call ExtendUsesToFormExtLoad to capture SetCCs that use the load. So we only end up finding any setccs when the and has additional uses and the load is used by a setcc. I don't think the and having multiple uses is relevant here. I think we should only be checking for the load having multiple uses.
This changes an NVPTX test because we now find that the load has a second use by a truncate, but ExtendUsesToFormExtLoad only looks at setccs it can extend. All other operations just check isTruncateFree. Maybe we should allow widening of an existing truncate even if its not free?
Differential Revision: https://reviews.llvm.org/D43063
llvm-svn: 325289
We can't fold a large immediate into a 64-bit operation. But if we know we're only operating on a single bit we can use the bit instructions.
For now only do this for optsize.
Differential Revision: https://reviews.llvm.org/D37418
llvm-svn: 325287
Summary:
The behavior described in Coroutines TS `[dcl.fct.def.coroutine]/7`
allows coroutine parameters to be passed into allocator functions.
The instructions to store values into the alloca'd parameters must not
be moved past the frame allocation, otherwise uninitialized values are
passed to the allocator.
Test Plan: `check-llvm`
Reviewers: rsmith, GorNishanov, eric_niebler
Reviewed By: GorNishanov
Subscribers: compnerd, EricWF, llvm-commits
Differential Revision: https://reviews.llvm.org/D43000
llvm-svn: 325285
There is a latent Windows kernel bug, the exact trigger
conditions are not well understood, which can cause a file
to be correctly written, but unable to be correctly read.
The workaround appears to be simply calling FlushFileBuffers.
Differential Revision: https://reviews.llvm.org/D42925
llvm-svn: 325274
Use the FunctionType of the callee when it's available. It may not be
available for synthetic calls to functions specified by external symbols.
llvm-svn: 325269
The reference '&' is missing in the function parameter. If there are
back-to-back optimizations in terms of dag node list like below:
t29: i64,ch = load<LD4[bitcast (%struct.test_t* @test.t to i8*)+12](dereferenceable), zext from i32> t3, t43, undef:i64
t34: i64,ch = load<LD4[bitcast (%struct.test_t* @test.t to i8*)](dereferenceable), zext from i32> t3, t41, undef:i64
The bug will trigger a segfault for the added test case remove_truncate_5.ll:
LLVMSymbolizer: error reading file: No such file or directory
#0 0x000000000241c4d9 (llc+0x241c4d9)
#1 0x000000000241c56a (llc+0x241c56a)
#2 0x000000000241aa50 (llc+0x241aa50)
...
#22 0x0000000000fd5edf (llc+0xfd5edf)
#23 0x00007f0fe03bec05 __libc_start_main (/lib64/libc.so.6+0x21c05)
#24 0x0000000000fd3e69 (llc+0xfd3e69)
...
Segmentation fault
Signed-off-by: Yonghong Song <yhs@fb.com>
llvm-svn: 325267
The FunctionType of the callee is always available, even if the Function
of the callee is not. Use that to get the number of fixed parameters.
llvm-svn: 325259
The variable name 'AllowReassociate' is a lie at this point because
it's set to 'isFast()' which is more than the 'reassoc' FMF after
rL317488.
In D41286, we showed that this transform may be valid even with strict
math by brute force checking every 32-bit float result.
There's a potential problem here because we're replacing with a tan()
libcall rather than a hypothetical LLVM tan intrinsic. So we might
set errno when we should be guaranteed not to do that. But that's
independent of this change.
llvm-svn: 325247
Summary:
In LLVM, 't' selects a floating-point/SIMD register and only supports
32-bit values. This is appropriately documented in the LLVM Language
Reference Manual. However, this behaviour diverges from that of GCC, where
't' selects the s0-s31 registers and its qX and dX variants depending on
additional operand modifiers (q/P).
For example, the following C code:
#include <arm_neon.h>
float32x4_t a, b, x;
asm("vadd.f32 %0, %1, %2" : "=t" (x) : "t" (a), "t" (b))
results in the following assembly if compiled with GCC:
vadd.f32 s0, s0, s1
whereas LLVM will show "error: couldn't allocate output register for
constraint 't'", since a, b, x are 128-bit variables, not 32-bit.
This patch extends the use of 't' to mean that of GCC, thus allowing
selection of the lower Q vector regs and their D/S variants. For example,
the earlier code will now compile as:
vadd.f32 q0, q0, q1
This behaviour still differs from that of GCC but I think it is actually
more correct, since LLVM picks up the right register type based on the
datatype of x, while GCC would need an extra operand modifier to achieve
the same result, as follows:
asm("vadd.f32 %q0, %q1, %q2" : "=t" (x) : "t" (a), "t" (b))
Since this is only an extension of functionality, existing code should not
be affected by this change. Note that operand modifiers q/P are already
supported by LLVM, so this patch should suffice to support inline
assembly with constraint 't' originally built for GCC.
Reviewers: grosbach, rengolin
Reviewed By: rengolin
Subscribers: rogfer01, efriedma, olista01, aemerson, javed.absar, eraman, kristof.beyls, llvm-commits
Differential Revision: https://reviews.llvm.org/D42962
llvm-svn: 325244
We can use PACKSS to saturate each stage of the chain: PACKSSDW down to [-32768,32767] and then PACKSSWB to [-128,127].
PACKUS is a little trickier and will be handled in a separate patch.
llvm-svn: 325235
This is mainly a move of simplifyShuffleOperands from DAGCombiner::visitVECTOR_SHUFFLE to create a more general purpose TargetLowering::SimplifyDemandedVectorElts implementation.
Further features can be moved/added in future patches.
Differential Revision: https://reviews.llvm.org/D42896
llvm-svn: 325232
Analysis of fails in the case of out of memory errors can be tricky on
Windows. Such error emerges at the point where memory allocation function
fails, but manifests itself when null pointer is used. These two points
may be distant from each other. Besides, next runs may not exhibit
allocation error.
Usual programming practice does not require checking result of 'operator
new' because it throws 'std::bad_alloc' in the case of allocation error.
However, LLVM is usually built with exceptions turned off, so 'new' can
return null pointer. This change installs custom new handler, which causes
fatal error in the case of out of memory. The handler is installed
automatically prior to call to 'main' during construction of a static
object defined in 'lib/Support/ErrorHandling.cpp'. If the application does
not use this file, the handler may be installed manually by a call to
'llvm::install_out_of_memory_new_handler', declared in
'include/llvm/Support/ErrorHandling.h".
There are calls to C allocation functions, malloc, calloc and realloc.
They are used for interoperability with C code, when allocated object has
variable size and when it is necessary to avoid call of constructors. In
many calls the result is not checked against null pointer. To simplify
checks, new functions are defined in the namespace 'llvm' with the
same names as these C function. These functions produce fatal error if
allocation fails. User should use 'llvm::malloc' instead of 'std::malloc'
in order to use the safe variant. This change replaces 'std::malloc'
in the cases when the result of allocation function is not checked against
null pointer.
Finally, there are plain C code, that uses malloc and similar functions. If
the result is not checked, assert statements are added.
Differential Revision: https://reviews.llvm.org/D43010
llvm-svn: 325224
There is a more powerful but still simple function `isKnownViaSimpleReasoning ` that
does constant range check and few more additional checks. We use it some places (e.g.
when proving implications) and in some other places we only check constant ranges.
Currently, indvar simplifier fails to remove the check in following loop:
int inc = ...;
for (int i = inc, j = inc - 1; i < 200; ++i, ++j)
if (i > j) { ... }
This patch replaces all usages of `isKnownPredicateViaConstantRanges` with
`isKnownViaSimpleReasoning` to have smarter proofs. In particular, it fixes the
case above.
Reviewed-By: sanjoy
Differential Revision: https://reviews.llvm.org/D43175
llvm-svn: 325214
Some ELF files produced by lld may have zero-size segment placeholders as shown
below. Since GNU_STACK Offset is 0, the current code makes it the lowest used
offset, and relocates all the segments over the ELF header. The resulting
binary is total garbage.
This change fixes how llvm-objcopy handles PT_PHDR properlly by treating ELF
headers and the program header table as segments to allow the layout algorithm
decide where those should go.
Author: vit9696
Differential Revision: https://reviews.llvm.org/D42872
llvm-svn: 325189
Summary:
TypeID summaries are used by CFI and need to be serialized by ThinLTO
indexing for later use by LTO Backend.
Reviewers: tejohnson, pcc
Subscribers: mehdi_amini, inglorion, eraman, hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D42611
llvm-svn: 325182
The bound instruction does not have reversed operands in gas.
Fixes PR27653.
Patch by Maya Madhavan.
Differential Revision: https://reviews.llvm.org/D43243
llvm-svn: 325178
While there, change a bunch of helper functions to take references to
avoid adding calls to get().
This should conclude the bugpoint yak shaving.
llvm-svn: 325177
Summary:
This patch adds templated functions to MachineIRBuilder for some opcodes
and adds pattern matcher support for G_AND and G_OR.
Reviewers: aditya_nandakumar
Reviewed By: aditya_nandakumar
Subscribers: rovka, kristof.beyls, llvm-commits
Differential Revision: https://reviews.llvm.org/D43309
llvm-svn: 325162
Previously we only invalidated the pressure set limit cached when the TargetRegisterInfo pointer changes. But as reserved regs and callee saved regs are used as part of calculating the limits we should invalidate when those change too.
I encountered this when reverting a patch from the 6.0 branch. One of the x86 test files had a function that used rbp as a frame pointer, making it reserved. It was followed by another function which didn't use rbp but had the same TRI so the pressure set limit cache was not invalidated. If i removed the function that used rbp as a frame pointer from the file, the remaining function then got a different register pressure limit for the GR16 pressure set. This caused the machine scheduler to change the scheduling for the function. This was an unexpected change from just deleting a function.
I don't have a test case for trunk because the particular x86 test case is different enough from the 6.0 branch to not be affected now.
Differential Revision: https://reviews.llvm.org/D43274
llvm-svn: 325153
Move computeLoopSafetyInfo, defined in Transforms/Utils/LoopUtils.h,
into the corresponding LoopUtils.cpp, as opposed to LICM where it resides
at the moment. This will allow other functions from Transforms/Utils
to reference it.
llvm-svn: 325151
Try to keep PACK*SDW/PACK*SWB as wide as possible, this helps ComputeNumSignBits as it can only peek through bitcasts to wider types, pre-AVX2 codegen was already doing this as it could peek through bitcasts/subvectors more easily than AVX2 could through shuffles.
This shouldn't affect existing results as calls to truncateVectorWithPACK ensure we have enough sign bits to pack to the same value, but it should make it possible to use truncateVectorWithPACK chains to perform saturation in combineTruncateWithSat with a future patch.
llvm-svn: 325149
The select may have been preventing a division by zero or INT_MIN/-1 so removing it might not be safe.
Fixes PR36362.
Differential Revision: https://reviews.llvm.org/D43276
llvm-svn: 325148
Kernel arguments likely read by all workitems and should not bypass
cache. Fixes performance hit in sub-dword argument loads.
Differential Revision: https://reviews.llvm.org/D43249
llvm-svn: 325146
The prologue-end line record must be emitted after the last
instruction that is part of the function frame setup code and before
the instruction that marks the beginning of the function body.
Patch by Carlos Alberto Enciso!
Differential Revision: https://reviews.llvm.org/D41762
llvm-svn: 325143
This keeps with our current usage of 'match' and is easier to see that
the optional NSW only applies in the non-constant operand case.
llvm-svn: 325140
So that macros defined in inline assembly blocks are available to the
whole file.
This provides a consistent behavior with other assembly directives,
since equations for example are already preserved between inline
assembly blocks.
PR: 36110
Patch by Roger!
llvm-svn: 325139
When creating high MachineMemOperand for MSTORE/MLOAD we supply
it with the original PointerInfo, while the pointer itself had been incremented.
The patch adds the proper offset to the PointerInfo.
llvm-svn: 325135
Summary:
Reversed loads are handled as gathering. But we can just reshuffle
these values. Patch adds support for vectorization of reversed loads.
Reviewers: RKSimon, spatel, mkuper, hfinkel
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D43022
llvm-svn: 325134
If a load follows a store and reloads data that the store has written to memory, Intel microarchitectures can in many cases forward the data directly from the store to the load, This "store forwarding" saves cycles by enabling the load to directly obtain the data instead of accessing the data from cache or memory.
A "store forward block" occurs in cases that a store cannot be forwarded to the load. The most typical case of store forward block on Intel Core microarchiticutre that a small store cannot be forwarded to a large load.
The estimated penalty for a store forward block is ~13 cycles.
This pass tries to recognize and handle cases where "store forward block" is created by the compiler when lowering memcpy calls to a sequence
of a load and a store.
The pass currently only handles cases where memcpy is lowered to XMM/YMM registers, it tries to break the memcpy into smaller copies.
breaking the memcpy should be possible since there is no atomicity guarantee for loads and stores to XMM/YMM.
Change-Id: Ic41aa9ade6512e0478db66e07e2fde41b4fb35f9
llvm-svn: 325128
While the AVX512 VTRUNCS/VTRUNCUS instructions require legal types, truncateVectorWithPACK handles cases with multiples of legal types through splitting/concatenation. So we just need to ensure that the src/dst scalar types are correct and leave truncateVectorWithPACK to handle the rest of it.
llvm-svn: 325127
For basic blocks with instructions between the beginning of the block
and a call we have to duplicate the instructions before the call in all
split blocks and add PHI nodes for uses of the duplicated instructions
after the call.
Currently, the threshold for the number of instructions before a call
is quite low, to keep the impact on binary size low.
Reviewers: junbuml, mcrosier, davidxl, davide
Reviewed By: junbuml
Differential Revision: https://reviews.llvm.org/D41860
llvm-svn: 325126
We can use incremental dominator tree updates to avoid re-calculating
the dominator tree after interchanging 2 loops.
Reviewers: dmgreen, kuhar
Reviewed By: kuhar
Differential Revision: https://reviews.llvm.org/D43176
llvm-svn: 325122
Commit https://reviews.llvm.org/rL324489 added
EXPECT_EQ(false, N->isUnsigned());
which older GCC versions dislike for some reason. Anyway, it looks like the
proper GTest way is to use EXPECT_FALSE, etc.
Differential Revision: https://reviews.llvm.org/D43233
llvm-svn: 325121
Preserve debug info from a dead 'and' instruction with a constant.
Patch by Djordje Todorovic.
Differential Revision: https://reviews.llvm.org/D43163
llvm-svn: 325119
Summary:
This patch implements a variant of the DJB hash function which folds the
input according to the algorithm in the Dwarf 5 specification (Section
6.1.1.4.5), which in turn references the Unicode Standard (Section 5.18,
"Case Mappings").
To achieve this, I have added a llvm::sys::unicode::foldCharSimple
function, which performs this mapping. The implementation of this
function was generated from the CaseMatching.txt file from the Unicode
spec using a python script (which is also included in this patch). The
script tries to optimize the function by coalescing adjecant mappings
with the same shift and stride (terms I made up). Theoretically, it
could be made a bit smarter and merge adjecant blocks that were
interrupted by only one or two characters with exceptional mapping, but
this would save only a couple of branches, while it would greatly
complicate the implementation, so I deemed it was not worth it.
Since we assume that the vast majority of the input characters will be
US-ASCII, the folding hash function has a fast-path for handling these,
and only whips out the full decode+fold+encode logic if we encounter a
character outside of this range. It might be possible to implement the
folding directly on utf8 sequences, but this would also bring a lot of
complexity for the few cases where we will actually need to process
non-ascii characters.
Reviewers: JDevlieghere, aprantl, probinson, dblaikie
Subscribers: mgorny, hintonda, echristo, clayborg, vleschuk, llvm-commits
Differential Revision: https://reviews.llvm.org/D42740
llvm-svn: 325107
Making a width of GEP Index, which is used for address calculation, to be one of the pointer properties in the Data Layout.
p[address space]:size:memory_size:alignment:pref_alignment:index_size_in_bits.
The index size parameter is optional, if not specified, it is equal to the pointer size.
Till now, the InstCombiner normalized GEPs and extended the Index operand to the pointer width.
It works fine if you can convert pointer to integer for address calculation and all registered targets do this.
But some ISAs have very restricted instruction set for the pointer calculation. During discussions were desided to retrieve information for GEP index from the Data Layout.
http://lists.llvm.org/pipermail/llvm-dev/2018-January/120416.html
I added an interface to the Data Layout and I changed the InstCombiner and some other passes to take the Index width into account.
This change does not affect any in-tree target. I added tests to cover data layouts with explicitly specified index size.
Differential Revision: https://reviews.llvm.org/D42123
llvm-svn: 325102
The change implements constructor of DisassembleInfo to avoid duplication
of initialization code and gets rid of malloc/free where possible.
Differential Revision: https://reviews.llvm.org/D43003
llvm-svn: 325098
Summary:
It's just cleanup after r323818 to avoid irrelevant error message inside the test.
Existing version of test passed but generated unrelated error report about
symbol redefinition.
llvm-svn: 325080
* Document most API's
* Delete a useless function call
* Fix a discrepancy between the single and multi-opcode variants of
getActionDefinitions().
The multi-opcode variant now requires that more than one opcode is requested.
Previously it acted much like the single-opcode form but unnecessarily
enforced the requirements of the multi-opcode form.
llvm-svn: 325067
This preserves an additional 581 unique source variables in a stage2
build of clang (according to `llvm-dwarfdump --statistics`). It
increases the size of the .debug_loc section by 0.1% (or 87139 bytes).
Differential Revision: https://reviews.llvm.org/D43255
llvm-svn: 325063
This replaces the bit-tracking based fold that did the same thing,
but it only worked for scalars and not directly.
There is no evidence in existing regression tests that the greater
power of bit-tracking was needed here, but we should be aware of
this potential loss of optimization.
llvm-svn: 325062
The scalar folds are done indirectly and use potentially
expensive value tracking calls. That can be improved
along with the enhancement to support vector types.
llvm-svn: 325051
Summary:
Instead of solving the hard problem of how to pass the callee to the indirect
jump thunk without a register, just use a CSR. At a call boundary, there's
nothing stopping us from using a CSR to hold the callee as long as we save and
restore it in the prologue.
Also, add tests for this mregparm=3 case. I wrote execution tests for
__llvm_retpoline_push, but they never got committed as lit tests, either
because I never rewrote them or because they got lost in merge conflicts.
Reviewers: chandlerc, dwmw2
Subscribers: javed.absar, kristof.beyls, hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D43214
llvm-svn: 325049
This is both a functional improvement for vectors and an
efficiency improvement for scalars. The existing code below
the new folds does the same thing for scalars, but in an
indirect and expensive way.
llvm-svn: 325048
Summary:
This protects calls to longjmp from transferring control to arbitrary
program points. Instead, longjmp calls are limited to the set of
registered setjmp return addresses.
This also implements /guard:nolongjmp to allow users to link in object
files that call setjmp that weren't compiled with /guard:cf. In this
case, the linker will approximate the set of address taken functions,
but it will leave longjmp unprotected.
I used the following program to test, compiling it with different -guard
flags:
$ cl -c t.c -guard:cf
$ lld-link t.obj -guard:cf
#include <setjmp.h>
#include <stdio.h>
jmp_buf buf;
void g() {
printf("before longjmp\n");
fflush(stdout);
longjmp(buf, 1);
}
void f() {
if (setjmp(buf)) {
printf("setjmp returned non-zero\n");
return;
}
g();
}
int main() {
f();
printf("hello world\n");
}
In particular, the program aborts when the code is compiled *without*
-guard:cf and linked with -guard:cf. That indicates that longjmps are
protected.
Reviewers: ruiu, inglorion, amccarth
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D43217
llvm-svn: 325047
Also make a drive-by-fix of a bug in the subregister scan code that
only triggers with an incomplete or otherwise very irregular machine
description.
rdar://problem/37404493
This re-applies r324972 with an early exit in the case of a complete
failure to make this commit NFC again as intended.
llvm-svn: 325041
The InstCombine integer mul test file had tests that belong in InstSimplify
(including fmul tests). Move things to where they belong and auto-generate
complete checks for everything.
llvm-svn: 325037
If a function doesn't have an exact definition, don't apply debugify
metadata as it triggers a DIVerifier failure.
The issue is that it's invalid to have DILocations inside a DISubprogram
which isn't a definition ("scope points into the type hierarchy!").
llvm-svn: 325036
According to `llvm-dwarfdump --statistics` this salvages 43 additional
unique source variables in a stage2 build of clang. It increases the
size of the .debug_loc section by 0.002% (or 2864 bytes).
Differential Revision: https://reviews.llvm.org/D43220
llvm-svn: 325035
It caused "Cannot select: t33: f64 = AArch64ISD::FMOV Constant:i32<0>"
in Chromium builds. See PR36369.
> Get rid of icky goto loops and make the code easier to maintain (NFC).
>
> Differential revision: https://reviews.llvm.org/D42723
llvm-svn: 325034
Summary:
If the and has an additional use we shouldn't invert it. That creates an additional instruction.
While there add a one use check to the transform above that looked similar.
Reviewers: spatel, RKSimon
Reviewed By: RKSimon
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D43225
llvm-svn: 325019
Change ARMConstantIslandPass to:
- accept f16 literals as litpool entries,
- if the litpool needs to be inserted in the middle of a big block, then we
need to 4-byte align the next instruction in ARM mode.
Differential Revision: https://reviews.llvm.org/D42784
llvm-svn: 325012
The bug has been lying dormant, but apparently was never exposed, until
after rL324941 because we didn't return the correct result
for shifts with undef operands.
llvm-svn: 325010
Due to a typo in D41565, mutable TBAA tags created with
createMutableTBAAAccessTag() lose their base types. This patch
fixes that typo and updates tests respectively.
Differential Revision: https://reviews.llvm.org/D42364
llvm-svn: 325008
For basic blocks with instructions between the beginning of the block
and a call we have to duplicate the instructions before the call in all
split blocks and add PHI nodes for uses of the duplicated instructions
after the call.
Currently, the threshold for the number of instructions before a call
is quite low, to keep the impact on binary size low.
Reviewers: junbuml, mcrosier, davidxl, davide
Reviewed By: junbuml
Differential Revision: https://reviews.llvm.org/D41860
llvm-svn: 325001
In cases where the OuterMostLoopLatchBI only has a single successor,
accessing the second successor will fail.
This fixes a failure when building the test-suite with loop-interchange
enabled.
Reviewers: mcrosier, karthikthecool, davide
Reviewed by: karthikthecool
Differential Revision: https://reviews.llvm.org/D42906
llvm-svn: 324994
We already try to salvage debug values from no-op bitcasts and inttoptr
instructions: we should handle ptrtoint instructions as well.
This saves an additional 24,444 debug values in a stage2 build of clang,
and (according to llvm-dwarfdump --statistics) provides an additional
289 unique source variables.
llvm-svn: 324982
Seeing some inlining missing in internal uses of symbolizer. I'll work
on a reproduction, tests, improvements & recommit as soon as possible.
(Chandler would like it to be known that this improvement did make
check-llvm 4x faster... - so there's certainly some fairly good
motivation to push on fixing/figuring this out & getting it back in)
This reverts commit r321345.
llvm-svn: 324981
Here are the number of additional debug values salvaged in a stage2
build of clang:
63 SALVAGE: MUL
1250 SALVAGE: SDIV
(No values were salvaged from `srem` instructions in this experiment,
but it's a simple case to handle so we might as well.)
llvm-svn: 324976
Here are the number of additional debug values salvaged in a stage2
build of clang:
1912 SALVAGE: ASHR
405 SALVAGE: LSHR
249 SALVAGE: SHL
llvm-svn: 324975
Also make a drive-by-fix of a bug in the subregister scan code that
only triggers with an incomplete or otherwise very irregular machine
description.
rdar://problem/37404493
llvm-svn: 324972
These intrinsic folds were added with D41381, but only allowed with isFast().
That's more than necessary because FMF has 'reassoc' to apply to these
kinds of folds after D39304, and that's all we need in these cases.
Differential Revision: https://reviews.llvm.org/D43160
llvm-svn: 324967
The diff to use 'reassoc' is part of D43160; it should not have
been made with rL324961. Reverting that part here, so we'll
see the intended diff with the code change.
llvm-svn: 324963
Summary:
This patch makes postdominators always recalculate the tree when an update causes to change the tree roots.
As @dmgreen noticed in [[ https://reviews.llvm.org/D41298 | D41298 ]], the previous implementation was not conservative enough and it was possible to end up with a PostDomTree that was different than a freshly computed one.
The patch also compares postdominators with a freshly computed tree at the end of full verification to make sure we don't hit similar issues in the future.
This should (ideally) be also backported to 6.0 before the release, although I don't have any reports of this causing an observable error. It should be safe to do it even if it's late in the release, as the change only makes the current behavior more conservative.
Reviewers: dmgreen, dberlin, davide, brzycki, grosser
Reviewed By: brzycki, grosser
Subscribers: llvm-commits, dmgreen
Differential Revision: https://reviews.llvm.org/D43140
llvm-svn: 324962
Some tests didn't add much value because we already show stronger
constraints for the folds in other tests, so the weaker versions
were deleted.
Moved the remaining tests into 1 file because the folds are
very similar and handled from 1 place in the code.
llvm-svn: 324961
Summary:
This change is part of step five in the series of changes to remove alignment argument from
memcpy/memmove/memset in favour of alignment attributes. In particular, this changes the
InstCombine pass to cease using the deprecated MemoryIntrinsic::getAlignment() method, and
instead we use the separate getSourceAlignment and getDestAlignment APIs to simplify
the source and destination alignment attributes separately.
Steps:
Step 1) Remove alignment parameter and create alignment parameter attributes for
memcpy/memmove/memset. ( rL322965, rC322964, rL322963 )
Step 2) Expand the IRBuilder API to allow creation of memcpy/memmove with differing
source and dest alignments. ( rL323597 )
Step 3) Update Clang to use the new IRBuilder API. ( rC323617 )
Step 4) Update Polly to use the new IRBuilder API. ( rL323618 )
Step 5) Update LLVM passes that create memcpy/memmove calls to use the new IRBuilder API,
and those that use use MemIntrinsicInst::[get|set]Alignment() to use [get|set]DestAlignment()
and [get|set]SourceAlignment() instead. ( rL323886, rL323891, rL324148, rL324273, rL324278,
rL324384, rL324395, rL324402, rL324626, rL324642, rL324653, rL324654, rL324773, rL324774,
rL324781, rL324784, rL324955 )
Step 6) Remove the single-alignment IRBuilder API for memcpy/memmove, and the
MemIntrinsicInst::[get|set]Alignment() methods.
Reference
http://lists.llvm.org/pipermail/llvm-dev/2015-August/089384.htmlhttp://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20151109/312083.html
Reviewers: majnemer, bollu, efriedma
Reviewed By: efriedma
Subscribers: efriedma, llvm-commits
Differential Revision: https://reviews.llvm.org/D42871
llvm-svn: 324960
Summary:
This change is part of step five in the series of changes to remove alignment argument from
memcpy/memmove/memset in favour of alignment attributes. In particular, this changes the
creation of memcpys in the SafeStack pass to set the alignment of the destination object to
its stack alignment while separately setting the source byval arguments alignment to its
alignment.
Steps:
Step 1) Remove alignment parameter and create alignment parameter attributes for
memcpy/memmove/memset. ( rL322965, rC322964, rL322963 )
Step 2) Expand the IRBuilder API to allow creation of memcpy/memmove with differing
source and dest alignments. ( rL323597 )
Step 3) Update Clang to use the new IRBuilder API. ( rC323617 )
Step 4) Update Polly to use the new IRBuilder API. ( rL323618 )
Step 5) Update LLVM passes that create memcpy/memmove calls to use the new IRBuilder API,
and those that use use MemIntrinsicInst::[get|set]Alignment() to use [get|set]DestAlignment()
and [get|set]SourceAlignment() instead. (rL323886, rL323891, rL324148, rL324273, rL324278,
rL324384, rL324395, rL324402, rL324626, rL324642, rL324653, rL324654, rL324773, rL324774,
rL324781, rL324784 )
Step 6) Remove the single-alignment IRBuilder API for memcpy/memmove, and the
MemIntrinsicInst::[get|set]Alignment() methods.
Reference
http://lists.llvm.org/pipermail/llvm-dev/2015-August/089384.htmlhttp://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20151109/312083.html
Reviewers: eugenis, bollu
Reviewed By: eugenis
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D42710
llvm-svn: 324955
Summary:
wasm32-unknown-unknown-elf has MCSymbols that are not MCSymbolWasms, so
we need a non-asserting cast here.
Reviewers: dschuff, sunfish
Subscribers: jfb, sbc100, aheejin, llvm-commits
Differential Revision: https://reviews.llvm.org/D43205
llvm-svn: 324942
This started by noticing that scalar and vector types were producing different results with div ops in PR36305:
https://bugs.llvm.org/show_bug.cgi?id=36305
...but the problem is bigger. I couldn't keep it straight without a table, so I'm attaching that as a PDF to
the review. The x86 tests in undef-ops.ll correspond to that table.
Green means that instsimplify and the DAG agree on the result for all types.
Red means the DAG was returning undef when IR was not.
Yellow means the DAG was returning a non-undef result when IR returned undef.
This patch assumes that we're currently doing the right thing in IR.
Note: I couldn't find any problems with lowering vector constants as the code comments were warning,
but those comments were written long ago in rL36413 .
Differential Revision: https://reviews.llvm.org/D43141
llvm-svn: 324941
If merging them, the dllexport attribute needs to be brought along
to the new GlobalAlias.
Differential Revision: https://reviews.llvm.org/D43192
llvm-svn: 324937
It caused assertion failure
Assertion failed: (!DD.IsLambda && !MergeDD.IsLambda && "faked up lambda definition?"), function MergeDefinitionData, file /Users/buildslave/jenkins/workspace/clang-stage1-configure-RA/llvm/tools/clang/lib/Serialization/ASTReaderDecl.cpp, line 1675.
on the second stage build bots.
llvm-svn: 324932
Rather than encode the absence of a checksum with a Kind variant, instead put
both the kind and value in a struct and wrap it in an Optional.
Differential Revision: http://reviews.llvm.org/D43043
llvm-svn: 324928
This is similar to the instsimplify fold added with D42385
( rL323716 )
...but this can't be in instsimplify because we're creating/morphing
a different instruction.
llvm-svn: 324927
Update BlockColors after splitting predecessors. Do not allow splitting
EHPad for sinking when the BlockColors is not empty, so we can
simply assign predecessor's color to the new block.
Fixes PR36184
llvm-svn: 324916
Expand existing SchedRW to encompass these like it did for the other memory offset movs - added comments to closing braces to keep track of def scopes.
We only tagged it with the itinerary class, so completeness checks were erroneously passed (PR35639).
llvm-svn: 324910
Armv8.1-A added an atomic load-clear instruction (which performs bitwise
and with the complement of it's operand), but not a load-and
instruction. Our current code-generation for atomic load-and always
inserts an MVN instruction to invert its argument, even if it could be
folded into a constant or another instruction.
This adds lowering early in selection DAG to convert a load-and
operation into an xor with -1 and a load-clear, allowing the normal DAG
optimisations to work on it.
To do this, I've had to add a new ISD opcode, ATOMIC_LOAD_CLR. I don't
see any easy way to do this with an AArch64-specific ISD node, because
the code-generation for atomic operations assumes the SDNodes are of
type AtomicSDNode.
I've left the old tablegen patterns in because they are still needed for
global isel.
Differential revision: https://reviews.llvm.org/D42478
llvm-svn: 324908
Tag AVX512 variants to match SSE/AVX originals.
We only tagged it with the itinerary class, so completeness checks were erroneously passed (PR35639).
llvm-svn: 324901
We only tagged it with the itinerary class, so completeness checks were erroneously passed (PR35639).
AMD targets can perform these a lot quicker than WriteMicrocoded so will need an override in the models.
llvm-svn: 324897
Summary:
For better vectorization result we should take into consideration the
cost of the user insertelement instructions when we try to
vectorize sequences that build the whole vector. I.e. if we have the
following scalar code:
```
<Scalar code>
insertelement <ScalarCode>, ...
```
we should consider the cost of the last `insertelement ` instructions as
the cost of the scalar code.
Reviewers: RKSimon, spatel, hfinkel, mkuper
Subscribers: javed.absar, llvm-commits
Differential Revision: https://reviews.llvm.org/D42657
llvm-svn: 324893
Armv8.1-A added an atomic load-add instruction, but not a load-subtract
instruction. Our current code-generation for atomic load-subtract always
inserts a NEG instruction to negate it's argument, even if it could be
folded into a constant or another instruction.
This adds lowering early in selection DAG to convert a load-subtract
operation into a subtract and a load-add, allowing the normal DAG
optimisations to work on it.
I've left the old tablegen patterns in because they are still needed for
global isel.
Some of the tests in this patch are copied from D35375 by Chad Rosier (which
was abandoned).
Differential revision: https://reviews.llvm.org/D42477
llvm-svn: 324892
It asserts building Chromium; see PR36346.
(This also reverts the follow-up r324836.)
> If a load follows a store and reloads data that the store has written to memory, Intel microarchitectures can in many cases forward the data directly from the store to the load, This "store forwarding" saves cycles by enabling the load to directly obtain the data instead of accessing the data from cache or memory.
> A "store forward block" occurs in cases that a store cannot be forwarded to the load. The most typical case of store forward block on Intel Core microarchiticutre that a small store cannot be forwarded to a large load.
> The estimated penalty for a store forward block is ~13 cycles.
>
> This pass tries to recognize and handle cases where "store forward block" is created by the compiler when lowering memcpy calls to a sequence
> of a load and a store.
>
> The pass currently only handles cases where memcpy is lowered to XMM/YMM registers, it tries to break the memcpy into smaller copies.
> breaking the memcpy should be possible since there is no atomicity guarantee for loads and stores to XMM/YMM.
llvm-svn: 324887
In case of correct using of the 'l' constraint llvm now generates valid
code; otherwise it shows an error message. Initially these triggers an
assertion.
This commit is the same as r324869 with fixed the test's file name.
llvm-svn: 324885
Add a common -trap-unreachable option, similar to the target
specific hexagon equivalent, which has been replaced. This
turns unreachable instructions into traps, which is useful for
debugging.
Differential Revision: https://reviews.llvm.org/D42965
llvm-svn: 324880
Summary:
These are functions like operator<<(raw_ostream&, Foo).
Previously these were only supported for messages. In the assertion
EXPECT_EQ(A, B) << C;
the local modifications would explicitly try to use raw_ostream printing for C.
However A and B would look for a std::ostream printing function, and often fall
back to gtest's default "168 byte object <00 01 FE 42 ...>".
This patch pulls out the raw_ostream support into a new header under `custom/`.
I changed the mechanism: instead of a convertible stream, we wrap the printed
value in a proxy object to allow it to be sent to a std::ostream.
I think the new way is clearer.
I also changed the policy: we prefer raw_ostream printers over std::ostream
ones. This is because the fallback printers are defined using std::ostream,
while all the raw_ostream printers should be "good".
Reviewers: ilya-biryukov, chandlerc
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D43091
llvm-svn: 324876
In case of correct using of the 'l' constraint llvm now generates valid
code; otherwise it shows an error message. Initially these triggers an
assertion.
llvm-svn: 324869
The current implementation of `getPostIncExpr` invokes `getAddExpr` for two recurrencies
and expects that it always returns it a recurrency. But this is not guaranteed to happen if we
have reached max recursion depth or refused to make SCEV simplification for other reasons.
This patch changes its implementation so that now it always returns SCEVAddRec without
relying on `getAddExpr`.
Differential Revision: https://reviews.llvm.org/D42953
llvm-svn: 324866