These two functions are hard to reason about. This commit makes the code
more comprehensible:
- Use four distinct variables (OldIdxIn, OldIdxOut, NewIdxIn, NewIdxOut)
with a fixed value instead of a changing iterator I that points to
different things during the function.
- Remove the early explanation before the function in favor of more
detailed comments inside the function. Should have more/clearer comments now
stating which conditions are tested and which invariants hold at
different points in the functions.
The behaviour of the code was not changed.
I hope that this will make it easier to review the changes in
http://reviews.llvm.org/D9067 which I will adapt next.
Differential Revision: http://reviews.llvm.org/D16379
llvm-svn: 258756
For historic reasons, the behavior of .align differs between targets.
Fortunately, there are alternatives, .p2align and .balign, which make the
interpretation of the parameter explicit, and which behave consistently across
targets.
This patch teaches MC to use .p2align instead of .align, so that people reading
code for multiple architectures don't have to remember which way each platform
does its .align directive.
Differential Revision: http://reviews.llvm.org/D16549
llvm-svn: 258750
* __cfi_check gets a 3rd argument: ubsan handler data
* Instead of trapping on failure, call __cfi_check_fail which must be
present in the module (generated in the frontend).
llvm-svn: 258746
We had the same code duplicated for each type of Def. We also have the entire block duplicated between the local and non-local case, but let's start with local cleanup.
llvm-svn: 258740
There's a special case in EmitLoweredSelect() that produces an improved
lowering for cmov(cmov) patterns. However this special lowering is
currently broken if the inner cmov has multiple users so this patch
stops using it in this case.
If you wonder why this wasn't fixed by continuing to use the special
lowering and inserting a 2nd PHI for the inner cmov: I believe this
would incur additional copies/register pressure so the special lowering
does not improve upon the normal one anymore in this case.
This fixes http://llvm.org/PR26256 (= rdar://24329747)
llvm-svn: 258729
For metadata postpass linking, after importing all functions, we need
to recursively walk through any nodes reached via imported functions to
locate needed subprogram metadata. Some might only be reached indirectly
via the variable list for an inlined function.
llvm-svn: 258728
We were hitting an assertion because we were computing smaller type sizes for
instructions that cannot be demoted. The fix first determines the instructions
that will be demoted, and then applies the smaller type size to only those
instructions.
This should fix PR26239.
llvm-svn: 258705
Instructions can be DCE'd after the RegStackify pass. If the instruction which
would be the pop for what would be a push is removed, don't use a push.
llvm-svn: 258694
When generating calls to memcpy, memmove, and memset, use void* as the return
type rather than void, to match the standard signatures for these functions.
This has no practical effect for most targets, since the return values of
these calls aren't being used anyway, and most calling conventions tolerate
this kind of mismatch. However, this change will help support future
optimizations to utilize the return value to avoid holding the argument
value live across a call.
llvm-svn: 258691
The computation of ICmp demanded bits is independent of the individual operand being evaluated. We simply return a mask consisting of the minimum leading zeroes of both operands.
We were incorrectly passing "I" to ComputeKnownBits - this should be "UserI->getOperand(0)". In cases where we were evaluating the 1th operand, we were taking the minimum leading zeroes of it and itself.
This should fix PR26266.
llvm-svn: 258690
Enable more strict standards conformance in MSVC for rvalue casting and string literal type conversion to non-const types. Also enables generation of intrinsics for more functions.
Patch by Alexander Riccio
llvm-svn: 258687
This patch was originally committed as r257885, but was reverted due to windows
failures. The cause of these failures has been fixed under r258677, hence
re-committing the original patch.
llvm-svn: 258683
This patch was originally committed as r257884, but was reverted due to windows
failures. The cause of these failures has been fixed under r258677, hence
re-committing the original patch.
llvm-svn: 258682
This patch was originally committed as r257883, but was reverted due to windows
failures. The cause of these failures has been fixed under r258677, hence
re-committing the original patch.
llvm-svn: 258681
VPMADD52LUQ - Packed Multiply of Unsigned 52-bit Integers and Add the Low 52-bit Products to Qword Accumulators
VPMADD52HUQ - Packed Multiply of Unsigned 52-bit Unsigned Integers and Add High 52-bit Products to 64-bit Accumulators
Differential Revision: http://reviews.llvm.org/D16407
llvm-svn: 258680
This was originally committed as r255762, but reverted as it broke windows
bots. Re-commitiing the exact same patch, as the underlying cause was fixed by
r258677.
ARMv8.2-A adds 16-bit floating point versions of all existing VFP
floating-point instructions. This is an optional extension, so all of
these instructions require the FeatureFullFP16 subtarget feature.
The assembly for these instructions uses S registers (AArch32 does not
have H registers), but the instructions have ".f16" type specifiers
rather than ".f32" or ".f64". The top 16 bits of each source register
are ignored, and the top 16 bits of the destination register are set to
zero.
These instructions are mostly the same as the 32- and 64-bit versions,
but they use coprocessor 9 rather than 10 and 11.
Two new instructions, VMOVX and VINS, have been added to allow packing
and extracting two 16-bit floats stored in the top and bottom halves of
an S register.
New fixup kinds have been added for the PC-relative load and store
instructions, but no ELF relocations have been added as they have a
range of 512 bytes.
Differential Revision: http://reviews.llvm.org/D15038
llvm-svn: 258678
This is a fix for https://llvm.org/bugs/show_bug.cgi?id=22796.
The previous implementation of ClassInfo::operator< allowed cycles of classes
such that x < y < z < x, meaning that a list of them cannot be correctly
sorted, and the sort order could differ with different standard libraries.
The original implementation sorted classes by ValueName if they were otherwise
equal. This isn't strictly necessary, but some backends seem to accidentally
rely on it. If I reverse this comparison I get 8 test failures spread across
the AArch64, Mips and X86 backends, so I have left it in until those backends
can be fixed.
There was one case in the X86 backend where the observable behaviour of the
assembler is changed by this patch. This was because some of the memory asm
operands were not marked as children of X86MemAsmOperand.
Differential Revision: http://reviews.llvm.org/D16141
llvm-svn: 258677
Use AVX1 FP instructions (vmaskmovps/pd) in place of the AVX2 int instructions (vpmaskmovd/q).
Differential Revision: http://reviews.llvm.org/D16528
llvm-svn: 258675
The ORC ObjectLinkingLayer uses this flag during symbol lookup. Failure to set
it causes all symbols to behave as if they were non-exported, which has caused
failures in the kaleidoscope tutorials on Windows. Raising the flag should
un-break the tutorials.
No test case yet - none of the existing command line tools for printing symbol
tables (llvm-nm, llvm-objdump) show the status of this flag, and I don't want to
change the format from these tools without consulting their owners. I'll send an
email to the dev-list to figure out the right way forward.
llvm-svn: 258665
Consolidate the code which handles string table offsets less than 999999
with the code for offsets less than 9999999. While we are here,
simplify the code by not using sprintf to generate the string.
llvm-svn: 258664
Use existing functionality provided in changeToUnreachable instead of
reinventing it in LoopSimplify.
No functionality change is intended.
llvm-svn: 258663
Changes in X86.td:
I set features of Intel processors in incremental form: IVB = SNB + X HSW = IVB + X ..
I added Skylake client processor and defined it's features
FeatureADX was missing on KNL
Added some new features to appropriate processors SMAP, IFMA, PREFETCHWT1, VMFUNC and others
Differential Revision: http://reviews.llvm.org/D16357
llvm-svn: 258659
SCCP has code identical to changeToUnreachable's behavior, switch it
over to just call changeToUnreachable.
No functionality change intended.
llvm-svn: 258654
InstCombine and SCCP both want to remove dead code in a very particular
way but using identical means to do so. Share the code between the two.
No functionality change is intended.
llvm-svn: 258653
A cleanup can have paths which unwind or end up in unreachable.
If there is an unreachable path *and* a path which unwinds to caller,
we would mistakenly inject an unwind path to a catchswitch on the
unreachable path. This results in a verifier assertion firing because
the cleanup unwinds to two different places: to the caller and to the
catchswitch.
This occured because we used getCleanupRetUnwindDest to determine if the
cleanuppad had no cleanuprets.
This is incorrect, getCleanupRetUnwindDest returns null for cleanuprets
which unwind to caller.
llvm-svn: 258651
Generalised mask generation / subvector extraction to use the input/output types directly instead of an if/else through all the currently accepted types.
llvm-svn: 258645
Summary: Helper so we don't have to enumerate nvptx && nvptx64 everywhere.
Reviewers: echristo
Subscribers: llvm-commits, jhen, tra
Differential Revision: http://reviews.llvm.org/D16494
llvm-svn: 258639
Summary:
Previously, we would just output "foo = bar" in the assembly, and then
ptxas would choke. Now we die before emitting any invalid code.
Reviewers: echristo
Subscribers: jholewinski, llvm-commits, jhen, tra
Differential Revision: http://reviews.llvm.org/D16490
llvm-svn: 258638
Summary:
Update ObjectTransformLayer::addObjectSet to take the object set by
value rather than reference and pass it to the base layer with move
semantics rather than copy, to match r258185's changes to
ObjectLinkingLayer.
Update the unit test to verify that ObjectTransformLayer's signature stays
in sync with ObjectLinkingLayer's.
Reviewers: lhames
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D16414
llvm-svn: 258630
For the moment, this file takes way too long to run (see inline comments), but
that should be a temporary problem. The fact that the compile time is so slow
for a target that doesn't support maskmov may be a bug worth investigating too.
llvm-svn: 258629
target is macho.
It looks like the check for macho was accidentally dropped in r132959.
I don't have a test case, but I'll add one if anyone knows how this can
be tested.
llvm-svn: 258627
If the INSERTPS zeroes out all the referenced elements from either of the 2 input vectors (and the input is not already UNDEF), then set that input to UNDEF to reduce dependencies.
llvm-svn: 258622
Some of the conditions necessary to produce ccmp sequences were only
checked in recursive calls to emitConjunctionDisjunctionTree() after
some of the earlier expressions were already built. Move all checks over
to isConjunctionDisjunctionTree() so they are all checked before we
start emitting instructions.
Also rename some variable to better reflect their usage.
llvm-svn: 258605
Previously it failed to add NumArgRegs to the offset and so clobbered an
already-used register. Now just start the numbering after the arg regs
and don't duplicate the add. Test coverage for this coming shortly with
the implementation of byval.
llvm-svn: 258597
Cleanups in C++ are a little weird. They are only guaranteed to be
reliably executed if, and only if, there is a viable catch handler which
can handle the exception.
This means that reachability of a cleanup is lexically determined by it
being nested with a try-block which unwinds to a catch. It is *cannot*
be reasoned about by examining the control flow edges leaving a cleanup.
Usually this is not a problem. It becomes a problem when there are *no*
edges out of a cleanup because we believed that code post-dominated by
the cleanup is dead. In LLVM's case, this code is what informs the
personality routine about the presence of a suitable catch handler.
However, the lack of edges to that catch handler makes the handler
become unreachable which causes us to remove it. By removing the
handler, the cleanup becomes unreachable.
Instead, inject a catch-all handler with every cleanup that has no
unwind edges. This will allow us to properly unwind the stack.
This fixes PR25997.
llvm-svn: 258580
in MachOObjectFile::getSymbolByIndex() when a Mach-O file has
a symbol table load command but the number of symbols are zero.
The code in MachOObjectFile::symbol_begin_impl() should not be
assuming there is a symbol at index 0, in cases there is no symbol
table load command or the count of symbol is zero. So I also fixed
that. And needed to fix MachOObjectFile::symbol_end_impl() to
also do the same thing for no symbol table or one with zero entries.
The code in MachOObjectFile::getSymbolByIndex() should trigger
the report_fatal_error() for programmatic errors for any index when
there is no symbol table load command and not return the end iterator.
So also fixed that. Note there is no test case as this is a programmatic
error.
The test case using the file macho-invalid-bad-symbol-index has
a symbol table load command with its number of symbols (nsyms)
is zero. Which was incorrectly testing the bad triggering of the
report_fatal_error() in in MachOObjectFile::getSymbolByIndex().
This test case is an invalid Mach-O file but not for that reason.
It appears this Mach-O file use to have an nsyms value of 11,
and what makes this Mach-O file invalid is the counts and
indexes into the symbol table of the dynamic load command
are now invalid because the number of symbol table entries
(nsyms) is now zero. Which can be seen with the existing
llvm-obdump:
% llvm-objdump -private-headers macho-invalid-bad-symbol-index
…
Load command 4
cmd LC_SYMTAB
cmdsize 24
symoff 4216
nsyms 0
stroff 4392
strsize 144
Load command 5
cmd LC_DYSYMTAB
cmdsize 80
ilocalsym 0
nlocalsym 8 (past the end of the symbol table)
iextdefsym 8 (greater than the number of symbols)
nextdefsym 2 (past the end of the symbol table)
iundefsym 10 (greater than the number of symbols)
nundefsym 1 (past the end of the symbol table)
...
And the native darwin tools generates an error for this file:
% nm macho-invalid-bad-symbol-index
nm: object: macho-invalid-bad-symbol-index truncated or malformed object (ilocalsym plus nlocalsym in LC_DYSYMTAB load command extends past the end of the symbol table)
I added new checks for the indexes and sizes for these in the
constructor of MachOObjectFile. And added comments for what
would be a proper diagnostic messages.
And changed the test case using macho-invalid-bad-symbol-index
to test for the new error now produced.
Also added a test with a valid Mach-O file with a symbol table
load command where the number of symbols is zero that shows
the report_fatal_error() is not called.
llvm-svn: 258576
Summary:
Fix the issue with the most recently discovered unit receiving much less attention.
Note: this is the second attempt (prev: r258473). Now, libc++ build is fixed.
Reviewers: aizatsky, kcc
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D16487
llvm-svn: 258571
Summary:
The testing for returnBB was flipped which may cause ARM ld/st opt pass uses callee saved regs in returnBB when shrink-wrap is used.
Reviewers: t.p.northover, apazos, MatzeB
Subscribers: mcrosier, zzheng, aemerson, llvm-commits, rengolin
Differential Revision: http://reviews.llvm.org/D16434
llvm-svn: 258569
When we build LLVM with externalized debug info, all debugging and
symbolication related data is extracted into dSYM files prior to
stripping. As such, there is no need to preserve local symbols in LLVM
binaries after dSYM creation.
This shrinks libLLVM.dylib from 58MB to 55MB on my system.
llvm-svn: 258566
If the intrinsic is overloaded and works on multiple types,
it cannot resolve to a single corresponding builtin and requires
handling in clang. This just causes crashes now.
llvm-svn: 258559
The intrinsic target prefix should match the target name
as it appears in the triple.
This is not yet complete, but gets most of the important ones.
llvm.AMDGPU.* intrinsics used by mesa and libclc are still handled
for compatability for now.
llvm-svn: 258557
Summary:
Make sure that any new and optimized objects created during GlobalOPT copy all the attributes from the base object.
A good example of improper behavior in the current implementation is section information associated with the GlobalObject. If a section was set for it, and GlobalOpt is creating/modifying a new object based on this one (often copying the original name), without this change new object will be placed in a default section, resulting in inappropriate properties of the new variable.
The argument here is that if customer specified a section for a variable, any changes to it that compiler does should not cause it to change that section allocation.
Moreover, any other properties worth representation in copyAttributesFrom() should also be propagated.
Reviewers: jmolloy, joker-eph, joker.eph
Subscribers: slarin, joker.eph, rafael, tobiasvk, llvm-commits
Differential Revision: http://reviews.llvm.org/D16074
llvm-svn: 258556
\src\llvm-rw\include\llvm/Support/AlignOf.h(254) :
error C2872: 'detail' : ambiguous symbol
could be 'llvm::detail'
or 'llvm::support::detail'
llvm-svn: 258553
Summary:
This change adds a `-spp-no-statepoints` flag to PlaceSafepoints that
bypasses the code that wraps newly introduced polls and existing calls
in gc.statepoint. With `-spp-no-statepoints` enabled, PlaceSafepoints
effectively becomes a safpeoint **poll** insertion pass.
The eventual goal is to "constant fold" this option, along with
`-rs4gc-use-deopt-bundles` to `true`, once clients using gc.statepoint
are okay doing so.
Reviewers: pgavlin, reames, JosephTremoulet
Subscribers: sanjoy, mcrosier, llvm-commits
Differential Revision: http://reviews.llvm.org/D16439
llvm-svn: 258551
Make the variable a member of the writer trait object owned
now by the writer. Also use a different generator interface
to pass the infoObject from the writer.
llvm-svn: 258544
The promote alloca pass didn't handle these intrinsics and crashed.
These intrinsics should accept any address space, but for now just
erase them to avoid breaking.
llvm-svn: 258537
Using an array instead of ArrayRef would allow type inference, but
(short of using C99) one would still need to write
typedef uint16_t VT[];
LE.write(VT{0x1234, 0x5678});
llvm-svn: 258535
The current behavior is incorrect, as the two CCs returned by
changeFPCCToAArch64CC, intended to be OR'ed, are instead used
in an AND ccmp chain.
Consider:
define i32 @t(float %a, float %b, float %c, float %d, i32 %e, i32 %f) {
%cc1 = fcmp one float %a, %b
%cc2 = fcmp olt float %c, %d
%and = and i1 %cc1, %cc2
%r = select i1 %and, i32 %e, i32 %f
ret i32 %r
}
Assuming (%a < %b) and (%c < %d); we used to do:
fcmp s0, s1 # nzcv <- 1000
orr w8, wzr, #0x1 # w8 <- 1
csel w9, w8, wzr, mi # w9 <- 1
csel w8, w8, w9, gt # w8 <- 1
fcmp s2, s3 # nzcv <- 1000
cset w9, mi # w9 <- 1
tst w8, w9 # (w8 & w9) == 1, so: nzcv <- 0000
csel w0, w0, w1, ne # w0 <- w0
We now do:
fcmp s2, s3 # nzcv <- 1000
fccmp s0, s1, #0, mi # mi, so: nzcv <- 1000
fccmp s0, s1, #8, le # !le, so: nzcv <- 1000
csel w0, w0, w1, pl # !pl, so: w0 <- w1
In other words, we transformed:
(c < d) && ((a < b) || (a > b))
into:
(c < d) && (a u>= b) && (a u<= b)
whereas, per De Morgan's, we wanted:
(c < d) && !((a u>= b) && (a u<= b))
Note that this problem doesn't occur in the test-suite.
changeFPCCToAArch64CC produces disjunct CCs; here, one -> mi/gt.
We can't represent that in the fccmp chain; it can't express
arbitrary OR sequences, as one comment explains:
In general we can create code for arbitrary "... (and (and A B) C)"
sequences. We can also implement some "or" expressions, because
"(or A B)" is equivalent to "not (and (not A) (not B))" and we can
implement some negation operations. [...] However there is no way
to negate the result of a partial sequence.
Instead, introduce changeFPCCToANDAArch64CC, which produces the
conjunct cond codes:
- (a one b)
== ((a olt b) || (a ogt b))
== ((a ord b) && (a une b))
- (a ueq b)
== ((a uno b) || (a oeq b))
== ((a ule b) && (a uge b))
Note that, at first, one might think that, when PushNegate is true,
we should use the disjunct CCs, in effect doing:
(a || b)
= !(!a && !(b))
= !(!a && !(b1 || b2)) <- changeFPCCToAArch64CC(b, b1, b2)
= !(!a && !b1 && !b2)
However, we can take advantage of the fact that the CC is already
negated, which lets us avoid special-casing PushNegate and doing
the simpler to reason about:
(a || b)
= !(!a && (!b))
= !(!a && (b1 && b2)) <- changeFPCCToANDAArch64CC(!b, b1, b2)
= !(!a && b1 && b2)
This makes both emitConditionalCompare cases behave identically,
and produces correct ccmp sequences for the 2-CC fcmps.
llvm-svn: 258533
We verify that the op tree is eligible for CCMP emission in
isConjunctionDisjunctionTree, but it's also possible that
emitConjunctionDisjunctionTree fails later.
The initial check is useful, as it avoids building nodes
that will get discarded.
Still, make sure that inconsistencies don't happen with
an assert.
llvm-svn: 258532
but to return object_error::parse_failed. Then made the code in llvm-nm
do for Mach-O files what is done in the darwin native tools which is to
print "bad string index" for bad string indexes. Updated the error message
in the llvm-objdump test, and added tests to show llvm-nm prints
"bad string index" and a test to print the actual bad string index value
which in this case is 0xfe000002 when printing the fields as raw hex.
llvm-svn: 258520
This reapplies r258296 and r258366, and also fixes an existing bug in
SelectionDAG.cpp's isMemSrcFromString, neglecting to account for the
offset in a GlobalAddressSDNode, which is uncovered by those patches.
llvm-svn: 258482
Summary:
Fix the issue with the most recently discovered unit receiving much less attention.
Note: I had to change the seed for one test to make it pass. Alternatively,
the number of runs could be increased. I believe that the average time of
'foo' discovery is not increased, just seed=1 was particularly convenient
for the previous PRNG scheme used.
Reviewers: aizatsky, kcc
Subscribers: llvm-commits, kcc
Differential Revision: http://reviews.llvm.org/D16419
llvm-svn: 258473
Summary:
SETCC with f16 vectors has OperationAction set to Expand but still gets
lowered to FCM* intrinsics based on its result type. This patch skips
lowering of VSETCC if the operand is an f16 vector.
v4 and v8 tests included.
Reviewers: ab, jmolloy
Subscribers: srhines, llvm-commits
Differential Revision: http://reviews.llvm.org/D15361
llvm-svn: 258471
This reverts r258296 and the follow up r258366. With this change, we
miscompiled the following program on Windows:
#include <string>
#include <iostream>
static const char kData[] = "asdf jkl;";
int main() {
std::string s(kData + 3, sizeof(kData) - 3);
std::cout << s << '\n';
}
llvm-svn: 258465
Summary:
Since we are currently not doing incremental importing there is
no need to link metadata as a postpass. The module linker will
only link in the imported subroutines due to the functionality
added by r256003.
(Note that the metadata postpass linking functionalitiy is still
used by llvm-link, and may be needed here in the future if a more
incremental strategy is adopted.)
Reviewers: joker.eph
Subscribers: joker.eph, llvm-commits
Differential Revision: http://reviews.llvm.org/D16424
llvm-svn: 258458