We're erasing MI here, but then immediately using it again inside the
`if`. This moves the erase after we're done using it.
Doing that reveals a second problem though - this case is missing a
break, so we fall through to the default and dereference MI again.
This is obviously a bug, though I don't know how to write a test that
triggers it - all we do in the error case is print some extra debug
output.
Both of these issue crash on lots of tests under ASAN with the
recycling allocator changes from PR26808 applied.
llvm-svn: 264442
64-bit, 32-bit and 16-bit move-immediate instructions are 7, 6, and 5 bytes,
respectively, whereas and/or with 8-bit immediate is only three bytes.
Since these instructions imply an additional memory read (which the CPU could
elide, but we don't think it does), restrict these patterns to minsize functions.
Differential Revision: http://reviews.llvm.org/D18374
llvm-svn: 264440
Now register parameters that aren't saved to the stack or CSRs are
considered dead after the first call. Previously the debugger would show
whatever was in the register.
Fixes PR26589
Reviewers: aprantl
Differential Revision: http://reviews.llvm.org/D17211
llvm-svn: 264429
This keeps the naming consistent with Chapters 6-8, where Error was renamed to
LogError in r264426 to avoid clashes with the new Error class in libSupport.
llvm-svn: 264427
Summary:
Add erase() which returns an iterator pointing to the next element after the
erased one. This makes it possible to erase selected elements while iterating
over the SetVector :
while (I != E)
if (test(*I))
I = SetVector.erase(I);
else
++I;
Reviewers: qcolombet, mcrosier, MatzeB, dblaikie
Subscribers: dberlin, dblaikie, mcrosier, llvm-commits
Differential Revision: http://reviews.llvm.org/D18281
llvm-svn: 264414
Optimize output of MDStrings in bitcode. This emits them in big blocks
(currently 1024) in a pair of records:
- BULK_STRING_SIZES: the sizes of the strings in the block, and
- BULK_STRING_DATA: a single blob, which is the concatenation of all
the strings.
Inspired by Mehdi's similar patch, http://reviews.llvm.org/D18342, this
should (a) slightly reduce bitcode size, since there is less record
overhead, and (b) greatly improve reading speed, since blobs are super
cheap to deserialize.
I needed to add support for blobs to streaming input to get the test
suite passing.
- StreamingMemoryObject::getPointer reads ahead and returns the
address of the blob.
- To avoid a possible reallocation of StreamingMemoryObject::Bytes,
BitstreamCursor::readRecord needs to move the call to JumpToEnd
forward so that getPointer is the last bitstream operation.
llvm-svn: 264409
LowerShift was using the same code as Lower256IntArith to split 256-bit vectors into 2 x 128-bit vectors, so now we just call Lower256IntArith.
llvm-svn: 264403
Differential Revision: http://reviews.llvm.org/D18456
This is a re-commit of r264387 and r264388 after fixing a typo.
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 264392
This reverts commit r264387.
Bots are broken in various ways, I need to take one commit at a time...
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 264390
This reverts commit r264388.
Bots are broken in various ways, I need to take one commit at a time...
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 264389
Summary:
Loading IR with debug info improves MDString::get() from 19ms to 10ms.
This is a rework of D16597 with adding an "emplace" method on the StringMap
to avoid requiring the MDString move ctor to be public.
Reviewers: dexonsmith
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D17920
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 264386
Summary:
StringMap ctor accepts an initialize size, but expect it to be
rounded to the next power of 2. The ctor can handle that directly
instead of expecting clients to round it. Also, since the map will
resize itself when 75% full, take this into account an initialize
a larger initial size to avoid any growth.
Reviewers: dblaikie
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D18344
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 264385
Summary:
Just running the loop in the unittests for a few more iterations
(till 48) exhibit that the condition on the limit was not handled
properly in r263522.
Rewrite the test to use a class to count move/copies that happens
when inserting into the map.
Also take the opportunity to refactor the logic to compute the
number of buckets required for a given number of entries in the map.
Use this when constructing a DenseMap with a desired size given to
the constructor (and add a tests for this).
Reviewers: dblaikie
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D18345
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 264384
Summary:
These are just helpers calling their static counter part to
simplify client code.
Reviewers: tejohnson
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D18339
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 264382
The motivation for MODULE_CODE_METADATA_VALUES was to enable an
-flto=thin scheme where:
1. First, one function is cherry-picked from a bitcode file.
2. Later, another function is cherry-picked.
3. Later, ...
4. Finally, the metadata needed by all the previous functions is
loaded.
This was abandoned in favour of:
1. Calculate the superset of functions needed from a Module.
2. Link all functions at once.
Delayed metadata reading no longer serves a purpose. It also adds
a few complication, since we can't count on metadata being properly
parsed when exiting the BitcodeReader. After discussing with Teresa, we
agreed to remove it.
The code that depended on this was removed/updated in r264326.
llvm-svn: 264378
This is the same as r255936, with added logic for avoiding clobbering of the
red zone (PR26023).
Differential Revision: http://reviews.llvm.org/D18246
llvm-svn: 264375
Remove logic to upgrade !llvm.loop by changing the MDString tag
directly. This old logic would check (and change) arbitrary strings
that had nothing to do with loop metadata. Instead, check !llvm.loop
attachments directly, and change which strings get attached.
Rather than updating the assembly-based upgrade, drop it entirely. It
has been quite a while since we supported upgrading textual IR.
llvm-svn: 264373
This reserves an MDKind for !llvm.loop, which allows callers to avoid a
string-based lookup. I'm not sure why it was missing.
There should be no functionality change here, just a small compile-time
speedup.
llvm-svn: 264371
We did not have an explicit branch to the continuation BB. When the check was
hoisted, this could permit control follow to fall through into the division
trap. Add the explicit branch to the continuation basic block to ensure that
code execution is correct.
llvm-svn: 264370
It is incorrect to get the corresponding MBB for a ReturnInst before
SelectAllBasicBlocks since SelectAllBasicBlocks can change the
correspondence between a ReturnInst and the MBB it is in.
PR27062
llvm-svn: 264358
This is an enhancement of the existing update_llc_test_checks.py script.
It adds some of the functionality from the script used in D17999 to make
the IR checking more flexible.
The bad news:
This actually is 'My First Python Program'. Thus, it's likely that I have
violated all best practices of Python programming if I've made a functional
change from the original program. If you see anything that's obviously
wrong, please let me know or feel free to fix it. I didn't even read any
documentation...
The good news:
I tested this on ~10 existing opt/llc regression tests, and it does what
I hoped for. It produces exact checking for IR regression tests and doesn't
signficantly change the existing llc-with-x86-target asm checking. The opt
tests that were modified in r263667, r263668, r263674, and r263679 are
examples of the expected results, except that this version of the script
puts the check lines ahead of the IR to follow the existing llc/asm
behavior.
If there are no complaints/fallout, we should be able to remove the
original script. Extending this script to be used for non-x86 and clang
regression tests would be the expected follow-up steps.
llvm-svn: 264357
Earlier we were ignoring varargs in LowerCallSiteWithDeoptBundle because
populateCallLoweringInfo does not set CallLoweringInfo::IsVarArg.
llvm-svn: 264354
Going to be reading the DW_AT_GNU_dwo_name shortly as well, and there
was already enough duplication here that it was worth refactoring
rather than adding even more.
llvm-svn: 264350
We try to hoist the insertion point as high as possible to encourage
sharing. However, we must be careful not to hoist into a catchswitch as
it is both an EHPad and a terminator.
llvm-svn: 264344
Summary:
Using leading zeroes allows you to search for "000%" to find non-covered
items.
Differential Revision: http://reviews.llvm.org/D18420
llvm-svn: 264336
Summary:
Apparently, when compiling with gcc 5.3.2 for powerpc64, the order of
headers is such that it gets an error about std::atomic<> use in
ThreadPool.h, since this header is not included explicitly. See also:
https://llvm.org/bugs/show_bug.cgi?id=27058
Fix this by including <atomic>. Patch by Bryan Drewery.
Reviewers: chandlerc, joker.eph
Subscribers: bdrewery, llvm-commits
Differential Revision: http://reviews.llvm.org/D18460
llvm-svn: 264335
Summary:
Only adds support for "naked" calls to llvm.experimental.deoptimize.
Support for round-tripping through RewriteStatepointsForGC will come
as a separate patch (should be simpler than this one).
Reviewers: reames
Subscribers: sanjoy, mcrosier, llvm-commits
Differential Revision: http://reviews.llvm.org/D18429
llvm-svn: 264329
Summary:
Use bulk importing so we can avoid the use of post-pass metadata
linking. Cloned the ModuleLazyLoaderCache from the FunctionImport pass
to facilitate this.
Reviewers: joker.eph
Subscribers: dexonsmith, llvm-commits, joker.eph
Differential Revision: http://reviews.llvm.org/D18455
llvm-svn: 264326
In PIC mode, the registers R14, R15 and R28 are reserved for use by
the PLT handling code. This causes all functions to clobber these
registers. While this is not new for regular function calls, it does
also apply to save/restore functions, which do not follow the standard
ABI conventions with respect to the volatile/non-volatile registers.
Patch by Jyotsna Verma.
llvm-svn: 264324
Given that StatepointLowering now uniques derived pointers before
putting them in the per-statepoint spill map, we may end up with missing
entries for derived pointers when we visit a gc.relocate on a pointer
that was de-duplicated away.
Fix this by keeping two maps, one mapping gc pointers to their
de-duplicated values, and one mapping a de-duplicated value to the slot
it is spilled in.
llvm-svn: 264320
When multiple DWP files are merged together and duplicate DWO IDs are
found it's currently difficult to give an actionable error message - the
DW_AT_name of the CU could be provided, but might be identical (if the
same source file is built into two different configurations), which
doesn't help the user identify the problem.
When no intermediate DWP files are generated, the path to the two DWO
files could be provided - but is lost once the DWOs are merged into a
DWP.
So, include the name of the DWO (dwo_name) in the split file so that
collissions involving a source CU from a DWP can be better diagnosed.
(improvements to llvm-dwp using this to come shortly)
llvm-svn: 264316
isDependenceDistanceOfOne asserts that the store and the load access
through the same type. This function is also used by
removeDependencesFromMultipleStores so we need to make sure we filter
out mismatching types before reaching this point.
Now we do this when the initial candidates are gathered.
This is a refinement of the fix made in r262267.
Fixes PR27048.
llvm-svn: 264313
We lose the 'utils' directory name in our advertising line with
this change. We could retain that, but I don't see the point.
This removes a dependency for making the script apply to more than
'llc'. Ie, we'll want to change the script name if it works with
opt/clang too.
llvm-svn: 264310
The `MipsMCInstrAnalysis` class overrides the `evaluateBranch` method
and calculates target addresses for branch and calls instructions.
That allows llvm-objdump to print functions' names in branch instructions
in the disassemble mode.
Differential Revision: http://reviews.llvm.org/D18209
llvm-svn: 264309
The goal is to enhance this script to be used with opt and clang:
Break 'main' into functions and change variable names to be more
generic because we want to handle more than x86 asm output.
llvm-svn: 264307
Simplify ValueEnumerator and WriteModuleMetadata by shifting the logic
for the METADATA_GENERIC_DEBUG abbreviation into WriteGenericDINode.
(This is just like r264302, but for GenericDINode.)
The only change is that the abbreviation is emitted later in the
bitcode, just before the first `GenericDINode` record. This shouldn't
be observable though.
llvm-svn: 264303
Simplify ValueEnumerator and WriteModuleMetadata by shifting the logic
for the METADATA_LOCATION abbreviation into WriteDILocation.
The only change is that the abbreviation is emitted later in the
bitcode, just before the first `DILocation` record. This shouldn't be
observable though.
llvm-svn: 264302
Split writeNamedMetadata out of WriteModuleMetadata to write named
metadata, and createNamedMetadataAbbrev for the abbreviation.
There should be no effective functionality change, although the layout
of the bitcode will change. Previously, the abbreviation was emitted at
the top of the block, but now it is delayed until immediately before the
named metadata records are emitted.
llvm-svn: 264301
The patch supports common STV_xxx visibility flags and MIPS specific
STO_MIPS_xxx flags.
Differential Revision: http://reviews.llvm.org/D18447
llvm-svn: 264300
KTEST instruction may be used instead of TEST in this case:
%int_sel3 = bitcast <8 x i1> %sel3 to i8
%res = icmp eq i8 %int_sel3, zeroinitializer
br i1 %res, label %L2, label %L1
Differential Revision: http://reviews.llvm.org/D18444
llvm-svn: 264298
If the operation's type has been promoted during type legalization, we
need to account for the fact that the high bits of the comparison
operand are likely unspecified.
The LHS is usually zero-extended, but MIPS sign extends it, so we have
to be slightly careful.
Patch by Simon Dardis.
llvm-svn: 264296
After comdat processing, the symbols still go through regular symbol
resolution.
We were not doing it for linkonce symbols since they are lazy linked.
This fixes pr27044.
llvm-svn: 264288
Summary:
Some target lowerings of FP_TO_FP16, for instance ARM's vcvtb.f16.f32
instruction, do not guarantee that the top 16 bits are zeroed out.
Remove the unsafe AssertZext and add tests to exercise this.
Reviewers: jmolloy, sbaranga, kristof.beyls, aadg
Subscribers: llvm-commits, srhines, aemerson
Differential Revision: http://reviews.llvm.org/D18426
llvm-svn: 264285
This patch corresponds to review:
http://reviews.llvm.org/D17711
It disables direct moves on these operations in 32-bit mode since the patterns
assume 64-bit registers. The final patch is slightly different from the
Phabricator review as the bitcast operations needed to be disabled in 32-bit
mode as well. This fixes PR26617.
llvm-svn: 264282
This patch begins adding support for lowering to the XOP VPPERM instruction - adding the X86ISD::VPPERM opcode.
Differential Revision: http://reviews.llvm.org/D18189
llvm-svn: 264260
Summary:
In particular, make the cnMIPS predicates much more obvious and prefer
def ... : ... {
let Foo = bar;
}
over:
let Foo = bar in
def ... : ...;
Reviewers: vkalintiris
Subscribers: dsanders, llvm-commits
Differential Revision: http://reviews.llvm.org/D18354
llvm-svn: 264258
We used to only allow SCEVAddRecExpr for pointer expressions in order to
be able to compute the bounds. However this is also trivially possible
for loop-invariant addresses (scUnknown) since then the bounds are the
address itself.
Interestingly, we used allow this for the special case when the
loop-invariant address happens to also be an SCEVAddRecExpr (in an outer
loop).
There are a couple more loops that are vectorized in SPEC after this.
My guess is that the main reason we don't see more because for example a
loop-invariant load is vectorized into a splat vector with several
vector-inserts. This is likely to make the vectorization unprofitable.
I.e. we don't notice that a later LICM will move all of this out of the
loop so the cost estimate should really be 0.
llvm-svn: 264243
We need the "return address" of a noreturn call to be within the
bounds of the calling function; TrapUnreachable turns 'unreachable'
into a 'ud2' instruction, which has that desired effect.
Differential Revision: http://reviews.llvm.org/D18414
llvm-svn: 264224
If not for lazy linking of linkonce GVs, comdats are just a
preprocessing before symbol resolution.
Lazy linking complicates it since when we pick a visible member of
comdat, we have to make sure the rest of it passes symbol resolution
too.
llvm-svn: 264223
This is a temporary crutch to enable code that currently uses std::error_code
to be incrementally moved over to Error. Requiring all Error instances be
convertible enables clients to call errorToErrorCode on any error (not just
ECErrors created by conversion *from* an error_code).
This patch also moves code for Error from ErrorHandling.cpp into a new
Error.cpp file.
llvm-svn: 264221
The BumpPtrAllocator currently doesn't handle zero length allocations well.
The discussion for how to fix that is ongoing. However, there's no need
for StringRef::copy to actually allocate anything here anyway, so just
return StringRef() when we get a zero length copy.
Reviewed by David Blaikie
llvm-svn: 264201
Strengthen tests of storing frame indices.
Right now this just creates irrelevant scheduling changes.
We don't want to have multiple frame index operands
on an instruction. There seem to be various assumptions
that at least the same frame index will not appear twice
in the LocalStackSlotAllocation pass.
There's no reason to have this happen, and it just
makes it easy to introduce bugs where the immediate
offset is appplied to the storing instruction when it should
really be applied to the value being stored as a separate
add.
This might not be sufficient. It might still be problematic
to have an add fi, fi situation, but that's even less unlikely
to happen in real code.
llvm-svn: 264200
Currently, AnalyzeBranch() fails non-equality comparison between floating points
on X86 (see https://llvm.org/bugs/show_bug.cgi?id=23875). This is because this
function can modify the branch by reversing the conditional jump and removing
unconditional jump if there is a proper fall-through. However, in the case of
non-equality comparison between floating points, this can turn the branch
"unanalyzable". Consider the following case:
jne.BB1
jp.BB1
jmp.BB2
.BB1:
...
.BB2:
...
AnalyzeBranch() will reverse "jp .BB1" to "jnp .BB2" and then "jmp .BB2" will be
removed:
jne.BB1
jnp.BB2
.BB1:
...
.BB2:
...
However, AnalyzeBranch() cannot analyze this branch anymore as there are two
conditional jumps with different targets. This may disable some optimizations
like block-placement: in this case the fall-through behavior is enforced even if
the fall-through block is very cold, which is suboptimal.
Actually this optimization is also done in block-placement pass, which means we
can remove this optimization from AnalyzeBranch(). However, currently
X86::COND_NE_OR_P and X86::COND_NP_OR_E are not reversible: there is no defined
negation conditions for them.
In order to reverse them, this patch defines two new CondCode X86::COND_E_AND_NP
and X86::COND_P_AND_NE. It also defines how to synthesize instructions for them.
Here only the second conditional jump is reversed. This is valid as we only need
them to do this "unconditional jump removal" optimization.
Differential Revision: http://reviews.llvm.org/D11393
llvm-svn: 264199
The goal is to enhance this script to be used with opt and clang:
Group all of the regexes together, so it's easier to see what's going on.
This will make it easier to break main() up into pieces too.
Also, note that some of the regexes are for x86-specific asm.
llvm-svn: 264197
If a comdat is dropped, all symbols in it are dropped.
If a comdat is kept, the symbols survive to pass regular symbol
resolution.
With this patch we do that for all global symbols.
The added test is a copy of test/tools/gold/X86/comdat.ll that we now
pass.
llvm-svn: 264192
in the test suite. While this is not really an interesting tool and option to run
on a Mach-O file to show the symbol table in a generic libObject format
it shouldn’t crash.
The reason for the crash was in MachOObjectFile::getSymbolType() when it was
calling MachOObjectFile::getSymbolSection() without checking its return value
for the error case.
What makes this fix require a fair bit of diffs is that the method getSymbolType() is
in the class ObjectFile defined without an ErrorOr<> so I needed to add that all
the sub classes. And all of the uses needed to be updated and the return value
needed to be checked for the error case.
The MachOObjectFile version of getSymbolType() “can” get an error in trying to
come up with the libObject’s internal SymbolRef::Type when the Mach-O symbol
symbol type is an N_SECT type because the code is trying to select from the
SymbolRef::ST_Data or SymbolRef::ST_Function values for the SymbolRef::Type.
And it needs the Mach-O section to use isData() and isBSS to determine if
it will return SymbolRef::ST_Data.
One other possible fix I considered is to simply return SymbolRef::ST_Other
when MachOObjectFile::getSymbolSection() returned an error. But since in
the past when I did such changes that “ate an error in the libObject code” I
was asked instead to push the error out of the libObject code I chose not
to implement the fix this way.
As currently written both the COFF and ELF versions of getSymbolType()
can’t get an error. But if isReservedSectionNumber() wanted to check for
the two known negative values rather than allowing all negative values or
the code wanted to add the same check as in getSymbolAddress() to use
getSection() and check for the error then these versions of getSymbolType()
could return errors.
At the end of the day the error printed now is the generic “Invalid data was
encountered while parsing the file” for object_error::parse_failed. In the
future when we thread Lang’s new TypedError for recoverable error handling
though libObject this will improve. And where the added // Diagnostic(…
comment is, it would be changed to produce and error message
like “bad section index (42) for symbol at index 8” for this case.
llvm-svn: 264187
This should be hoisted further up so it can be used in DAGCombiner and other backends,
but I'm limiting the scope in the interest of patch minimalism.
It's not quite NFC because some of the replaced code was using an 'if' check rather
than a 'while' loop, so those cases would only look through a single bitcast.
llvm-svn: 264186
There are a few bugs in the walker that this patch addresses.
Primarily:
- Caching can break when we have multiple BBs without phis
- We weren't optimizing some phis properly
- Because of how the DFS iterator works, there were times where we
wouldn't cache any results of our DFS
I left the test cases with FIXMEs in, because I'm not sure how much
effort it will take to get those to work (read: We'll probably
ultimately have to end up redoing the walker, or we'll have to come up
with some creative caching tricks), and more test coverage = better.
Differential Revision: http://reviews.llvm.org/D18065
llvm-svn: 264180
Summary:
This changes the conversion functions from SCEV * to SCEVAddRecExpr from
ScalarEvolution and PredicatedScalarEvolution to return a SCEVAddRecExpr*
instead of a SCEV* (which removes the need of most clients to do a
dyn_cast right after calling these functions).
We also don't add new predicates if the transformation was not successful.
This is not entirely a NFC (as it can theoretically remove some predicates
from LAA when we have an unknown dependece), but I couldn't find an obvious
regression test for it.
Reviewers: sanjoy
Subscribers: sanjoy, mzolotukhin, llvm-commits
Differential Revision: http://reviews.llvm.org/D18368
llvm-svn: 264161
If we can't handle a relocation type, report it as an error in the source,
rather than asserting. I've added a more descriptive message and a test for the
only cases of this that I've been able to trigger.
Differential Revision: http://reviews.llvm.org/D18388
llvm-svn: 264156
This is all horribly outdated, and is mostly about the autoconf build
system that doesn't even exist anymore. These questions aren't
frequent, and these answers aren't useful.
llvm-svn: 264141
Now that StatepointLoweringInfo represents base pointers, derived
pointers and gc relocates as SmallVectors and not ArrayRefs, we no
longer need to allocate "backing storage" on stack in LowerStatepoint.
So elide the backing storage, and inline the trivial body of
getIncomingStatepointGCValues.
llvm-svn: 264128
We can statically decide whether or not a register pressure set is for
SGPRs or VGPRs, so we don't need to re-compute this information in
SIRegisterInfo::getRegPressureSetLimit().
Differential Revision: http://reviews.llvm.org/D14805
llvm-svn: 264126
In retrospect, it seems "obvious" that the sense of the return code is
the same as if it crashed on "interesting" inputs. But that didn't stop
me from spending more time than I care to admit verifying this.
llvm-svn: 264119
MCContext shouldn't be accessing the filesystem - that's a gross
layering violation and makes it awkward to use as a library or in a
daemon where it may not even be allowed filesystem access.
The CWD lookup here is normally redundant anyway, since the calling
context either also looks up the CWD or sets this to something more
specific. Here, we fix up the one caller that doesn't already set up a
debug compilation dir and make it clear that the responsibility for
such set up is in the users of MCContext.
llvm-svn: 264109
Summary:
I've completed my audit of all the code that looks at noduplicate and
added handling of convergent where appropriate, so we no longer need
noduplicate on these intrinsics.
Reviewers: jholewinski
Subscribers: llvm-commits, jholewinski
Differential Revision: http://reviews.llvm.org/D18168
llvm-svn: 264107
A really unfortunate design of llvm-link and related libraries is that
they operate one module at a time.
This means they can copy a GV to the destination module that should not
be there in the final result because a later bitcode file takes
precedence.
We already handled cases like a strong GV replacing a weak for example.
One case that is not currently handled is a comdat replacing another.
This doesn't happen in ELF, but with COFF largest selection kind it is
possible.
In "llvm-link a.ll b.ll" if the selected comdat was from a.ll,
everything will work and we will not copy the comdat from b.ll.
But if we run "llvm-link b.ll a.ll", we fail to delete the already
copied comdat from b.ll. This patch fixes that.
llvm-svn: 264103
CGP modifies the domtree in some cases, so saying that it preserves the
domtree is a lie. We'll be able to selectively preserve it with the new
pass manager.
Differential Revision: http://reviews.llvm.org/D16893
llvm-svn: 264099
We were just completely ignoring the types when determining whether we could
safely emit a libcall as a tail call. This is clearly wrong.
Theoretically, we could dig deeper looking for incidental matches (much like
the generic code in Analysis.cpp does), but it's probably not worth it for the
few libcalls that exist.
llvm-svn: 264084
When you have multiple LCSSA (single-operand) PHIs that are converted
into two-operand PHIs due to versioning, only assert that the PHI
currently being converted has a single operand. I.e. we don't want to
check PHIs that were converted earlier in the loop.
Fixes PR27023.
Thanks to Karl-Johan Karlsson for the minimized testcase!
llvm-svn: 264081
Improve vector extension of vectors on hardware without dedicated VSEXT/VZEXT instructions.
We already convert these to SIGN_EXTEND_VECTOR_INREG/ZERO_EXTEND_VECTOR_INREG but can further improve this by using the legalizer instead of prematurely splitting into legal vectors in the combine as this only properly helps for lowering to VSEXT/VZEXT.
Removes a lot of unnecessary any_extend + mask pattern - (Fix for PR25718).
Reapplied with a fix for PR26953 (missing vector widening legalization).
Differential Revision: http://reviews.llvm.org/D17932
llvm-svn: 264062
Summary:
Also renamed li_simm7 to li16_imm since it's not a simm7 and has an unusual
encoding (it's a uimm7 except that 0x7f represents -1).
Reviewers: vkalintiris
Subscribers: dsanders, llvm-commits
Differential Revision: http://reviews.llvm.org/D18145
llvm-svn: 264056
Summary:
We can't check the error message for this one because there's another lw/sw
available that covers a larger range. We therefore check the transition
between the two sizes.
Reviewers: vkalintiris
Subscribers: llvm-commits, dsanders
Differential Revision: http://reviews.llvm.org/D18144
llvm-svn: 264054
It's a bug fix.
For rerolled loops SE trip count remains unchanged. It leads to incorrect work of the next passes.
My patch just resets SE info for rerolled loop forcing SE to re-evaluate it next time it requested.
I also added a verifier call in the exisitng test to be sure no invalid SE data remain. Without my fix this test would fail with -verify-scev.
Differential Revision: http://reviews.llvm.org/D18316
llvm-svn: 264051
Adding support for section names with special characters in them (e.g. "/").
GCC successfully compiles such section names.
This also fixes PR24520.
Differential Revision: http://reviews.llvm.org/D15678
llvm-svn: 264038
Summary:
After this change, deopt operand bundles can be lowered directly by
SelectionDAG into STATEPOINT instructions (which are then lowered to a
call or sequence of nop, with an associated __llvm_stackmaps entry0.
This obviates the need to round-trip deoptimization state through
gc.statepoint via RewriteStatepointsForGC.
Reviewers: reames, atrick, majnemer, JosephTremoulet, pgavlin
Subscribers: sanjoy, mcrosier, majnemer, llvm-commits
Differential Revision: http://reviews.llvm.org/D18257
llvm-svn: 264015
Summary:
Without tree pruning clang has 2,667,552 points.
Wiht only dominators pruning: 1,515,586.
With both dominators & predominators pruning: 1,340,534.
Resubmit of r262103.
Differential Revision: http://reviews.llvm.org/D18341
llvm-svn: 264003
As noted in PR18355, this patch makes it clear that all cases with undef operands have been handled before further constant folding is attempted.
Differential Revision: http://reviews.llvm.org/D18305
llvm-svn: 263994
If we have a BB with only MemoryDefs, live-in calculations will ignore
it. This means we get results like this:
define void @foo(i8* %p) {
; 1 = MemoryDef(liveOnEntry)
store i8 0, i8* %p
br i1 undef, label %if.then, label %if.end
if.then:
; 2 = MemoryDef(1)
store i8 1, i8* %p
br label %if.end
if.end:
; 3 = MemoryDef(1)
store i8 2, i8* %p
ret void
}
...When there should be a MemoryPhi in the `if.end` BB.
This patch fixes that behavior.
llvm-svn: 263991
Summary:
Whole quad mode is already enabled for pixel shaders that compute
derivatives, but it must be suspended for instructions that cause a
shader to have side effects (i.e. stores and atomics).
This pass addresses the issue by storing the real (initial) live mask
in a register, masking EXEC before instructions that require exact
execution and (re-)enabling WQM where required.
This pass is run before register coalescing so that we can use
machine SSA for analysis.
The changes in this patch expose a problem with the second machine
scheduling pass: target independent instructions like COPY implicitly
use EXEC when they operate on VGPRs, but this fact is not encoded in
the MIR. This can lead to miscompilation because instructions are
moved past changes to EXEC.
This patch fixes the problem by adding use-implicit operands to
target independent instructions. Some general codegen passes are
relaxed to work with such implicit use operands.
Reviewers: arsenm, tstellarAMD, mareko
Subscribers: MatzeB, arsenm, llvm-commits
Differential Revision: http://reviews.llvm.org/D18162
llvm-svn: 263982
In executable and shared object ELF files, relocations in the file contain the final virtual address rather than section offset so this is adjusted to display section offset.
Differential revision: http://reviews.llvm.org/D15965
llvm-svn: 263971
Summary:
When control flow is implemented using the exec mask, the compiler will
insert branch instructions to skip over the masked section when exec is
zero if the section contains more than a certain number of instructions.
The previous code would only count instructions in successor blocks,
and this patch modifies the code to start counting instructions in all
blocks between the start and end of the branch.
Reviewers: nhaehnle, arsenm
Subscribers: arsenm, llvm-commits
Differential Revision: http://reviews.llvm.org/D18282
llvm-svn: 263969
This introduces a custom lowering for ISD::SETCCE (introduced in r253572)
that allows us to emit a short code sequence for 64-bit compares.
Before:
push {r7, lr}
cmp r0, r2
mov.w r0, #0
mov.w r12, #0
it hs
movhs r0, #1
cmp r1, r3
it ge
movge.w r12, #1
it eq
moveq r12, r0
cmp.w r12, #0
bne .LBB1_2
@ BB#1: @ %bb1
bl f
pop {r7, pc}
.LBB1_2: @ %bb2
bl g
pop {r7, pc}
After:
push {r7, lr}
subs r0, r0, r2
sbcs.w r0, r1, r3
bge .LBB1_2
@ BB#1: @ %bb1
bl f
pop {r7, pc}
.LBB1_2: @ %bb2
bl g
pop {r7, pc}
Saves around 80KB in Chromium's libchrome.so.
Some notes on this patch:
- I don't much like the ARMISD::BRCOND and ARMISD::CMOV combines I
introduced (nothing else needs them). However, they are necessary in
order to avoid poor codegen, and they seem similar to existing combines
in other backends (e.g. X86 combines (brcond (cmp (setcc Compare))) to
(brcond Compare)).
- No support for Thumb-1. This is in principle possible, but we'd need
to implement ARMISD::SUBE for Thumb-1.
Differential Revision: http://reviews.llvm.org/D15256
llvm-svn: 263962
Summary:
replaceCongruentIVs can break LCSSA when trying to replace IV increments
since it tries to replace all uses of a phi node with another phi node
while both of the phi nodes are not necessarily in the processed loop.
This will cause an assert in IndVars.
To fix this, we add a check to make sure that the replacement maintains
LCSSA.
Reviewers: sanjoy
Subscribers: mzolotukhin, llvm-commits
Differential Revision: http://reviews.llvm.org/D18266
llvm-svn: 263941
Summary:
extract_vector_elt can cause an implicit any_ext if the types don't
match. When processing the following pattern:
(and (extract_vector_elt (load ([non_ext|any_ext|zero_ext] V))), c)
DAGCombine was ignoring the possible extend, and sometimes removing
the AND even though it was required to maintain some of the bits
in the result to 0, resulting in a miscompile.
This change fixes the issue by limiting the transformation only to
cases where the extract_vector_elt doesn't perform the implicit
extend.
Reviewers: t.p.northover, jmolloy
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D18247
llvm-svn: 263935
Summary:
The old address space inference pass (NVPTXFavorNonGenericAddrSpaces) is unable
to convert the address space of a pointer induction variable. This patch adds a
new pass called NVPTXInferAddressSpaces that overcomes that limitation using a
fixed-point data-flow analysis (see the file header comments for details).
The new pass is experimental and not enabled by default. Users can turn
it on by setting the -nvptx-use-infer-addrspace flag of llc.
Reviewers: jholewinski, tra, jlebar
Subscribers: jholewinski, llvm-commits
Differential Revision: http://reviews.llvm.org/D17965
llvm-svn: 263916
Improve computeZeroableShuffleElements to be able to peek through bitcasts to extract zero/undef values from BUILD_VECTOR nodes of different element sizes to the shuffle mask.
Differential Revision: http://reviews.llvm.org/D14261
llvm-svn: 263906
Summary: Also expose getters and setters in the C API, so that the change can be tested.
Reviewers: nhaehnle, axw, joker.eph
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D18260
From: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
llvm-svn: 263886
The sinpi/cospi can be replaced with sincospi to remove unnecessary
computations. However, we need to make sure that the calls are within
the same function!
This fixes PR26993.
llvm-svn: 263875