Summary:
Initially, these intrinsics seemed like part of a family of "frame"
related intrinsics, but now I think that's more confusing than helpful.
Initially, the LangRef specified that this would create a new kind of
allocation that would be allocated at a fixed offset from the frame
pointer (EBP/RBP). We ended up dropping that design, and leaving the
stack frame layout alone.
These intrinsics are really about sharing local stack allocations, not
frame pointers. I intend to go further and add an `llvm.localaddress()`
intrinsic that returns whatever register (EBP, ESI, ESP, RBX) is being
used to address locals, which should not be confused with the frame
pointer.
Naming suggestions at this point are welcome, I'm happy to re-run sed.
Reviewers: majnemer, nicholas
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D11011
llvm-svn: 241633
Summary:
SelectionDAG itself is not invoking directly the DataLayout in the
TargetMachine, but the "TargetLowering" class is still using it. I'll
address it in a following commit.
This change is part of a series of commits dedicated to have a single
DataLayout during compilation by using always the one owned by the
module.
Reviewers: echristo
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D11000
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 241618
This reverts commit r241602. We had a latent bug in SCCP where we would
make a basic block empty and then proceed to ask questions about it's
terminator.
llvm-svn: 241616
Summary:
This change is part of a series of commits dedicated to have a single
DataLayout during compilation by using always the one owned by the
module.
Reviewers: echristo
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D10987
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 241615
Summary:
This change is part of a series of commits dedicated to have a single
DataLayout during compilation by using always the one owned by the
module.
Reviewers: echristo
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D10986
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 241614
Summary:
This change is part of a series of commits dedicated to have a single
DataLayout during compilation by using always the one owned by the
module.
Reviewers: echristo
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D10985
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 241613
Summary:
This change is part of a series of commits dedicated to have a
single DataLayout during compilation by using always the one owned by the
module.
Reviewers: echristo
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D10984
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 241610
This commit modifies the interface for the machine instruction parsing
functions by wrapping the parameter 'MBBSlots' in a new structure called
'PerFunctionMIParsingState'. This change is useful as in the future I will be
able to pass new parameters to the machine instruction parser just by modifying
the 'PerFunctionMIParsingState' structure instead of adding a new parameter to
each function.
llvm-svn: 241607
getSymbolValue now returns a value that in convenient for most callers:
* 0 for undefined
* symbol size for common symbols
* offset/address for symbols the rest
Code that needs something more specific can check getSymbolFlags.
llvm-svn: 241605
This type of prologue isn't supported yet. Implementing it should be a
matter of copying the adjusted incoming EBP into ESI (the base pointer)
instead of EBP. The original EBP can be saved and restored from other
memory afterwards.
llvm-svn: 241597
This includes code that is intended to be target-independent as well
as the Hexagon-specific details. This is just the framework without
any users.
llvm-svn: 241595
At least not in the interface exposed by ObjectFile. This matches what ELF and
COFF implement.
Adjust existing code that was expecting them to have values. No overall
functionality change intended.
Another option would be to change the interface and the ELF and COFF
implementations to say that the value of a common symbol is its size.
llvm-svn: 241593
They are implemented like that in some object formats, but for the interface
provided by lib/Object, SF_Undefined and SF_Common are different things.
This matches the ELF and COFF implementation and fixes llvm-nm for MachO.
llvm-svn: 241587
In these two contexts we really just want the raw n_value. No need to use
getSymbolValue which checks for special cases where, semantically, the symbol
has no value.
llvm-svn: 241584
getFirstNonPHI's documentation states that it returns null if there is
no non-PHI instruction. However, it instead returns a pointer to the
end iterator. The implementation of getFirstNonPHI claims that
dereferencing the iterator will result in an assertion failure but this
doesn't occur. Instead, machinery like getFirstInsertionPt will attempt
to isa<> this invalid memory which results in unpredictable behavior.
Instead, make getFirst* return null if no such instruction exists.
llvm-svn: 241570
be emitted.
This is needed to enable ARM long calls for LTO and enable and disable it on a
per-function basis.
Out-of-tree projects currently using EnableARMLongCalls to emit long calls
should start passing "+long-calls" to the feature string (see the changes made
to clang in r241565).
rdar://problem/21529937
Differential Revision: http://reviews.llvm.org/D9364
llvm-svn: 241566
This commit verifies that the parsed machine instructions contain the implicit
register operands as specified by the MCInstrDesc. Variadic and call
instructions aren't verified.
Reviewers: Duncan P. N. Exon Smith
Differential Revision: http://reviews.llvm.org/D10781
llvm-svn: 241537
Calling into the base class' getAnalysisUsage method after we did our pass
specific modifications. This shouldn't really matter since this is the last
pass in the pipeline anyways.
llvm-svn: 241536
This commit serializes the implicit flag for the register machine operands. It
introduces two new keywords into the machine instruction syntax: 'implicit' and
'implicit-def'. The 'implicit' keyword is used for the implicit register
operands, and the 'implicit-def' keyword is used for the register operands that
have both the implicit and the define flags set.
Reviewers: Duncan P. N. Exon Smith
Differential Revision: http://reviews.llvm.org/D10709
llvm-svn: 241519
The vperm2f128/vperm2i128 shuffle mask decoding was not attempting to deal with shuffles that give zero lanes. This patch fixes this so that the assembly printer can provide shuffle comments.
As this decoder is also used in X86ISelLowering for shuffle combining, I've added an early-out to match existing behaviour. The hope is that we can add zero support in the future, this would allow other ops' decodes (e.g. insertps) to be combined as well.
Differential Revision: http://reviews.llvm.org/D10593
llvm-svn: 241516
Extend the reassociation optimization of http://reviews.llvm.org/rL240361 (D10460)
to SSE scalar FP SP adds in addition to AVX scalar FP SP adds.
With the 'switch' in place, we can trivially add other opcodes and test cases in
future patches.
Differential Revision: http://reviews.llvm.org/D10975
llvm-svn: 241515
This patch adds vectorization support for uniform constant i64 arithmetic shift right operators.
Differential Revision: http://reviews.llvm.org/D9645
llvm-svn: 241514
The previous code put the load after the terminator, leading to invalid
IR and downstream crashes. This caused http://crbug.com/506446.
llvm-svn: 241509
This patch adds support for v8i16 and v16i8 shuffle lowering using the immediate versions of the SSE4A EXTRQ and INSERTQ instructions. Although rather limited (they can only act on the lower 64-bits of the source vectors, leave the upper 64-bits of the result vector undefined and don't have VEX encoded variants), the instructions are still useful for the zero extension of any lane (EXTRQ) or inserting a lane into another vector (INSERTQ). Testing demonstrated that it wasn't typically worth it to use these instructions for v2i64 or v4i32 vector shuffles although they are capable of it.
As well as adding specific pattern matching for the shuffles, the patch uses EXTRQ for zero extension cases where SSE41 isn't available and its more efficient than the SSE2 'unpack' default approach. It also adds shuffle decode support for the EXTRQ / INSERTQ cases when the instructions are handling full byte-sized extractions / insertions.
From this foundation, future patches will be able to make use of the instructions for situations that use their ability to extract/insert at the bit level.
Differential Revision: http://reviews.llvm.org/D10146
llvm-svn: 241508
With the completion of D9746 there is now a common implementation of integer signed/unsigned min/max nodes, removing the need for the equivalent X86 specific implementations.
This patch removes the old X86ISD nodes, legalizes the relevant SSE2/SSE41/AVX2/AVX512 instructions for the ISD versions and converts the small amount of existing X86 code.
Differential Revision: http://reviews.llvm.org/D10947
llvm-svn: 241506
This commit adds a 'run-pass' option to llc, which instructs the compiler to run
one specific code generation pass only.
Llc already has the 'start-after' and the 'stop-after' options, and this new
option complements the other two by making it easier to write tests that want
to invoke a single pass only.
Reviewers: Duncan P. N. Exon Smith
Differential Revision: http://reviews.llvm.org/D10776
llvm-svn: 241476
Running this after the scheduler enables scheduling
waits later so other ALU instructions can run while
this would be waiting.
When combined with enabling the post-RA scheduler, this
gives about a ~20% improvement on sgemm.
llvm-svn: 241473
Summary:
This concludes the patch series to eliminate StringRef forms of GNU triples
from the internals of LLVM that began in r239036.
At this point, the StringRef-form of GNU Triples should only be used in the
public API (including IR serialization) and a couple objects that directly
interact with the API (most notably the Module class). The next step is to
replace these Triple objects with the TargetTuple object that will represent
our authoratative/unambiguous internal equivalent to GNU Triples.
Reviewers: rengolin
Subscribers: llvm-commits, jholewinski, ted, rengolin
Differential Revision: http://reviews.llvm.org/D10962
llvm-svn: 241472
This change includes a fix for https://code.google.com/p/chromium/issues/detail?id=499508#c3,
which required updating the visibility for symbols with eliminated definitions.
--Original Commit Message--
Add new EliminateAvailableExternally module pass, which is performed in
O2 compiles just before GlobalDCE, unless we are preparing for LTO.
This pass eliminates available externally globals (turning them into
declarations), regardless of whether they are dead/unreferenced, since
we are guaranteed to have a copy available elsewhere at link time.
This enables additional opportunities for GlobalDCE.
If we are preparing for LTO (e.g. a -flto -c compile), the pass is not
included as we want to preserve available externally functions for possible
link time inlining. The FE indicates whether we are doing an -flto compile
via the new PrepareForLTO flag on the PassManagerBuilder.
llvm-svn: 241466
We don't have a good way to detect most situations where
DS offsets are usable on SI, so add an option to force using
them even if unsafe for debugging performance problems.
llvm-svn: 241462
These are mostly from the chart in the SparcV8 spec, section "A.3
Synthetic Instructions".
Differential Revision: http://reviews.llvm.org/D9834
llvm-svn: 241461
Originally added in r139314.
Back then it didn't actually get the address, it got whatever value the
relocation used: address or offset.
The values in different object formats are:
* MachO: Always an offset.
* COFF: Always an address, but when talking about the virtual address of
sections it says: "for simplicity, compilers should set this to zero".
* ELF: An offset for .o files and and address for .so files. In the case of the
.so, the relocation in not linked to any section (sh_info is 0). We can't
really compute an offset.
Some API mappings would be:
* Use getAddress for everything. It would be quite cumbersome. To compute the
address elf has to follow sh_info, which can be corrupted and therefore the
method has to return an ErrorOr. The address of the section is also the same
for every relocation in a section, so we shouldn't have to check the error
and fetch the value for every relocation.
* Use a getValue and make it up to the user to know what it is getting.
* Use a getOffset and:
* Assert for dynamic ELF objects. That is a very peculiar case and it is
probably fair to ask any tool that wants to support it to use ELF.h. The
only tool we have that reads those (llvm-readobj) already does that. The
only other use case I can think of is a dynamic linker.
* Check that COFF .obj files have sections with zero virtual address spaces. If
it turns out that some assembler/compiler produces these, we can change
COFFObjectFile::getRelocationOffset to subtract it. Given COFF format,
this can be done without the need for ErrorOr.
The getRelocationAddress method was never implemented for COFF. It also
had exactly one use in a very peculiar case: a shortcut for adding the
section value to a pcrel reloc on MachO.
Given that, I don't expect that there is any use out there of the C API. If
that is not the case, let me know and I will add it back with the implementation
inlined and do a proper deprecation.
llvm-svn: 241450
The code in AArch64A57FPLoadBalancing::scavengeRegister() to handle dead defs
was not correctly handling aliased registers. E.g. if the dead def was of D2,
then S2 was not being marked as unavailable, so it could potentially be used
across a live-range in which it would be clobbered.
Patch by Geoff Berry <gberry@codeaurora.org>!
Phabricator: http://reviews.llvm.org/D10900
llvm-svn: 241449
When talking about the virtual address of sections the coff spec says:
... for simplicity, compilers should set this to zero. Otherwise, it is an
arbitrary value that is subtracted from offsets during relocation.
We don't currently subtract it, so check that it is zero.
If some producer does create such files, we can change getRelocationOffset
instead.
llvm-svn: 241447
Add support for resolving MIPS32r6 relocations in MCJIT.
Patch by Vladimir Radosavljevic.
Differential Revision: http://reviews.llvm.org/D10687
llvm-svn: 241442
From the linker's perspective, an available_externally global is equivalent
to an external declaration (per isDeclarationForLinker()), so it is incorrect
to consider it to be a weak definition.
Also clean up some logic in the dead argument elimination pass and clarify
its comments to better explain how its behavior depends on linkage,
introduce GlobalValue::isStrongDefinitionForLinker() and start using
it throughout the optimizers and backend.
Differential Revision: http://reviews.llvm.org/D10941
llvm-svn: 241413
There is some functional change here because it changes target code from
atoi(3) to StringRef::getAsInteger which has error checking. For valid
constraints there should be no difference.
llvm-svn: 241411
Correctly support assembling "pushw $imm8" on x86-64 targets.
Also some cleanup of the PUSH instructions (PUSH64i16 and PUSHi16 actually
represent the same instruction)
This fixes PR23996
Patch by: david.l.kreitzer@intel.com
Differential Revision: http://reviews.llvm.org/D10878
llvm-svn: 241404
Followup to D10433 and D10589 that fixes i8/i16 uint2fp vector conversions by zero extending to i32 and using the sint2fp path (unless the target does actually support uint2fp).
llvm-svn: 241394
Although this does cut the number of traces recomputed by ~10% for the
test case mentioned in http://reviews.llvm.org/D10460, it doesn't
make a dent in the overall performance. That example needs to be more
selective when invalidating traces.
llvm-svn: 241393
This is needed for COFF linkers to distinguish between weak external aliases
and regular symbols with LLVM weak linkage, which are represented as strong
symbols in COFF.
llvm-svn: 241389
Requested by Eugene Rozenfeld of the LLILC team, this feature allows JIT
clients to skip relocations for selected external symbols by returning ~0ULL
from their symbol resolver. If this value is returned for a given symbol,
RuntimeDyld will skip all relocations for that symbol. The client will be
responsible for applying the skipped relocations manually before the code
is executed.
llvm-svn: 241383
SHT_NOBITS sections do not have content in an object file. Now the yaml2obj
tool does not accept `Content` field for such sections, and the obj2yaml
tool does not attempt to read the section content from a file.
Restore r241350 and r241352.
llvm-svn: 241377
Summary:
Looking at r241279, I noticed that UpgradedIntrinsics only gets written
to in the following code:
if (UpgradeIntrinsicFunction(&F, NewFn))
UpgradedIntrinsics[&F] = NewFn;
Looking through UpgradeIntrinsicFunction, we always return false OR
NewFn will be set to a different function from our source.
This patch pulls the F != NewFn into UpgradeIntrinsicFunction as an
assert, and removes the check from callers of UpgradeIntrinsicFunction.
Reviewers: rafael, chandlerc
Subscribers: llvm-commits-list
Differential Revision: http://reviews.llvm.org/D10915
llvm-svn: 241369
r241350 broke lld tests.
r241352 depends on r241350.
Original messages:
"[ELFYAML] Fix handling SHT_NOBITS sections by obj2yaml/yaml2obj tools"
"[ELFYAML] Make the Size field for .bss section optional"
llvm-svn: 241354
SHT_NOBITS sections do not have content in an object file. Now yaml2obj
tool does not accept `Content` field for such sections, and obj2yaml
tool does not attempt to read the section content from a file.
llvm-svn: 241350
Add support for v2i8/v2i16 to v2f64 by using a sign extension to v2i32 before conversion to v2f64.
Differential Revision: http://reviews.llvm.org/D10589
llvm-svn: 241325
This patch adds support for sign extension for sub 128-bit vectors, such as to v2i32. It concatenates with UNDEF subvectors up to 128-bits, performs the sign extension (i.e. as v4i32) and then extracts the target subvector.
Patch 1/2 of D10589 - the second patch covers the conversion of v2i8/v2i16 to v2f64.
llvm-svn: 241323
The assertion in getCopyFromPartsVector assumed that the vector 'part' must
match the type of argument (arguments are potentially split into multiple
parts). However, in some cases the targets return a 'part' of the right size
but with a different type. We already handle this case correctly later on
and generate a bitcast. This commit just makes sure that we are actually
checking the property that we care about.
llvm-svn: 241312
This commit changes normal isel and fast isel to read the user-defined trap
function name from function attribute "trap-func-name" attached to llvm.trap or
llvm.debugtrap instead of from TargetOptions::TrapFuncName. This is needed to
use clang's command line option "-ftrap-function" for LTO and enable changing
the trap function name on a per-call-site basis.
Out-of-tree projects currently using TargetOptions::TrapFuncName to specify the
trap function name should attach attribute "trap-func-name" to the call sites
of llvm.trap and llvm.debugtrap instead.
rdar://problem/21225723
Differential Revision: http://reviews.llvm.org/D10832
llvm-svn: 241305
This function can really fail since the string table offset can be out of
bounds.
Using ErrorOr makes sure the error is checked.
Hopefully a lot of the boilerplate code in tools/* can go away once we have
a diagnostic manager in Object.
llvm-svn: 241297
In r241285, I removed the SUBREG_TO_REG restriction from VSX swap
removal, determining that this was overly conservative. We have
another form of the same restriction in that we check for the presence
of implicit subregs in vector operations. As with SUBREG_TO_REG for
partial register conversions, an implicit subreg is safe in and of
itself, provided no other operation makes a lane-sensitive assumption
about the result. This patch removes that restriction, by removing
the HasImplicitSubreg flag and all code that relies on it.
I've added a test case that fails to optimize before this patch is
applied, and optimizes properly with the patch. Test based on a
report from Anton Blanchard.
llvm-svn: 241290
With a previous patch, the VSX swap optimization is able to recognize
the doubleword load-splat idiom that can be implemented using lxvdsx.
However, that does not cover a doubleword splat where the source is a
register. We can implement this using xxspltd (a special form of
xxpermdi). This patch teaches the swap optimization pass about this
idiom.
As a prerequisite, it also permits swap optimization to succeed for
all forms of SUBREG_TO_REG. Previously we were conservative and only
allowed SUBREG_TO_REG when it copied a full register. However, on
reflection any form of SUBREG_TO_REG is safe in and of itself, so long
as an unsafe operation is not performed on its result. In particular,
a widening SUBREG_TO_REG often occurs as an input to a doubleword
splat idiom, particularly in auto-vectorized code.
The doubleword splat idiom is an XXPERMDI operation where both source
registers are identical, and the selection mask is either 0 (splat the
first element) or 3 (splat the second element). To determine whether
the registers are identical, we use the existing mechanism for looking
through "copy-like" operations. That mechanism has a side effect of
marking the XXPERMDI operation as using a physical register, which
would invalidate its presence in a swap-optimized region. This is
correct for the form of XXPERMDI that performs a swap and hence would
be removed, but is not what we want for a doubleword-splat variety of
XXPERMDI. Therefore we reset the physical-register flag on the
XXPERMDI when it represents a splat.
A simple test case is added to verify that we generate the splat and
that we also remove the xxswapd instructions that would otherwise be
associated with the load and store of another operand.
llvm-svn: 241285
When trying to upgrade @llvm.x86.sse2.psrl.dq while parsing a module,
BitcodeReader adds the function to its worklist twice, resulting in a
crash when accessing it the second time.
This patch replaces the worklist vector by a map.
Patch by Philip Pfaffe.
llvm-svn: 241281
This patch changes linkage with dbghlp.dll for clang from static (at load time)
to on demand (at the first use of required functions). Clang uses dbghlp.dll
only in minor use-cases. First of all in case of crash and in case of plugin load.
The dbghlp.dll library can be absent on system. In this case clang will fail
to load. With lazy load of dbghlp.dll clang can work even if dbghlp.dll
is not available.
Differential Revision: http://reviews.llvm.org/D10737
llvm-svn: 241271
The code responsible for shl folding in the DAGCombiner was assuming incorrectly that all constants are less than 64 bits. This patch simply changes the way values are compared.
It has been reverted previously because of some problems with comparing APInt with raw uint64_t. That has been fixed/changed with r241204.
llvm-svn: 241254
By default, the GraphWriter code assumes that the generic file open
program (`open` on Apple, `xdg-open` on other systems) can wait on the
forked proces to complete. When the fork ends, the code would delete
the temporary dot files created, and return.
On GNU/Linux, the xdg-open program does not have a "wait for your fork
to complete before dying" option. So the behaviour was that xdg-open
would launch a process, quickly die itself, and then the GraphWriter
code would think its OK to quickly delete all the temporary files.
Once the temporary files were deleted, the dot viewers would get very
upset, and often give you weird errors.
This change only waits on the generic open program on Apple platforms.
Elsewhere, we don't wait on the process, and hence we don't try and
clean up the temporary files.
llvm-svn: 241250
Summary: Rename some methods to make Statepoint look more like CallSite.
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D10756
llvm-svn: 241235
This checks subtarget feature compatibility for inlining by verifying
that the callee is a strict subset of the caller's features. This includes
the cpu as part of the subtarget we can get via the incoming functions as
the backend takes CPUs as feature sets.
This allows us to inline things like:
int foo() { return baz(); }
int __attribute__((target("sse4.2"))) bar() {
return foo();
}
so that generic code can be inlined into specialized functions.
llvm-svn: 241221
Summary:
* Add 64-bit address space feature.
* Rename SIMD feature to SIMD128.
* Handle single-thread model with an IR pass (same way ARM does).
* Rename generic processor to MVP, to follow design's lead.
* Add bleeding-edge processors, with all features included.
* Fix a few DEBUG_TYPE to match other backends.
Test Plan: ninja check
Reviewers: sunfish
Subscribers: jfb, llvm-commits
Differential Revision: http://reviews.llvm.org/D10880
llvm-svn: 241211
TwoAddressInstructionPass stops after a successful commuting but 3 Addr
conversion might be good for some cases.
Consider:
int foo(int a, int b) {
return a + b;
}
Before this commit, we emit:
addl %esi, %edi
movl %edi, %eax
ret
After this commit, we try 3 Addr conversion:
leal (%rsi,%rdi), %eax
ret
Patch by Volkan Keles <vkeles@apple.com>!
Differential Revision: http://reviews.llvm.org/D10851
llvm-svn: 241206
This is mostly an NFC, which increases code readability (instead of
saving old terminator, generating new one in front of old, and deleting
old, we just call a function). However, it would additionaly copy
the debug location from old instruction to replacement, which
would help PR23837.
llvm-svn: 241197
All file formats only needed 16-bits right now which is enough to fit
in to the padding with other fields.
This reduces the size of MCSymbol to 24-bytes on a 64-bit system. The
layout is now
0 | class llvm::MCSymbol
0 | class llvm::PointerIntPair SectionOrFragmentAndHasName
0 | intptr_t Value
| [sizeof=8, dsize=8, align=8
| nvsize=8, nvalign=8]
8 | unsigned int IsTemporary
8 | unsigned int IsRedefinable
8 | unsigned int IsUsed
8 | _Bool IsRegistered
8 | unsigned int IsExternal
8 | unsigned int IsPrivateExtern
8 | unsigned int Kind
9 | unsigned int IsUsedInReloc
9 | unsigned int SymbolContents
9 | unsigned int CommonAlignLog2
10 | uint32_t Flags
12 | uint32_t Index
16 | union
16 | uint64_t Offset
16 | uint64_t CommonSize
16 | const class llvm::MCExpr * Value
| [sizeof=8, dsize=8, align=8
| nvsize=8, nvalign=8]
| [sizeof=24, dsize=24, align=8
| nvsize=24, nvalign=8]
llvm-svn: 241196
Summary:
According to PTX ISA:
For convenience, ld, st, and cvt instructions permit source and destination data operands to be wider than the instruction-type size, so that narrow values may be loaded, stored, and converted using regular-width registers. For example, 8-bit or 16-bit values may be held directly in 32-bit or 64-bit registers when being loaded, stored, or converted to other types and sizes. The operand type checking rules are relaxed for bit-size and integer (signed and unsigned) instruction types; floating-point instruction types still require that the operand type-size matches exactly, unless the operand is of bit-size type.
So, the ISA does not support load with extending/store with truncatation for floating numbers. This is reflected in setting the loadext/truncstore actions to expand in the code for floating numbers, but vectors of floating numbers are not taken care of.
As a result, loading a vector of floats followed by a fp_extend may be combined by DAGCombiner to a extload, and the extload may be lowered to NVPTXISD::LoadV2 with extending information. However, NVPTXISD::LoadV2 does not perform extending, and no extending instructions are inserted. Finally, PTX instructions with mismatched types are generated, like
ld.v2.f32 {%fd3, %fd4}, [%rd2]
This patch adds the correct actions for vectors of floats, so DAGCombiner would not create loads with extending, and correct code is generated.
Patched by Gang Hu.
Test Plan: Test case attached.
Reviewers: jingyue
Reviewed By: jingyue
Subscribers: llvm-commits, jholewinski
Differential Revision: http://reviews.llvm.org/D10876
llvm-svn: 241191
Given that alignments are always powers of 2, just encode it this way.
This matches how we encode alignment on IR GlobalValue's for example.
This compresses the CommonAlign member down to 5 bits which allows it
to pack better with the surrounding fields.
Reviewed by Duncan Exon Smith.
llvm-svn: 241189
Don't pattern match for frontend outlined finally calls on non-x64
platforms. The 32-bit runtime uses a different funclet prototype. Now,
the frontend is pre-outlining the finally bodies so that it ends up
doing most of the heavy lifting for variable capturing. We're just
outlining the callsite, and adapting the frameaddress(0) call to line up
the frame pointer recovery.
llvm-svn: 241186
Summary:
Offset of frame index is calculated by NVPTXPrologEpilogPass. Before
that the correct offset of stack objects cannot be obtained, which
leads to wrong offset if there are more than 2 frame objects. This patch
move NVPTXPeephole after NVPTXPrologEpilogPass. Because the frame index
is already replaced by %VRFrame in NVPTXPrologEpilogPass, we check
VRFrame register instead, and try to remove the VRFrame if there
is no usage after NVPTXPeephole pass.
Patched by Xuetian Weng.
Test Plan:
Strengthened test/CodeGen/NVPTX/local-stack-frame.ll to check the
offset calculation based on SP and SPL.
Reviewers: jholewinski, jingyue
Reviewed By: jingyue
Subscribers: jholewinski, llvm-commits
Differential Revision: http://reviews.llvm.org/D10853
llvm-svn: 241185
When adding little-endian vector support for PowerPC last year, I
inadvertently disabled an optimization that recognizes a load-splat
idiom and generates the lxvdsx instruction. This patch moves the
offending logic so lxvdsx is once again generated.
This pattern is frequently generated by the vectorizer for scalar
loads of an effective constant. Previously the lxvdsx instruction was
wrongly listed as lane-sensitive for the VSX swap optimization (since
both doublewords are identical, swaps are safe). This patch fixes
this as well, so that vectorized code using lxvdsx can now have swaps
removed from the computation.
There is an existing test (@test50) in test/CodeGen/PowerPC/vsx.ll
that checks for the missing optimization. However, vsx.ll was only
being tested for POWER7 with big-endian code generation. I've added
a little-endian RUN statement and expected LE code generation for all
the tests in vsx.ll to give us a bit better VSX coverage, including
what's needed for this patch.
llvm-svn: 241183
This patch is not intended to change existing codegen behavior for any target.
It just exposes the JumpIsExpensive setting on the command-line to allow for
easier testing and emergency overrides.
Also, change the existing regression test to use FileCheck, explicitly specify
the jump-is-expensive option, and use more precise checks.
Differential Revision: http://reviews.llvm.org/D10846
llvm-svn: 241179
The EH code might have been deleted as unreachable and the personality
pruned while the filter is still present. Currently I'm hitting this at
-O0 due to the clang bug PR24009.
llvm-svn: 241170
This patch teaches the AsmParser to accept add/adds/sub/subs/cmp/cmn
with a negative immediate operand and convert them as shown:
add Rd, Rn, -imm -> sub Rd, Rn, imm
sub Rd, Rn, -imm -> add Rd, Rn, imm
adds Rd, Rn, -imm -> subs Rd, Rn, imm
subs Rd, Rn, -imm -> adds Rd, Rn, imm
cmp Rn, -imm -> cmn Rn, imm
cmn Rn, -imm -> cmp Rn, imm
Those instructions are an alternate syntax available to assembly coders,
and are needed in order to support code already compiling with some other
assemblers (gas). They are documented in the "ARMv8 Instruction Set
Overview", in the "Arithmetic (immediate)" section. This makes llvm-mc
a programmer-friendly assembler !
This also fixes PR20978: "Assembly handling of adding negative numbers
not as smart as gas".
llvm-svn: 241166
Move some instructions into order of sections in the spec, as the rest
already were.
Differential Revision: http://reviews.llvm.org/D9102
llvm-svn: 241163
This also improves the logic of what is an error:
* getSection(uint_32): only return an error if the index is out of bounds. The
index 0 corresponds to a perfectly valid entry.
* getSection(Elf_Sym): Returns null for symbols that normally don't have
sections and error for out of bound indexes.
In many places this just moves the report_fatal_error up the stack, but those
can then be fixed in smaller patches.
llvm-svn: 241156
Function static variables, typedefs and records (class, struct or union) declared inside
a lexical scope were associated with the function as their parent scope, rather than the
lexical scope they are defined or declared in.
This fixes PR19238
Patch by: amjad.aboud@intel.com
Differential Revision: http://reviews.llvm.org/D9758
llvm-svn: 241153
Only consider an instruction a candidate for relaxation if the last operand of the
instruction is an expression. We previously checked whether any operand is an expression,
which is useless, since for all instructions concerned, the only operand that may be
affected by relaxation is the last one.
In addition, this removes the check for having RIP as an argument, since it was
plain wrong - even when one of the arguments is RIP, relaxation may still be needed.
This fixes PR9807.
Patch by: david.l.kreitzer@intel.com
Differential Revision: http://reviews.llvm.org/D10766
llvm-svn: 241152
The AArch32 assembler parses the '@' as a comment symbol, so the error message shouldn't suggest
that '@<type>' is a valid replacement when assembling for AArch32 target.
Differential Revision: http://reviews.llvm.org/D10651
llvm-svn: 241149
We would create a phi node with a zero initialized operand instead of
undef in the case where no value was originally available. This was
problematic for x86_mmx which has no null value.
llvm-svn: 241143
Surprisingly, this is a correctness issue: the mmx type exists for
calling convention purposes, LLVM doesn't have a zero representation for
them.
This partially fixes PR23999.
llvm-svn: 241142
Summary:
nsw are flaky and can often be removed by optimizations. This patch enhances
nsw by leveraging @llvm.assume in the IR. Specifically, NaryReassociate now
understands that
assume(a + b >= 0) && assume(a >= 0) ==> a +nsw b
As a result, it can split more sext(a + b) into sext(a) + sext(b) for CSE.
Test Plan: nary-gep.ll
Reviewers: broune, meheff
Subscribers: jholewinski, llvm-commits
Differential Revision: http://reviews.llvm.org/D10822
llvm-svn: 241139
The incoming EBP value established by the runtime is actually a pointer
to the end of the EH registration object, and not the true parent
function frame pointer. Clang doesn't need llvm.x86.seh.exceptioninfo
anymore because we know that the exception info pointer is at a fixed
offset from this incoming EBP.
The llvm.x86.seh.recoverfp intrinsic takes an EBP value provided by the
EH runtime and returns a pointer that is usable with llvm.framerecover.
The llvm.x86.seh.restoreframe intrinsic is inserted by the 32-bit
specific preparation pass in blocks targetted by the EH runtime. It
re-establishes any physical registers used by the parent function to
address the stack, such as the frame, base, and stack pointers.
Neither of these intrinsics correctly handle stack realignment prologues
yet, but it's possible to add that later.
Reviewers: majnemer
Differential Revision: http://reviews.llvm.org/D10848
llvm-svn: 241125
Summary:
This change introduces a !make.implicit metadata that allows the
frontend to pre-select the set of explicit null checks that will be
considered for transformation into implicit null checks.
The reason for not using profiling data instead of !make.implicit is
explained in the change to `FaultMaps.rst`.
Reviewers: atrick, reames, pgavlin, JosephTremoulet
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D10824
llvm-svn: 241116
This is part of an effort to pack the average MCSymbol down to 24 bytes.
The HasName bit was pushing the size of the bitfield over to another word,
so this change uses a PointerIntPair to fit in it to unused bits of a
PointerUnion.
Reviewed by Rafael Espíndola
llvm-svn: 241115
It is mandatory to specify a comdat in order to receive comdat semantics
for a symbol. We were previously getting this wrong in -function-sections
mode; linker-weak symbols were being emitted in a selectany comdat. This
change causes such symbols to use a noduplicates comdat instead, fixing
the inconsistency.
Also correct an inaccuracy in the docs.
Differential Revision: http://reviews.llvm.org/D10828
llvm-svn: 241103
Summary:
Really check if %SP is not used in other places, instead of checking only exact
one non-dbg use.
Patched by Xuetian Weng.
Test Plan:
@foo4 in test/CodeGen/NVPTX/local-stack-frame.ll, create a case that
SP will appear twice.
Reviewers: jholewinski, jingyue
Reviewed By: jingyue
Subscribers: llvm-commits, sfantao, jholewinski
Differential Revision: http://reviews.llvm.org/D10844
llvm-svn: 241099
This commit implements serialization of the machine basic block successors. It
uses a YAML flow sequence that contains strings that have the MBB references.
The MBB references in those strings use the same syntax as the MBB machine
operands in the machine instruction strings.
Reviewers: Duncan P. N. Exon Smith
Differential Revision: http://reviews.llvm.org/D10699
llvm-svn: 241093
This commit extracts the code that reports an error that's produced by the
machine instruction parser into a new method that can be reused in other places.
llvm-svn: 241086
This commit refactors the interface for machine instruction parser. It adopts
the pattern of returning a bool and passing in the result in the first argument
that is used by the other parsing methods for the the method 'parse' and the
function 'parseMachineInstr'.
llvm-svn: 241085
This commit refactors the machine instruction lexer so that the lexing
functions use the 'maybeLex...' pattern, where they determine if they
can lex the current token by themselves.
Reviewers: Sean Silva
Differential Revision: http://reviews.llvm.org/D10817
llvm-svn: 241078
Duplicating an FP register "as itself" is a bad idea, since it violates the
invariant that every FP register is mapped to at most one FPU stack slot.
Use the scratch FP register instead.
This fixes PR23957.
llvm-svn: 241069
These directives are used to set the default value of the SoftFloat feature.
They have the same effect as setting -m{soft, hard}-float from the command line.
Differential Revision: http://reviews.llvm.org/D9073
llvm-svn: 241066
represented by uint64_t, this patch replaces these
usages with the FeatureBitset (std::bitset) type.
Differential Revision: http://reviews.llvm.org/D10542
llvm-svn: 241058
This unbreaks TripleTest.Normalization. We'll have to come up with a new
plan for the OS component of the target triple for WebAssembly.
llvm-svn: 241041
Realistically, this will be returning ErrorOr for some time as refactoring the
user code to check once per section will take some time.
Given that, use it for checking if a relocation has addend or not.
While at it, add ELFRelocationRef to simplify the users.
llvm-svn: 241028
A call to removeEmptySubranges() is necessary after every operation that
potentially removes all segments from a subregister range; this case in
the register coalescer was missing.
llvm-svn: 241027
If you only need Name and Value fields in the COFF symbol,
you don't need to distinguish 32 bit and 64 bit COFF symbols.
These fields start at the same offsets and have the same size.
This data strucutre is one pointer smaller than COFFSymbolRef
thus slightly efficient. I'll use this class in LLD as we create
millions of LLD symbol objects that currently contain COFFSymbolRef.
Shaving off 8 byte (or 4 byte on 32 bit) from that class actually
matters becasue of the number of objects we create in LLD.
llvm-svn: 241024
It is meant to be used to record modules @imported by the current
compile unit, so a debugger an import the same modules to replicate this
environment before dropping into the expression evaluator.
DIModule is a sibling to DINamespace and behaves quite similarly.
In addition to the name of the module it also records the module
configuration details that are necessary to uniquely identify the module.
This includes the configuration macros (e.g., -DNDEBUG), the include path
where the module.map file is to be found, and the isysroot.
The idea is that the backend will turn this into a DW_TAG_module.
http://reviews.llvm.org/D9614
rdar://problem/20965932
llvm-svn: 241017
Reapplies r241005 after fixing the build on non-Mac platforms. Original
commit message below.
The hostname can be very unstable when there are many machines on the
network competing for the same name. Using the hardware UUID makes it
less likely to have collisions or to consider files written by the
current host to be owned by a different one at a later time.
rdar://problem/21512307
llvm-svn: 241012
This change unifies how LTOModule and the backend obtain linker flags
for globals: via a new TargetLoweringObjectFile member function named
emitLinkerFlagsForGlobal. A new function LTOModule::getLinkerOpts() returns
the list of linker flags as a single concatenated string.
This change affects the C libLTO API: the function lto_module_get_*deplibs now
exposes an empty list, and lto_module_get_*linkeropts exposes a single element
which combines the contents of all observed flags. libLTO should never have
tried to parse the linker flags; it is the linker's job to do so. Because
linkers will need to be able to parse flags in regular object files, it
makes little sense for libLTO to have a redundant mechanism for doing so.
The new API is compatible with the old one. It is valid for a user to specify
multiple linker flags in a single pragma directive like this:
#pragma comment(linker, "/defaultlib:foo /defaultlib:bar")
The previous implementation would not have exposed
either flag via lto_module_get_*deplibs (as the test in
TargetLoweringObjectFileCOFF::getDepLibFromLinkerOpt was case sensitive)
and would have exposed "/defaultlib:foo /defaultlib:bar" as a single flag via
lto_module_get_*linkeropts. This may have been a bug in the implementation,
but it does give us a chance to fix the interface.
Differential Revision: http://reviews.llvm.org/D10548
llvm-svn: 241010
The hostname can be very unstable when there are many machines on the
network competing for the same name. Using the hardware UUID makes it
less likely to have collisions or to consider files written by the
current host to be owned by a different one at a later time.
rdar://problem/21512307
llvm-svn: 241005
When the store sequence being combined actually stores the base register, we
should not mark it as killed until the end.
rdar://21504262
llvm-svn: 241003
This is a new version of http://reviews.llvm.org/D10260.
It turned out that when you specify an integer register in inline asm on
x86 you get the register of the required type size back. That means that
X86TargetLowering::getRegForInlineAsmConstraint() has to accept any of
the integer registers and adapt its size to the given target size which
may be any 8/16/32/64 bit sized type. Surprisingly that means given a
constraint of "{ax}" and a type of MVT::F32 we need to return X86::EAX.
This change makes this face explicit, the previous code seemed like
working by accident because there it never returned an error once a
register was found. On the other hand this rewrite allows to actually
return errors for invalid situations like requesting an integer register
for an i128 type.
Related to rdar://21042280
Differential Revision: http://reviews.llvm.org/D10813
llvm-svn: 241002
Set debug location for terminator instruction in loop backedge block
(which is an unconditional jump to loop header). We can't copy debug
location from original backedges, as there can be several of them,
with different debug info locations. So, we follow the approach of
SplitBlockPredecessors, and copy the debug info from first non-PHI
instruction in the header (i.e. destination block).
This is yet another change for PR23837.
llvm-svn: 240999
Summary: This patch fixes the cases of sext/zext constant folding in DAG combiner where constans do not fit 64 bits. The fix simply removes un$
Test Plan: New regression test included.
Reviewers: RKSimon
Reviewed By: RKSimon
Subscribers: RKSimon, llvm-commits
Differential Revision: http://reviews.llvm.org/D10607
llvm-svn: 240991
Make sure to remove the unique lock file, which is what the .lock
symlink points to, if there is a signal while the lock is held. This
will release the lock, since the symlink will point to nothing (already
tested in unit tests). For good measure, also clean up the unique lock
file if there is an error or signal before the lock is acquired.
I will add a clang test.
rdar://problem/21512307
llvm-svn: 240967
This commit implements serialization of the register mask machine
operands. This commit serializes only the call preserved register
masks that are defined by a target, it doesn't serialize arbitrary
register masks.
This commit also extends the TargetRegisterInfo class and TableGen so that
the users of TRI can get the list of all the call preserved register masks and
their names.
Reviewers: Duncan P. N. Exon Smith
Differential Revision: http://reviews.llvm.org/D10673
llvm-svn: 240966
The expressions we delinearize do not necessarily have to have a SCEVAddRecExpr
at the outermost level. At this moment, the additional flexibility is not
exploited in LLVM itself, but in Polly we will soon soonish use this
functionality. For LLVM, this change should not affect existing functionality
(which is covered by test/Analysis/Delinearization/)
llvm-svn: 240952
This moves the error checking for string tables to getStringTable which returns
an ErrorOr<StringRef>.
This improves error checking, makes it uniform across all string tables and
makes it possible to check them once instead of once per name.
llvm-svn: 240950
removing default label in switch as it results.
This is part of earlier commit http://reviews.llvm.org/D1064
Subscribers: llvm-commits
llvm-svn: 240932
Some of the the permissible ARM -mfpu options, which are supported in GCC,
are currently not present in llvm/clang.This patch adds the options:
'neon-fp16', 'vfpv3-fp16', 'vfpv3-d16-fp16', 'vfpv3xd' and 'vfpv3xd-fp16.
These are related to half-precision floating-point and single precision.
Reviewers: rengolin, ranjeet.singh
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D10645
llvm-svn: 240930
We had a hack in SDAGBuilder in place to work around this but now we
can avoid that. Call BuildExactSDIV from BuildSDIV so DAGCombiner can
perform this trick automatically.
The added check in DAGCombiner is necessary to prevent exact sdiv by pow2
from regressing as the target-specific pow2 lowering is not aware of
exact bits yet.
This is mostly covered by existing tests. One side effect is that we
get the better lowering for exact vector sdivs now too :)
llvm-svn: 240891
the DW_AT_bit_offset computation, the byte offset is in fact also
endian-dependent as it needs to point to the storage unit containing the
most-significant bit of the the bitfield.
I'm so looking forward to emitting the endian-agnostic DWARF 3 version
instead.
llvm-svn: 240890
Summary:
Previously it (incorrectly) used GPR's.
Patch by Simon Dardis. A couple small corrections by myself.
Reviewers: dsanders
Reviewed By: dsanders
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D10567
llvm-svn: 240883
If we are dealing with a pointer induction variable, isInductionPHI
gives back a step value of Stride / size of pointer. However, we might
be indexing with a legal type wider than the pointer width.
Handle this by inserting casts where appropriate instead of crashing.
This fixes PR23954.
llvm-svn: 240877
The PruneEH pass tries to annotate functions as 'noreturn' if it doesn't
see a ReturnInst. However, a naked function containing inline assembly
can contain control flow leaving the function.
This fixes PR23971.
llvm-svn: 240876
Summary:
The current implementation doesn't always flush all pending labels
beforeemitting data which can result in an incorrectly placed labels in
case when when instruction bundling is enabled and -mc-relax-all flag is
being used. To address this issue, we always flush pending labels before
emitting data.
The change was tested by running PNaCl toolchain trybots with
-mc-relax-all flag set.
Fixes https://code.google.com/p/nativeclient/issues/detail?id=4063
Test Plan: Regression test attached
Reviewers: mseaborn
Subscribers: jfb, llvm-commits
Differential Revision: http://reviews.llvm.org/D10325
llvm-svn: 240870
Summary:
Ensure that fragments are bundle aligned when instruction bundling
is enabled and the -mc-relax-all flag is set. This is implicitly
assumed by the bundle padding implementation but this assumption
does not hold when custom alignment is being used.
The change was tested by running PNaCl toolchain trybots with
-mc-relax-all flag set.
Fixes https://code.google.com/p/nativeclient/issues/detail?id=4063
Test Plan: Regression test attached
Reviewers: mseaborn
Subscribers: jfb, llvm-commits
Differential Revision: http://reviews.llvm.org/D10044
llvm-svn: 240869
Allow callers of `Value::print()` and `Metadata::print()` to pass in a
`ModuleSlotTracker`. This allows them to pay only once for calculating
module-level slots (such as Metadata).
This is related to PR23865, where there was a huge cost for
`MachineFunction::print()`. Although I don't have a *particular* user
in mind for this new code, I have hit big slowdowns before when running
`opt -debug`, and I think this will be useful. Going forward, if
someone hits a big slowdown with `print()` statements, they can create a
`ModuleSlotTracker` and send it through. Similarly, adding support to
`Value::dump()` and `Metadata::dump()` should be trivial.
I added unit tests to be sure the `print()` functions actually behave
the same way with and without the slot tracker.
llvm-svn: 240867
It is possible for a global to be substituted with another global of a
different type or a different kind (i.e. an alias) at IR link time. One
example of this scenario is when a Microsoft ABI vtable is substituted with
an alias referring to a larger vtable containing an RTTI reference.
This will cause the global to be RAUW'd with a possibly bitcasted reference
to the other global. This will of course also affect any references to the
global in bitset metadata.
The right way to handle such metadata is simply to ignore it. This is sound
because the linked module should contain another copy of the bitset entries as
applied to the new global.
llvm-svn: 240866
Another follow-up related to r240848: try a little harder to share slot
tracking calculations within a single `MachineInstr` dump. This is
unrelated to `MachineFunction::print()`, since that should be passing
through the function's `ModuleSlotTracker` by now, but could affect the
speed of dumping from a debugger if there is more than one IR-level
operand.
llvm-svn: 240852
This commit serializes the global address machine operands.
This commit doesn't serialize the operand's offset and target
flags, it serializes only the global value reference.
Reviewers: Duncan P. N. Exon Smith
Differential Revision: http://reviews.llvm.org/D10671
llvm-svn: 240851
This change extends the detection of base pointers for vector constructs to handle arbitrary phi and select nodes. The existing non-vector code already handles those, so this is basically just extending the vector special case to be less special cased. It still isn't generalized vector handling since we can't handle arbitrary vector instructions (e.g. shufflevectors), but it's a lot closer.
The general structure of the change is as follows:
* Extend the base defining value relation over a subset of vector instructions and vector typed phi & select instructions.
* Move scalarization from before base pointer rewriting to after base pointer rewriting. The extension of the BDV relation is sufficient to find vector base phis for vector inputs.
* Preserve the existing special case logic for when the base of a vector element is locally obvious. This general idea could be extended to the scalar case as well.
Differential Revision: http://reviews.llvm.org/D10461#inline-84275
llvm-svn: 240850
Summary:
Some front ends make kernel pointers global already. In that case,
handlePointerParams does nothing.
Test Plan: more tests in lower-kernel-ptr-arg.ll
Reviewers: grosser
Subscribers: jholewinski, llvm-commits
Differential Revision: http://reviews.llvm.org/D10779
llvm-svn: 240849
For another 1% speedup on the testcase in PR23865, push the
`ModuleSlotTracker` through to metadata-related printing in
`MachineBasicBlock::print()`.
llvm-svn: 240848
Push `ModuleSlotTracker` through `MachineOperand`s, dropping the time
for `llc -print-machineinstrs` on the testcase in PR23865 from ~13
seconds to ~9 seconds. Now `SlotTracker::processFunctionMetadata()`
accounts for only 8% of the runtime, which seems reasonable.
llvm-svn: 240845
Expose enough of the IR-level `SlotTracker` so that
`MachineFunction::print()` can use a single one for printing
`BasicBlock`s. Next step would be to lift this through a few more APIs
so that we can make other print methods faster.
Fixes PR23865, changing the runtime of `llc -print-machineinstrs` from
many minutes (killed after 3 minutes, but it wasn't very close) to
13 seconds for a 502185 line dump.
llvm-svn: 240842
Summary: We need to set MTYPE = 2 for VI shaders when targeting the HSA runtime.
Reviewers: arsenm
Differential Revision: http://reviews.llvm.org/D10777
llvm-svn: 240841
We support invoking a subset of llvm's intrinsics, but the verifier didn't account for this. We had previously added a special case to verify invokes of statepoints. By generalizing the code in terms of CallSite, we can verify invokes of other intrinsics as well. Interestingly, this found one test case which was invalid.
Note: I'm deliberately leaving the naming change from CI to CS to a follow up change. That will happen shortly, I just wanted to reduce the diff to make it clear what was happening with this one.
Differential Revision: http://reviews.llvm.org/D10118
llvm-svn: 240836
Summary:
This way the function symbol points to the start of amd_kernel_code_t
rather than the start of the function.
Reviewers: arsenm
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D10705
llvm-svn: 240829
If we have a caller that knows a particular argument can never be null, we can exploit this fact while simplifying values in the inline cost analysis. This has the effect of reducing the cost for inlining when a null check is present in the callee, but the value is known non null in the caller. In particular, any dependent control flow can be discounted from the cost estimate.
Note that we use the parameter attributes at the call site to memoize the analysis within the caller's code. The setting of this attribute is done in InstCombine, the inline cost analysis just consumes it. This is intentional and important because we want the inline cost analysis results to be easily cachable themselves. We're not currently doing so, but initial results on LTO indicate this will quickly become important.
Differential Revision: http://reviews.llvm.org/D9129
llvm-svn: 240828
If pseudoToMCOpcode failed, we would return the original opcode, so operands
would be swapped, but the instruction would remain the same.
It resulted in LSHLREV a, b ---> LSHLREV b, a.
This fixes Glamor text rendering and
piglit/arb_sample_shading-builtin-gl-sample-mask on VI.
This is a candidate for stable branches.
v2: the test was simplified by Tom Stellard
llvm-svn: 240824
This patch corresponds to review:
http://reviews.llvm.org/D10638
This is the back end portion of patch
http://reviews.llvm.org/D10637
It just adds the code gen and intrinsic functions necessary to support that patch to the back end.
llvm-svn: 240820
The body of the loops here only contained asserts. This triggered an unused variable
warning on release builds and -Werror on the bots.
llvm-svn: 240819
SDNode already had ops() which would iterate over the operands and return
SDUse*. This version instead gets the SDValue's out of the SDUse's so that
we can use foreach in more places.
Reviewed by David Blaikie.
llvm-svn: 240805