The previous patch had effect, but missed this one. It seems MSVC
gets ADL-confused by the calls where the first argument is a function call?
llvm-svn: 222968
It was failing with this kind of error:
C:\b\build\slave\CrWinClang\build\src\third_party\llvm\lib\TableGen\TGParser.cpp(1243) : error C2668: 'llvm::make_unique' : ambiguous call to overloaded function
C:\b\build\slave\CrWinClang\build\src\third_party\llvm\include\llvm/ADT/STLExtras.h(408): could be 'std::unique_ptr<llvm::Record,std::default_delete<_Ty>> llvm::make_unique<llvm::Record,std::string,llvm::SMLoc&,llvm::RecordKeeper&,bool>(std::string &&,llvm::SMLoc &,llvm::RecordKeeper &,bool &&)'
with
[
_Ty=llvm::Record
]
C:\b\depot_tools\win_toolchain\vs2013_files\win8sdk\bin\..\..\VC\include\memory(1637): or 'std::unique_ptr<llvm::Record,std::default_delete<_Ty>> std::make_unique<llvm::Record,std::string,llvm::SMLoc&,llvm::RecordKeeper&,bool>(std::string &&,llvm::SMLoc &,llvm::RecordKeeper &,bool &&)' [found using argument-dependent lookup]
with
[
_Ty=llvm::Record
]
while trying to match the argument list '(std::string, llvm::SMLoc, llvm::RecordKeeper, bool)'
llvm-svn: 222967
This reverts commit r222632 (and follow-up r222636), which caused a host
of LNT failures on an internal bot. I'll respond to the commit on the
list with a reproduction of one of the failures.
Conflicts:
lib/Target/X86/X86TargetTransformInfo.cpp
llvm-svn: 222936
We may be in a situation where the icmps might not be near each other in
a tree of or instructions. Try to dig out related compare instructions
and see if they combine.
N.B. This won't fire on deep trees of compares because rewritting the
tree might end up creating a net increase of IR. We may have to resort
to something more sophisticated if this is a real problem.
llvm-svn: 222928
Loop simplify skips exit-block insertion when exits contain indirectbr
instructions. This leads to an assertion in LICM when trying to sink
stores out of non-dedicated loop exits containing indirectbr
instructions. This patch fix this issue by re-checking for dedicated
exits in LICM prior to store sink attempts.
Differential Revision: http://reviews.llvm.org/D6414
rdar://problem/18943047
llvm-svn: 222927
Switch cases statements with sequential values that branch to the same
destination BB may often be handled together in a single new source BB.
In this scenario we need to remove remaining incoming values from PHI
instructions in the destination BB, as to match the number of source
branches.
Differential Revision: http://reviews.llvm.org/D6415
rdar://problem/19040894
llvm-svn: 222926
Allow unaligned 16-byte memop codegen for btver2. No functional changes for any other subtargets.
Replace the existing supposed small memcpy test with an actual test of a small memcpy.
The previous test wasn't using FileCheck either.
This patch should allow us to close PR21541 ( http://llvm.org/bugs/show_bug.cgi?id=21541 ).
Differential Revision: http://reviews.llvm.org/D6360
llvm-svn: 222925
The original patch would fail when:
* A dst opaque type (%A) is matched with a src type (%A).
* A src opaque (%E) type is then speculatively matched with %A and the
speculation fails afterward.
* When rolling back the speculation we would cancel the source %A to dest
%A mapping.
The fix is to keep an explicit list of which resolutions are speculative.
Original message:
Fix overly aggressive type merging.
If we find out that two types are *not* isomorphic, we learn nothing about
opaque sub types in both the source and destination.
llvm-svn: 222923
MSan does not assign origin for instrumentation temps (i.e. the ones that do
not come from the application code), but "select" instrumentation erroneously
tried to use one of those.
https://code.google.com/p/memory-sanitizer/issues/detail?id=78
llvm-svn: 222918
The AAPCS treats small structs and homogeneous floating (or vector) aggregates
specially, and guarantees they either get passed as a contiguous block of
registers, or prevent any future use of those registers and get passed on the
stack.
This concept can fit quite neatly into LLVM's own type system, mapping an HFA
to [N x float] and so on, and small structs to [N x i64]. Doing so allows
front-ends to emit AAPCS compliant code without having to duplicate the
register counting logic.
llvm-svn: 222903
I also added a test.
Original message:
Allow FDE references outside the +/-2GB range supported by PC relative
offsets for code models other than small/medium. For JIT application,
memory layout is less controlled and can result in truncations
otherwise.
Patch from Akos Kiss.
Differential Revision: http://reviews.llvm.org/D6079
llvm-svn: 222897
This reverts commit r222760.
It changed our behaviour on PIC so we don't match gas anymore. It also
included lots of unnecessary changes to tests.
If those changes are desirable, there should be an independent discussion
as they are out of scope for that patch.
I will recommit the other bits.
llvm-svn: 222896
This reverts commit r222727, which causes LTO bootstrap failures.
Last passing @ r222698:
http://lab.llvm.org:8080/green/job/clang-Rlto_master_build/532/
First failing @ r222843:
http://lab.llvm.org:8080/green/job/clang-Rlto_master_build/533/
Internal bootstraps pointed at a much narrower range: r222725 is
passing, and r222731 is failing.
LTO crashes while handling libclang.dylib:
http://lab.llvm.org:8080/green/job/clang-Rlto_master_build/533/consoleFull#-158682280549ba4694-19c4-4d7e-bec5-911270d8a58c
GEP is not of right type for indices!
%InfoObj.i.i = getelementptr inbounds %"class.llvm::OnDiskIterableChainedHashTable"* %.lcssa, i64 0, i32 0, i32 4, !dbg !123627
%"class.clang::serialization::reader::ASTIdentifierLookupTrait" = type { %"class.clang::ASTReader.31859"*, %"class.clang::serialization::ModuleFile.31870"*, %"class.clang::IdentifierInfo"* }LLVM ERROR: Broken function found, compilation aborted!
clang: error: linker command failed with exit code 1 (use -v to see invocation)
Looks like the new algorithm doesn't merge types aggressively enough.
llvm-svn: 222895
Fixed missing dominance check.
Original commit message:
This optimization tries to reuse the generated compare instruction, if there is a comparison against the default value after the switch.
Example:
if (idx < tablesize)
r = table[idx]; // table does not contain default_value
else
r = default_value;
if (r != default_value)
...
Is optimized to:
cond = idx < tablesize;
if (cond)
r = table[idx];
else
r = default_value;
if (cond)
...
Jump threading will then eliminate the second if(cond).
llvm-svn: 222891
The string data for string-valued build attributes were being unconditionally
uppercased. There is no mention in the ARM ABI addenda about case conventions,
so it's technically implementation defined as to whether the data are
capitialised in some way or not. However, there are good reasons not to
captialise the data.
* It's less work.
* Some vendors may legitimately have case-sensitive checks for these
attributes which would fail on LLVM generated object files.
* There could be locale issues with uppercasing.
The original reasons for uppercasing appear to have stemmed from an
old codesourcery toolchain behaviour, see
http://comments.gmane.org/gmane.comp.compilers.llvm.cvs/87133
This patch makes the object file emitted no longer captialise string
data, it encodes as seen in the assembly source.
Change-Id: Ibe20dd6e60d2773d57ff72a78470839033aa5538
llvm-svn: 222882
This optimization tries to reuse the generated compare instruction, if there is a comparison against the default value after the switch.
Example:
if (idx < tablesize)
r = table[idx]; // table does not contain default_value
else
r = default_value;
if (r != default_value)
...
Is optimized to:
cond = idx < tablesize;
if (cond)
r = table[idx];
else
r = default_value;
if (cond)
...
\endcode
Jump threading will then eliminate the second if(cond).
llvm-svn: 222872
This restores our ability to optimize:
(X & C) ? X & ~C : X into X & ~C
(X & C) ? X : X & ~C into X
(X & C) ? X | C : X into X
(X & C) ? X : X | C into X | C
llvm-svn: 222868
All symbols have to be stored in the global symbol to enable
cross-rtdyld-instance linking, so the local symbol table content is
redundant.
llvm-svn: 222867
This reverts commit r210006, it miscompiled libapr which is used in who
knows how many projects.
A test has been added to ensure that we don't regress again.
I'll work on a rewrite of what the optimization was trying to do later.
llvm-svn: 222856
llvm-objdump printed out an error message for this off-by-one error,
but because it always exits with 0 whether or not it found an error,
the test (llvm-objdump/coff-many-relocs.test) succeeded.
I made llvm-objdump exit with EXIT_FAILURE when an error is found.
llvm-svn: 222852
This sort of doesn't matter since the setcc type is i1, but
this previously was using the default UndefinedBooleanContent. This
makes it more consistent with R600. This enables more optimizations
which typically give up on UndefinedBooleanContent. For example,
there is already a special case target DAG combine for
setcc + sext which can be eliminated in favor of what the generic
DAG combiner can do if it assumes boolean values are sign extended.
Since -1 is an inline immediate, using it is basically free and the
backend already uses it when a boolean value is needed in a wider type.
llvm-svn: 222850
This fixes moving boolean constants into registers before operating
on them. They get permuted and shrunk down to e32 anyway later. This
is a temporary fix until the patch that removes these pseudos is
committed.
llvm-svn: 222844
This mostly entails adding relocations, however there are a couple of
changes to existing relocations:
1. R_AARCH64_NONE is defined to be zero rather than 256
R_AARCH64_NONE has been defined to be zero for a long time elsewhere
e.g. binutils and glibc since the submission of the AArch64 port in
2012 so this is required for compatibility.
2. R_AARCH64_TLSDESC_ADR_PAGE renamed to R_AARCH64_TLSDESC_ADR_PAGE21
I don't think there is any way for relocation names to leak out of LLVM
so this should not break anything.
Tested with check-all with no regressions.
llvm-svn: 222821
including SAE mode and memory operand.
Added AVX512_maskable_scalar template, that should cover all scalar instructions in the future.
The main difference between AVX512_maskable_scalar<> and AVX512_maskable<> is using X86select instead of vselect.
I need it, because I can't create vselect node for MVT::i1 mask for scalar instruction.
http://reviews.llvm.org/D6378
llvm-svn: 222820
that we actually have an object to register first.
For MachO objects, RuntimeDyld::LoadedObjectInfo::getObjectForDebug returns an
empty OwningBinary<ObjectFile> which was causing crashes in the GDB registration
code.
llvm-svn: 222812
The RuntimeDyld cleanup patch r222810 turned on GDB registration for MachO
objects. I expected this to be harmless, but it seems to have broken on
MacsOS. Temporarily disabling debugger registration while I dig in to what's
gone wrong.
llvm-svn: 222811
Previously, when loading an object file, RuntimeDyld (1) took ownership of the
ObjectFile instance (and associated MemoryBuffer), (2) potentially modified the
object in-place, and (3) returned an ObjectImage that managed ownership of the
now-modified object and provided some convenience methods. This scheme accreted
over several years as features were tacked on to RuntimeDyld, and was both
unintuitive and unsafe (See e.g. http://llvm.org/PR20722).
This patch fixes the issue by removing all ownership and in-place modification
of object files from RuntimeDyld. Existing behavior, including debugger
registration, is preserved.
Noteworthy changes include:
(1) ObjectFile instances are now passed to RuntimeDyld by const-ref.
(2) The ObjectImage and ObjectBuffer classes have been removed entirely, they
existed to model ownership within RuntimeDyld, and so are no longer needed.
(3) RuntimeDyld::loadObject now returns an instance of a new class,
RuntimeDyld::LoadedObjectInfo, which can be used to construct a modified
object suitable for registration with the debugger, following the existing
debugger registration scheme.
(4) The JITRegistrar class has been removed, and the GDBRegistrar class has been
re-written as a JITEventListener.
This should fix http://llvm.org/PR20722 .
llvm-svn: 222810
Since (v)pslldq / (v)psrldq instructions resolve to a single input argument it is useful to match it much earlier than we currently do - this prevents more complicated shuffles (notably insertion into a zero vector) matching before it.
Differential Revision: http://reviews.llvm.org/D6409
llvm-svn: 222796
If solveBlockValue() needs results from predecessors that are not already
computed, it returns false with the intention of resuming when the dependencies
have been resolved. However, the computation would never be resumed since an
'overdefined' result had been placed in the cache, preventing any further
computation.
The point of placing the 'overdefined' result in the cache seems to have been
to break cycles, but we can check for that when inserting work items in the
BlockValue stack instead. This makes the "stop and resume" mechanism of
solveBlockValue() work as intended, unlocking more analysis.
Using this patch shaves 120 KB off a 64-bit Chromium build on Linux.
I benchmarked compiling bzip2.c at -O2 but couldn't measure any difference in
compile time.
Tests by Jiangning Liu from r215343 / PR21238, Pete Cooper, and me.
Differential Revision: http://reviews.llvm.org/D6397
llvm-svn: 222768
This changes the order in which different types are passed to get, but
one order is not inherently better than the other.
The main motivation is that this simplifies linkDefinedTypeBodies now that
it is only linking "real" opaque types. It is also means that we only have to
call it once and that we don't need getImpl.
A small change in behavior is that we don't copy type names when resolving
opaque types. This is an improvement IMHO, but it can be added back if
desired. A test is included with the new behavior.
llvm-svn: 222764
Mark destination buffer in zlib::compress and zlib::decompress as fully
initialized.
When building LLVM with system zlib and MemorySanitizer instrumentation,
MSan does not observe memory writes in zlib code and erroneously considers
zlib output buffers as uninitialized, resulting in false use-of-uninitialized
memory reports. This change helps MSan understand the state of that memory
and prevents such reports.
llvm-svn: 222763
and PIC:
Allow FDE references outside the +/-2GB range supported by PC relative
offsets for code models other than small/medium. For JIT application,
memory layout is less controlled and can result in truncations
otherwise.
Patch from Akos Kiss.
Differential Revision: http://reviews.llvm.org/D6079
llvm-svn: 222760
Exactly the same checks are present in areTypesIsomorphic.
This might have been a premature performance optimization. I cannot reproduce
any slowdown with this patch.
llvm-svn: 222758
stored rather than the pointer type.
This change is analogous to r220138 which changed the canonicalization
for loads. The rationale is the same: memory does not have a type,
operations (and thus the values they produce) have a type. We should
match that type as closely as possible rather than reading some form of
semantics into the pointer type.
With this change, loads and stores should no longer be made with
nonsensical types for the values that tehy load and store. This is
particularly important when trying to match specific loaded and stored
types in the process of doing other instcombines, which is what led me
down this twisty maze of miscanonicalization.
I've put quite some effort into looking through IR to find places where
LLVM's optimizer was being unreasonably conservative in the face of
mismatched load and store types, however it is possible (let's say,
likely!) I have missed some. If you see regressions here, or from
r220138, the likely cause is some part of LLVM failing to cope with load
and store types differing. Test cases appreciated, it is important that
we root all of these out of LLVM.
llvm-svn: 222748
clearly only exactly equal width ptrtoint and inttoptr casts are no-op
casts, it says so right there in the langref. Make the code agree.
Original log from r220277:
Teach the load analysis to allow finding available values which require
inttoptr or ptrtoint cast provided there is datalayout available.
Eventually, the datalayout can just be required but in practice it will
always be there today.
To go with the ability to expose available values requiring a ptrtoint
or inttoptr cast, helpers are added to perform one of these three casts.
These smarts are necessary to finish canonicalizing loads and stores to
the operational type requirements without regressing fundamental
combines.
I've added some test cases. These should actually improve as the load
combining and store combining improves, but they may fundamentally be
highlighting some missing combines for select in addition to exercising
the specific added logic to load analysis.
llvm-svn: 222739
Only the super register flat_scr was marked as reserved,
so in some cases with high register usage it would still
try to allocate the subregisters.
llvm-svn: 222737
The pattern matching failed to recognize all instances of "-1", because when
comparing against "-1" we didn't use an APInt of the same bitwidth.
This commit fixes this and also adds inverse versions of the conditon to catch
more cases.
llvm-svn: 222722
This handles cases where we are comparing a masked value against itself.
The analysis could be further improved by making it recursive but such
expense is not currently justified.
llvm-svn: 222716
The attn instruction is not part of the Power ISA, but is documented in the A2
user manual, and is accepted by the GNU assembler for the A2 and the POWER4+.
Reported as part of PR21650.
llvm-svn: 222712
This does not matter on newer cores (where we can use reciprocal estimates in
fast-math mode anyway), but for older cores this allows us to generate better
fast-math code where we have multiple FDIVs with a common divisor.
llvm-svn: 222710
We were matching against the assume intrinsic in every check. Since we know that it must be an assume, this is just wasted work. Somewhat surprisingly, matching an intrinsic id is actually relatively expensive. It devolves to a string construction and comparison in Function::isIntrinsic.
I originally spotted this because it showed up in a performance profile of my compiler. I've since discovered a separate issue which seems to be the actual root cause, but this is minor perf goodness regardless.
I'm likely to follow up with another change to factor out the comparison matching. There's no need to match the compare instruction in every single one of the tests.
Differential Revision: http://reviews.llvm.org/D6312
llvm-svn: 222709
When processing an assignment in the integrated assembler that sets
a symbol to the value of another symbol, we need to copy the st_other
bits that encode the local entry point offset.
Modeled after MipsTargetELFStreamer::emitAssignment handling of the
ELF::STO_MIPS_MICROMIPS flag.
llvm-svn: 222672
We would create an instruction but not inserting it.
Not inserting the unused instruction would lead us to verification
failure.
This fixes PR21653.
llvm-svn: 222659
Fix JRADDIUSP instruction, remove delay slot flag because this instruction
doesn't have delay slot.
Differential Revision: http://reviews.llvm.org/D6365
llvm-svn: 222658
With the help of new method readInstruction16() two bytes are read and
decodeInstruction() is called with DecoderTableMicroMips16, if this fails
four bytes are read and decodeInstruction() is called with
DecoderTableMicroMips32.
Differential Revision: http://reviews.llvm.org/D6149
llvm-svn: 222648
This patch teaches function 'transformVSELECTtoBlendVECTOR_SHUFFLE' how to
convert VSELECT dag nodes to shuffles on targets that do not have SSE4.1.
On pre-SSE4.1 targets, we can still perform blend operations using movss/movsd.
Also, removed a target specific combine that performed a premature lowering of
VSELECT nodes to target specific MOVSS/MOVSD nodes.
llvm-svn: 222647
We tried to get the result of DataLayout::getLargestLegalIntTypeSize but
we didn't have a DataLayout. This resulted in opt crashing.
This fixes PR21651.
llvm-svn: 222645
r222375 made some improvements to build_vector lowering of v4x32 and v4xf32 into an insertps, but it missed a case where:
1. A single extracted element is used twice.
2. The lower of the two non-zero indexes should be preserved, and the higher should be used for the dest mask.
This caused a crash, since the source value for the insertps ends-up uninitialized.
Differential Revision: http://reviews.llvm.org/D6377
llvm-svn: 222635
Introduced new target-independent intrinsics in order to support masked vector loads and stores. The loop vectorizer optimizes loops containing conditional memory accesses by generating these intrinsics for existing targets AVX2 and AVX-512. The vectorizer asks the target about availability of masked vector loads and stores.
Added SDNodes for masked operations and lowering patterns for X86 code generator.
Examples:
<16 x i32> @llvm.masked.load.v16i32(i8* %addr, <16 x i32> %passthru, i32 4 /* align */, <16 x i1> %mask)
declare void @llvm.masked.store.v8f64(i8* %addr, <8 x double> %value, i32 4, <8 x i1> %mask)
Scalarizer for other targets (not AVX2/AVX-512) will be done in a separate patch.
http://reviews.llvm.org/D6191
llvm-svn: 222632
No functionality changed yet, but this will prevent subsequent patches
from having to handle permutations of various interleaved shuffle
patterns.
llvm-svn: 222614
Fixes the self-host fail. Note that this commit activates dominator
analysis in the combiner by default (like the original commit did).
llvm-svn: 222590
This s_mov_b32 will write to a virtual register from the M0Reg
class and all the ds instructions now take an extra M0Reg explicit
argument.
This change is necessary to prevent issues with the scheduler
mixing together instructions that expect different values in the m0
registers.
llvm-svn: 222583
filler such as if delay slot filler have to put NOP instruction into the
delay slot of microMIPS BEQ or BNE instruction which uses the register $0,
then instead of emitting NOP this instruction is replaced by the corresponding
microMIPS compact branch instruction, i.e. BEQZC or BNEZC.
Differential Revision: http://reviews.llvm.org/D3566
llvm-svn: 222580
We can now use the ELF relocation .def files to create the mapping
of relocation numbers to names and avoid having to duplicate the
list of relocations.
Patch by Will Newton.
llvm-svn: 222567
We can now use the ELF relocation .def files to create the mapping
of relocation numbers to names and avoid having to duplicate the
list of relocations.
Patch by Will Newton.
llvm-svn: 222566
This patch adds a feature flag to avoid unaligned 32-byte load/store AVX codegen
for Sandy Bridge and Ivy Bridge. There is no functionality change intended for
those chips. Previously, the absence of AVX2 was being used as a proxy to detect
this feature. But that hindered codegen for AVX-enabled AMD chips such as btver2
that do not have the 32-byte unaligned access slowdown.
Performance measurements are included in PR21541 ( http://llvm.org/bugs/show_bug.cgi?id=21541 ).
Differential Revision: http://reviews.llvm.org/D6355
llvm-svn: 222544
shuffle lowering to allow much better blend matching.
Specifically, with the new structure the code seems clearer to me and we
correctly can hit the cases where merging two 128-bit lanes is a clear
win and can be shuffled cheaply afterward.
llvm-svn: 222539
offsets for code models other than small/medium. For JIT application,
memory layout is less controlled and can result in truncations
otherwise.
Patch from Akos Kiss.
Differential Revision: http://reviews.llvm.org/D6079
llvm-svn: 222538
a bunch more improvements.
Non-lane-crossing is fine, the key is that lane merging only makes sense
for single-input shuffles. Not sure why I got so turned around here. The
code all works, I was just using the wrong model for it.
This only updates v4 and v8 lowering. The v16 and v32 lowering requires
restructuring the entire check sequence.
llvm-svn: 222537
Before this patch, the DAGCombiner only tried to convert build_vector dag nodes
into shuffles if all operands were either extract_vector_elt or undef.
This patch improves that logic and teaches the DAGCombiner how to deal with
build_vector dag nodes where one or more operands are zero. A build_vector
dag node with some zero operands is turned into a shuffle only if the resulting
shuffle mask is legal for the target.
llvm-svn: 222536
lanes.
By special casing these we can often either reduce the total number of
shuffles significantly or reduce the number of (high latency on Haswell)
AVX2 shuffles that potentially cross 128-bit lanes. Even when these
don't actually cross lanes, they have much higher latency to support
that. Doing two of them and a blend is worse than doing a single insert
across the 128-bit lanes to blend and then doing a single interleaved
shuffle.
While this seems like a narrow case, it kept cropping up on me and the
difference is *huge* as you can see in many of the test cases. I first
hit this trying to perfectly fix the interleaving shuffle patterns used
by Halide for AVX2.
llvm-svn: 222533
This patch simplifies the logic that combines a pair of shuffle nodes into
a single shuffle if there is a legal mask. Also added comments to better
describe the algorithm. No functional change intended.
llvm-svn: 222522
E.g., ( a / D; b / D ) -> ( recip = 1.0 / D; a * recip; b * recip)
A hook is added to allow the target to control whether it needs to do such combine.
Reviewed in http://reviews.llvm.org/D6334
llvm-svn: 222510
This mirrors r222331, which enabled SeparateConstOffsetFromGEP on AArch64, in
the PowerPC backend. Yields, on a POWER7 machine, a 30% speedup on
SingleSource/Benchmarks/Shootout/nestedloop (this might just be from LICM,
there is a store moved out of the inner loop) and a potential speedup on
MultiSource/Benchmarks/mediabench/mpeg2/mpeg2dec/mpeg2decode. Regardless, it
makes some code look cleaner, and synchronizing the backends in this regard
seems like a generally good thing.
llvm-svn: 222504
The alloca's type is irrelevant, only those types which are used in a
load or store of the exact size of the slice should be considered.
This manifested as an assertion failure when we compared the various
types: we had a size mismatch.
This fixes PR21480.
llvm-svn: 222499
These recently all grew a unique_ptr<TargetLoweringObjectFile> member in
r221878. When anyone calls a virtual method of a class, clang-cl
requires all virtual methods to be semantically valid. This includes the
implicit virtual destructor, which triggers instantiation of the
unique_ptr destructor, which fails because the type being deleted is
incomplete.
This is just part of the ongoing saga of PR20337, which is affecting
Blink as well. Because the MSVC ABI doesn't have key functions, we end
up referencing the vtable and implicit destructor on any virtual call
through a class. We don't actually end up emitting the dtor, so it'd be
good if we could avoid this unneeded type completion work.
llvm-svn: 222480
Code seems cleaner and easier to understand this way
This is basically r222416, after fixes for MSVC lack of standard
support, and a few cleaning (got rid of a warning).
Thanks Nakamura Takumi and Nico Weber for the MSVC fixes.
llvm-svn: 222472
Currently LoopUnroll generates a prologue loop before the main loop
body to execute first N%UnrollFactor iterations. Also, this loop is
used if trip-count can overflow - it's determined by a runtime check.
However, we've been mistakenly optimizing this loop to a linear code for
UnrollFactor = 2, not taking into account that it also serves as a safe
version of the loop if its trip-count overflows.
llvm-svn: 222451
Windows itanium targets the MSVCRT, and the stack probe symbol is provided by
MSVCRT. This corrects the emission of stack probes on i686-windows-itanium.
llvm-svn: 222439
As dump() methods should be. To allow that, do not store the DWARFFormValue
objects used for the dump in the header data.
Per Alexey's suggestion!
llvm-svn: 222436
These fields would need to be explicitly deleted before we RAUW the temporary
node anyway (this was done in cfe commit r222373). Instead, do not create
these useless nodes in the first place.
llvm-svn: 222434
"global-init", "global-init-src" and "global-init-type" were originally
used to blacklist entities in ASan init-order checker. However, they
were never documented, and later were replaced by "=init" category.
Old blacklist entries should be converted as follows:
* global-init:foo -> global:foo=init
* global-init-src:bar -> src:bar=init
* global-init-type:baz -> type:baz=init
llvm-svn: 222401
This reverts commit r222142. This is causing/exposing an execution-time regression
in spec2006/gcc and coremark on AArch64/A57/Ofast.
Conflicts:
test/Transforms/Reassociate/optional-flags.ll
llvm-svn: 222398
- Show "Considering..." message after flipping so you actually see the final
destination vreg as destination.
- Add a message on final join, so you can grep for "Success" messages to obtain
a list of which register got merged with which.
llvm-svn: 222382
This patch improves the lowering of v4f32 and v4i32 build_vector dag nodes
that are known to have at least two non-zero elements.
With this patch, a build_vector that performs a blend with zero is
converted into a shuffle. This is done to let the shuffle legalizer expand
the dag node in a optimal way. For example, if we know that a build_vector
performs a blend with zero, we can try to lower it as a movq/blend instead of
always selecting an insertps.
This patch also improves the logic that lowers a build_vector into a insertps
with zero masking. See for example the extra test cases added to test sse41.ll.
Differential Revision: http://reviews.llvm.org/D6311
llvm-svn: 222375
As detailed at http://llvm.org/PR20728, due to an internal overflow in
APFloat::multiplySignificand the APFloat::fusedMultiplyAdd method can return
incorrect results for x87DoubleExtended (x86_fp80) values. This commonly
manifests as incorrect constant folding of libm fmal calls on x86. E.g.
fmal(1.0L, 1.0L, 3.0L) == 0.0L (should be 4.0L)
This patch fixes PR20728 by adding an extra bit to the significand for
intermediate results of APFloat::multiplySignificand, avoiding the overflow.
llvm-svn: 222374
A register operand that has a common sub-class with its instruction's
defined register class is not always legal. For example,
SReg_32 and M0Reg both have a common sub-class, but we can't
use an SReg_32 in instructions that expect a M0Reg.
This prevents the llvm.SI.sendmsg.ll test from failing when the fold
operand pass is added.
llvm-svn: 222368
When the BasicBlock containing the return instrution has a PHI with 2
incoming values, FoldReturnIntoUncondBranch will remove the no longer
used incoming value and remove the no longer needed phi as well. This
leaves us with a BB that no longer has a PHI, but the subsequent call
to FoldReturnIntoUncondBranch from FoldReturnAndProcessPred will not
remove the return instruction (which still uses the result of the call
instruction). This prevents EliminateRecursiveTailCall to remove
the value, as it is still being used in a basicblock which has no
predecessors.
The basicblock can not be erased on the spot, because its iterator is
still being used in runTRE.
This issue was exposed when removing the threshold on size for lifetime
marker insertion for named temporaries in clang. The testcase is a much
reduced version of peelOffOuterExpr(const Expr*, const ExplodedNode *)
from clang/lib/StaticAnalyzer/Core/BugReporterVisitors.cpp.
llvm-svn: 222354
This patch builds on http://reviews.llvm.org/D5598 to perform byte rotation shuffles (lowerVectorShuffleAsByteRotate) on pre-SSSE3 (palignr) targets - pre-SSSE3 is only enabled on i8 and i16 vector targets where it is a more definite performance gain.
I've also added a separate byte shift shuffle (lowerVectorShuffleAsByteShift) that makes use of the ability of the SLLDQ/SRLDQ instructions to implicitly shift in zero bytes to avoid the need to create a zero register if we had used palignr.
Differential Revision: http://reviews.llvm.org/D5699
llvm-svn: 222340
AliasSetTracker::addUnknown may create an AliasSet devoid of pointers
just to contain an instruction if no suitable AliasSet already exists.
It will then AliasSet::addUnknownInst and we will be done.
However, it's possible for addUnknown to choose an existing AliasSet to
addUnknownInst.
If this were to occur, we are in a bit of a pickle: removing pointers
from the AliasSet can cause the entire AliasSet to become destroyed,
taking our unknown instructions out with them.
Instead, keep track whether or not our AliasSet has any unknown
instructions.
This fixes PR21582.
llvm-svn: 222338
This is to be consistent with StringSet and ultimately with the standard
library's associative container insert function.
This lead to updating SmallSet::insert to return pair<iterator, bool>,
and then to update SmallPtrSet::insert to return pair<iterator, bool>,
and then to update all the existing users of those functions...
llvm-svn: 222334
Using AA during CodeGen is very useful for in-order cores. It is less useful for ooo cores. Also I find
enabling useAA for Cortex-A57 may generate worse code for some test cases. If useAA in codegen is improved
and benefical for ooo cores, we can enable it again.
llvm-svn: 222333
SeparateConstOffsetFromGEP can gives more optimizaiton opportunities related to GEPs, which benefits EarlyCSE
and LICM. By enabling these passes we can have better address calculations and generate a better addressing
mode. Some SPEC 2006 benchmarks (astar, gobmk, namd) have obvious improvements on Cortex-A57.
Reviewed in http://reviews.llvm.org/D5864.
llvm-svn: 222331
If LowerGEP is enabled, it can lower a GEP with multiple indices into GEPs with a single index
or arithmetic operations. Lowering GEPs can always extract structure indices. Lowering GEPs can
also give use more optimization opportunities. It can benefit passes like CSE, LICM and CGP.
Reviewed in http://reviews.llvm.org/D5864
llvm-svn: 222328
Having two ways to do this doesn't seem terribly helpful and
consistently using the insert version (which we already has) seems like
it'll make the code easier to understand to anyone working with standard
data structures. (I also updated many references to the Entry's
key and value to use first() and second instead of getKey{Data,Length,}
and get/setValue - for similar consistency)
Also removes the GetOrCreateValue functions so there's less surface area
to StringMap to fix/improve/change/accommodate move semantics, etc.
llvm-svn: 222319
It printed out base relocation table header as table entry.
This patch also makes llvm-readobj to not skip ABSOLUTE entries
becuase it was confusing.
llvm-svn: 222299
Summary:
move the code from BreakCriticalEdges::runOnFunction()
into a separate utility function llvm::SplitAllCriticalEdges()
so that it can be used independently.
No functionality change intended.
Test Plan: check-llvm
Reviewers: nlewycky
Reviewed By: nlewycky
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D6313
llvm-svn: 222288
This partially makes up for not having address spaces
used for alias analysis in some simple cases.
This is not yet enabled by default so shouldn't change anything yet.
llvm-svn: 222286
Assuming unmodeled side effects interferes with some scheduling
opportunities.
Don't put it in the base class of DS instructions since there
are a few weird effecting, non load/store instructions there.
llvm-svn: 222285
Under many circumstances the stack is not 32-byte aligned, resulting in the use of the vmovups/vmovupd/vmovdqu instructions when inserting ymm reloads/spills.
This minor patch adds these instructions to the isFrameLoadOpcode/isFrameStoreOpcode helpers so that they can be correctly identified and not be treated as folded reloads/spills.
This has also been noticed by http://llvm.org/bugs/show_bug.cgi?id=18846 where it was causing redundant spills - I've added a reduced test case at test/CodeGen/X86/pr18846.ll
Differential Revision: http://reviews.llvm.org/D6252
llvm-svn: 222281
shift-right for booleans (i1).
Arithmetic shift-right immediate with sign-/zero-extensions also works for
boolean values. Update the assert and the test cases to reflect that fact.
llvm-svn: 222272
shift-right for booleans (i1).
Logical shift-right immediate with sign-/zero-extensions also works for boolean
values. Update the assert and the test cases to reflect that fact.
llvm-svn: 222270
We would attempt to replace an frem's operand with the same operand.
This would cause InstCombine to think real work was done, causing
InstCombine to enter an infinite loop.
This fixes the second part of PR21576.
llvm-svn: 222265
Shifts also perform sign-/zero-extends to larger types, which requires us to emit
an integer extend instead of a simple COPY.
Related to PR21594.
llvm-svn: 222257
This should expose more of the actually used VALU
instructions to the machine optimization passes.
This also should help getting i1 handling into a better state.
For not entirly understood reasons, this fixes the split-scalar-i64-add.ll
test where a 64-bit add would only partially be moved to the VALU
resulting in use of undefined VCC.
llvm-svn: 222256
"optimizeCompareInstr" converts compares (cmp/cmn) into plain sub/add
instructions when the flags are not used anymore. This conversion is valid for
most instructions, but not all. Some instructions that don't set the flags
(e.g. sub with immediate) can set the SP, whereas the flag setting version uses
the same encoding for the "zero" register.
Update the code to also check for the return register before performing the
optimization to make sure that a cmp doesn't suddenly turn into a sub that sets
the stack pointer.
I don't have a test case for this, because it isn't easy to trigger.
llvm-svn: 222255
This change emits a COPY for a shift-immediate with a "zero" shift value.
This fixes PR21594 where we emitted a shift instruction with an incorrect
immediate operand.
llvm-svn: 222247
EarlyCSE is giving up on the current instruction immediately when it recognizes that the current instruction makes a previous store trivially dead. There's no reason to do this. Once the previous store has been deleted, it's perfectly legal to remember the value of the current store (for value forwarding) and the fact the store occurred (it could be dead too!).
Reviewed by: Hal
Differential Revision: http://reviews.llvm.org/D6301
llvm-svn: 222241
It is impossible for (x & INT_MAX) == 0 && x == INT_MAX to ever be true.
While this sort of reasoning should normally live in InstSimplify,
the machinery that derives this result is not trivial to split out.
llvm-svn: 222230
Usually global variables are in a retain list and instanciated before
any call to constructImportedEntityDIE is made. This isn't true for
forward declarations though.
The testcase for this change is generated by a clang patched to emit
such forward declarations (patch at http://reviews.llvm.org/D6173
which will land soon). The updated testcase tests more than just
global variables, it now tests every type of 'using' clause we
support.
llvm-svn: 222217
I added a pessimization in r217102 to prevent miscompiles when the
incremented induction variable was used in a comparison; it would be
poison.
Try to use the incremented induction variable more often when we can be
sure that the increment won't end in poison.
Differential Revision: http://reviews.llvm.org/D6222
llvm-svn: 222213
Having the operands at the back prevents subclasses from safely adding
fields. Move them to the front.
Instead of replicating the custom `malloc()`, `free()` and `DestroyFlag`
logic that was there before, overload `new` and `delete`.
I added calls to a new `GenericMDNode::dropAllReferences()` in
`LLVMContextImpl::~LLVMContextImpl()`. There's a maze of callbacks
happening during teardown, and this resolves them before we enter
the destructors.
Part of PR21532.
llvm-svn: 222211
Split `MDNode` into two classes:
- `GenericMDNode`, which is uniquable (and for now, always starts
uniqued). Once `Metadata` is split from the `Value` hierarchy, this
class will lose the ability to RAUW itself.
- `MDNodeFwdDecl`, which is used for the "temporary" interface, is
never uniqued, and isn't managed by `LLVMContext` at all.
I've left most of the guts in `MDNode` for now, but I'll incrementally
move things to the right places (or delete the functionality, as
appropriate).
Part of PR21532.
llvm-svn: 222205
use DIScopeRef.
A paired commit at clang will follow to show cases where we will use an
identifer for the context of a global variable.
rdar://18958417
llvm-svn: 222195
Change uniquing from a `FoldingSet` to a `DenseSet` with custom
`DenseMapInfo`. Unfortunately, this doesn't save any memory, since
`DenseSet<T>` is a simple wrapper for `DenseMap<T, char>`, but I'll come
back to fix that later.
I used the name `GenericDenseMapInfo` to the custom `DenseMapInfo` since
I'll be splitting `MDNode` into two classes soon: `MDNodeFwdDecl` for
temporaries, and `GenericMDNode` for everything else.
I also added a non-debug-info reduced version of a type-uniquing test
that started failing on an earlier draft of this patch.
Part of PR21532.
llvm-svn: 222191
This was resulting in use of a register after a kill.
For some reason this showed up as a problem in many tests
when moving the SIFixSGPRCopies pass closer to instruction
selection.
llvm-svn: 222175
When converting a switch to a lookup table we might have to generate a bitmaks
to encode and check for holes in the original switch statement.
The type of this mask depends on the number of switch statements, which can
result in illegal types for pretty much all architectures.
To avoid unnecessary type legalization and help FastISel this commit increases
the size of the bitmask to next power-of-2 value when necessary.
This fixes rdar://problem/18984639.
llvm-svn: 222168
The triple parser should only accept existing architecture names
when the triple starts with armv, armebv, thumbv or thumbebv.
Patch by Gabor Ballabas.
llvm-svn: 222129
SCEVDivision::divide constructed an object of SCEVDivision<Derived>
instead of Derived. divide would call visit which would cast the
SCEVDivision<Derived> to type Derived. As it happens,
SCEVDivision<Derived> and Derived currently have the same layout but
this is fragile and grounds for UB.
Instead, just construct Derived. No functional change intended.
llvm-svn: 222126
This was motivated by a bug which caused code like this to be
miscompiled:
declare void @take_ptr(i8*)
define void @test() {
%addr1.32 = alloca i8
%addr2.32 = alloca i32, i32 1028
call void @take_ptr(i8* %addr1)
ret void
}
This was emitting the following assembly to get the value of %addr1:
add r0, sp, #1020
add r0, r0, #8
However, "add r0, r0, #8" is not a valid Thumb1 instruction, and this
could not be assembled. The generated object file contained this,
resulting in r0 holding SP+8 rather tha SP+1028:
add r0, sp, #1020
add r0, sp, #8
This function looked like it could have caused miscompilations for
other combinations of registers and offsets (though I don't think it is
currently called with these), and the heuristic it used did not match
the emitted code in all cases.
llvm-svn: 222125
We were a little lax in a few areas:
- We pretended that import libraries were like any old COFF file, they
are not. In fact, they aren't really COFF files at all, we should
probably grow some specialized functionality to handle them smarter.
- Our symbol iterators were more than happy to attempt to go past the
end of the symbol table if you had a symbol with a bad list of
auxiliary symbols.
llvm-svn: 222124
Some optimisations in DAGCombiner cause miscompilations for targets that use
TargetLowering::UndefinedBooleanContent, because they assume that the results
of a SELECT_CC node are boolean values, and can be safely ANDed, ORed and
XORed. These optimisations are only valid for targets that use
ZeroOrOneBooleanContent or ZeroOrNegativeOneBooleanContent.
This is a follow-up to D6210/r221693.
llvm-svn: 222123
This is a simple optimization for switch table lookup:
It computes the output value directly with an (optional) mul and add if there is a linear mapping between index and output.
Example:
int f1(int x) {
switch (x) {
case 0: return 10;
case 1: return 11;
case 2: return 12;
case 3: return 13;
}
return 0;
}
generates:
define i32 @f1(i32 %x) #0 {
entry:
%0 = icmp ult i32 %x, 4
br i1 %0, label %switch.lookup, label %return
switch.lookup:
%switch.offset = add i32 %x, 10
ret i32 %switch.offset
return:
ret i32 0
}
llvm-svn: 222121
Indices into the table are stored in each MCRegisterClass instead of a pointer. A new method, getRegClassName, is added to MCRegisterInfo and TargetRegisterInfo to lookup the string in the table.
llvm-svn: 222118
This adds back r222061, but now calls initializePAEvalPass from the correct
library to avoid link problems.
Original message:
Don't make assumptions about the name of private global variables.
Private variables are can be renamed, so it is not reliable to make
decisions on the name.
The name is also dropped by the assembler before getting to the
linker, so using the name causes a disconnect between how llvm makes a
decision (var name) and how the linker makes a decision (section it is
in).
This patch changes one case where we were looking at the variable name to use
the section instead.
Test tuning by Michael Gottesman.
llvm-svn: 222117
It turns out that not all users of SCEVDivision want the same
signedness. Let the users determine which operation they'd like by
explicitly choosing SCEVUDivision or SCEVSDivision.
findArrayDimensions and computeAccessFunctions will use SCEVSDivision
while HowFarToZero will use SCEVUDivision.
llvm-svn: 222104
Summary:
Several places in DependenceAnalysis assumes both SCEVs in a subscript pair
share the same integer type. For instance, isKnownPredicate calls
SE->getMinusSCEV(X, Y) which asserts X and Y share the same type. However,
DependenceAnalysis fails to ensure this assumption when producing a subscript
pair, causing tests such as NonCanonicalizedSubscript to crash. With this
patch, DependenceAnalysis runs unifySubscriptType before producing any
subscript pair, ensuring the assumption.
Test Plan:
Added NonCanonicalizedSubscript.ll on which DependenceAnalysis before the fix
crashed because subscripts have different types.
Reviewers: spop, sebpop, jingyue
Reviewed By: jingyue
Subscribers: eliben, meheff, llvm-commits
Differential Revision: http://reviews.llvm.org/D6289
llvm-svn: 222100
HowFarToZero was supposed to use unsigned division in order to calculate
the backedge taken count. However, SCEVDivision::divide performs signed
division. Unless I am mistaken, no users of SCEVDivision actually want
signed arithmetic: switch to udiv and urem.
This fixes PR21578.
llvm-svn: 222093
A few things:
- computeKnownBits is relatively expensive, let's delay its use as long
as we can.
- Don't create two APInt values just to run computeKnownBits on a
ConstantInt, we already know the exact value!
- Avoid creating a temporary APInt value in order to calculate unary
negation.
llvm-svn: 222092
This patch teaches the DAGCombiner how to combine shuffles according to rules:
shuffle(shuffle(A, Undef, M0), B, M1) -> shuffle(B, A, M2)
shuffle(shuffle(A, B, M0), B, M1) -> shuffle(B, A, M2)
shuffle(shuffle(A, B, M0), A, M1) -> shuffle(B, A, M2)
llvm-svn: 222090
Updated X86TargetLowering::isShuffleMaskLegal to match SHUFP masks with commuted inputs and PSHUFD masks that reference the second input.
As part of this I've refactored isPSHUFDMask to work in a more general manner and allow it to match against either the first or second input vector.
Differential Revision: http://reviews.llvm.org/D6287
llvm-svn: 222087
This gets the correct NaN behavior based on the compare type
the hardware uses. This now passes the new piglit test I have
for this on SI.
Add stricter tests for the operand order.
llvm-svn: 222079
This is so it could potentially be used by SI. However, the current
implementation does not always produce correct results, so the
IntegerDivisionPass is being used instead.
llvm-svn: 222072
Make explicit the requirement that most IR values in `DIBuilder` are
`Constant`. This requires a follow-up change in clang.
Part of PR21532.
llvm-svn: 222070
Now that `MDString` and `MDNode` have a common base class, use it. Note
that it's not useful to assume subclasses of `Metadata` must be one or
the other since we'll be adding more subclasses soon enough.
Part of PR21532.
llvm-svn: 222064
Summary:
The current "WinEH" exception handling type is more about Itanium-style
LSDA tables layered on top of the Windows native unwind info format
instead of .eh_frame tables or EHABI unwind info. Use the name
"ItaniumWinEH" to better reflect the hybrid nature of the design.
Also rename isExceptionHandlingDWARF to usesItaniumLSDAForExceptions,
since the LSDA is part of the Itanium C++ ABI document, and not the
DWARF standard.
Reviewers: echristo
Subscribers: llvm-commits, compnerd
Differential Revision: http://reviews.llvm.org/D6279
llvm-svn: 222062
Private variables are can be renamed, so it is not reliable to make
decisions on the name.
The name is also dropped by the assembler before getting to the
linker, so using the name causes a disconnect between how llvm makes a
decision (var name) and how the linker makes a decision (section it is
in).
This patch changes one case where we were looking at the variable name to use
the section instead.
Test tuning by Michael Gottesman.
llvm-svn: 222061
We use to track quite a few "adjusted" offsets through the FrameLowering code
to account for changes in the prologue instructions as we went and allow the
emission of correct CFA annotations. However, we were missing a couple of cases
and the code was almost impenetrable.
It's easier to just add any stack-adjusting instruction to a list and emit them
together.
llvm-svn: 222057
When we folded the DPR alignment gap into a push, we weren't noting the extra
distance from the beginning of the push to the FP, and so FP ended up pointing
at an incorrect offset.
The .cfi_def_cfa_offset directives are still wrong in this case, but I think
that can be improved by refactoring.
llvm-svn: 222056
We would attempt to replace a fptrunc of an frem with an identical
fptrunc. This would cause the new fptrunc to be added to the worklist.
Of course, this results in an infinite loop because we will keep
visiting the newly created fptruncs.
This fixes PR21576.
llvm-svn: 222040
doing Load PRE"
This commit updates the failing test in
Analysis/TypeBasedAliasAnalysis/gvn-nonlocal-type-mismatch.ll
The failing test is sensitive to the order in which we process loads. This
version turns on the RPO traversal instead of the while DT traversal in GVN.
The new test code is functionally same just the order of loads that are
eliminated is swapped.
This new version also fixes an issue where GVN splits a critical edge and
potentially invalidate the RPO/DT iterator.
llvm-svn: 222039
If we have spilled the value of the m0 register, then we need to restore
it with v_readlane_b32 to a regular sgpr, because v_readlane_b32 can't
write to m0.
v_readlane_b32 can't write to m0, so
llvm-svn: 222036
This allows COFF targets to emit accelerator tables
when requested by -dwarf-accel-tables=Enable instead
of aborting. The test DebugInfo/cross-cu-inlining.ll
covers this on COFF platforms.
llvm-svn: 222034
ELF targets (and maybe COFF) use relocations when referring
to strings in the .debug_str section. Handle that in the
accelerator table dumper. This commit restores the
test/DebugInfo/cross-cu-inlining.ll test to its expected
platform independant form, validating that the fix works
(this test failed on linux boxes).
llvm-svn: 222029
Prior to this commit fmul and fadd binary operators were being canonicalized for
both scalar and vector versions. We now canonicalize add, mul, and, or, and xor
vector instructions.
llvm-svn: 222006
This reverts commit r221842 which was a revert of r221836 and of the
test parts of r221837.
This new version fixes an UB bug pointed out by David (along with
addressing some other review comments), makes some dumping more
resilient to broken input data and forces the accelerator tables
to be dumped in the tests where we use them (this decision is
platform specific otherwise).
llvm-svn: 222003
This patch adds builtin support for xvdivdp and xvdivsp, along with a
test case. Straightforward stuff.
There's a companion patch for Clang.
llvm-svn: 221983
In support of serializing executables, obj2yaml now records the virtual address
and size of sections. It also serializes whatever we strictly need from
the PE header, it expects that it can reconstitute everything else via
inference.
yaml2obj can reconstitute a fully linked executable.
In order to get executables correctly serialized/deserialized, other
bugs were fixed as a circumstance. We now properly respect file and
section alignments. We also avoid writing out string tables unless they
are strictly necessary.
llvm-svn: 221975
This teaches CoverageMapping::getCoveredFunctions to filter to a
particular file and uses that to replace most of the logic found in
llvm-cov report.
llvm-svn: 221962
getTargetConstant should only be used when you can guarantee the instruction
selected will be able to cope with the raw value. BUILD_VECTOR is rather too
generic for this so we should use getConstant instead. In that case, an
instruction can still consume the constant, but if it doesn't it'll be
materialised through its own round of ISel.
Should fix PR21352.
llvm-svn: 221961
Stop using `Value::getName()` to get the string behind an `MDString`.
Switch to `StringMapEntry<MDString>` so that we can find the string by
its coallocation.
This is part of PR21532.
llvm-svn: 221960
Hide the fact that `MDString`'s string is stored in `Value::Name` --
that's going to change soon. Update the only in-tree client that was
using it instead of `Value::getString()`.
Part of PR21532.
llvm-svn: 221951
Summary:
This has most of what is needed for mips fast-isel call lowering for O32.
What is missing I will add on the next patch because this patch is already too large.
It should not be doing anything wrong but it will punt on some cases that it is basically
capable of doing.
The mechanism is there for parameters to be passed on the stack but I have not enabled it because it serves as a way for now to prevent some of the strange cases of O32 register passing that I have not fully checked yet and have some issues.
The Mips O32 abi rules are very complicated as far how data is passed in floating and integer registers.
However there is a way to think about this all very simply and this implementation reflects that.
Basically, the ABI rules are written as if everything is passed on the stack and aligned as such.
Once that is conceptually done, it is nearly trivial to reassign those locations to registers and
then all the complexity disappears.
So I have told tablegen that all the data is passed on the stack and during the lowering I fix
this by assigning to registers as per the ABI doc.
This has been my approach and you can line up what I did with the ABI document and see 1 to 1 what
is going on.
Test Plan: callabi.ll
Reviewers: dsanders
Reviewed By: dsanders
Subscribers: jholewinski, echristo, ahatanak, llvm-commits, rfuhler
Differential Revision: http://reviews.llvm.org/D5714
llvm-svn: 221948
Fix for LLI failure on Windows\X86: http://llvm.org/PR5053
LLI.exe crashes on Windows\X86 when single precession floating point
intrinsics like the following are used: acos, asin, atan, atan2, ceil,
copysign, cos, cosh, exp, floor, fmin, fmax, fmod, log, pow, sin, sinh,
sqrt, tan, tanh
The above intrinsics are defined as inline-expansions in math.h, and are
not exported by msvcr120.dll (Win32 API GetProcAddress returns null).
For an FREM instruction, the JIT compiler generates a call to a stub for
the fmodf() intrinsic, and adds a relocation to fixup at load time. The
loader searches the libraries for the function, but fails because the
symbol is not exported. So, the call target remains NULL and the
execution crashes.
Since the math functions are loaded at JIT/runtime, the JIT can patch
CALL instruction directly instead of the searching the libraries'
exported symbols. However, this fix caused build failures due to
unresolved symbols like _fmodf at link time.
Therefore, the current fix defines helper functions in the Runtime
link/load library to perform the above operations. The address of these
helper functions are used to patch up the CALL instruction at load time.
Reviewers: lhames, rnk
Reviewed By: rnk
Differential Revision: http://reviews.llvm.org/D5387
Patch by Swaroop Sridhar!
llvm-svn: 221947