If a load follows a store and reloads data that the store has written to memory, Intel microarchitectures can in many cases forward the data directly from the store to the load, This "store forwarding" saves cycles by enabling the load to directly obtain the data instead of accessing the data from cache or memory.
A "store forward block" occurs in cases that a store cannot be forwarded to the load. The most typical case of store forward block on Intel Core microarchiticutre that a small store cannot be forwarded to a large load.
The estimated penalty for a store forward block is ~13 cycles.
This pass tries to recognize and handle cases where "store forward block" is created by the compiler when lowering memcpy calls to a sequence
of a load and a store.
The pass currently only handles cases where memcpy is lowered to XMM/YMM registers, it tries to break the memcpy into smaller copies.
breaking the memcpy should be possible since there is no atomicity guarantee for loads and stores to XMM/YMM.
Differential revision: https://reviews.llvm.org/D41330
Change-Id: Ib48836ccdf6005989f7d4466fa2035b7b04415d9
llvm-svn: 328973
Before, the instruction builder incorrectly assumed that only explicit reads
could have been associated with ReadAdvance entries.
This patch fixes the issue and adds a test to verify it.
llvm-svn: 328972
These should exist in all toolchains LLVM supports nowadays.
Enables making DataTypes.h a regular header instead of a .h.cmake file and
allows deleting a bunch of cmake goop (which should also speed up cmake
configure time a bit).
All the code this removes is 9+ years old.
https://reviews.llvm.org/D45155
llvm-svn: 328970
When running dsymutil as part of your build system, it can be desirable
for warnings to be part of the end product, rather than just being
emitted to the output stream. This patch upstreams that functionality.
Differential revision: https://reviews.llvm.org/D44639
llvm-svn: 328965
Found by looking through the output of
for f in $(grep -o '\bHAVE_[A-Z0-9_]*\b' llvm/cmake/config-ix.cmake); do
echo $f $(git grep $f '*' | wc -l);
done
in the monorepo.
llvm-svn: 328957
This patch adds a set of unstable C API bindings to the DIBuilder interface for
creating structure, function, and aggregate types.
This patch also removes the existing implementations of these functions from
the Go bindings and updates the Go API to fit the new C APIs.
llvm-svn: 328953
Give them both the same itineraries. Add hasSideEffects = 0 to ADOX since they don't have patterns. Rename source operands to $src1 and $src2 instead of $src0 and $src. Add ReadAfterLd to the memory form SchedRW.
llvm-svn: 328952
This also moves to define it in the same way as ADCX which seems to use
constraints a bit better.
This is pulled out of the review for reducing the use of popf for
restoring EFLAGS, but is independent. There are still more problems with
our definitions for these instructions that Craig is going to look at
but this is at least less broken and he can start from this to improve
them more fully.
Thanks to Craig for the review here.
llvm-svn: 328945
for X86's instruction information. I've now got a second patch under
review that needs these same APIs. This bit is nicely orthogonal and
obvious, so landing it. NFC.
llvm-svn: 328944
Summary:
r327219 added wrappers to std::sort which randomly shuffle the container before sorting.
This will help in uncovering non-determinism caused due to undefined sorting
order of objects having the same key.
To make use of that infrastructure we need to invoke llvm::sort instead of std::sort.
Note: This patch is one of a series of patches to replace *all* std::sort to llvm::sort.
Refer the comments section in D44363 for a list of all the required patches.
Reviewers: JDevlieghere, zturner, echristo, dberris, friss
Reviewed By: echristo
Subscribers: gbedwell, llvm-commits
Differential Revision: https://reviews.llvm.org/D45141
llvm-svn: 328943
Summary:
r327219 added wrappers to std::sort which randomly shuffle the container before sorting.
This will help in uncovering non-determinism caused due to undefined sorting
order of objects having the same key.
To make use of that infrastructure we need to invoke llvm::sort instead of std::sort.
Note: This patch is one of a series of patches to replace *all* std::sort to llvm::sort.
Refer the comments section in D44363 for a list of all the required patches.
Reviewers: echristo, zturner, mzolotukhin, lhames
Reviewed By: echristo
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D45135
llvm-svn: 328940
Summary:
This is in preparation for the new dimension-aware image intrinsics,
which I'd rather not have to list here by hand.
Change-Id: Iaa16e3a635a11283918ce0d9e1e618591b0bf6fa
Reviewers: arsenm, rampitec, b-sumner
Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, llvm-commits, t-tye
Differential Revision: https://reviews.llvm.org/D44938
llvm-svn: 328939
Summary:
Avoids having to list all intrinsics manually.
This is in preparation for the new dimension-aware image intrinsics,
which I'd rather not have to list here by hand.
Change-Id: If7ced04998397ef68c4cb8f7de66b5050fb767e5
Reviewers: arsenm, rampitec, b-sumner
Subscribers: kzhuravl, wdng, mgorny, yaxunl, dstuttard, tpr, llvm-commits, t-tye
Differential Revision: https://reviews.llvm.org/D44937
llvm-svn: 328938
Summary:
We will use this in the AMDGPU backend in a subsequent patch
in the stack to lookup target-specific per-intrinsic information.
The generic CodeGenIntrinsic machinery is used to ensure that,
even though we don't calculate actual enum values here, we do
get the intrinsics in the right order for the binary search
index.
Change-Id: If61cd5587963a4c5a1cc53df1e59c5e4dec1f9dc
Reviewers: arsenm, rampitec, b-sumner
Subscribers: wdng, tpr, llvm-commits
Differential Revision: https://reviews.llvm.org/D44935
llvm-svn: 328937
Summary:
r327219 added wrappers to std::sort which randomly shuffle the container before sorting.
This will help in uncovering non-determinism caused due to undefined sorting
order of objects having the same key.
To make use of that infrastructure we need to invoke llvm::sort instead of std::sort.
Note: This patch is one of a series of patches to replace *all* std::sort to llvm::sort.
Refer the comments section in D44363 for a list of all the required patches.
Reviewers: echristo, zturner, samsonov
Reviewed By: echristo
Subscribers: JDevlieghere, llvm-commits
Differential Revision: https://reviews.llvm.org/D45134
llvm-svn: 328935
Summary:
Adds -import-cutoff=N which will stop importing during the thin link
after N imports. Default is -1 (no limit).
Reviewers: wmi
Subscribers: inglorion, llvm-commits
Differential Revision: https://reviews.llvm.org/D45127
llvm-svn: 328934
If a loop has a loop exiting latch, it can be profitable
to rotate the loop if it leads to the simplification of
a phi node. Perform rotation in these cases even if loop
rotate itself didnt simplify the loop to get there.
Differential Revision: https://reviews.llvm.org/D44199
llvm-svn: 328933
This Promote flag was alwasys set to true except in the default case. But in the default case we don't need to set PVT and can just return false.
llvm-svn: 328926
Summary:
r327219 added wrappers to std::sort which randomly shuffle the container before sorting.
This will help in uncovering non-determinism caused due to undefined sorting
order of objects having the same key.
To make use of that infrastructure we need to invoke llvm::sort instead of std::sort.
Note: This patch is one of a series of patches to replace *all* std::sort to llvm::sort.
Refer D44363 for a list of all the required patches.
Reviewers: sanjoy, dexonsmith, hfinkel, RKSimon
Reviewed By: dexonsmith
Subscribers: david2050, llvm-commits
Differential Revision: https://reviews.llvm.org/D44944
llvm-svn: 328925
fptosi / fptoui round towards zero, and that's the same behavior as ISD::FTRUNC,
so replace a pair of casts with the equivalent node. We don't have to account for
special cases (NaN, INF) because out-of-range casts are undefined.
Differential Revision: https://reviews.llvm.org/D44909
llvm-svn: 328921
Summary:
It seems many CPUs don't implement this instruction as well as the other vector multiplies. Often using a multi uop flow. Silvermont in particular has a 7 uop flow with 11 cycle throughput. Sandy Bridge implements it as a single uop with 5 cycle latency and 1 cycle throughput. But Haswell and later use 2 uops with 10 cycle latency and 2 cycle throughput.
This patch adds a new X86SchedWritePair we can use to tag this instruction separately. I've provided correct information for Silvermont, Btver2, and Sandy Bridge. I've removed the InstRWs for SandyBridge. I've left Haswell/Broadwell/Skylake InstRWs in place because I wasn't sure how to account for the different load latency between 128 and 256 bits. I also left Znver1 InstRWs in place because the existing values don't match Agner's spreadsheet.
I also left a FIXME in the SandyBridge model because it being used for the "generic" model is too optimistic for the 256/512-bit versions since those are multiple uops on all known CPUs.
Reviewers: RKSimon, GGanesh, courbet
Reviewed By: RKSimon
Subscribers: gchatelet, gbedwell, andreadb, llvm-commits
Differential Revision: https://reviews.llvm.org/D44972
llvm-svn: 328914
Summary:
Useful to selectively disable importing into specific modules for
debugging/triaging/workarounds.
Reviewers: eraman
Subscribers: inglorion, llvm-commits
Differential Revision: https://reviews.llvm.org/D45062
llvm-svn: 328909
Make sure ThinLTO with caching doesn't use non-atomic writes to the cache file (to prevent data races and cache files corruption).
1. Place temp file to the same place where the caching directory is (instead of creating it the directory pointed to by TMP/TEMP variable). This will help to prevent using non-atomic rename and falling back to non-atomic "direct" write to the cache file.
2. if rename failed do not write to the cache file directly (direct write to the file is non-atomic and could cause data race conditions).
3. if cache file doesn't exist (e.g., because 'rename' failed or because some other reasons), bypass using the cache altogether.
Differential Revision: https://reviews.llvm.org/D45076
llvm-svn: 328904
Summary:
This exposes WebAssembly passes for use on the command line (as
arguments to -print-before and the like).
Reviewers: dschuff, sunfish
Subscribers: MatzeB, jfb, sbc100, llvm-commits, aheejin
Differential Revision: https://reviews.llvm.org/D45103
llvm-svn: 328901
Two memory instructions with a dependency only on the address register
between the two (the first one of them being post-incrememnt) can be
packetized together after the offset on the second was updated to the
incremement value. Make sure that the new offset is valid for the
instruction.
llvm-svn: 328897
This patch resolves link errors when the address of a static function is taken, and that function is uninstrumented by DFSan.
This change resolves bug 36314.
Patch by Sam Kerner!
Differential Revision: https://reviews.llvm.org/D44784
llvm-svn: 328890
Summary:
Make NVPTX require structured CFG. Added a temporary flag to
"roll back" the behavior for easy deployment.
Combined with D45008, this fixes several internal Nvidia GPU test
failures that we suspect to be ptxas miscompiles (PR27738).
Reviewers: jlebar
Subscribers: jholewinski, sanjoy, llvm-commits, hiraditya
Differential Revision: https://reviews.llvm.org/D45070
llvm-svn: 328885
Summary:
Tail duplication easily breaks the structure of CFG, e.g. duplicating on
a region entry. If the structure is intended to be preserved, then we
may want to configure tail duplication, or disable it for structured
CFG. From our benchmark results disabling it doesn't cause performance
regression.
Notice that this currently affects AMDGPU backend. In the next patch, I
also plan to turn on requiresStructuredCFG for NVPTX.
All unit tests still pass.
Reviewers: jlebar, arsenm
Subscribers: jholewinski, sanjoy, wdng, tpr, hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D45008
llvm-svn: 328884
Summary:
Previous revision caused a leak in the echo test that got caught by the ASAN bots because of missing free of the handlers array and was reverted in r328759. Resubmitting the patch with that correction.
Add support for cleanupret, catchret, catchpad, cleanuppad and catchswitch and their associated accessors.
Test is modified from SimplifyCFG because it contains many diverse usages of these instructions.
Reviewers: whitequark, deadalnix
Reviewed By: whitequark
Subscribers: llvm-commits, vlad.tsyrklevich
Differential Revision: https://reviews.llvm.org/D45100
llvm-svn: 328883
This will show more detail when using `llvm-pdbutil explain` on an
offset in the DBI or PDB streams. Specifically, it will dig into
individual header fields and substreams to give a more precise
description of what the byte represents.
llvm-svn: 328878
The code has bugs dealing with -0.0.
Since D44550 introduced FABS pattern folding in InstCombine,
this patch removes the now-redundant code that causes
https://bugs.llvm.org/show_bug.cgi?id=36600.
Patch by Mikhail Dvoretckii!
Differential Revision: https://reviews.llvm.org/D44683
llvm-svn: 328872
instructions.
In the Btver2 model, there are a few InstRW overrides that don't specify a
ReadAfterLd for the register input operand.
As a result, a few AVX variants of horizontal operations and most vector logic
operations with a folded memory operand don't have a ReadAdvance info associated
to their input register operands.
llvm-svn: 328865
Verify that the ReadAfterLd is correctly applied to FMA and 4-ops variable blend
instructions.
As Craig pointed out in D44726, some Intel models still have to be fixed.
llvm-svn: 328861
This change adds a couple of tests to verify the change introduced by revision
328823 ([X86] Correct the placement of ReadAfterLd in BEXTR and BZHI).
llvm-svn: 328859
Summary:
The phase attempts to transform operations that extract a portion of a value
into an SDWA src operand in cases where that value is used only once. It
was not prepared for this use to be the preserved portion of a value for
dst:UNUSED_PRESERVE, resulting in a crash or assert.
This change either rejects the illegal SDWA attempt, or in the case where
dst:WORD_1 and the src_sel would be WORD_0, removes the unneeded
extract instruction.
Reviewers: arsenm, #amdgpu
Reviewed By: arsenm, #amdgpu
Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits
Differential Revision: https://reviews.llvm.org/D44364
llvm-svn: 328856
For Hexagon, peeling loops with small runtime trip count is beneficial for our
benchmarks. We set PeelCount in HexagonTargetInfo.cpp and we use PeelCount set
by the target for computing the desired peel count.
Differential Revision: https://reviews.llvm.org/D44880
llvm-svn: 328854
MachineCopyPropagation::CopyPropagateBlock has a bunch of special
handling for COPY instructions. This handling assumes that COPY
instructions do not modify the source of the copy; this is wrong if
the COPY destination overlaps the source.
To fix the bug, check explicitly for this situation, and fall back to
the generic instruction handling.
This bug can't happen for most register classes because they don't
have this sort of overlap, but there are a few register classes
where this is possible. The testcase uses the AArch64 QQQQ register
class.
Differential Revision: https://reviews.llvm.org/D44911
llvm-svn: 328851
Sometimes the operand comes after the memory operand so we need 5 ReadDefaults first.
I suspect we also need to do something for the mask operand for masked avx512 instructions? I'm not sure if the mask should be ReadAfterLd or not since it can mask faults. If it shouldn't be ReadAfterLd then we're probably wrong for zero masking instructions already.
Differential Revision: https://reviews.llvm.org/D44726
llvm-svn: 328834
While the stack access instructions don't care about
alignment > 4, some transformations on the pointer calculation
do make assumptions based on knowing the low bits of a pointer
are 0. If a stack object ends up being accessed through its
absolute address (relative to the kernel scratch wave offset),
the addressing expression may depend on the stack frame being
properly aligned. This was breaking in a testcase due to the
add->or combine.
I think some of the SP/FP handling logic is still backwards,
and overly simplistic to support all of the stack features.
Code which tries to modify the SP with inline asm for example
or variable sized objects will probably require redoing this.
llvm-svn: 328831
The memory form of these instructions only read an input from memory. They don't have any register operands.
Differential Revision: https://reviews.llvm.org/D44836
llvm-svn: 328828
These instructions have the memory operand before the register operand. So we need to put ReadDefault for all the load ops first. Then the ReadAfterLd
Differential Revision: https://reviews.llvm.org/D44838
llvm-svn: 328823
As a further refinement on:
r328274 - For llvm-nm and Mach-O files also use function starts info in some cases when printing symbols
we want to special case a redacted LC_MAIN so it is easier to find.
rdar://38978929
llvm-svn: 328820
Summary:
r327219 added wrappers to std::sort which randomly shuffle the container before sorting.
This will help in uncovering non-determinism caused due to undefined sorting
order of objects having the same key.
To make use of that infrastructure we need to invoke llvm::sort instead of std::sort.
Note: This patch is one of a series of patches to replace *all* std::sort to llvm::sort.
Refer the comments section in D44363 for a list of all the required patches.
Reviewers: sdardis, RKSimon, dsanders, atanasyan
Reviewed By: atanasyan
Subscribers: atanasyan, arichardson, llvm-commits
Differential Revision: https://reviews.llvm.org/D44869
llvm-svn: 328815
There are two FPMs in an MSF file, the idea being that for
incremental updates you can write to the alternate one and then
atomically swap them on commit. LLVM defaulted to using FPM1
on the first commit, but this differs from Microsoft's behavior
which is to default to using FPM2 on the first commit. To
eliminate some byte-level file differences, this patch changes
LLVM's default to also be FPM2.
Additionally, LLVM was trying to be "smart" about marking FPM
pages allocated. In addition to marking every page belonging
to the alternate FPM as unallocated, LLVM also marked pages at
the end of the main FPM which were not needed as unallocated.
In order to match the behavior of Microsoft-generated PDBs, we
now always mark every FPM block as allocated, regardless of
whether it is in the main FPM or the alt FPM, and regardless of
whether or not it describes blocks which are actually in the file.
This has the side benefit of simplifying our code.
llvm-svn: 328812
When we determine that a field belongs to an MSF super block or
the free page map, we wouldn't print any additional information.
With this patch, we now print the value of the field (for super
block fields) or the allocation status of the specified byte (in
the case of offsets in the FPM).
llvm-svn: 328808
Currently EVT is in the IR layer only because of Function.cpp needing a very small piece of the functionality of EVT::getEVTString(). The rest of EVT is used in codegen making CodeGen a better place for it.
The previous code converted a Type* to EVT and then called getEVTString. This was only expected to handle the primitive types from Type*. Since there only a few primitive types, we can just print them as strings directly.
Differential Revision: https://reviews.llvm.org/D45017
llvm-svn: 328806
DWARF v5 specifies that the root file (also given in the DW_AT_name
attribute of the compilation unit DIE) should be emitted explicitly to
the line table's list of files. This makes the line table more
independent of the .debug_info section.
We emit the new syntax only for DWARF v5 and later.
Fixes the bug found by asan. Also XFAIL the new test for Darwin, which
is stuck on DWARF v2, and fix up other tests so they stop failing on
Windows. Last but not least, don't break "clang -g" of an assembler
file that has .file directives in it.
Differential Revision: https://reviews.llvm.org/D44054
llvm-svn: 328805
We were trying to dig into the super block fields and print a
description of the field at the specified offset, but we were
printing the wrong field due to an off-by-one-field-error.
llvm-svn: 328804
When investigating various things, we often have a file offset
and what to know what's in the PDB at that address. For example
we may be doing a binary comparison of two LLD-generated PDBs
to look for sources of non-determinism, or we may wish to compare
an LLD-generated PDB with a Microsoft generated PDB for sources
of byte-for-byte incompatibility. In these cases, we can do a
binary diff of the two files, and once we find a mismatched byte
we can use explain to figure out what that byte is, immediately
honining in on the problem.
This patch implements this by trying to narrow the meaning of
a particular file offset down as much as possible.
Differential Revision: https://reviews.llvm.org/D44959
llvm-svn: 328799
In r312664 (D36404), JumpThreading stopped threading edges into
loop headers. Unfortunately, I observed a significant performance
regression as a result of this change. Upon further investigation,
the problematic pattern looked something like this (after
many high level optimizations):
while (true) {
bool cond = ...;
if (!cond) {
<body>
}
if (cond)
break;
}
Now, naturally we want jump threading to essentially eliminate the
second if check and hook up the edges appropriately. However, the
above mentioned change, prevented it from doing this because it would
have to thread an edge into the loop header.
Upon further investigation, what is happening is that since both branches
are threadable, JumpThreading picks one of them at arbitrarily. In my
case, because of the way that the IR ended up, it tended to pick
the one to the loop header, bailing out immediately after. However,
if it had picked the one to the exit block, everything would have
worked out fine (because the only remaining branch would then be folded,
not thraded which is acceptable).
Thus, to fix this problem, we can simply eliminate loop headers from
consideration as possible threading targets earlier, to make sure that
if there are multiple eligible branches, we can still thread one of
the ones that don't target a loop header.
Patch by Keno Fischer!
Differential Revision: https://reviews.llvm.org/D42260
llvm-svn: 328798
We should align the value of the field, not the overall section offset.
This distinction matters if one of the debug_names contributions is not
of size which is a multiple of four. The dwarf producers may choose to
emit rounded contributions, but they are not required to do so. In the
latter case, without this patch we would corrupt the parsing state, as
we would adjust the offset even if subsequent contributions contained
correctly rounded augmentation strings.
llvm-svn: 328796
The tool was passing the wrong operand index to method
MCSubtargetInfo::getReadAdvanceCycles(). That method requires a "UseIdx", and
not the operand index. This was found when testing X86 code where instructions
had a memory folded operand.
This patch fixes the issue and adds test read-advance-1.s to ensure that
the ReadAfterLd (a ReadAdvance of 3cy) information is correctly used.
llvm-svn: 328790
Before this patch we were parsing the attributes as section offsets, as
that is what apple_names is doing. However, this is not correct as DWARF
v5 specifies that this attribute should use the Reference form class.
This also updates all the testcases (except the ones that deliberately
pass a different form) to use the correct form class.
llvm-svn: 328773
Fixes for "lets" references which should be "let's" in the Kaleidoscope
tutorial.
Patch by: Robin Dupret
Differential Revision: https://reviews.llvm.org/D44990
llvm-svn: 328772
We are re-adding all the bitcasts, constant masks and target shuffles to the work list for no apparent gain.
Found while investigating adding SimplifyDemandedVectorElts to target shuffles.
Differential Revision: https://reviews.llvm.org/D44942
llvm-svn: 328771
Summary:
As we are only doing X.0.Z releases (not using the minor version), there is no need to keep -X.Y in the version.
Like patch https://reviews.llvm.org/D41808, I propose that we rename libLLVM-7.0svn.so to libLLVM-7svn.so
This patch will also rename downstream libraries like liblldb-7.0 to liblldb-7
Reviewers: axw, beanz, dim, hans
Reviewed By: dim, hans
Subscribers: mgorny, llvm-commits
Differential Revision: https://reviews.llvm.org/D41869
llvm-svn: 328768
I believe the role of ehDataReg has been replaced by MipsABIInfo::GetEhDataReg, thus removing the dead code.
Patch By: Wei-Ren Chen.
Reviewers: ehostunreach, sdardis
Differential Revision: https://reviews.llvm.org/D44867
llvm-svn: 328767
The existing LoopRotation.cpp is implemented as one of loop passes instead of
being a utility. The user cannot easily perform the loop rotation selectively
(or on demand) under different optimization level. For example, the loop
rotation is needed as part of the logic to convert a loop into a loop with
bottom test for a transformation. If the loop rotation is simply added as a
loop pass before the transformation, the pass is skipped if it is compiled at
–O0 or if it is explicitly disabled by the user, causing the compiler to
generate incorrect code. Furthermore, as a loop pass it will rotate all loops
instead of just the relevant loops.
We provide a utility interface for the loop rotation so that the loop rotation
can be called on demand. The changeset is as follows:
- Create a new file lib/Transforms/Utils/LoopRotationUtils.cpp and move the main
implementation of class LoopRotate into this file.
- Create a new file llvm/include/Transform/Utils/LoopRotationUtils.h with the
interface LoopRotation(...).
- Original LoopRotation.cpp is changed to use the utility function LoopRotation
in LoopRotationUtils.cpp. This is done in the same way community did for
mem-to-reg implementation.
Patch by Jin Lin!
Differential Revision: https://reviews.llvm.org/D44595
llvm-svn: 328766
Otherwise the definitions can't see the extern C declarations and get
name mangled, making it impossible for users to call them. This breaks
the Go bindings.
llvm-svn: 328765
The IntelPrinter and the ATTPrinter produce the same strings for the same input. We already use the ATTPrinter explicitly in several other places.
llvm-svn: 328762
Summary:
Add support for cleanupret, catchret, catchpad, cleanuppad and catchswitch and their associated accessors.
Test is modified from SimplifyCFG because it contains many diverse usages of these instructions.
Reviewers: whitequark, deadalnix, echristo
Reviewed By: echristo
Subscribers: llvm-commits, harlanhaskins
Differential Revision: https://reviews.llvm.org/D44496
llvm-svn: 328759
Eli pointed out that variadic functions are totally a thing, so this
assert is incorrect.
No test-case is provided, since the only way this assert fires is if a
specific DenseMap falls back to doing `isEqual` checks, and that seems
fairly brittle (and requires a pyramid of growing
`call void (i8, ...) @varargs(i8 0)`).
llvm-svn: 328755
We use a `DenseMap<MemoryLocOrCall, MemlocStackInfo>` to keep track of
prior work when optimizing uses in MemorySSA. Because we weren't
accounting for callsite arguments in either the hash code or equality
tests for `MemoryLocOrCall`s, we optimized uses too aggressively in
some rare cases.
Fix by Daniel Berlin.
Should fix PR36883.
llvm-svn: 328748
Support includes this header for the typedefs - but logically it's part
of the MC/Disassembler library that implements the functions. Split the
header so as not to create a circular dependency.
This is another case where probably inverting the llvm-c implementation
might be best (rather than core llvm libraries implementing the parts of
llvm-c - instead llvm-c could be its own library, depending on all the
parts of LLVM's core libraries to then implement llvm-c on top of
them... if that makes sense)
llvm-svn: 328744
Most of these were optional matches at the end of the strings, but since the strings themselves are prefix matches by default you don't need to check for something optional at the end.
I've left the 'b' on memory instructions where it means 'broadcast' because I'm not sure those really have the same load latency and we may need to split them explicitly in the future.
llvm-svn: 328730
Summary: Mark CFG is preserved since this pass do not make any change in CFG.
Reviewers: sebpop, mzolotukhin, mcrosier
Reviewed By: mzolotukhin
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D44845
llvm-svn: 328727
These instructions have been around for a long time, but we
haven't supported intrinsics for them. The "new" versions use
the CSx register for the start of the buffer instead of the K
field in the Mx register.
We need to use pseudo instructions for these instructions until
after register allocation. The problem is that these instructions
allocate a M0/CS0 or M1/CS1 pair. But, we can't generate code for
the CSx set-up until after register allocation when the Mx
register has been fixed for the instruction.
There is a related clang patch.
Patch by Brendon Cahoon.
llvm-svn: 328724
This commit simplifies the call outlining logic by removing references to the
Function associated with the callee. To do this, it requires that valid
callee save info is available to the outliner.
llvm-svn: 328719
This allows syntax like:
$ llvm-ar -c -r -u file.a file.o
This is in addition to the other formats that are already supported:
$ llvm-ar cru file.a file.o
$ llvm-ar -cru file.a file.o
Patch by Tom Anderson!
Differential Revision: https://reviews.llvm.org/D44452
llvm-svn: 328716
Summary:
There aren't any matchers for the three vector operations: insertelement, extractelement, and
shufflevector. This patch adds them as well as corresponding unit tests.
llvm-svn: 328709
This reverts commit 771829b640a5494ab65c810dd6b4330522bf3a33 (rr328598)
Hopefully the test will now pass on the bots.
rdar://problem/38774530
llvm-svn: 328703
The `shtest-timeout.py` test was failing intermittently. It looks like
the issue is that on a resource constrained system lit is unable to run
`quick_then_slow.py` twice and print out the messages the tests expects
within the one second timeout.
The underlying issue is that the test is dependent on the performance of
the host machine is a rather fragile way. This is due to hardcoding
timeout values and having assumptions that the host machine is able to
perform a certain amount of work within the hardcoded timeout values.
We could increase the timeout values but that doesn't really fix the
underlying issue. Instead this patch removes one of fragile assumptions
in the hope that this will be enough to fix the bots.
There are other fragile assumptions in this test (e.g. `quick.py` can be
executed in less than 1 second). If the bots continue to fail we'll have
to revisit this.
rdar://problem/38774530
llvm-svn: 328702
This reverts commit r328676.
Commit r328676 broke the -no-integrated-as flag necessary to build Linux kernel with Clang:
$ cat t.c
void foo() {}
$ clang -no-integrated-as -c t.c -g
/tmp/t-dcdec5.s: Assembler messages:
/tmp/t-dcdec5.s:8: Error: file number less than one
clang-7.0: error: assembler command failed with exit code 1 (use -v to see invocation)
llvm-svn: 328699
Similar to r328694. The number of micro opcodes should be 2 for those
instructions.
This was found when testing AVX code for BtVer2 using llvm-mca.
llvm-svn: 328698
This is a step towards the upcoming KMSAN implementation patch.
KMSAN is going to prepend a special basic block containing
tool-specific calls to each function. Because we still want to
instrument the original entry block, we'll need to store it in
ActualFnStart.
For MSan this will still be F.getEntryBlock(), whereas for KMSAN
it'll contain the second BB.
llvm-svn: 328697
This reverts commit 0daf86291d3aa04d3cc280cd0ef24abdb0174981.
It was causing an assert in test/CodeGen/AMDGPU/amdpal.ll only on a
release-with-asserts build. I will resubmit the change when I have fixed
that.
Change-Id: If270594eba27a7dc4076bdeab3fa8e6bfda3288a
llvm-svn: 328695
The Jaguar backend natively supports 128-bit data types. Operations on YMM
registers are split into two COPs (complex operations). Each COP consumes a slot
in the dispatch group, and in the reorder buffer.
The scheduling model for Jaguar should mark those instructions as `let
NumMicroOps = 2`.
This was found when testing AVX code for BtVer2 using llvm-mca.
llvm-svn: 328694
This is a step towards the upcoming KMSAN implementation patch.
The isStore argument is to be used by getShadowOriginPtrKernel(),
it is ignored by getShadowOriginPtrUserspace().
Depending on whether a memory access is a load or a store, KMSAN
instruments it with different functions, __msan_metadata_ptr_for_load_X()
and __msan_metadata_ptr_for_store_X().
Those functions may return different values for a single address,
which is necessary in the case the runtime library decides to ignore
particular accesses.
llvm-svn: 328692
Follow up patch of r328313 to support the UseVMOVSR constraint. Removed
some unneeded instructions from the test and removed some stray
comments.
Differential Revision: https://reviews.llvm.org/D44941
llvm-svn: 328691
Summary:
RegisterCoalescer::removePartialRedundancy tries to hoist B = A from
BB0/BB2 to BB1:
BB1:
...
BB0/BB2: ----
B = A; |
... |
A = B; |
|-------
|
It does so if a number of conditions are fulfilled. However, it failed
to check if B was used by any of the terminators in BB1. Since we must
insert B = A before the terminators (since it's not a terminator itself),
this means that we could erroneously insert a new definition of B before a
use of it.
Reviewers: wmi, qcolombet
Reviewed By: wmi
Subscribers: MatzeB, llvm-commits, sdardis
Differential Revision: https://reviews.llvm.org/D44918
llvm-svn: 328689
Previously this crashed because a nullptr (returned by
createLocalIndirectStubsManagerBuilder() on platforms without
indirection support) functor was unconditionally invoked.
Patch by Andres Freund. Thanks Andres!
llvm-svn: 328687
Summary:
Since wasm EH does not use landingpad instructions, these instructions
provide exception pointer and selector values until we lower them in
WasmEHPrepare.
Reviewers: jgravelle-google
Subscribers: jfb, sbc100, jgravelle-google, sunfish, llvm-commits
Differential Revision: https://reviews.llvm.org/D44930
llvm-svn: 328678
DWARF v5 specifies that the root file (also given in the DW_AT_name
attribute of the compilation unit DIE) should be emitted explicitly to
the line table's list of files. This makes the line table more
independent of the .debug_info section.
Fixes the bug found by asan. Also XFAIL the new test for Darwin, which
is stuck on DWARF v2, and fix up other tests so they stop failing on
Windows. Last but not least, don't break "clang -g" of an assembler
file that has .file directives in it.
Differential Revision: https://reviews.llvm.org/D44054
llvm-svn: 328676
If an ADRP appears with, say, a CPI operand, we shouldn't outline it.
This moves the check for unsafe operands so that it occurs before the special-case
for ADRPs. Also add a test for outlining ADRPs.
llvm-svn: 328674
Summary:
For OS type AMDPAL, the scratch descriptor is loaded from offset 0 of
the GIT, whose 32 bit pointer is in s0 (s8 for gfx9 merged shaders).
This commit fixes that to use offset 0x10 instead of offset 0 for a
compute shader, per the PAL ABI spec.
Reviewers: kzhuravl, nhaehnle, timcorringham
Subscribers: kzhuravl, wdng, yaxunl, t-tye, llvm-commits, dstuttard, nhaehnle, arsenm
Differential Revision: https://reviews.llvm.org/D44468
Change-Id: I93dffa647758e37f613bb5e0dfca840d82e6d26f
llvm-svn: 328673
If a given split type unit does not have source locations, don't have
it refer to the split line table.
If no split type unit refers to the split line table, don't emit the
line table at all.
This will save a little space on rare occasions, but also refactors
things a bit to improve which class is responsible for what.
Responding to review comments on r326395.
Differential Revision: https://reviews.llvm.org/D44220
llvm-svn: 328670
Summary:
Rev 327580 "[CodeGen] Use MIR syntax for MachineMemOperand printing"
broke -print-machineinstrs for us on AMDGPU, because we have custom
pseudo source values, and MIR serialization does not implement that.
This commit at least restores the functionality of -print-machineinstrs,
even if it does not properly implement the missing MIR serialization
functionality.
Differential Revision: https://reviews.llvm.org/D44871
Change-Id: I44961c0b90bf6d48c01484ed7a4e466fd300db66
llvm-svn: 328668
Currently MOVMSK instructions use the WriteVecLogic class, which is a very poor choice given that MOVMSK involves a SSE->GPR transfer.
Differential Revision: https://reviews.llvm.org/D44924
llvm-svn: 328664
The existing YAML Output::scalarString code path includes a partial and
incorrect implementation of YAML escaping logic. In particular, the logic put
in place in rL321283 escapes non-printable bytes only if they are not part of a
multibyte UTF8 sequence; implicitly this means that all multibyte UTF8
sequences -- printable and non -- are passed through verbatim.
The simplest solution to this is to direct the Output::scalarString method to
use the standalone yaml::escape function, and this _almost_ works, except that
the existing code in that function _over_ escapes: any multibyte UTF8 sequence
is escaped, even printable ones. While this is permitted for YAML, it is also
more aggressive (and hard to read for non-English locales) than necessary,
and the entire point of rL321283 was to back off such aggressive over-escaping.
So in this change, I have both redirected Output::scalarString to use
yaml::escape _and_ modified yaml::escape to optionally restrict its escaping to
non-printables. This preserves behaviour of any existing clients while giving
them a path to more moderate escaping should they desire.
Reviewers: JDevlieghere, thegameg, MatzeB, vladimir.plyashkun
Reviewed By: thegameg
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D44863
llvm-svn: 328661
Before this was not done if the function had no calls in it. This
is still a possible issue with any callable function, regardless
of calls present.
llvm-svn: 328659
This transform is valid because the ranges have been validated (LowPC <= HighPC).
Differential Revision: https://reviews.llvm.org/D44772
llvm-svn: 328655
Fixed counter/weight overflow that leads to an assertion. Also fixed the help
string for pgo-emit-branch-prob option.
Differential Revision: https://reviews.llvm.org/D44809
llvm-svn: 328653
The combine on a select of a load only triggers for
addrspace 0, and discards the MachinePointerInfo. The
conservative default needs to be used for this.
llvm-svn: 328652
In a function, s5 is used as the frame base SGPR. If a function
is calling another function, during the call sequence
it is copied to a preserved SGPR and restored.
Before it was possible for the scheduler to move stack operations
before the restore of s5, since there's nothing to associate
a frame index access with the restore.
Add an implicit use of s5 to the adjcallstack pseudo which ends
the call sequence to preven this from happening. I'm not 100%
satisfied with this solution, but I'm not sure what else would be
better.
llvm-svn: 328650
Summary: When a node is about to be erased from ReplacedValues, we should also remap its corresponding values in PromotedFloats.
Patch by Yan Luo (Yan.Luo2@synopsys.com)
Reviewers: pirama
Reviewed By: pirama
Subscribers: lebedev.ri, llvm-commits
Differential Revision: https://reviews.llvm.org/D44872
llvm-svn: 328644
%tmp = bitcast i32* %arg to i8*
%tmp1 = getelementptr inbounds i8, i8* %tmp, i32 0
- %tmp2 = load i8, i8* %tmp, align 1
+ %tmp2 = load i8, i8* %tmp1, align 1
This doesn't change the semantics of the tests but makes use of %tmp1 which was originally intended.
llvm-svn: 328642
This implements a set of TTI functions that the loop vectorizer uses.
The only purpose of this is to enable testing. Auto-vectorization is
disabled by default, enabled by -hexagon-autohvx.
llvm-svn: 328639
Summary:
This is a canonical way to teach objdump to print the target
symbols for branches when disassembling AArch64 code.
Reviewers: evandro, t.p.northover, espindola
Reviewed By: t.p.northover
Differential Revision: https://reviews.llvm.org/D44851
llvm-svn: 328638
Summary:
This is an NFC refactoring of the OptBisect class to split it into an optional pass gate interface used by LLVMContext and the Optional Pass Bisector (OptBisect) used for debugging of optional passes.
This refactoring is needed for D44464, which introduces setOptPassGate() method to allow implementations other than OptBisect.
Patch by Yevgeny Rouban.
Reviewers: andrew.w.kaylor, fedor.sergeev, vsk, dberlin, Eugene.Zelenko, reames, skatkov
Reviewed By: fedor.sergeev
Differential Revision: https://reviews.llvm.org/D44821
llvm-svn: 328637
On Hexagon "x = y" is a syntax used in most instructions, and is not
treated as a directive.
Differential Revision: https://reviews.llvm.org/D44256
llvm-svn: 328635
We were incorrectly initializing the array of used registers in method checkRAT.
As a consequence, the number of register file stalls was misreported.
Added a test to cover this case.
llvm-svn: 328629
Summary:
I recently added a new Fixup kind to our fork of LLVM but forgot to add
it to the table in MipsAsmBackend.cpp. With this static_assert the error
would have been caught instead of zero-initializing the array entries for
the new fixups.
Reviewers: sdardis, atanasyan
Reviewed By: atanasyan
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D44895
llvm-svn: 328616
We check `canPeel` twice: when evaluating the number of iterations to be peeled
and within the method `peelLoop` that performs peeling. This method is only
executed if the calculated peel count is positive. Thus, the check in `peelLoop` can
never fail. This patch replaces this check with an assert.
Differential Revision: https://reviews.llvm.org/D44919
Reviewed By: fhahn
llvm-svn: 328615
As a follow-up to r328480, this updates the logic for the decreasing
safety checks in a similar manner:
- CanBeMax is replaced by CannotBeMaxInLoop which queries
isLoopEntryGuardedByCond on the maximum value.
- SumCanReachMin is replaced by isSafeDecreasingBound which includes
some logic from parseLoopStructure and, again, has been updated to
use isLoopEntryGuardedByCond on the given bounds.
Differential Revision: https://reviews.llvm.org/D44776
llvm-svn: 328613
Currently, `getExact` fails if it sees two exit counts in different blocks. There is
no solid reason to do so, given that we only calculate exact non-taken count
for exiting blocks that dominate latch. Using this fact, we can simply take min
out of all exits of all blocks to get the exact taken count.
This patch makes the calculation more optimistic with enforcing our assumption
with asserts. It allows us to calculate exact backedge taken count in trivial loops
like
for (int i = 0; i < 100; i++) {
if (i > 50) break;
. . .
}
Differential Revision: https://reviews.llvm.org/D44676
Reviewed By: fhahn
llvm-svn: 328611
This patch teaches `computeConstantDifference` handle calculation of constant
difference between `(X + C1)` and `(X + C2)` which is `(C2 - C1)`.
Differential Revision: https://reviews.llvm.org/D43759
Reviewed By: anna
llvm-svn: 328609
Summary:
This patch adds itinerary support to the schedcover.py script. I've been trying to use this script to figure out why SSE and AVX instructions are ending up in separate tablegen scheduler classes and sometimes its because we are using different itineraries.
Rather than using None to indicate the default scheduler model, I now use the string "default". I had to hack around the sorting a little to keep "default" at the beginning. But this also makes it so you can specify "default" on the command line to just get the defaults
I also fixed the regular expression code so that the no_default wasn't evaluated twice.
Reviewers: RKSimon, atrick, jmolloy, javed.absar
Reviewed By: javed.absar
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D44834
llvm-svn: 328608
Summary:
This reverts commit r328596.
Checking if the arguments are strings before testing if they contain "/dev/null".
Reviewers: rnk
Reviewed By: rnk
Subscribers: delcypher, llvm-commits
Differential Revision: https://reviews.llvm.org/D44914
llvm-svn: 328603
Before this change, using dumpProperties() with PDBSymbolData
would look like this:
get_locationType: 3
1
After this change:
get_locationType: 3
get_value: 1
llvm-svn: 328590
Currently CRC32 instructions use the WriteFAdd class, this patch splits them off into their own, at the moment it is still mostly just a duplicate of WriteFAdd but it can now be tweaked on a target by target basis.
Differential Revision: https://reviews.llvm.org/D44647
llvm-svn: 328582
Goma[1] is a distributed build system similar to distcc and icecc
primarily used to compile Chromium. The client is open source, and
hopefully soon the server will be as well. The intended usage model is
similar to most distributed build systems: prefix gomacc onto your
compiler command line, and it transparently distributes compilation.
The go lit config wants to determine the host compiler binary, so it
needs some extra logic to avoid looking at these prefixes.
[1] https://chromium.googlesource.com/infra/goma/client/
llvm-svn: 328580
MemorySSAUpdater::getPreviousDefRecursive is a recursive algorithm, for
each block, it computes the previous definition for each predecessor,
then takes those definitions and combines them. But currently it doesn't
remember results which it already computed; this means it can visit the
same block multiple times, which adds up to exponential time overall.
To fix this, this patch adds a cache. If we computed the result for a
block already, we don't need to visit it again because we'll come up
with the same result. Well, unless we RAUW a MemoryPHI; in that case,
the TrackingVH will be updated automatically.
This matches the original source paper for this algorithm.
The testcase isn't really a test for the bug, but it adds coverage for
the case where tryRemoveTrivialPhi erases an existing PHI node. (It's
hard to write a good regression test for a performance issue.)
Differential Revision: https://reviews.llvm.org/D44715
llvm-svn: 328577
Summary:
Re-lands r328386 and r328443, reverting r328482.
Incorporates fixes from @mstorsjo in D44876 (thanks!) so that small
parameters in i8 and i16 do not end up in the SysV register parameters
(EDI, ESI, etc).
I added tests for how we receive small parameters, since that is the
important part. It's always safe to store more bytes than will be read,
but the assumptions you make when loading them are what really matter.
I also tested this by self-hosting clang and it passed tests on win64.
Reviewers: mstorsjo, hans
Subscribers: hiraditya, mstorsjo, llvm-commits
Differential Revision: https://reviews.llvm.org/D44900
llvm-svn: 328570
Give the bit count instructions their own scheduler classes instead of forcing them into existing classes.
These were mostly overridden anyway, but I had to add in costs from Agner for silvermont and znver1 and the Fam16h SoG for btver2 (Jaguar).
Differential Revision: https://reviews.llvm.org/D44879
llvm-svn: 328566
Summary:
r327219 added wrappers to std::sort which randomly shuffle the container before sorting.
This will help in uncovering non-determinism caused due to undefined sorting
order of objects having the same key.
To make use of that infrastructure we need to invoke llvm::sort instead of std::sort.
Note: This patch is one of a series of patches to replace *all* std::sort to llvm::sort.
Refer the comments section in D44363 for a list of all the required patches.
Reviewers: dblaikie, RKSimon, robertlytton
Reviewed By: robertlytton
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D44875
llvm-svn: 328564
This has been made obsolete by the fact that almost all of the
things it previously checked for are no longer relevant since
we can just compare bytes in a lot of places.
llvm-svn: 328562
Legalize and emit code for quad-precision floating point operation xscvdpqp
and add option to guard the quad precision operation support.
Differential Revision: https://reviews.llvm.org/D44746
llvm-svn: 328558
A new function getOpcodeForSpill should now be the only place to get
the opcode for a given spilled register.
Differential Revision: https://reviews.llvm.org/D43086
llvm-svn: 328556
First, we change the heuristic that is used to ignore the recurrent
node-sets in the node ordering. In certain cases it's not important
to focus on the recurrent node-sets. Instead, the algorithm begins
by considering all the instructions in the node ordering step.
Second, a minor change to the bottom up traversal, which needs to
consider loop carried dependences (modeled as anti dependences).
Previously, these instructions were skipped, which caused problems
because the instruction ends up having both predecessors and
sucessors in the schedule.
Third, consider anti-dependences as a tie breaker when choosing
between instructions in the node ordering. We want to make sure
that the source of the anti-dependence does not end up with both
predecesssors and sucessors in the final node ordering.
Patch by Brendon Cahoon.
llvm-svn: 328554
Summary:
llvm-objdump now disassembles unrecognised opcodes as data, using
the .long directive. We treat unrecognised opcodes as being 32 bit
values, so move along 4 bytes rather than the single byte which
previously resulted in a cascade of bogus disassembly following an
unrecognised opcode.
While no solution can always disassemble code that contains
embedded data correctly this provides a significant improvement.
The disassembler will now cope with an arbitrary length section
as it no longer truncates it to a multiple of 4 bytes, and will
use the .byte directive for trailing bytes.
Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits
Differential Revision: https://reviews.llvm.org/D44685
llvm-svn: 328553
The pipeliner must add a loop carried dependence between two memory
operations if the base register is not an affine (linear) exression.
The current implementation doesn't check how the base register is
defined, which allows non-affine expressions, and then the pipeliner
does not add a loop carried dependence when one is needed.
This patch adds code to isLoopCarriedOrder that checks if the base
register of the memory operations is defined by a phi, and the loop
definition for the phi is a constant increment value. This is a very
simple check for a linear expression.
Patch by Brendon Cahoon.
llvm-svn: 328550
The pipeliner is not adding a dependence edge for a loop carried
dependence, and ends up scheduling a load from iteration n prior
to an aliased store in iteration n-1.
The code that adds the loop carried dependences in the pipeliner
doesn't check if the memory objects for loads and stores are
"identified" (i.e., distinct) objects. If they are not, then the
code that adds the dependences needs to be conservative. The
objects can be used to check dependences only when they are
distinct objects.
The code that checks for loop carried dependences has been updated
to classify loads and stores that are not identified as "unknown"
values. A store with an "unknown" value can potentially create
a loop carried dependence with any pending load.
Patch by Brendon Cahoon.
llvm-svn: 328547
The phi renaming code in the pipeliner uses the wrong value when
rewriting phi uses, which results in an undefined value. In this
case, the original phi is no longer needed due to the order of
instruction in the pipelined loop. The pipeliner was assuming, in
this case, the the phi loop definition should be used to
rewrite the uses. However, the pipeliner needs to check to make
sure that the loop definition has already been scheduled. If not,
then the phi initial value needs to be used instead.
Patch by Brendon Cahoon.
llvm-svn: 328545
The pipeliner was generating too many phis in the epilog blocks, which
caused incorrect code generation when rewriting an instruction that uses
the phi.
In this case, there 3 prolog and epilog stages. An existing phi was
scheduled at stage 1. When generating the code for the 2nd epilog an
extra new phi was generated.
To fix this, we need to update the code that calculates the maximum
number of phis that can be generated, which is based upon the current
prolog stage and the stage of the original phi. In this case, when the
prolog stage is 1 and the original phi stage is 1, the maximum number
of phis to generate is 2.
Patch by Brendon Cahoon.
llvm-svn: 328543
The patch contains severals changes needed to pipeline an example
that was transformed so that a Phi with a subreg is converted to
copies.
The pipeliner wasn't working for a couple of reasons.
- The RecMII was 3 instead of 2 due to the extra copies.
- Copy instructions contained a latency of 1.
- The node order algorithm was not choosing the best "bottom"
node, which caused an instruction to be scheduled that had a
predecessor and successor already scheduled.
- Updated the Hexagon Machine Scheduler to check if the node is
latency bound when adding the cost for a 0-latency dependence.
The RecMII was 3 because the computation looks at the number of
nodes in the recurrence. The extra copy is an extra node but
it shouldn't increase the latency. The new RecMII computation
looks at the latency of the instructions in the recurrence. We
changed the latency of the dependence of a copy to 0. The latency
computation for the copy also checks the use of the copy (similar
to a reg_sequence).
The node order algorithm was not choosing the last instruction
in the recurrence for a bottom up traversal. This was when the
last instruction is a copy. A check was added when choosing the
instruction to check for NodeNum if the maxASAP is the same. This
means that the scheduler will not end up with another node in
the recurrence that has both a predecessor and successor already
scheduled.
The cost computation in Hexagon Machine Scheduler adds cost when
an instruction can be packetized with a zero-latency instruction.
We should only do this if the schedule is latency bound.
Patch by Brendon Cahoon.
llvm-svn: 328542
The pipeliner is asserting because the serialization step that
occurs at the end is deleting an instruction. The assert
occurs later on because there is a use without a definition.
The problem occurs when an instruction defines a value used
by a REQ_SEQUENCE and that value is used by a COPY instruction.
The latencies between these instructions are zero, so they are
put in to the same packet. The serialization code is unable to
handle this correctly, and ends up putting the REG_SEQUENCE
before its definition.
There is special code in the serialization step that attempts
to handle zero-cost instructions (phis, copy, reg_sequence)
differently than regular instructions. Unfortunately, this means
the order does not come out correct.
This patch simplifies the code by changing the seperate steps for
handling zero-cost and regular instructions. Only phis are
handled separate now, since they should occurs first. Then, this
patch adds checks to make use the MoveUse is set to the smallest
value if there are multiple uses in a cycle.
Patch by Brendon Cahoon.
llvm-svn: 328540
This change brings performance of zlib up by 10%. The example below is from a
hot loop in longest_match() from zlib.
do.body:
%cur_match.addr.0 = phi i32 [ %cur_match, %entry ], [ %2, %do.cond ]
%idx.ext = zext i32 %cur_match.addr.0 to i64
%add.ptr = getelementptr inbounds i8, i8* %win, i64 %idx.ext
%add.ptr2 = getelementptr inbounds i8, i8* %add.ptr, i64 %idx.ext1
%add.ptr3 = getelementptr inbounds i8, i8* %add.ptr2, i64 -1
In this example %idx.ext1 is a loop invariant. It will be moved above the use of
loop induction variable %idx.ext such that it can be hoisted out of the loop by
LICM. The operands that have dependences carried by the loop will be sinked down
in the GEP chain. This patch will produce the following output:
do.body:
%cur_match.addr.0 = phi i32 [ %cur_match, %entry ], [ %2, %do.cond ]
%idx.ext = zext i32 %cur_match.addr.0 to i64
%add.ptr = getelementptr inbounds i8, i8* %win, i64 %idx.ext1
%add.ptr2 = getelementptr inbounds i8, i8* %add.ptr, i64 -1
%add.ptr3 = getelementptr inbounds i8, i8* %add.ptr2, i64 %idx.ext
llvm-svn: 328539