Summary:
Currently merge conditional stores can't handle cases where PostBB (the block we need to move the store to) has more than 2 predecessors.
This patch removes that restriction by creating a new block with only the 2 predecessors we care about and an unconditional branch to the original block. This provides a place to put the store.
Reviewers: efriedma, jmolloy, ABataev
Reviewed By: efriedma
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D39760
llvm-svn: 329142
Summary:
The ShadowCallStack pass instruments functions marked with the
shadowcallstack attribute. The instrumented prolog saves the return
address to [gs:offset] where offset is stored and updated in [gs:0].
The instrumented epilog loads/updates the return address from [gs:0]
and checks that it matches the return address on the stack before
returning.
Reviewers: pcc, vitalybuka
Reviewed By: pcc
Subscribers: cryptoad, eugenis, craig.topper, mgorny, llvm-commits, kcc
Differential Revision: https://reviews.llvm.org/D44802
llvm-svn: 329139
This commit is similar to r329120, but uses the existing getUsesRedZone() function
in X86MachineFunctionInfo. This teaches the outliner to look at whether or not a
function *truly* uses a redzone instead of just the noredzone attribute on a
function.
Thus, after this commit, it's possible to outline from x86 without using
-mno-red-zone and still get outlining results.
This also adds a new test for the new redzone behaviour.
llvm-svn: 329134
Summary: There are no packed instructions for min3 or max3. So, performMinMaxCombine should not optimize vectors of f16 to min3/max3.
Author: FarhanaAleen
Reviewed By: arsenm
Subscribers: llvm-commits, AMDGPU
Differential Revision: https://reviews.llvm.org/D45219
llvm-svn: 329131
LLVM Bug Id : 36449
Revision 328563 caused tests to fail under python 3.
This patch modified cat.py file to support both python 2 and 3.
This patch also fixes CRLF issues on Windows.
Patch by Chamal de Silva
Differential Revision: https://reviews.llvm.org/D45077
llvm-svn: 329123
The tests marked with 'FIXME' require loosening the check
in SimplifyAssociativeOrCommutative() to optimize completely;
that's still checking isFast() in Instruction::isAssociative().
llvm-svn: 329121
This patch adds a hasRedZone() function to AArch64MachineFunctionInfo. It
returns true if the function is known to use a redzone, false if it is known
to not use a redzone, and no value otherwise.
This removes the requirement to pass -mno-red-zone when outlining for AArch64.
https://reviews.llvm.org/D45189
llvm-svn: 329120
The linkage type on outlined functions was private before. This meant that if
you set a breakpoint in an outlined function, the debugger wouldn't be able to
give a sane name to the outlined function.
This commit changes the linkage type to internal and updates any tests that
relied on the prefixes on the names of outlined functions.
llvm-svn: 329116
Summary:
If an alloca need to be stored in the coroutine frame and it has an alignment specified and the alignment does not match the natural alignment of the alloca type. Insert appropriate padding into the coroutine frame to make sure that it gets requested alignment.
For example for a packet type (which natural alignment is 1), but alloca alignment is 8, we may need to insert a padding field with required number of bytes to make sure it is properly aligned.
```
%PackedStruct = type <{ i64 }>
...
%data = alloca %PackedStruct, align 8
```
If the previous field in the coroutine frame had alignment 2, we would have [6 x i8] inserted before %PackedStruct in the coroutine frame:
```
%f.Frame = type { ..., i16, [6 x i8], %PackedStruct }
```
Reviewers: rnk, lewissbaker, modocache
Reviewed By: modocache
Subscribers: EricWF, llvm-commits
Differential Revision: https://reviews.llvm.org/D45221
llvm-svn: 329112
It also updates test/Transforms/LoopInterchange/call-instructions.ll
to use accesses where we can prove dependence after D35430.
Reviewers: sebpop, karthikthecool, blitz.opensource
Reviewed By: sebpop
Differential Revision: https://reviews.llvm.org/D45206
llvm-svn: 329111
Summary:
Introduce the ShadowCallStack function attribute. It's added to
functions compiled with -fsanitize=shadow-call-stack in order to mark
functions to be instrumented by a ShadowCallStack pass to be submitted
in a separate change.
Reviewers: pcc, kcc, kubamracek
Reviewed By: pcc, kcc
Subscribers: cryptoad, mehdi_amini, javed.absar, llvm-commits, kcc
Differential Revision: https://reviews.llvm.org/D44800
llvm-svn: 329108
The missing definitions are from cvconst.h shipped with DIA SDK.
Correct the url to MSDN for MemoryTypeEnum and set the underlying
type of PDB_StackFrameType and PDB_MemoryType to uint16_t.
llvm-svn: 329104
We don't constant fold any of these, but we could...but if we
do, we must produce the right answer.
Unlike the IR fptosi instruction or its DAG node counterpart
ISD::FP_TO_SINT, these are not undef for an out-of-range input.
llvm-svn: 329100
Summary:
This change declare that PostRAMachineSinking and ShrinkWrap require NoVRegs
property, so now the MachineFunctionPass can enforce this check.
These passes are disabled in NVPTX & WebAssembly.
Reviewers: dschuff, jlebar, tra, jgravelle-google, MatzeB, sebpop, thegameg, mcrosier
Reviewed By: dschuff, thegameg
Subscribers: jholewinski, jfb, sbc100, aheejin, sunfish, llvm-commits
Differential Revision: https://reviews.llvm.org/D45183
llvm-svn: 329095
The DwarfLinker can have some very deep recursion that can max out the
(significantly smaller) stack when using threads. We don't want this
limitation when we only have a single thread. We already have this
workaround for the architecture-related threading. This patch applies
the same workaround to the parallel analysis and cloning.
Differential revision: https://reviews.llvm.org/D45172
llvm-svn: 329093
Summary:
Some targets do not support extended format of .loc directive and
support only simple format: .loc <FileID> <Line> <Column>. Patch adds
MCAsmInfo flag and option that allows emit .loc directive without
additional flags.
Reviewers: echristo
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D45184
llvm-svn: 329089
Summary:
Folding patterns like:
%vec = shufflevector <4 x i8> %insvec, <4 x i8> undef, <4 x i32> zeroinitializer
%cast = bitcast <4 x i8> %vec to i32
%cond = icmp eq i32 %cast, 0
into:
%ext = extractelement <4 x i8> %insvec, i32 0
%cond = icmp eq i32 %ext, 0
Combined with existing rules, this allows us to fold patterns like:
%insvec = insertelement <4 x i8> undef, i8 %val, i32 0
%vec = shufflevector <4 x i8> %insvec, <4 x i8> undef, <4 x i32> zeroinitializer
%cast = bitcast <4 x i8> %vec to i32
%cond = icmp eq i32 %cast, 0
into:
%cond = icmp eq i8 %val, 0
When we construct a splat vector via a shuffle, and bitcast the vector into an integer type for comparison against an integer constant. Then we can simplify the the comparison to compare the splatted value against the integer constant.
Reviewers: spatel, anna, mkazantsev
Reviewed By: spatel
Subscribers: efriedma, rengolin, llvm-commits
Differential Revision: https://reviews.llvm.org/D44997
llvm-svn: 329087
Summary:
If the load/extractelement/extractvalue instructions are not originally
consecutive, the SLP vectorizer is unable to vectorize them. Patch
allows reordering of such instructions.
Patch does not support reordering of the repeated instruction, this must
be handled in the separate patch.
Reviewers: RKSimon, spatel, hfinkel, mkuper, Ayal, ashahid
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D43776
llvm-svn: 329085
Before this patch, the "BackendStatistics" view was responsible for printing the
register file usage (as well as many other statistics).
Now users can enable register file usage statistics using the command line flag
`-register-file-stats`. By default, the tool doesn't print register file
statistics.
llvm-svn: 329083
The primary issue here is that using NDEBUG alone isn't enough to guard
debug printing -- instead the DEBUG() macro needs to be used so that the
specific pass debug logging check is employed. Without this, every
asserts-enabled build was printing out information when it hit this.
I also fixed another place where we had multiple statements in a DEBUG
macro to use {}s to be a bit cleaner. And I fixed a place that used
errs() rather than dbgs().
llvm-svn: 329082
I have taken the opportunity to simplify some tests slightly and move
parts around.
It also brings back a few IR checks for interchangable loops.
Reviewers: karthikthecool, sebpop, grosser
Reviewed By: sebpop
Differential Revision: https://reviews.llvm.org/D45207
llvm-svn: 329081
This should fix the problem reported by the lld buildbots:
- Builder lld-x86_64-darwin13, Build #19782
- Builder lld-perf-testsuite, Build #1419
llvm-svn: 329068
This patch allows the description of register files in processor scheduling
models. This addresses PR36662.
A new tablegen class named 'RegisterFile' has been added to TargetSchedule.td.
Targets can optionally describe register files for their processors using that
class. In particular, class RegisterFile allows to specify:
- The total number of physical registers.
- Which target registers are accessible through the register file.
- The cost of allocating a register at register renaming stage.
Example (from this patch - see file X86/X86ScheduleBtVer2.td)
def FpuPRF : RegisterFile<72, [VR64, VR128, VR256], [1, 1, 2]>
Here, FpuPRF describes a register file for MMX/XMM/YMM registers. On Jaguar
(btver2), a YMM register definition consumes 2 physical registers, while MMX/XMM
register definitions only cost 1 physical register.
The syntax allows to specify an empty set of register classes. An empty set of
register classes means: this register file models all the registers specified by
the Target. For each register class, users can specify an optional register
cost. By default, register costs default to 1. A value of 0 for the number of
physical registers means: "this register file has an unbounded number of
physical registers".
This patch is structured in two parts.
* Part 1 - MC/Tablegen *
A first part adds the tablegen definition of RegisterFile, and teaches the
SubtargetEmitter how to emit information related to register files.
Information about register files is accessible through an instance of
MCExtraProcessorInfo.
The idea behind this design is to logically partition the processor description
which is only used by external tools (like llvm-mca) from the processor
information used by the llvm machine schedulers.
I think that this design would make easier for targets to get rid of the extra
processor information if they don't want it.
* Part 2 - llvm-mca related *
The second part of this patch is related to changes to llvm-mca.
The main differences are:
1) class RegisterFile now needs to take into account the "cost of a register"
when allocating physical registers at register renaming stage.
2) Point 1. triggered a minor refactoring which lef to the removal of the
"maximum 32 register files" restriction.
3) The BackendStatistics view has been updated so that we can print out extra
details related to each register file implemented by the processor.
The effect of point 3. is also visible in tests register-files-[1..5].s.
Differential Revision: https://reviews.llvm.org/D44980
llvm-svn: 329067
fact use regular expression syntax to use regular expressions.
Should restore the bots. Sorry for the noise on this test.
Thanks to Philip for spotting the bug!
llvm-svn: 329057
This adds the basic test cases from all the EFLAGS bugs in more direct
forms. It also switches to generated check lines, and includes both
32-bit and 64-bit variations.
No functionality changing here, just setting things up to have a nice
clean asm diff in my EFLAGS patch.
llvm-svn: 329056
do explicit scrubbing of the offsets of stack spills and reloads.
You can always turn this off in order to test specific stack slot usage.
We were already hiding most of this, but the new logic hides it more
generically. Notably, we should effectively hide stack slot churn in
functions that have a frame pointer now, and should also hide it when
changing a function from stack pointer to frame pointer. That transition
already changes enough to be clearly noticed in the test case diff,
showing *every* spill and reload is really noisy without benefit. See
the test case I ran this on as a classic example.
llvm-svn: 329055
The default assembly handling mode may introduce false positives in the
cases when MSan doesn't understand that the assembly call initializes
the memory pointed to by one of its arguments.
We introduce the conservative mode, which initializes the first
|sizeof(type)| bytes for every |type*| pointer passed into the
assembly statement.
llvm-svn: 329054
The patch changes the usage of dominate to properlyDominate
to satisfy the condition !(a < a) while using std::max.
It is actually NFC due to set data structure is used to keep
the Loops and no two identical loops can be in collection.
So in reality there is no difference between usage of
dominate and properlyDominate in this particular case.
However it might be changed so it is better to fix it.
llvm-svn: 329051
Useful when looking for indirect calls/jmps the need mitigation
via retpoline or other mitigations for Spectre v2.
Feedback, extension, additional patches welcome.
llvm-svn: 329050
Current implementation of `computeExitLimit` has a big piece of code
the only purpose of which is to prove that after the execution of this
block the latch will be executed. What it currently checks is actually a
subset of situations where the exiting block dominates latch.
This patch replaces all these checks for simple particular cases with
domination check over loop's latch which is the only necessary condition
of taking the exiting block into consideration. This change allows to
calculate exact loop taken count for simple loops like
for (int i = 0; i < 100; i++) {
if (cond) {...} else {...}
if (i > 50) break;
. . .
}
Differential Revision: https://reviews.llvm.org/D44677
Reviewed By: efriedma
llvm-svn: 329047
The primary issue here is that using NDEBUG alone isn't enough to guard
debug printing -- instead the DEBUG() macro needs to be used so that the
specific pass debug logging check is employed. Without this, every
asserts-enabled build was printing out information when it hit this.
I also fixed another place where we had multiple statements in a DEBUG
macro to use {}s to be a bit cleaner. And I fixed a place that used
`errs()` rather than `dbgs()`.
llvm-svn: 329046
Commit 37962a331c77 ("bpf: Improve expanding logic in LowerSELECT_CC")
intended to improve code quality for certain jmp conditions. The
commit, however, has a couple of issues:
(1). In code, just swap is not enough, ConditionalCode CC
should also be swapped, otherwise incorrect code will
be generated.
(2). The ConditionalCode swap should be subject to
getHasJmpExt(). If getHasJmpExt() is False, certain
conditional codes will not be supported and swap
may generate incorrect code.
The original goal for this patch is to optimize jmp operations
which does not have JmpExt turned on. If JmpExt is on,
better code could be generated. For example, the test
select_ri.ll is introduced to demonstrate the optimization.
The same result can be achieved with -mcpu=v2 flag.
Signed-off-by: Yonghong Song <yhs@fb.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
llvm-svn: 329043
For Hexagon, peeling loops with small runtime trip count is beneficial for our
benchmarks. We set PeelCount in HexagonTargetInfo.cpp and we use PeelCount set
by the target for computing the desired peel count.
Differential Revision: https://reviews.llvm.org/D44880
llvm-svn: 329042
Just adds basic block labels and tidies up where comments go in the test
case and then generates fresh CHECK lines with the script. This way, the
check lines are much easier to maintain. They were already close to this
but not quite there.
llvm-svn: 329040
Only rely on Python 3 (io.open) when necessary. This puts TestRunnyer.py closer to how it behaved
before the changes introduced in D43165 and silences a few Windows build bot failures.
Thanks to Stella Stamenova for the patch!
llvm-svn: 329037
We use two approaches for determining the minimum bitwidth.
* Demanded bits
* Value tracking
If demanded bits doesn't result in a narrower type, we then try value tracking.
We need this if we want to root SLP trees with the indices of getelementptr
instructions since all the bits of the indices are demanded.
But there is a missing piece though. We need to be able to distinguish "demanded
and shrinkable" from "demanded and not shrinkable". For example, the bits of %i
in
%i = sext i32 %e1 to i64
%gep = getelementptr inbounds i64, i64* %p, i64 %i
are demanded, but we can shrink %i's type to i32 because it won't change the
result of the getelementptr. On the other hand, in
%tmp15 = sext i32 %tmp14 to i64
%tmp16 = insertvalue { i64, i64 } undef, i64 %tmp15, 0
it doesn't make sense to shrink %tmp15 and we can skip the value tracking.
Ideas are from Matthew Simpson!
Differential Revision: https://reviews.llvm.org/D44868
llvm-svn: 329035
Summary:
When attempting to split a coroutine with 'hidden' visibility (for
example, a C++ coroutine that is inlined when compiled with the option
'-fvisibility-inlines-hidden'), LLVM would hit an assertion in
include/llvm/IR/GlobalValue.h:240: "local linkage requires default
visibility". The issue is that the visibility is copied from the source
of the function split in the `CloneFunctionInto` function, but the linkage
is not. To fix, create the new function first with external linkage,
then copy the linkage from the original function *after* `CloneFunctionInto`
is called.
Since `GlobalValue::setLinkage` in turn calls `maybeSetDsoLocal`, the
explicit call to `setDSOLocal` can be removed in CoroSplit.cpp.
Test Plan: check-llvm
Reviewers: GorNishanov, lewissbaker, EricWF, majnemer, rnk
Reviewed By: rnk
Subscribers: llvm-commits, eric_niebler
Differential Revision: https://reviews.llvm.org/D44185
llvm-svn: 329033
Summary:
The cast simplifications that instcombine does here do not make any
attempt to obey the verifier rules for musttail calls. Therefore we have
to disable them.
Reviewers: efriedma, majnemer, pcc
Subscribers: hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D45186
llvm-svn: 329027
Reapply D43165 which was reverted because of different versions of python failing.
The one line fix for the different python versions was commited at the same time
that D43165 was reverted. If this change is giving you issues then get in touch
with your python version and we will fix it.
llvm-svn: 329022
Summary:
This issue was found when running the clang unit test on Windows. Python 3.x cannot open some of the files that the tests are using with a simple open because of their encoding. Python 2.7+ and Python 3.x both support io.open which allows for an encoding to be specified.
This change will determine whether two files being compared should be opened (and then compared) as text or binary and whether to use utf-8 or the default encoding before proceeding with a line-by-line comparison.
Patch by Stella Stamenova!
Reviewers: zturner, llvm-commits, rnk, MaggieYi
Reviewed By: zturner
Subscribers: asmith, MatzeB, stella.stamenova, delcypher, llvm-commits
Differential Revision: https://reviews.llvm.org/D43165
llvm-svn: 329012
This command can dump the binary contents of a stream to a file.
This is useful when you want to do side-by-side comparisons of
a specific stream from two PDBs to examine the differences between
them. You can export both of them to a file, then open them up
side by side in a hex editor (for example), so as to eliminate any
differences that might arise from the contents being on different
blocks in the PDB.
In subsequent patches I plan to improve the "explain" subcommand
so that you can explain the contents of a binary file that isn't
necessarily a full PDB, but one of these dumped streams, by telling
the subcommand how to interpret the contents.
llvm-svn: 329002
These used to be set in the old autoconf build, but the cmake build has had a
"TODO: actually check for these" comment since it was checked in, and they
were set to 1 on mingw unconditionally. It seems safe to say that they always
exist under mingw, so just remove them and assume they're set exactly when on
mingw (with msvc, we use `pragma comment` instead of linking these via flags).
llvm-svn: 328992
Some Function level metadatas, such as function entry count, are not cloned in
DeadArgumentElim. This happens a lot in lto/thinlto because of DeadArgumentElim
after internalization.
This patch clones the metadatas in the original function to the new function.
Differential Revision: https://reviews.llvm.org/D44127
llvm-svn: 328991
The autoconf manual: "This macro is obsolescent, as all current systems with
directory libraries have <dirent.h>. New programs need not use this macro."
llvm-svn: 328989
Summary:
A recent addition to Coroutines TS (https://wg21.link/p0913) adds a pre-defined coroutine noop_coroutine that does nothing.
To implement this feature, we implemented an llvm.coro.noop intrinsic that returns a coroutine handle to a coroutine that does nothing when resumed or destroyed.
Reviewers: EricWF, modocache, rnk, lewissbaker
Reviewed By: modocache
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D45114
llvm-svn: 328986
Summary:
If the load/extractelement/extractvalue instructions are not originally
consecutive, the SLP vectorizer is unable to vectorize them. Patch
allows reordering of such instructions.
Reviewers: RKSimon, spatel, hfinkel, mkuper, Ayal, ashahid
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D43776
llvm-svn: 328980
If a load follows a store and reloads data that the store has written to memory, Intel microarchitectures can in many cases forward the data directly from the store to the load, This "store forwarding" saves cycles by enabling the load to directly obtain the data instead of accessing the data from cache or memory.
A "store forward block" occurs in cases that a store cannot be forwarded to the load. The most typical case of store forward block on Intel Core microarchiticutre that a small store cannot be forwarded to a large load.
The estimated penalty for a store forward block is ~13 cycles.
This pass tries to recognize and handle cases where "store forward block" is created by the compiler when lowering memcpy calls to a sequence
of a load and a store.
The pass currently only handles cases where memcpy is lowered to XMM/YMM registers, it tries to break the memcpy into smaller copies.
breaking the memcpy should be possible since there is no atomicity guarantee for loads and stores to XMM/YMM.
Differential revision: https://reviews.llvm.org/D41330
Change-Id: Ib48836ccdf6005989f7d4466fa2035b7b04415d9
llvm-svn: 328973
Before, the instruction builder incorrectly assumed that only explicit reads
could have been associated with ReadAdvance entries.
This patch fixes the issue and adds a test to verify it.
llvm-svn: 328972
These should exist in all toolchains LLVM supports nowadays.
Enables making DataTypes.h a regular header instead of a .h.cmake file and
allows deleting a bunch of cmake goop (which should also speed up cmake
configure time a bit).
All the code this removes is 9+ years old.
https://reviews.llvm.org/D45155
llvm-svn: 328970
When running dsymutil as part of your build system, it can be desirable
for warnings to be part of the end product, rather than just being
emitted to the output stream. This patch upstreams that functionality.
Differential revision: https://reviews.llvm.org/D44639
llvm-svn: 328965
Found by looking through the output of
for f in $(grep -o '\bHAVE_[A-Z0-9_]*\b' llvm/cmake/config-ix.cmake); do
echo $f $(git grep $f '*' | wc -l);
done
in the monorepo.
llvm-svn: 328957
This patch adds a set of unstable C API bindings to the DIBuilder interface for
creating structure, function, and aggregate types.
This patch also removes the existing implementations of these functions from
the Go bindings and updates the Go API to fit the new C APIs.
llvm-svn: 328953
Give them both the same itineraries. Add hasSideEffects = 0 to ADOX since they don't have patterns. Rename source operands to $src1 and $src2 instead of $src0 and $src. Add ReadAfterLd to the memory form SchedRW.
llvm-svn: 328952
This also moves to define it in the same way as ADCX which seems to use
constraints a bit better.
This is pulled out of the review for reducing the use of popf for
restoring EFLAGS, but is independent. There are still more problems with
our definitions for these instructions that Craig is going to look at
but this is at least less broken and he can start from this to improve
them more fully.
Thanks to Craig for the review here.
llvm-svn: 328945
for X86's instruction information. I've now got a second patch under
review that needs these same APIs. This bit is nicely orthogonal and
obvious, so landing it. NFC.
llvm-svn: 328944
Summary:
r327219 added wrappers to std::sort which randomly shuffle the container before sorting.
This will help in uncovering non-determinism caused due to undefined sorting
order of objects having the same key.
To make use of that infrastructure we need to invoke llvm::sort instead of std::sort.
Note: This patch is one of a series of patches to replace *all* std::sort to llvm::sort.
Refer the comments section in D44363 for a list of all the required patches.
Reviewers: JDevlieghere, zturner, echristo, dberris, friss
Reviewed By: echristo
Subscribers: gbedwell, llvm-commits
Differential Revision: https://reviews.llvm.org/D45141
llvm-svn: 328943
Summary:
r327219 added wrappers to std::sort which randomly shuffle the container before sorting.
This will help in uncovering non-determinism caused due to undefined sorting
order of objects having the same key.
To make use of that infrastructure we need to invoke llvm::sort instead of std::sort.
Note: This patch is one of a series of patches to replace *all* std::sort to llvm::sort.
Refer the comments section in D44363 for a list of all the required patches.
Reviewers: echristo, zturner, mzolotukhin, lhames
Reviewed By: echristo
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D45135
llvm-svn: 328940
Summary:
This is in preparation for the new dimension-aware image intrinsics,
which I'd rather not have to list here by hand.
Change-Id: Iaa16e3a635a11283918ce0d9e1e618591b0bf6fa
Reviewers: arsenm, rampitec, b-sumner
Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, llvm-commits, t-tye
Differential Revision: https://reviews.llvm.org/D44938
llvm-svn: 328939
Summary:
Avoids having to list all intrinsics manually.
This is in preparation for the new dimension-aware image intrinsics,
which I'd rather not have to list here by hand.
Change-Id: If7ced04998397ef68c4cb8f7de66b5050fb767e5
Reviewers: arsenm, rampitec, b-sumner
Subscribers: kzhuravl, wdng, mgorny, yaxunl, dstuttard, tpr, llvm-commits, t-tye
Differential Revision: https://reviews.llvm.org/D44937
llvm-svn: 328938
Summary:
We will use this in the AMDGPU backend in a subsequent patch
in the stack to lookup target-specific per-intrinsic information.
The generic CodeGenIntrinsic machinery is used to ensure that,
even though we don't calculate actual enum values here, we do
get the intrinsics in the right order for the binary search
index.
Change-Id: If61cd5587963a4c5a1cc53df1e59c5e4dec1f9dc
Reviewers: arsenm, rampitec, b-sumner
Subscribers: wdng, tpr, llvm-commits
Differential Revision: https://reviews.llvm.org/D44935
llvm-svn: 328937
Summary:
r327219 added wrappers to std::sort which randomly shuffle the container before sorting.
This will help in uncovering non-determinism caused due to undefined sorting
order of objects having the same key.
To make use of that infrastructure we need to invoke llvm::sort instead of std::sort.
Note: This patch is one of a series of patches to replace *all* std::sort to llvm::sort.
Refer the comments section in D44363 for a list of all the required patches.
Reviewers: echristo, zturner, samsonov
Reviewed By: echristo
Subscribers: JDevlieghere, llvm-commits
Differential Revision: https://reviews.llvm.org/D45134
llvm-svn: 328935
Summary:
Adds -import-cutoff=N which will stop importing during the thin link
after N imports. Default is -1 (no limit).
Reviewers: wmi
Subscribers: inglorion, llvm-commits
Differential Revision: https://reviews.llvm.org/D45127
llvm-svn: 328934
If a loop has a loop exiting latch, it can be profitable
to rotate the loop if it leads to the simplification of
a phi node. Perform rotation in these cases even if loop
rotate itself didnt simplify the loop to get there.
Differential Revision: https://reviews.llvm.org/D44199
llvm-svn: 328933
This Promote flag was alwasys set to true except in the default case. But in the default case we don't need to set PVT and can just return false.
llvm-svn: 328926
Summary:
r327219 added wrappers to std::sort which randomly shuffle the container before sorting.
This will help in uncovering non-determinism caused due to undefined sorting
order of objects having the same key.
To make use of that infrastructure we need to invoke llvm::sort instead of std::sort.
Note: This patch is one of a series of patches to replace *all* std::sort to llvm::sort.
Refer D44363 for a list of all the required patches.
Reviewers: sanjoy, dexonsmith, hfinkel, RKSimon
Reviewed By: dexonsmith
Subscribers: david2050, llvm-commits
Differential Revision: https://reviews.llvm.org/D44944
llvm-svn: 328925
fptosi / fptoui round towards zero, and that's the same behavior as ISD::FTRUNC,
so replace a pair of casts with the equivalent node. We don't have to account for
special cases (NaN, INF) because out-of-range casts are undefined.
Differential Revision: https://reviews.llvm.org/D44909
llvm-svn: 328921
Summary:
It seems many CPUs don't implement this instruction as well as the other vector multiplies. Often using a multi uop flow. Silvermont in particular has a 7 uop flow with 11 cycle throughput. Sandy Bridge implements it as a single uop with 5 cycle latency and 1 cycle throughput. But Haswell and later use 2 uops with 10 cycle latency and 2 cycle throughput.
This patch adds a new X86SchedWritePair we can use to tag this instruction separately. I've provided correct information for Silvermont, Btver2, and Sandy Bridge. I've removed the InstRWs for SandyBridge. I've left Haswell/Broadwell/Skylake InstRWs in place because I wasn't sure how to account for the different load latency between 128 and 256 bits. I also left Znver1 InstRWs in place because the existing values don't match Agner's spreadsheet.
I also left a FIXME in the SandyBridge model because it being used for the "generic" model is too optimistic for the 256/512-bit versions since those are multiple uops on all known CPUs.
Reviewers: RKSimon, GGanesh, courbet
Reviewed By: RKSimon
Subscribers: gchatelet, gbedwell, andreadb, llvm-commits
Differential Revision: https://reviews.llvm.org/D44972
llvm-svn: 328914
Summary:
Useful to selectively disable importing into specific modules for
debugging/triaging/workarounds.
Reviewers: eraman
Subscribers: inglorion, llvm-commits
Differential Revision: https://reviews.llvm.org/D45062
llvm-svn: 328909