Summary:
TargetLoweringBase::Expand is defined as "Try to expand this to other ops,
otherwise use a libcall." For ISD::UDIV and ISD::SDIV, the choice between
the two possibilities was defined in a rather convoluted way:
- if DIVREM is legal, expand to DIVREM
- if DIVREM has a custom lowering, expand to DIVREM
- if DIVREM libcall is defined and a remainder from the same division is
computed elsewhere, expand to a DIVREM libcall
- else, expand to a DIV libcall
This had the undesirable effect that if both DIV and DIVREM are implemented
as libcalls, then ISD::UDIV and ISD::SDIV are expanded to the heavier DIVREM
libcall, even when the remainder isn't used.
The new code adds a new LegalizeAction, TargetLoweringBase::LibCall, so that
backends can directly control whether they prefer an expansion or a conversion
to a libcall. This makes the generic lowering code even more generic,
allowing its reuse in a wider range of target-specific configurations.
The useful effect is that ARM backend will now generate a call
to __aeabi_{i,u}div rather than __aeabi_{i,u}divmod in cases where
it doesn't need the remainder. There's no functional change outside
the ARM backend.
Reviewers: t.p.northover, rengolin
Subscribers: t.p.northover, llvm-commits, aemerson
Differential Revision: http://reviews.llvm.org/D13862
llvm-svn: 250826
Summary:
In addition to moving the code over, this patch amends the DIV,REM -> DIVREM
combining to run on all affected nodes at once: if the nodes are converted
to DIVREM one at a time, then the resulting DIVREM may get legalized by the
backend into something target-specific that we won't be able to recognize
and correlate with the remaining nodes.
The motivation is to "prepare terrain" for D13862: when we set DIV and REM
to be legalized to libcalls, instead of the DIVREM, we otherwise lose the
ability to combine them together. To prevent this, we need to take the
DIV,REM -> DIVREM combining out of the lowering stage.
Reviewers: RKSimon, eli.friedman, rengolin
Subscribers: john.brawn, rengolin, llvm-commits
Differential Revision: http://reviews.llvm.org/D13733
llvm-svn: 250825
Loading the debug info from a large application is the slowest task
LLDB do. This CL makes most of the dwarf parsing code multi-threaded.
As a result the speed of "attach; backtrace; exit;" when the inferior
is an LLDB with full debug info increased by a factor of 2 (on my machine).
Differential revision: http://reviews.llvm.org/D13662
llvm-svn: 250821
The purpose of the class is to make it easy to execute tasks in parallel
Basic design goals:
* Have a very lightweight and easy to use interface where a list of
lambdas can be executed in parallel
* Use a global thread pool to limit the number of threads used
(std::async don't do it on Linux) and to eliminate the thread creation
overhead
* Destroy the thread currently not in use to avoid the confusion caused
by them during debugging LLDB
Possible future improvements:
* Possibility to cancel already added, but not yet started tasks
* Parallel for_each implementation
* Optimizations in the thread creation destroyation code
Differential revision: http://reviews.llvm.org/D13727
llvm-svn: 250820
The mask value type for maskload/maskstore GCC builtins is never a vector of
packed floats/doubles.
This patch fixes the following issues:
1. The mask argument for builtin_ia32_maskloadpd and builtin_ia32_maskstorepd
should be of type llvm_v2i64_ty and not llvm_v2f64_ty.
2. The mask argument for builtin_ia32_maskloadpd256 and
builtin_ia32_maskstorepd256 should be of type llvm_v4i64_ty and not
llvm_v4f64_ty.
3. The mask argument for builtin_ia32_maskloadps and builtin_ia32_maskstoreps
should be of type llvm_v4i32_ty and not llvm_v4f32_ty.
4. The mask argument for builtin_ia32_maskloadps256 and
builtin_ia32_maskstoreps256 should be of type llvm_v8i32_ty and not
llvm_v8f32_ty.
Differential Revision: http://reviews.llvm.org/D13776
llvm-svn: 250817
According to the Intel documentation, the mask operand of a maskload and
maskstore intrinsics is always a vector of packed integer/long integer values.
This patch introduces the following two changes:
1. It fixes the avx maskload/store intrinsic definitions in avxintrin.h.
2. It changes BuiltinsX86.def to match the correct gcc definitions for avx
maskload/store (see D13861 for more details).
Differential Revision: http://reviews.llvm.org/D13861
llvm-svn: 250816
Summary:
ADB packets have a maximum size of 4k. This means the size of memory reads does not affect speed
too much (as long as it fits in one packet). Therefore, I am increasing the default memory read
size for android to 2k. This value is used only if the user has not modified the default
memory-cache-line-size setting.
Reviewers: clayborg, tberghammer
Subscribers: tberghammer, danalbert, srhines, lldb-commits
Differential Revision: http://reviews.llvm.org/D13812
llvm-svn: 250814
Summary: In r231241, TargetLibraryInfoWrapperPass was added to
`getAnalysisUsage` for `AddressSanitizer`, but the corresponding
`INITIALIZE_PASS_DEPENDENCY` was not added.
Reviewers: dvyukov, chandlerc, kcc
Subscribers: kcc, llvm-commits
Differential Revision: http://reviews.llvm.org/D13629
llvm-svn: 250813
We isolate full tiles from partial tiles to be able to, for example, vectorize
loops with parametric lower and/or upper bounds.
If we use -polly-vectorizer=stripmine, we can see execution-time improvements:
correlation from 1m7361s to 0m5720s (-67.05 %), covariance from 1m5561s to
0m5680s (-63.50 %), ary3 from 2m3201s to 1m2361s (-46.72 %), CrystalMk from
8m5565s to 7m4285s (-13.18 %).
The current full/partial tile separation increases compile-time more than
necessary. As a result, we see in compile time regressions, for example, for 3mm
from 0m6320s to 0m9881s (56.34%). Some of this compile time increase is expected
as we generate more IR and consequently more time is spent in the LLVM backends.
However, a first investiagation has shown that a larger portion of compile time
is unnecessarily spent inside Polly's parallelism detection and could be
eliminated by propagating existing knowledge about vector loop parallelism.
Before enabling -polly-vectorizer=stripmine by default, it is necessary to
address this compile-time issue.
Contributed-by: Roman Gareev <gareevroman@gmail.com>
Reviewers: jdoerfert, grosser
Subscribers: grosser, #polly
Differential Revision: http://reviews.llvm.org/D13779
llvm-svn: 250809
Target has supportsLazyRelocations() method which can switch lazy relocations on/off (currently all targets are OFF except x64 which is ON). So no any other targets are affected now.
Differential Revision: http://reviews.llvm.org/D13856?id=37726
llvm-svn: 250808
The option now just sets NOW bit in DT_FLAGS_1 but some loaders
seem to require also BIND_NOW bit to be set in DT_FLAGS. This is,
also, what ld.bfd and gold do.
Differential Revision: http://reviews.llvm.org/D13883
llvm-svn: 250799
Currently debug info for types used in explicit cast only is not emitted. It happened after a patch for better alignment handling. This patch fixes this bug.
Differential Revision: http://reviews.llvm.org/D13582
llvm-svn: 250795
This wasn't doing anything useful. They weren't explicitly used
anywhere, and the RegScavenger ignores reserved registers.
This for some reason caused a random scheduling change in the test.
Getting the check lines to pass is too frustrating, and there's probably
not too much value in checking the vector case's operands N times.
llvm-svn: 250794
`normalizeForInvokeSafepoint` in RewriteStatepointsForGC.cpp, as it is
written today, deals with `gc.relocate` and `gc.result` uses of a
statepoint equally well. This change documents this fact and adds a
test case.
There is no functional change here -- only documentation of existing
functionality.
llvm-svn: 250784
"expr -r" does this. It also returns to a REPL if the LLDB command interpreter
is neseted inside it, for example in cases where a REPL command resulted in a
breakpoint being hit or a crash.
llvm-svn: 250780
There are two things out of the ordinary in this commit. First, I made
a loop obviously "infinite" in HexagonInstrInfo.cpp. After checking if
an instruction was at the beginning of a basic block (in which case,
`break`), the loop decremented and checked the iterator for `nullptr` as
the loop condition. This has never been possible (the prev pointers are
always been circular, so even with the weird ilist/iplist
implementation, this isn't been possible), so I removed the condition.
Second, in HexagonAsmPrinter.cpp there was another case of comparing a
`MachineBasicBlock::instr_iterator` against `MachineBasicBlock::end()`
(which returns `MachineBasicBlock::iterator`). While not incorrect,
it's fragile. I switched this to `::instr_end()`.
All that said, no functionality change intended here.
llvm-svn: 250778