The ARM backend currently has poor codegen for long sext/zext
operations, such as v8i8 -> v8i32. This patch addresses this
by performing a custom expansion in ARMISelLowering. It also
adds/changes the cost of such lowering in ARMTTI.
This partially addresses PR14867.
Patch by Pete Couperus
llvm-svn: 177380
- TestCase.m_thread is now filled in with the first thread that has a valid
stop reason. This eliminates the need for the SelectMyThread() functions.
- The first thread that stops for a reason is also set as the selected thread
in the process in case any command line commands are run.
- Changed launch over to take a SBLaunchInfo parameter so that the launch
function doesn't keep getting new arguments as they are needed.
- TestCase::Setup() and TestCase::Launch(SBLaunchInfo) now return bool to
indicate success of setup and launch.
- ActionWanted::Next(SBThread) was renamed to ActionWanted::StepOver(SBThread)
- ActionWanted::Finish(SBThread) was renamed to ActionWanted::StepOut(SBThread)
llvm-svn: 177376
The general pattern now is that Foobar::constructTool only creates tools
defined in the tools::foobar namespace and then delegates to the parent.
The remaining duplicated code is now in the tools themselves.
llvm-svn: 177368
The global module index was querying the file manager for each of the
module files it knows about at load time, to prune out any out-of-date
information. The file manager would then cache the results of the
stat() falls used to find that module file.
Later, the same translation unit could end up trying to import one of the
module files that had previously been ignored by the module cache, but
after some other Clang instance rebuilt the module file to bring it
up-to-date. The stale stat() results in the file manager would
trigger a second rebuild of the already-up-to-date module, causing
failures down the line.
The global module index now lazily resolves its module file references
to actual AST reader module files only after the module file has been
loaded, eliminating the stat-caching race. Moreover, the AST reader
can communicate to its caller that a module file is missing (rather
than simply being out-of-date), allowing us to simplify the
module-loading logic and allowing the compiler to recover if a
dependent module file ends up getting deleted.
llvm-svn: 177367
Fixed a crasher in the SourceManager where it wasn't checking the m_target member variable for NULL.
In doing this fix, I hardened this class to have weak pointers to the debugger and target in case they do go away. I also changed SBSourceManager to hold onto weak pointers to the debugger and target so they don't keep objects alive by holding a strong reference to them.
llvm-svn: 177365
and the JITted code are managed by a standalone
class that handles memory management itself.
I have removed RecordingMemoryManager and
ProcessDataAllocator, which filled similar roles
and had confusing ownership, with a common class
called IRExecutionUnit. The IRExecutionUnit
manages all allocations ever made for an expression
and frees them when it goes away. It also contains
the code generator and can vend the Module for an
expression to other clases.
The end goal here is to make the output of the
expression parser re-usable; that is, to avoid
re-parsing when re-parsing isn't necessary.
I've also cleaned up some code and used weak pointers
in more places. Please let me know if you see any
leaks; I checked myself as well but I might have
missed a case.
llvm-svn: 177364
it wasn't taking into account that the float should be truncated *before* the
range check happens. Thus (unsigned)-0.99 and (unsigned char)255.9 have defined
behavior and should not be trapped.
llvm-svn: 177362
Splitting the graph trimming and the path-finding (r177216) already
recovered quite a bit of performance lost to increased suppression.
We can still do better by also performing the reverse BFS up front
(needed for shortest-path-finding) and only walking the shortest path
for each report. This does mean we have to walk back up the path and
invalidate all the BFS numbers if the report turns out to be invalid,
but it's probably still faster than redoing the full BFS every time.
More performance work for <rdar://problem/13433687>
llvm-svn: 177353
The previous implementation missed the case where the elif condition was
evaluated from the context of an #ifdef that was false causing PR15539.
llvm-svn: 177345
For each compile unit, we want to register a function that will flush that
compile unit. Otherwise, __gcov_flush() would only flush the counters within the
current compile unit, and not any outside of it.
PR15191 & <rdar://problem/13167507>
llvm-svn: 177340
PPC64 supports unaligned loads and stores of 64-bit values, but
in order to use the r+i forms, the offset must be a multiple of 4.
Unfortunately, this cannot always be determined by examining the
immediate itself because it might be available only via a TOC entry.
In order to get around this issue, we additionally predicate the
selection of the r+i form on the alignment of the load or store
(forcing it to be at least 4 in order to select the r+i form).
llvm-svn: 177338
The __llvm_gcov_flush() functions only work for the local compile unit. However,
when __gcov_flush() is called, the user expects all of the counters to be
flushed, not just the ones in the current compile unit.
This adds some library functions that register the flush functions. It also
defined __gcov_flush() so that loops through that list and calls the functions.
PR15191 & <rdar://problem/13167507>
llvm-svn: 177337
The default logic marks them as too expensive.
For example, before this patch we estimated:
cost of 16 for instruction: %r = uitofp <4 x i16> %v0 to <4 x float>
While this translates to:
vmovl.u16 q8, d16
vcvt.f32.u32 q8, q8
All other costs are left to the values assigned by the fallback logic. Theses
costs are mostly reasonable in the sense that they get progressively more
expensive as the instruction sequences emitted get longer.
radar://13445992
llvm-svn: 177334
Fix cost of some "cheap" cast instructions. Before this patch we used to
estimate for example:
cost of 16 for instruction: %r = fptoui <4 x float> %v0 to <4 x i16>
While we would emit:
vcvt.s32.f32 q8, q8
vmovn.i32 d16, q8
vuzp.8 d16, d17
All other costs are left to the values assigned by the fallback logic. Theses
costs are mostly reasonable in the sense that they get progressively more
expensive as the instruction sequences emitted get longer.
radar://13434072
llvm-svn: 177333