Patch to allow (v)blendps, (v)blendpd, (v)pblendw and vpblendd instructions to be commuted - swaps the src registers and inverts the blend mask.
This is primarily to improve memory folding (see new tests), but it also improves the quality of shuffles (see modified tests).
Differential Revision: http://reviews.llvm.org/D6015
llvm-svn: 221313
change LoopSimplifyPass to be !isCFGOnly. The motivation for the earlier patch
(r221223) was that LoopSimplify is not preserved by instcombine though
setPreservesCFG indicates that it is. This change fixes the issue
by making setPreservesCFG no longer imply LoopSimplifyPass, and is therefore less
invasive.
llvm-svn: 221311
The command line options are specified in a space-separated list that is an
argument to -dwarf-debug-flags, so that breaks if there are spaces in the
options. This feature came from Apple's internal version of GCC, so I went back
to check how llvm-gcc handled this and matched that behavior.
rdar://problem/18775420
llvm-svn: 221309
While fixing up the register classes in the machine combiner in a previous
commit I missed one.
This fixes the last one and adds a test case.
llvm-svn: 221308
Clang -gsplit-dwarf self-host -O0, binary increases by 0.0005%, -O2,
binary increases by 25%.
A large binary inside Google, split-dwarf, -O0, and other internal flags
(GDB index, etc) increases by 1.8%, optimized build is 35%.
The size impact may be somewhat greater in .o files (I haven't measured
that much - since the linked executable -O0 numbers seemed low enough)
due to relocations. These relocations could be removed if we taught the
llvm-symbolizer to handle indexed addressing in the .o file (GDB can't
cope with this just yet, but GDB won't be reading this info anyway).
Also debug_ranges could be shared between .o and .dwo, though ideally
debug_ranges would get a schema that could used index(+offset)
addressing, and move to the .dwo file, then we'd be back to sharing
addresses in the address pool again.
But for now, these sizes seem small enough to go ahead with this.
Verified that no other DW_TAGs are produced into the .o file other than
subprograms and inlined_subroutines.
llvm-svn: 221306
We were producing a relocation for
----------------
.section foo,bar
La:
Lb:
.long La-Lb
--------------
but not for
---------------------
.section foo,bar
zed:
La:
Lb:
.long La-Lb
----------------
This patch handles the case where both fragments are part of the first atom
in a section and there is no corresponding symbol to that atom.
This fixes pr21328.
llvm-svn: 221304
# Make DataInfo (describing a global) a member of ReportLocation
to avoid unnecessary copies and allocations.
# Introduce a constructor and a factory method, so that
all structure users don't have to go to internal allocator directly.
# Remove unused fields (file/line).
No functionality change.
llvm-svn: 221302
The job of the CompactUnwind pass is to turn __compact_unwind data (and
__eh_frame) into the compressed final form in __unwind_info. After it's done,
the original atoms are no longer relevant and should be deleted (they cause
problems during actual execution, quite apart from the fact that they're not
needed).
llvm-svn: 221301
This patch adds 'FeatureSlowSHLD' to 'bdver3'.
According to the official AMD optimization guide for amdfam15: "Using
alternative code in place of SHLD achieves lower overall latency and
requires fewer execution resources. The 32-bit and 64-bit forms of
ADD, ADC, SHR, and LEA (except 16-bit form) are DirectPath
instructions, while SHLD is a VectorPath instruction."
This patch also explicitly sets feature AVX and SSE4A for all the bdver*
cpus. This part of the patch is a non-functional change and it is mainly
done for clarity reasons (Both XOP and FMA4 already imply AVX and SSE4A).
llvm-svn: 221296
AddressInfo contains the results of symbolization. Store this object
directly in the symbolized stack, instead of copying data around and
making unnecessary memory allocations.
No functionality change.
llvm-svn: 221294
Registers are not all equal. Some are not allocatable (infinite cost),
some have to be preserved but can be used, and some others are just free
to use.
Ensure there is a cost hierarchy reflecting this fact, so that the
allocator will favor scratch registers over callee-saved registers.
llvm-svn: 221293
This patch improves how the different costs (register, interference, spill
and coalescing) relates together. The assumption is now that:
- coalescing (or any other "side effect" of reg alloc) is negative, and
instead of being derived from a spill cost, they use the block
frequency info.
- spill costs are in the [MinSpillCost:+inf( range
- register or interference costs are in [0.0:MinSpillCost( or +inf
The current MinSpillCost is set to 10.0, which is a random value high
enough that the current constraint builders do not need to worry about
when settings costs. It would however be worth adding a normalization
step for register and interference costs as the last step in the
constraint builder chain to ensure they are not greater than SpillMinCost
(unless this has some sense for some architectures). This would work well
with the current builder pipeline, where all costs are tweaked relatively
to each others, but could grow above MinSpillCost if the pipeline is
deep enough.
The current heuristic is tuned to depend rather on the number of uses of
a live interval rather than a density of uses, as used by the greedy
allocator. This heuristic provides a few percent improvement on a number
of benchmarks (eembc, spec, ...) and will definitely need to change once
spill placement is implemented: the current spill placement is really
ineficient, so making the cost proportionnal to the number of use is a
clear win.
llvm-svn: 221292
TSan used to do the following transformations with data obtained
from the symbolizer:
1) Strip "__interceptor_" prefix from function name.
2) Use "strip_path_prefix" runtime flag to strip filepath.
Now these transformations are performed right before the stack trace
is printed, and ReportStack structure contains original information.
This seems like a right thing to do - stripping is a detail of report
formatting implementation, and should belong there. We should, for
example, use original path to source file when we apply suppressions.
This change also make "strip_path_prefix" flag behavior in TSan
consistent with all the other sanitizers - now it should actually
match *the prefix* of path, not some substring. E.g. earlier TSan
would turn "/usr/lib/libfoo.so" into "libfoo.so" even if strip_path_prefix
was "/lib/".
Finally, strings obtained from symbolizer come from internal allocator,
and "stripping" them early by incrementing a "char*" ensures they can
never be properly deallocated, which is a bug.
llvm-svn: 221283
Change the LC_ID_DYLIB of ASan's dynamic libraries on OS X to be set to "@rpath/libclang_rt.asan_osx_dynamic.dylib" and similarly for iossim. Clang driver then sets the "-rpath" to be the real path to where clang currently has the dylib (because clang uses the relative path to its current executable). This means if you move the compiler or install the binary release, -fsanitize=address will link to the proper library.
Reviewed at http://reviews.llvm.org/D6018
llvm-svn: 221279
Change the LC_ID_DYLIB of ASan's dynamic libraries on OS X to be set to "@rpath/libclang_rt.asan_osx_dynamic.dylib" and similarly for iossim. Clang driver then sets the "-rpath" to be the real path to where clang currently has the dylib (because clang uses the relative path to its current executable). This means if you move the compiler or install the binary release, -fsanitize=address will link to the proper library.
Reviewed at http://reviews.llvm.org/D6018
llvm-svn: 221278
Summary:
Appropriately set/clear the FeatureBit for Mips16 when these assembler directives are used and also emit ".set nomips16" (previously, only ".set mips16" was being emitted).
These improvements allow for better testing of the .cpload/.cprestore assembler directives (which are not supposed to work when Mips16 is enabled).
Test Plan: The test is bare-bones because there are no MC tests for Mips16 instructions (there's only one, which checks that the Mips16 ELF header flag gets set), and that suggests to me that it has not been implemented yet in the IAS.
Reviewers: dsanders
Reviewed By: dsanders
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D5462
llvm-svn: 221277