Peephole optimization was folding MOVSDrm, which is a zero-extending double
precision floating point load, into ADDPDrr, which is a SIMD add of two packed
double precision floating point values.
(before)
%vreg21<def> = MOVSDrm <fi#0>, 1, %noreg, 0, %noreg; mem:LD8[%7](align=16)(tbaa=<badref>) VR128:%vreg21
%vreg23<def,tied1> = ADDPDrr %vreg20<tied0>, %vreg21; VR128:%vreg23,%vreg20,%vreg21
(after)
%vreg23<def,tied1> = ADDPDrm %vreg20<tied0>, <fi#0>, 1, %noreg, 0, %noreg; mem:LD8[%7](align=16)(tbaa=<badref>) VR128:%vreg23,%vreg20
X86InstrInfo::foldMemoryOperandImpl already had the logic that prevented this
from happening. However the check wasn't being conducted for loads from stack
objects. This commit factors out the logic into a new function and uses it for
checking loads from stack slots are not zero-extending loads.
rdar://problem/18236850
llvm-svn: 217799
Also, in case they don't define any, change the default from "Run Python function <blah>" into "For more information run help <blah>"
The core issue here is that Python only allows one docstring per function, so we can't really attach both a short and a long help to the same command easily
There are alternatives but this is not a pressing enough concern to go through the motions quite yet
Fixes rdar://18322737
llvm-svn: 217795
More methods to follow.
Using StringRef allows us the EE interface to work with more string types
without forcing construction of std::strings.
llvm-svn: 217794
Patch by Rafael Auler!
This patch addresses PR15171 and teaches Clang how to call other tools
with response files, when the command line exceeds system limits. This
is a problem for Windows systems, whose maximum command-line length is
32kb.
I introduce the concept of "response file support" for each Tool object.
A given Tool may have full support for response files (e.g. MSVC's
link.exe) or only support file names inside response files, but no flags
(e.g. Apple's ld64, as commented in PR15171), or no support at all (the
default case). Therefore, if you implement a toolchain in the clang
driver and you want clang to be able to use response files in your
tools, you must override a method (getReponseFileSupport()) to tell so.
I designed it to support different kinds of tools and
internationalisation needs:
- VS response files ( UTF-16 )
- GNU tools ( uses system's current code page, windows' legacy intl.
support, with escaped backslashes. On unix, fallback to UTF-8 )
- Clang itself ( UTF-16 on windows, UTF-8 on unix )
- ld64 response files ( only a limited file list, UTF-8 on unix )
With this design, I was able to test input file names with spaces and
international characters for Windows. When the linker input is large
enough, it creates a response file with the correct encoding. On a Mac,
to test ld64, I temporarily changed Clang's behavior to always use
response files regardless of the command size limit (avoiding using huge
command line inputs). I tested clang with the LLVM test suite (compiling
benchmarks) and it did fine.
Test Plan: A LIT test that tests proper response files support. This is
tricky, since, for Unix systems, we need a 2MB response file, otherwise
Clang will simply use regular arguments instead of a response file. To
do this, my LIT test generate the file on the fly by cloning many -DTEST
parameters until we have a 2MB file. I found out that processing 2MB of
arguments is pretty slow, it takes 1 minute using my notebook in a debug
build, or 10s in a Release build. Therefore, I also added "REQUIRES:
long_tests", so it will only run when the user wants to run long tests.
In the full discussion in
http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20130408/171463.html,
Rafael Espindola discusses a proper way to test
llvm::sys::argumentsFitWithinSystemLimits(), and, there, Chandler
suggests to use 10 times the current system limit (20MB resp file), so
we guarantee that the system will always use response file, even if a
new linux comes up that can handle a few more bytes of arguments.
However, by testing with a 20MB resp file, the test takes long 8 minutes
just to perform a silly check to see if the driver will use a response
file. I found it to be unreasonable. Thus, I discarded this approach and
uses a 2MB response file, which should be enough.
Reviewers: asl, rafael, silvas
Reviewed By: silvas
Subscribers: silvas, rnk, thakis, cfe-commits
Differential Revision: http://reviews.llvm.org/D4897
llvm-svn: 217792
This is another case of conditional ownership (in this case a raw
reference, plus a boolean to indicate whether the referenced object
should be deleted). While it's not ideal, I prefer to make the ownership
explicit with a unique_ptr than using a boolean flag (though it does
make the reference and the unique_ptr redundant in the sense that they
both refer to the same memory). At some point we might write a reusable
conditional ownership pointer (a stateful custom deleter for a unique_ptr
may be appropriate).
Based on a patch from a patch by Anton Yartsev.
llvm-svn: 217791
This adds a flag called -fseh-exceptions that uses the native Windows
.pdata and .xdata unwind mechanism to throw exceptions. The other EH
possibilities are DWARF and SJLJ exceptions.
Patch by Martell Malone!
Reviewed By: asl, rnk
Differential Revision: http://reviews.llvm.org/D3419
llvm-svn: 217790
Add some more tests to make sure better operand
choices are still made. Leave some cases that seem
to have no reason to ever be e64 alone.
llvm-svn: 217789
I noticed some odd looking cases where addr64 wasn't set
when storing to a pointer in an SGPR. This seems to be intentional,
and partially tested already.
The documentation seems to describe addr64 in terms of which registers
addressing modifiers come from, but I would expect to always need
addr64 when using 64-bit pointers. If no offset is applied,
it makes sense to not need to worry about doing a 64-bit add
for the final address. A small immediate offset can be applied,
so is it OK to not have addr64 set if a carry is necessary when adding
the base pointer in the resource to the offset?
llvm-svn: 217785
ELF objects contain marker symbols to differentiate between ARM and
THUMB functions. Instead of storing them internally and having garbage
show up when symbols are searched for by the user, we can just skip them
and not store them at all, as we never actually need them.
Change by Stephane Sezer.
Tested:
Ubuntu 14.04 x86_64
MacOSX 10.9.4 x86_64
llvm-svn: 217782
Instead of forcing the remote arch type to MachO all the time, we
inspect the OS/vendor that the remote debug server reports and use it to
set the arch type to MachO, ELF or COFF accordingly.
See thread here for more context:
http://lists.cs.uiuc.edu/pipermail/lldb-commits/Week-of-Mon-20140915/012968.html
Change by Stephane Sezer.
Tested:
MacOSX 10.9.4 x86_64
Ubuntu 14.04 x86_64
llvm-svn: 217779
This doesn't change the interface or gives additional safety but removes
a ton of retain/release boilerplate.
No functionality change.
llvm-svn: 217778
This is useful for checking inconsistencies between what the remote debug server thinks we are debugging and we think we are debugging. This follows the check for pointer byte size done just above.
Change by Stephane Sezer.
Tested:
Ubuntu 14.04 x86_64, llvm-3.5-built lldb
MacOSX 10.9.4, Xcode-Beta(2014-09-09)-built lldb.
llvm-svn: 217773
This test has a chance to hit some other random allocation
and get neither heap overflow nor SEGV.
Relax test condition to only check that there is no internal CHECK failure.
llvm-svn: 217769
As this is very dependent on the code base it has some ways of configuration.
It's possible to pick between 3 modes of operation:
- Line counting: number of lines including whitespace and comments
- Statement counting: number of statements within compoundStmts.
- Branch counter
In addition a threshold can be picked, warnings are only emitted when it is met.
The thresholds can be configured via a .clang-tidy file.
Differential Revision: http://reviews.llvm.org/D4986
llvm-svn: 217768
when SSE4.1 is available.
This removes a ton of domain crossing from blend code paths that were
ending up in the floating point code path.
This is just the tip of the iceberg though. The real switch is for
integer blend lowering to more actively rely on this instruction being
available so we don't hit shufps at all any longer. =] That will come in
a follow-up patch.
Another place where we need better support is for using PBLENDVB when
doing so avoids the need to have two complementary PSHUFB masks.
llvm-svn: 217767
of the file.
This would run past the end of the buffer. Sadly I don't have a great way to
test it, the only way to trigger the bug is having a removal fix it at the end
of the file, which none of our current warnings can generate.
llvm-svn: 217766
missing specific checks.
While there is a lot of redundancy here where all-but-one mode use the
same code generation, I'd rather have each variant spelled out and
checked so that readers aren't misled by an omission in the test suite.
llvm-svn: 217765
Summary:
These two functions are unavailable on MSVC2012, which breaks building the
ASAN tests with MSVC2012. Since the tests required to run these functions
are disabled on Windows for now, avoid building them to fix the MSVC2012
builds.
Test Plan: This is needed in order to fix building the ASAN tests with MSVC2012.
Reviewers: timurrrr
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D5343
llvm-svn: 217763
Summary: This is copied from llvm/utils/unittest/CMakeLists.txt.
Test Plan: This partly enables building ASAN tests with MSVC2012.
Reviewers: timurrrr
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D5342
llvm-svn: 217762
instructions from the relevant shuffle patterns.
This is the last tweak I'm aware of to generate essentially perfect
v4f32 and v2f64 shuffles with the new vector shuffle lowering up through
SSE4.1. I'm sure I've missed some and it'd be nice to check since v4f32
is amenable to exhaustive exploration, but this is all of the tricks I'm
aware of.
With AVX there is a new trick to use the VPERMILPS instruction, that's
coming up in a subsequent patch.
llvm-svn: 217761
instructions when it finds an appropriate pattern.
These are lovely instructions, and its a shame to not use them. =] They
are fast, and can hand loads folded into their operands, etc.
I've also plumbed the comment shuffle decoding through the various
layers so that the test cases are printed nicely.
llvm-svn: 217758
AVX is available, and generally tidy up things surrounding UNPCK
formation.
Originally, I was thinking that the only advantage of PSHUFD over UNPCK
instruction variants was its free copy, and otherwise we should use the
shorter encoding UNPCK instructions. This isn't right though, there is
a larger advantage of being able to fold a load into the operand of
a PSHUFD. For UNPCK, the operand *must* be in a register so it can be
the second input.
This removes the UNPCK formation in the target-specific DAG combine for
v4i32 shuffles. It also lifts the v8 and v16 cases out of the
AVX-specific check as they are potentially replacing multiple
instructions with a single instruction and so should always be valuable.
The floating point checks are simplified accordingly.
This also adjusts the formation of PSHUFD instructions to attempt to
match the shuffle mask to one which would fit an UNPCK instruction
variant. This was originally motivated to allow it to match the UNPCK
instructions in the combiner, but clearly won't now.
Eventually, we should add a MachineCombiner pass that can form UNPCK
instructions post-RA when the operand is known to be in a register and
thus there is no loss.
llvm-svn: 217755
'punpckhwd' instructions when suitable rather than falling back to the
generic algorithm.
While we could canonicalize to these patterns late in the process, that
wouldn't help when the freedom to use them is only visible during
initial lowering when undef lanes are well understood. This, it turns
out, is very important for matching the shuffle patterns that are used
to lower sign extension. Fixes a small but relevant regression in
gcc-loops with the new lowering.
When I changed this I noticed that several 'pshufd' lowerings became
unpck variants. This is bad because it removes the ability to freely
copy in the same instruction. I've adjusted the widening test to handle
undef lanes correctly and now those will correctly continue to use
'pshufd' to lower. However, this caused a bunch of churn in the test
cases. No functional change, just churn.
Both of these changes are part of addressing a general weakness in the
new lowering -- it doesn't sufficiently leverage undef lanes. I've at
least a couple of patches that will help there at least in an academic
sense.
llvm-svn: 217752
Use fully qualified name inside a typedef from llvm::iterator_range<...> to
iterator_range. This is reported (rightly I think) by GCC as an
ambiguous name redefinition. Hope this fixes the buildbots.
llvm-svn: 217751
Some ICmpInsts when anded/ored with another ICmpInst trivially reduces
to true or false depending on whether or not all integers or no integers
satisfy the intersected/unioned range.
This sort of trivial looking code can come about when InstCombine
performs a range reduction-type operation on sdiv and the like.
This fixes PR20916.
llvm-svn: 217750