Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
Signed-off-by: Samuel Li <samuel.li@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
llvm-svn: 207846
The register spiller assumes that only one new instruction is created
when spilling and restoring registers, so we need to emit pseudo
instructions for vector register spills and lower them after
register allocation.
v2:
- Fix calculation of lane index
- Extend VGPR liveness to end of program.
v3:
- Use SIMM16 field of S_NOP to specify multiple NOPs.
https://bugs.freedesktop.org/show_bug.cgi?id=75005
llvm-svn: 207843
Previously, LLVM had no knowledge that these instructions actually
modified their address register: fine if they never end up in CodeGen,
but when I'd rather like to write some patterns for them it becomes a
disaster.
The change is mostly straightforward, I think the most significant
design decision was to *always* put the address write-back first. This
allows loads and stores to be accessed more uniformly, for example
permitting the continued sharing of the InstAlias definitions.
I also discovered that the custom Decode logic is no longer needed, so
I removed it.
No tests, because there should be no functionality change.
llvm-svn: 207839
While post-indexed LD1/ST1 instructions do exist for vector loads,
this patch makes use of the more flexible addressing-modes in LDR/STR
instructions.
llvm-svn: 207838
Export definitions in a module definition file is as follows:
exportedname[=internalname] [@ordinal [NONAME]] [PRIVATE] [DATA]
Previously we did not support =internalname, so users couldn't export
symbols from a DLL with a different name.
llvm-svn: 207827
The Preprocessor::Initialize() function already offers a clear interface to
achieve this, further reducing the confusing number of states a newly
constructed preprocessor can have.
llvm-svn: 207825
These calls to ConsumeCodeCompletionToken() caused parsing to continue
needlessly when an immediate cutOffParsing() would do.
Document the function to clarify its correct usage.
llvm-svn: 207823
class template member classes (PR19613)
Also improve this code in general by implementing suggestions
from Richard.
Differential Revision: http://reviews.llvm.org/D3555?id=9020
llvm-svn: 207822
dependent-type-member-pointer.cpp is failing on a win64 bot because
-fms-extensions is not enabled. Use ConvertType rather than relying on
the inheritance attributes. It's less code, but probably slower.
llvm-svn: 207819
The Win64 ABI docs on MSDN say that arguments bigger than 8 bytes are
passed by reference. Prior to this change, we were only applying this
logic to RecordType arguments. This affects both the Itanium and
Microsoft C++ ABIs.
Reviewers: majnemer
Differential Revision: http://reviews.llvm.org/D3587
llvm-svn: 207817
- CTRL+C wasn't clearing the command in lldb
- CTRL+C doesn't work in python macros in lldb
- Ctrl+C no longer interrupts the running process that you attach to
<rdar://problem/15949205>
<rdar://problem/16778652>
<rdar://problem/16774411>
llvm-svn: 207816
This code is trying to test if the pointer is *not* null. Therefore we
should use 'or' instead of 'and' to combine the results of 'icmp ne'.
This logic is consistent with the general member pointer comparison code
in EmitMemberPointerComparison.
llvm-svn: 207815
Given the following C code llvm currently generates suboptimal code for
x86-64:
__m128 bss4( const __m128 *ptr, size_t i, size_t j )
{
float f = ptr[i][j];
return (__m128) { f, f, f, f };
}
=================================================
define <4 x float> @_Z4bss4PKDv4_fmm(<4 x float>* nocapture readonly %ptr, i64 %i, i64 %j) #0 {
%a1 = getelementptr inbounds <4 x float>* %ptr, i64 %i
%a2 = load <4 x float>* %a1, align 16, !tbaa !1
%a3 = trunc i64 %j to i32
%a4 = extractelement <4 x float> %a2, i32 %a3
%a5 = insertelement <4 x float> undef, float %a4, i32 0
%a6 = insertelement <4 x float> %a5, float %a4, i32 1
%a7 = insertelement <4 x float> %a6, float %a4, i32 2
%a8 = insertelement <4 x float> %a7, float %a4, i32 3
ret <4 x float> %a8
}
=================================================
shlq $4, %rsi
addq %rdi, %rsi
movslq %edx, %rax
vbroadcastss (%rsi,%rax,4), %xmm0
retq
=================================================
The movslq is uneeded, but is present because of the trunc to i32 and then
sext back to i64 that the backend adds for vbroadcastss.
We can't remove it because it changes the meaning. The IR that clang
generates is already suboptimal. What clang really should emit is:
%a4 = extractelement <4 x float> %a2, i64 %j
This patch makes that legal. A separate patch will teach clang to do it.
Differential Revision: http://reviews.llvm.org/D3519
llvm-svn: 207801