Added CLK_ABGR definition for get_image_channel_order return value inside opencl-c.h file.
Patch by Aaron En Ye Shi.
Differential Revision: https://reviews.llvm.org/D22767
llvm-svn: 277179
The DAG combiner tries to merge stores to adjacent vector wide memory
locations by creating stores which are integral multiples of the vector
width. Discourage this by informing it that this is slow. This should
not affect legalization passes, because all of them ignore the "Fast"
argument.
Patch by Pranav Bhandarkar.
llvm-svn: 277178
As mentioned in commit log for r276686 this next step is adding a new
method in the ArchiveMemberHeader class to get the full name that
does proper error checking, and can be use for error messages.
To do this the name of ArchiveMemberHeader::getName() is changed to
ArchiveMemberHeader::getRawName() to be consistent with
Archive::Child::getRawName(). Then the “new” method is the addition
of a new implementation of ArchiveMemberHeader::getName() which gets
the full name and provides proper error checking. Which is mostly a rewrite
of what was Archive::Child::getName() and cleaning up incorrect uses of
llvm_unreachable() in the code which were actually just cases of errors
in the input Archives.
Then Archive::Child::getName() is changed to return Expected<> and use
the new implementation of ArchiveMemberHeader::getName() .
Also needed to change Archive::getMemoryBufferRef() with these
changes to return Expected<> as well to propagate Errors up.
As well as changing Archive::isThinMember() to return Expected<> .
llvm-svn: 277177
For MachineInstrBuilder, having to manually use RegState::Define is ugly and
makes register definitions clunkier than they need to be, so this adds two
convenience functions: addDef and addUse.
For MachineIRBuilder, we want to avoid BuildMI's first-reg-is-def rule because
it's hidden away and causes bugs. So this patch switches buildInstr to
returning a MachineInstrBuilder and adding *all* operands via addDef/addUse.
NFC.
llvm-svn: 277176
Mostly straightforward as we ignore addressing modes and just
use the base + unsigned immediate offset (always 0) variants.
This currently fails to select extloads because we have yet to
agree on a representation.
llvm-svn: 277171
Software pipelining is an optimization for improving ILP by
overlapping loop iterations. Swing Modulo Scheduling (SMS) is
an implementation of software pipelining that attempts to
reduce register pressure and generate efficient pipelines with
a low compile-time cost.
This implementaion of SMS is a target-independent back-end pass.
When enabled, the pass should run just prior to the register
allocation pass, while the machine IR is in SSA form. If the pass
is successful, then the original loop is replaced by the optimized
loop. The optimized loop contains one or more prolog blocks, the
pipelined kernel, and one or more epilog blocks.
This pass is enabled for Hexagon only. To enable for other targets,
a couple of target specific hooks must be implemented, and the
pass needs to be called from the target's TargetMachine
implementation.
Differential Review: http://reviews.llvm.org/D16829
llvm-svn: 277169
If the mask of a vector shuffle has alternating odd or even numbers
starting with 1 or 0 respectively up to the largest possible index
for the given type in the given HVX mode (single of double) we can
generate vpacko or vpacke instruction respectively.
E.g.
%42 = shufflevector <32 x i16> %37, <32 x i16> %41,
<32 x i32> <i32 1, i32 3, ..., i32 63>
is %42.h = vpacko(%41.w, %37.w)
Patch by Pranav Bhandarkar.
llvm-svn: 277168
When coming from an IR label type, we set a 0 NumElements, but not
when constructing an LLT using unsized(), causing comparisons to fail.
Pick one variant and fix the other.
llvm-svn: 277161
This undoes my last commit. It collided with Pavel undoing
his change that my previous commit was adjusting for in the
Xcode file.
This reverts commit f6f29cb7d7c56f96f21d9c115ecc66d652639df3.
llvm-svn: 277157
This reverts commit r277139, because:
- broken unittest on windows (likely typo on my part)
- seems to break TestCallThatRestart (needs investigation)
llvm-svn: 277154
Rebalances address calculation trees and applies Hexagon-specific
optimizations to the trees to improve instruction selection.
Patch by Tobias Edler von Koch.
llvm-svn: 277151
The post register allocator scheduler can generate poor schedules
because the scoreboard hazard recognizer is unable to identify
hazards for Hexagon precisely. Instead, Hexagon should use a DFA
based hazard recognizer.
Patch by Brendon Cahoon.
llvm-svn: 277143
Summary:
There were places in the code, assuming(hardcoding) offsets
and types that were only valid for the x86_64 elf core file format.
The NT_PRSTATUS and NT_PRPSINFO structures are with the 64 bit layout.
I have reused them and parse i386 files manually, and fill them in the
same struct.
Also added some error handling during parsing that checks if the
available bytes in the buffer are enough to fill the structures.
The i386 core file test case now passes.
For reference on the structures layout, I generally used the
source of binutils (bfd, readelf)
Bug: https://llvm.org/bugs/show_bug.cgi?id=26947
Reviewers: labath
Subscribers: lldb-commits
Differential Revision: https://reviews.llvm.org/D22917
llvm-svn: 277140
SendContinuePacketAndWaitForResponse was huge function with very complex interactions with
several other functions (SendAsyncSignal, SendInterrupt, SendPacket). This meant that making any
changes to how packet sending functions and threads interact was very difficult and error-prone.
This change does not add any functionality yet, it merely paves the way for future changes. In a
follow-up, I plan to add the ability to have multiple query packets in flight (i.e.,
request,request,response,response instead of the usual request,response sequences) and use that
to speed up qModuleInfo packet processing.
Here, I introduce two special kinds of locks: ContinueLock, which is used by the continue thread,
and Lock, which is used by everyone else. ContinueLock (atomically) sends a continue packet, and
blocks any other async threads from accessing the connection. Other threads create an instance of
the Lock object when they want to access the connection. This object, while in scope prevents the
continue from being send. Optionally, it can also interrupt the process to gain access to the
connection for async processing.
Most of the syncrhonization logic is encapsulated within these two classes. Some of it still
had to bleed over into the SendContinuePacketAndWaitForResponse, but the function is still much
more manageable than before -- partly because of most of the work is done in the ContinueLock
class, and partly because I have factored out a lot of the packet processing code separate
functions (this also makes the functionality more easily testable). Most importantly, there is
none of syncrhonization code in the async thread users -- as far as they are concerned, they just
need to declare a Lock object, and they are good to go (SendPacketAndWaitForResponse is now a
very thin wrapper around the NoLock version of the function, whereas previously it had over 100
lines of synchronization code). This will make my follow up changes there easy.
I have written a number of unit tests for the new code and I have ran the test suite on linux and
osx with no regressions.
Subscribers: tberghammer
Differential Revision: https://reviews.llvm.org/D22629
llvm-svn: 277139
This patch adds 48-bits VMA support for tsan on aarch64. As current
mappings for aarch64, 48-bit VMA also supports PIE executable. This
limits the mapping mechanism because the PIE address bits
(usually 0aaaaXXXXXXXX) makes it harder to create a mask/xor value
to include all memory regions. I think it is possible to create a
large application VAM range by either dropping PIE support or tune
current range.
It also changes slight the way addresses are packed in SyncVar structure:
previously it assumes x86_64 as the maximum VMA range. Since ID is 14 bits
wide, shifting 48 bits should be ok.
Tested on x86_64, ppc64le and aarch64 (39 and 48 bits VMA).
llvm-svn: 277137
Summary:
Implements fastLowerArguments() to avoid the need to fall back on
SelectionDAG for 0-4 argument functions that don't do tricky things like
passing double in a pair of i32's.
This allows us to move all except one test to -fast-isel-abort=3. The
remaining one has function prototypes of the form 'i32 (i32, double, double)'
which requires floats to be passed in GPR's.
The previous commit had an uninitialized variable that caused the incoming
argument region to have undefined size. This has been fixed.
Reviewers: sdardis
Subscribers: dsanders, llvm-commits, sdardis
Differential Revision: https://reviews.llvm.org/D22680
llvm-svn: 277136
[DAG] Check debug values for invalidation before transferring and mark
old debug values invalid when transferring to another SDValue.
This fixes PR28613.
Reviewers: jyknight, hans, dblaikie, echristo
Subscribers: yaron.keren, ismail, llvm-commits
Differential Revision: https://reviews.llvm.org/D22858
llvm-svn: 277135
As reported in bug 28473, GCC supports "final" functionality in pre-C++11 code using the __final keyword. Clang currently supports the "final" keyword in accordance with the C++11 specification, however it ALSO supports it in pre-C++11 mode, with a warning.
This patch adds the "__final" keyword for compatibility with GCC in GCC Keywords mode (so it is enabled with existing flags), and issues a warning on its usage (suggesting switching to the C++11 keyword). This patch also adds a regression test for the functionality described. I believe this patch has minimal impact, as it simply adds a new keyword for existing behavior.
This has been validated with check-clang to avoid regressions. Patch is created in reference to revisions 276665.
Patch by Erich Keane.
Differential Revision: https://reviews.llvm.org/D22919
llvm-svn: 277134
The commit accidentally switched a timed wait on a condition variable into an infinite timeout.
Change that back. Android tests were timeing out without this.
llvm-svn: 277133
We currently default to using either generic shuffles or MASK+PACKUS/PACKSS to truncate all integer vectors. For vector comparisons, we know that the result will be either all or zero bits in every element, which can be efficiently truncated by directly using PACKSS to repeatedly halve the size of each element.
Due to the limited input values (-1 or 0) we don't need to account for vector element size, so for simplicity we just use the PACKSS(vXi16,vXi16) implementation in all cases. Additionally for AVX2 PACKSS of 256bit data we must perform a PERMQ shuffle to reorder the data into the correct order. I did investigate performing a single shuffle after all the PACKSS calls but the need to cross 128bit lanes makes this difficult to achieve efficiently.
We avoid performing this on AVX512 as it should have better alternative truncation instructions.
Differential Revision: https://reviews.llvm.org/D22814
llvm-svn: 277132
The complexity of renaming a USR is O(N) [N stands for number of nodes in
Translation Unit]. In some cases there are more than one USR for a single symbol
(see overridden functions and ctor/dtor handling), which means that the
complexity of finding all of the corresponding USRs is O(N * M) [M stands for
number of USRs corresponding to the symbols, which may be not quite small]. With
a simple tweak we can make it O(N * log(M)) by passing whole list of USRs
corresponding to the symbol to USRLocFinder.
llvm-svn: 277131