If we are extracting a subvector that has just been inserted then we should just use the original inserted subvector.
This has come up in certain several x86 shuffle lowering cases where we are crossing 128-bit lanes.
Differential Revision: https://reviews.llvm.org/D24254
llvm-svn: 280715
Currently the pass updates branch weights in the IR if the function has
any PGO info (entry frequency is set). However we could still have
regions of the CFG that does not have branch weights collected (e.g. a
cold region). In this case we'd use static estimates. Since static
estimates for branches are determined independently, they are
inconsistent. Updating them can "randomly" inflate block frequencies.
I've run into this in a completely cold loop of h264ref from
SPEC. -Rpass-with-hotness showed the loop to be completely cold during
inlining (before JT) but completely hot during vectorization (after JT).
The new testcase demonstrate the problem. We check array elements
against 1, 2 and 3 in a loop. The check against 3 is the loop-exiting
check. The block names should be self-explanatory.
In this example, jump threading incorrectly updates the weight of the
loop-exiting branch to 0, drastically inflating the frequency of the
loop (in the range of billions).
There is no run-time profile info for edges inside the loop, so branch
probabilities are estimated. These are the resulting branch and block
frequencies for the loop body:
check_1 (16)
(8) / |
eq_1 | (8)
\ |
check_2 (16)
(8) / |
eq_2 | (8)
\ |
check_3 (16)
(1) / |
(loop exit) | (15)
|
(back edge)
First we thread eq_1 -> check_2 to check_3. Frequencies are updated to
remove the frequency of eq_1 from check_2 and then from the false edge
leaving check_2. Changed frequencies are highlighted with * *:
check_1 (16)
(8) / |
eq_1~ | (8)
/ |
/ check_2 (*8*)
/ (8) / |
\ eq_2 | (*0*)
\ \ |
` --- check_3 (16)
(1) / |
(loop exit) | (15)
|
(back edge)
Next we thread eq_1 -> check_3 and eq_2 -> check_3 to check_1 as new
back edges. Frequencies are updated to remove the frequency of eq_1 and
eq_3 from check_3 and then the false edge leaving check_3 (changed
frequencies are highlighted with * *):
check_1 (16)
(8) / |
eq_1~ | (8)
/ |
/ check_2 (*8*)
/ (8) / |
/-- eq_2~ | (*0*)
(back edge) |
check_3 (*0*)
(*0*) / |
(loop exit) | (*0*)
|
(back edge)
As a result, the loop exit edge ends up with 0 frequency which in turn makes
the loop header to have maximum frequency.
There are a few potential problems here:
1. The profile data seems odd. There is a single profile sample of the
loop being entered. On the other hand, there are no weights inside the
loop.
2. Based on static estimation we shouldn't set edges to "extreme"
values, i.e. extremely likely or unlikely.
3. We shouldn't create profile metadata that is calculated from static
estimation. I am not sure what policy is but it seems to make sense to
treat profile metadata as something that is known to originate from
profiling. Estimated probabilities should only be reflected in BPI/BFI.
Any one of these would probably fix the immediate problem. I went for 3
because I think it's a good policy to have and added a FIXME about 2.
Differential Revision: https://reviews.llvm.org/D24118
llvm-svn: 280713
This was erroneously checked-in for 64 bits while trying to find if there was a way to get 64 bit atomicity in Leon processors. There is not and this change should not have been checked-in. There is no unit test for this as the existing unit tests test for behaviour to 32 bits, which was the original intention of the code.
llvm-svn: 280710
Patch implements FILL just as alias for =fillexpr.
This allows to make implementation much shorted and simpler than D24186.
Differential revision: https://reviews.llvm.org/D24227
llvm-svn: 280708
MSVC emits an error when one uses a const variable in a lambda without
capturing it.
gcc and clang don't emit an error in this scenario.
llvm-svn: 280707
LLVM PR/29052 highlighted that FastISel for MIPS attempted to lower
arguments assuming that it was using the paired 32bit registers to
perform operations for f64. This mode of operation is not supported
for MIPSR6.
This patch resolves the reported issue by adding additional checks
for unsupported floating point unit configuration.
Thanks to mike.k for reporting this issue!
Reviewers: seanbruno, vkalintiris
Differential Review: https://reviews.llvm.org/D23795
llvm-svn: 280706
Unlike PPC64, PPC32/SVRV4 does not have red zone. In the absence of it
there is no guarantee that this part of the stack will not be modified
by any interrupt. To avoid this, make sure to claim the stack frame first
before storing into it.
This fixes https://llvm.org/bugs/show_bug.cgi?id=26519.
Differential Revision: https://reviews.llvm.org/D24093
llvm-svn: 280705
This reverts commit rL280668 because the register tests fail on i386
Linux.
I investigated a little bit what causes the failure - there are missing
registers when running 'register read -a'.
This is the output I got at the bottom:
"""
...
Memory Protection Extensions:
bnd0 = {0x0000000000000000 0x0000000000000000}
bnd1 = {0x0000000000000000 0x0000000000000000}
bnd2 = {0x0000000000000000 0x0000000000000000}
bnd3 = {0x0000000000000000 0x0000000000000000}
unknown:
2 registers were unavailable.
"""
Also looking at the packets exchanged between the client and server:
"""
...
history[308] tid=0x7338 < 19> send packet: $qRegisterInfo4a#d7
history[309] tid=0x7338 < 130> read packet:
$name:bnd0;bitsize:128;offset:1032;encoding:vector;format:vector-uint64;set:Memory
Protection Extensions;ehframe:101;dwarf:101;#48
history[310] tid=0x7338 < 19> send packet: $qRegisterInfo4b#d8
history[311] tid=0x7338 < 130> read packet:
$name:bnd1;bitsize:128;offset:1048;encoding:vector;format:vector-uint64;set:Memory
Protection Extensions;ehframe:102;dwarf:102;#52
history[312] tid=0x7338 < 19> send packet: $qRegisterInfo4c#d9
history[313] tid=0x7338 < 130> read packet:
$name:bnd2;bitsize:128;offset:1064;encoding:vector;format:vector-uint64;set:Memory
Protection Extensions;ehframe:103;dwarf:103;#53
history[314] tid=0x7338 < 19> send packet: $qRegisterInfo4d#da
history[315] tid=0x7338 < 130> read packet:
$name:bnd3;bitsize:128;offset:1080;encoding:vector;format:vector-uint64;set:Memory
Protection Extensions;ehframe:104;dwarf:104;#54
history[316] tid=0x7338 < 19> send packet: $qRegisterInfo4e#db
history[317] tid=0x7338 < 76> read packet:
$name:bndcfgu;bitsize:64;offset:1096;encoding:vector;format:vector-uint8;#99
history[318] tid=0x7338 < 19> send packet: $qRegisterInfo4f#dc
history[319] tid=0x7338 < 78> read packet:
$name:bndstatus;bitsize:64;offset:1104;encoding:vector;format:vector-uint8;#8e
...
"""
The bndcfgu and bndstatus registers don't have the 'Memory Protections
Extension' set. I looked at the code and it seems that that is set
correctly.
So I'm not sure what's the problem or where does it come from.
Also there is a second failure related to something like this in the
tests:
"""
registerSet.GetName().lower()
"""
For some reason the registerSet.GetName() returns None.
llvm-svn: 280703
Checking for the type of the command line tokenizer should not be the criteria to enable support for the CL environment variable, this change checks that we are in clang-cl mode instead.
Differential Revision: https://reviews.llvm.org/D23503
llvm-svn: 280702
Use llvm::DINode::DIFlags type (strongly typed enum) for debug flags instead of unsigned int to avoid problems on platforms with sizeof(int) < 4: we already have flags with values > (1 << 16).
Patch by: Victor Leschuk <vleschuk@gmail.com>
Differential Revision: https://reviews.llvm.org/D23767
llvm-svn: 280701
Use ADT/BitmaskEnum for DINode::DIFlags for the following purposes:
Get rid of unsigned int for flags to avoid problems on platforms with sizeof(int) < 4
Flags are now strongly typed
Patch by: Victor Leschuk <vleschuk@gmail.com>
Differential Revision: https://reviews.llvm.org/D23766
llvm-svn: 280700
Summary:
Remove access qualifiers on images in arg info metadata:
* kernel_arg_type
* kernel_arg_base_type
Image access qualifiers are inseparable from type in clang implementation,
but OpenCL spec provides a special query to get access qualifier
via clGetKernelArgInfo with CL_KERNEL_ARG_ACCESS_QUALIFIER.
Besides that OpenCL conformance test_api get_kernel_arg_info expects
image types without access qualifier.
Patch by Evgeniy Tyurin.
Reviewers: bader, yaxunl, Anastasia
Subscribers: cfe-commits
Differential Revision: https://reviews.llvm.org/D23915
llvm-svn: 280699
Summary:
In addition to not including the register operand of the current
instruction also don't include any aliasing registers. We can't consider
these as candidates because using them will clobber the corresponding
register operand of the current instruction.
This change doesn't include a test case and it would probably be difficult
to produce a stable one since the bug depends on the results of register
allocation.
Reviewers: MatzeB, qcolombet, hfinkel
Subscribers: hfinkel, llvm-commits
Differential Revision: https://reviews.llvm.org/D24130
llvm-svn: 280698
The commit introduced an array of const objects, which libstdc++ does not like. Make the object
non-const.
Also fix a compiler warning while I'm in there.
llvm-svn: 280697
We need to bitcast the index operand to a floating point type so that it matches the result type. If not then the passthru part of the DAG will be a bitcast from the index's original type to the destination type. This makes it very difficult to match. The other option would be to add 5 sets of patterns for every other possible type.
llvm-svn: 280696
It doesn't work because we're looking for a bitcast from the v4i32 index operand to v4f32 for the passthru part of the DAG. But since the index is bitcasted from v2i64 and bitcasts fold, we actually have a bitcast from v2i64 to v4f32 in the passthru part of the DAG.
Taken from optimized output from clang's test case.
llvm-svn: 280695
When a process stops due to a crash, we get the crashing instruction and the
crashing memory location (if there is one). From the user's perspective it is
often unclear what the reason for the crash is in a symbolic sense.
To address this, I have added new fuctionality to StackFrame to parse the
disassembly and reconstruct the sequence of dereferneces and offsets that were
applied to a known variable (or fuction retrn value) to obtain the invalid
pointer.
This makes use of enhancements in the disassembler, as well as new information
provided by the DWARF expression infrastructure, and is exposed through a
"frame diagnose" command. It is also used to provide symbolic information, when
available, in the event of a crash.
The algorithm is very rudimentary, and it needs a bunch of work, including
- better parsing for assembly, preferably with help from LLVM
- support for non-Apple platforms
- cleanup of the algorithm core, preferably to make it all work in terms of
Operands instead of register/offset pairs
- improvement of the GetExpressioPath() logic to make prettier expression
paths, and
- better handling of vtables.
I welcome all suggestios, improvements, and testcases.
llvm-svn: 280692
This isn't the right thing to do - it turns out a number of the APIs
that "never fail" just exit(1) if something bad happens. We can and
should thread Error through this instead.
That diff will make more sense with this reverted. Sorry for the
noise.
This reverts r280690
llvm-svn: 280691
This simplifies ListReducer and most of its subclasses by removing the
std::string &Error that was threaded through all of them but almost
never used. If we end up needing error handling in more places here we
can reinstate it using llvm::Error instead of these unwieldy strings.
The 2 cases (out of 12) that actually can hit the error cases are a
little bit awkward now, but those will clean up as I refactor this API
further.
llvm-svn: 280690
This is a Windows ARM specific issue. If the code path in the if conversion
ends up using a relocation which will form a IMAGE_REL_ARM_MOV32T, we end up
with a bundle to ensure that the mov.w/mov.t pair is not split up. This is
normally fine, however, if the branch is also predicated, then we end up trying
to predicate the bundle.
For now, report a bundle as being unpredicatable. Although this is false, this
would trigger a failure case previously anyways, so this is no worse. That is,
there should not be any code which would previously have been if converted and
predicated which would not be now.
Under certain circumstances, it may be possible to "predicate the bundle". This
would require scanning all bundle instructions, and ensure that the bundle
contains only predicatable instructions, and converting the bundle into an IT
block sequence. If the bundle is larger than the maximal IT block length (4
instructions), it would require materializing multiple IT blocks from the single
bundle.
llvm-svn: 280689
Use ADT/BitmaskEnum for DINode::DIFlags for the following purposes:
* Get rid of unsigned int for flags to avoid problems on platforms with sizeof(int) < 4
* Flags are now strongly typed
Patch by: Victor Leschuk <vleschuk@gmail.com>
Differential Revision: https://reviews.llvm.org/D23766
llvm-svn: 280686
The latest MSVC update apparently resolve the call from the
const ref variant to itself, leading to an infinite
recursion. It is not clear to me why the r-value overload is
not selected. `ValueT` is a pointer type, and the functional-style
cast in the call `insert_as(ValueT(V), LookupKey);` should result
in a r-value ref. A bug in MSVC?
Differential Revision: https://reviews.llvm.org/D23956
llvm-svn: 280685
All of the builtins are designed to be invoked with ARM AAPCS CC even on ARM
AAPCS VFP CC hosts. Tweak the default initialisation to ARM AAPCS CC rather
than C CC for ARM/thumb targets.
The changes to the tests are necessary to ensure that the calling convention for
the lowered library calls are honoured. Furthermore, these adjustments cause
certain branch invocations to change to branch-and-link since the returned value
needs to be moved across registers (d0 -> r0, r1).
llvm-svn: 280683
Summary:
Move early uses of spilled variables after CoroBegin.
For example, if a parameter had address taken, we may end up with the code
like:
define @f(i32 %n) {
%n.addr = alloca i32
store %n, %n.addr
...
call @coro.begin
This patch fixes the problem by moving uses of spilled variables after CoroBegin.
Reviewers: majnemer
Subscribers: mehdi_amini, llvm-commits
Differential Revision: https://reviews.llvm.org/D24234
llvm-svn: 280678
As Pavel pointed out in a comment on llvm.org/pr30271, the VPATH I was
using here to eliminate duplication of a .cpp file had a side effect of
attempting to pull in a .o/.obj file from that same parent dir, where
other tests can be running in parallel. This is no good.
For now, I have removed the VPATH, which should address
llvm.org/pr30271. I have also removed the XFAIL.
llvm-svn: 280675
Lots of unittests started failing under asan after r280455. It seems
they've been failing for a long time, but lit silently ignored them.
Downgrade the error so we can figure out what is going on.
Filed http://llvm.org/PR30285.
llvm-svn: 280674
The code is now written in terms of source and dest classes with feature checks inside each type of copy instead of having separate functions for each feature set.
llvm-svn: 280673
I'm in the progress of adding ARMv6 support to CloudABI. On the compiler
side, everything seems to work properly with this tiny change applied.
llvm-svn: 280672
Summary:
During building of recent compiler-rt sources on FreeBSD for arm, I
noticed that our unwind.h (which originates in libunwind) was missing
the `_US_ACTION_MASK` constant:
compiler-rt/lib/builtins/gcc_personality_v0.c:187:18: error: use of undeclared identifier '_US_ACTION_MASK'
if ((state & _US_ACTION_MASK) != _US_UNWIND_FRAME_STARTING)
^
It appears that both clang's internal unwind.h, and libgcc's unwind.h
define this constant as 3, so let's add this to libunwind's version too.
Reviewers: logan, kledzik, davide, emaste
Subscribers: joerg, davide, aemerson, emaste, llvm-commits
Differential Revision: https://reviews.llvm.org/D24222
llvm-svn: 280669
Summary:
The Intel(R) Memory Protection Extensions (Intel(R) MPX) associates pointers
to bounds, against which the software can check memory references to
prevent out of bound memory access.
This patch allows accessing the MPX registers:
* bnd0-3: 128-bit registers to hold the bound values,
* bndcfgu, bndstatus: 64-bit configuration registers,
This patch also adds read/write tests for the MPX registers in the register
command tests and adds a new subdirectory for MPX specific tests.
Signed-off-by: Valentina Giusti <valentina.giusti@intel.com>
Reviewers: labath, granata.enrico, lldb-commits, clayborg
Subscribers: lldb-commits
Differential Revision: https://reviews.llvm.org/D24187
llvm-svn: 280668