The taskloop construct specifies that the iterations of one or more associated loops will be executed in parallel using OpenMP tasks. The iterations are distributed across tasks created by the construct and scheduled to be executed.
The next code will be generated for the taskloop directive:
#pragma omp taskloop num_tasks(N) lastprivate(j)
for( i=0; i<N*GRAIN*STRIDE-1; i+=STRIDE ) {
int th = omp_get_thread_num();
#pragma omp atomic
counter++;
#pragma omp atomic
th_counter[th]++;
j = i;
}
Generated code:
task = __kmpc_omp_task_alloc(NULL,gtid,1,sizeof(struct
task),sizeof(struct shar),&task_entry);
psh = task->shareds;
psh->pth_counter = &th_counter;
psh->pcounter = &counter;
psh->pj = &j;
task->lb = 0;
task->ub = N*GRAIN*STRIDE-2;
task->st = STRIDE;
__kmpc_taskloop(
NULL, // location
gtid, // gtid
task, // task structure
1, // if clause value
&task->lb, // lower bound
&task->ub, // upper bound
STRIDE, // loop increment
0, // 1 if nogroup specified
2, // schedule type: 0-none, 1-grainsize, 2-num_tasks
N, // schedule value (ignored for type 0)
(void*)&__task_dup_entry // tasks duplication routine
);
llvm-svn: 267395
The current logic assumes that any constant global will never be SRA'd. I presume this is because normally constant globals can be pushed into their uses and deleted. However, that sometimes can't happen (which is where you really want SRA, so the elements that can be eliminated, are!).
There seems to be no reason why we can't SRA constants too, so let's do it.
llvm-svn: 267393
If several regions cover the same area of code, we have to restore
the combined value for that area when return from a nested region.
This patch achieves that by combining regions before calling buildSegments.
Differential Revision: http://reviews.llvm.org/D18610
llvm-svn: 267390
Summary:
This implements a new method of run-time checking the NoWrap
SCEV predicates, which should be easier to optimize and nicer
for targets that don't correctly handle multiplication/addition
of large integer types (like i128).
If the AddRec is {a,+,b} and the backedge taken count is c,
the idea is to check that |b| * c doesn't have unsigned overflow,
and depending on the sign of b, that:
a + |b| * c >= a (b >= 0) or
a - |b| * c <= a (b <= 0)
where the comparisons above are signed or unsigned, depending on
the flag that we're checking.
The advantage of doing this is that we avoid extending to a larger
type and we avoid the multiplication of large types (multiplying
i128 can be expensive).
Reviewers: sanjoy
Subscribers: llvm-commits, mzolotukhin
Differential Revision: http://reviews.llvm.org/D19266
llvm-svn: 267389
ADD8TLS, a variant of add instruction used for initial-exec TLS,
currently accepts r0 as a source register. While add itself supports
r0 just fine, linker can relax it to a local-exec sequence, converting
it to addi - which doesn't support r0.
Differential Revision: http://reviews.llvm.org/D19193
llvm-svn: 267388
The new relocation recently defined in the Intel386 psABI
was still missing from this file. A subsequent commit will
add support for GOT32X in MC, together with a test.
llvm-svn: 267378
The interception context is not used by esan, but the compiler complains
about it being uninitialized all the same. We set it to null to avoid the
warning.
llvm-svn: 267376
Summary:
Separate the evaluation of expressions from printing
of results. This is in preparation for splitting the
core of the interpreter out for use in alternative
interpreter frontends.
At the same time, the output is made less noisy in
response to comments on the golang-nuts announcement.
We would ideally print out values using Go syntax,
but this is impractical until we have libgo based on
Go 1.5. When that happens, fmt's %#v will handle
reflect.Value better, and so we can fix/filter type
names to remove automatically generated package names.
Reviewers: pcc
Subscribers: llvm-commits, axw
Differential Revision: http://reviews.llvm.org/D13761
llvm-svn: 267374
This option evaluates an expression and, if the result is of pointer type, treats it as if it was an array of that many elements and displays such elements
This has a couple subtle points but is mostly as straightforward as it sounds
Add a parray N <expr> alias for this new mode
Also, extend the --object-description mode to do the moral equivalent of the above but display each element in --object-description mode
Add a poarray N <expr> alias for this
llvm-svn: 267372
in a debug-info-bearing function has a debug location attached to it. Failure to
do so causes an "!dbg attachment points at wrong subprogram for function"
assertion failure when the inliner sets up inline scope info.
rdar://problem/25878916
This reaplies r267320 without changes after fixing an issue in the OpenMP IR
generator in clang.
llvm-svn: 267370
LLVM really wants a debug location on every inlinable call in a function
with debug info, because it otherwise cannot set up inlining debug info.
This change applies an artificial line 0 debug location (which is how
DWARF marks automatically generated code that has no corresponding
source code) to the .__kmpc_global_dtor_. functions to avoid the
LLVM Verifier complaining.
llvm-svn: 267369
Availablity.h is not used within config.h. The locations which use the
availability infrastructure already include the necessary header(s). NFC.
llvm-svn: 267365
Rather than use the `__assert_rtn` on libSystem based targets and a local
`assert_rtn` function on others, expand the function definition into a macro
which will perform the writing to stderr and then abort. This unifies the
definition and behaviour across targets.
Ensure that we flush stderr prior to aborting.
llvm-svn: 267364
RegisterContextLLDB::InitializeNonZerothFrame already has code to attempt
to detect and handle the case where the PC points beyond the end of a
function, but there are certain cases where this doesn't work correctly.
In fact, there are *two* different places where this detection is attempted,
and the failure is in fact a result of an unfortunate interaction between
those two separate attempts.
First, the ResolveSymbolContextForAddress routine is called with the
resolve_tail_call_address flag set to true. This causes the routine
to internally accept a PC pointing beyond the end of a function, and
still resolving the PC to that function symbol.
Second, the InitializeNonZerothFrame routine itself maintains a
"decr_pc_and_recompute_addr_range" flag and, if that turns out to
be true, itself decrements the PC by one and searches again for
a symbol at that new PC value.
Both approaches correctly identify the symbol associated with the PC.
However, the problem is now that later on, we also need to find the
DWARF CFI record associated with the PC. This is done in the
RegisterContextLLDB::GetFullUnwindPlanForFrame routine, and uses
the "m_current_offset_backed_up_one" member variable.
However, that variable only actually contains the PC "backed up by
one" if the *second* approach above was taken. If the function was
already identified via the first approach above, that member variable
is *not* backed up by one but simply points to the original PC.
This in turn causes GetEHFrameUnwindPlan to not correctly identify
the DWARF CFI record associated with the PC.
Now, in many cases, if the first method had to back up the PC by one,
we *still* use the second method too, because of this piece of code:
// Or if we're in the middle of the stack (and not "above" an asynchronous event like sigtramp),
// and our "current" pc is the start of a function...
if (m_sym_ctx_valid
&& GetNextFrame()->m_frame_type != eTrapHandlerFrame
&& GetNextFrame()->m_frame_type != eDebuggerFrame
&& addr_range.GetBaseAddress().IsValid()
&& addr_range.GetBaseAddress().GetSection() == m_current_pc.GetSection()
&& addr_range.GetBaseAddress().GetOffset() == m_current_pc.GetOffset())
{
decr_pc_and_recompute_addr_range = true;
}
In many cases, when the PC is one beyond the end of the current function,
it will indeed then be exactly at the start of the next function. But this
is not always the case, e.g. if there happens to be alignment padding
between the end of one function and the start of the next.
In those cases, we may sucessfully look up the function symbol via
ResolveSymbolContextForAddress, but *not* set decr_pc_and_recompute_addr_range,
and therefore fail to find the correct DWARF CFI record.
A very simple fix for this problem is to just never use the first method.
Call ResolveSymbolContextForAddress with resolve_tail_call_address set
to false, which will cause it to fail if the PC is beyond the end of
the current function; or else, identify the next function if the PC
is also at the start of the next function. In either case, we will
then set the decr_pc_and_recompute_addr_range variable and back up the
PC anyway, but this time also find the correct DWARF CFI.
A related problem is that the ResolveSymbolContextForAddress sometimes
returns a "symbol" with empty name. This turns out to be an ELF section
symbol. Now, usually those get type eSymbolTypeInvalid. However, there
is code in ObjectFileELF::ParseSymbols that tries to change the type of
invalid symbols to eSymbolTypeCode or eSymbolTypeData if the symbol
lies within the code or data section.
Unfortunately, this check also hits the symbol for the code section
itself, which is then marked as eSymbolTypeCode. While the size of
the section symbol is 0 according to the ELF file, LLDB considers
this size invalid and attempts to figure out the "correct" size.
Depending on how this goes, we may end up with a symbol that overlays
part of the code section, even outside areas covered by real function
symbols.
Therefore, if we call ResolveSymbolContextForAddress with PC pointing
beyond the end of a function, we may get this bogus section symbol.
This again means InitializeNonZerothFrame thinks we have a valid PC,
but then we don't find any unwind info for it.
The fix for this problem is me to simply always leave ELF section
symbols as type eSymbolTypeInvalid.
Differential Revision: http://reviews.llvm.org/D18975
llvm-svn: 267363
This corrects the MI annotations for the stack adjustment following the __chkstk
invocation. We were marking the original SP usage as a Def rather than Kill.
The (new) assigned value is the definition, the original reference is killed.
Adjust the ISelLowering to mark Kills and FrameSetup as well.
This partially resolves PR27480.
llvm-svn: 267361
As discussed on D19318, if we only demand the first element of a DIVSS/DIVSD intrinsic, then reduce to a FDIV call. This matches the existing FADD/FSUB/FMUL patterns.
llvm-svn: 267359
Split from D17490. This patch improves support for determining the demanded vector elements through SSE scalar intrinsics:
1 - demanded vector element support for unary and some extra binary scalar intrinsics (RCP/RSQRT/SQRT/FRCZ and ADD/CMP/DIV/ROUND).
2 - addss/addsd get simplified to a fadd call if we aren't interested in the pass through elements
3 - if we don't need the lowest element of a scalar operation then just use the first argument (the pass through elements) directly
We can add support for propagating demanded elements through any equivalent packed SSE intrinsics in a future patch (these wouldn't use the pass through patterns).
Differential Revision: http://reviews.llvm.org/D19318
llvm-svn: 267357
This patch improves support for determining the demanded vector elements through SSE scalar intrinsics:
1 - recognise that we only need the lowest element of the second input for binary scalar operations (and all the elements of the first input)
2 - recognise that the roundss/roundsd intrinsics use the lowest element of the second input and the remaining elements from the first input
Differential Revision: http://reviews.llvm.org/D17490
llvm-svn: 267356
We aren't currently making use of this in any successful mask decode and its actually incorrect as it inserts the wrong number of SM_SentinelUndef mask elements.
llvm-svn: 267350
There's hardly any functionality change here. Instead of calling
materializeMetadata on the first call to materialize(GlobalValue*), wait
until the first one that's actually going to do something. Noticed by
inspection; I don't have a concrete case where this makes a difference.
Added an assertion in materializeMetadata to be sure this (or a future
change) doesn't delay materializeMetadata after function-level metadata.
llvm-svn: 267345