For a function call (using the default `-fplt`), GCC `-mcmodel=large` generates an assembly modifier which
leads to an R_X86_64_PLTOFF64 relocation. In real world,
http://git.ageinghacker.net/jitter (used by GNU poke) uses `-mcmodel=large`.
R_X86_64_PLTOFF64's formula is (if preemptible) `L - GOT + A` or (if non-preemptible) `S - GOT + A`
where `GOT` is (confusingly) the address of `.got.plt`
Reviewed By: peter.smith
Differential Revision: https://reviews.llvm.org/D112386
Taken from Chih-Mao Chen's D100835.
RelExpr has 64 bits now and needs the extension to support new members
(`R_PLT_GOTPLT` for `R_X86_64_PLTOFF64` support).
Note: RelExpr needs to have at least a member >=64 to prevent
-Wtautological-constant-out-of-range-compare for `if (expr >= 64)`.
Reviewed By: arichardson, peter.smith
Differential Revision: https://reviews.llvm.org/D112385
Using callbacks for allocation/deallocation allows users to override
the default.
Also add an option to comprehensive bufferization pass to use `alloca`
instead of `alloc`s. Note that this option is just for testing. The
option to use `alloca` does not work well with the option to allow for
returning memrefs.
The new module stats adds the ability to measure the time it takes to parse and index the symbol tables for each module, and reports modules statistics in the output of "statistics dump" along with the path, UUID and triple of the module. The time it takes to parse and index the symbol tables are also aggregated into new top level key/value pairs at the target level.
Differential Revision: https://reviews.llvm.org/D112279
There is no need to check for deferred diag when device compilation or target is
not given. This results in considerable build time improvement in some cases.
Differential Revision: https://reviews.llvm.org/D109175
Declarations of 5.1 atomic entries were added under
"#if KMP_ARCH_X86 || KMP_ARCH_X86_64" in kmp_atomic.h,
but definitions of the functions missed architecture guard in kmp_atomic.cpp.
As a result mangled symbols were available on non-x86 architecture.
The patch eliminates these unexpected symbols from the library.
Differential Revision: https://reviews.llvm.org/D112261
Now that AddRegister() is no longer used, remove it. While at it,
we can also make Finalize() protected as all supported API methods
call it internally.
Differential Revision: https://reviews.llvm.org/D111498
HardcodeARMRegisters() is a hack that was supposed to be used "until
we can get an updated debugserver down on the devices". Since it was
introduced back in 2012, there is a good chance that the debugserver
has been updated at least once since then. Removing this code makes
transition to the new DynamicRegisterInfo API easier.
Differential Revision: https://reviews.llvm.org/D111491
MutexSet is too large to be allocated on stack.
But we need local MutexSet objects in few places
and use various hacks to allocate them.
Add DynamicMutexSet helper that simplifies allocation
of such objects.
Reviewed By: melver
Differential Revision: https://reviews.llvm.org/D112449
Even though tensor.cast is not part of the sparse tensor dialect,
it may be used to cast static dimension sizes to dynamic dimension
sizes for sparse tensors without changing the actual sparse tensor
itself. Those cases should be lowered properly when replacing sparse
tensor types with their opaque pointers. Likewise, no op sparse
conversions are handled by this revision in a similar manner.
Reviewed By: bixia
Differential Revision: https://reviews.llvm.org/D112173
Changing the array type printing from `int [N]` to `int[N]` impacts lldb
pretty printer registration & may need to be updated to handle the
dropped space between type and dimensions.
There are a few STL containers hanging around that can become DenseMaps,
SmallVectors and similar. This recovers a modest amount of compile time
performance.
While I'm here, adjust the bit layout of ValueIDNum: this was always
supposed to act like a value type, however it seems that clang doesn't
compile the comparison functions to act that way. Add a uint64_t to a
union that explicitly aliases the bitfields, so that we can compare the
whole value as a single integer.
Differential Revision: https://reviews.llvm.org/D112333
Adds initial parsing and sema for the 'append_args' clause.
Note that an AST clause is not created as it instead adds its values
to the OMPDeclareVariantAttr.
Differential Revision: https://reviews.llvm.org/D111854
This patch has couple of small changes to clean-up FIROps.td.
- Wrap lines that are longer than 80.
- All parser, verifier and printer that are single line are wrapped with double
quotes.
- Couple of small typos.
Reviewed By: AlexisPerry
Differential Revision: https://reviews.llvm.org/D112436
The fir_Dialect definition was coming silently through FIRTypes.td.
Make the include of flang/Optimizer/Dialect/FIRDialect.td explicit in
FIROps.td and move MLIR includes to FIRDialect.td.
This patch is part of the upstreaming effort from fir-dev branch.
Reviewed By: AlexisPerry
Differential Revision: https://reviews.llvm.org/D112418
Co-authored-by: Eric Schweitz <eschweitz@nvidia.com>
This patch is like D111627 -- instead of calculating IDF for every location
on the stack, only do it for the smallest units of interference, and copy
the PHIs for those units to any aliases.
The test added runs placeMLocPHIs directly, and tests that:
* A def of the lower 8 bits of a stack slot causes all aliasing regs to
have PHIs placed,
* It doesn't cause the equivalent location to x86's $ah, which isn't
aliased, to have a PHI placed.
Differential Revision: https://reviews.llvm.org/D112324
The recently added logic to canonicalize exit conditions to unsigned relies on facts which hold about the use (i.e. exit test). Applying this blindly to the icmp is not legal, as there may be another use which never reaches the exit. Restrict ourselves to case where we have a single use.
__ompt_get_task_info_internal function is adapted to support thread_num
determination during the execution of multiple nested serialized
parallel regions enclosed by a regular parallel region.
Consider the following program that contains parallel region R1 executed
by two threads. Let the worker thread T of region R1 executes serialized
parallel regions R2 that encloses another serialized parallel region R3.
Note that the thread T is the master thread of both R2 and R3 regions.
Assume that __ompt_get_task_info_internal function is called with the
argument "ancestor_level == 1" during the execution of region R3.
The function should determine the "thread_num" of the thread T inside
the team of region R2, whose implicit task is at level 1 inside the
hierarchy of active tasks. Since the thread T is the master thread of
region R2, one should expected that "thread_num" takes a value 0.
After the while loop finishes, the following stands: "lwt != NULL",
"prev_lwt == NULL", "prev_team" represents the team information about
the innermost serialized parallel region R3. This results in executing
the assignment "thread_num = prev_team->t.t_master_tid". Note that
"prev_team->t.t_master_tid" was initialized at the moment of
R2’s creation and represents the "thread_num" of the thread T inside
the region R1 which encloses R2. Since the thread T is the worker thread
of the region R1, "the thread_num" takes value 1, which is a contradiction.
This patch proposes to use "lwt" instead of "prev_lwt" when determining
the "thread_num". If "lwt" exists, the task at the requested level belongs
to the serialized parallel region. Since the serialized parallel region
is executed by one thread only, the "thread_num" takes value 0.
Similarly, assume that __ompt_get_task_info_internal function is called
with the argument "ancestor_level == 2" during the execution of region R3.
The function should determine the "thread_num" of the thread T inside the
team of region R1. Since the thread is the worker inside the region R1,
one should expected that "thread_num" takes value 1. After the loop finishes,
the following stands: "lwt == NULL", "prev_lwt != NULL", "prev_team" represents
the team information about the innermost serialized parallel region R3.
This leads to execution of the assignment "thread_num = 0", which causes
a contradiction.
Ignoring the "prev_lwt" leads to executing the assignment
"thread_num = prev_team->t.t_master_tid" instead. From the previous explanation,
it is obvious that "thread_num" takes value 1.
Note that the "prev_lwt" variable is marked as unnecessary and thus removed.
This patch introduces the test case which represents the OpenMP program
described earlier in the summary.
Differential Revision: https://reviews.llvm.org/D110699
__kmp_fork_call sets the enter_frame of the active task (th_curren_task)
before new parallel region begins. After the region is finished, the
enter_frame is cleared.
The old implementation of __kmpc_fork_call didn’t clear the enter_frame of
active task.
Also, the way of initializing the enter_frame of the active task was wrong.
Consider the following two OpenMP programs.
The first program: Let R1 be the serialized parallel region that encloses
another serialized parallel region R2. Assume that thread that executes R2 is
going to create a new serialized parallel region R3 by executing
__kmpc_fork_call. This thread is responsible to set enter_frame of R2's
implicit task. Note that the information about R2's implicit task is present
inside master_th->th.th_current_task at this moment, while lwt represents the
information about R1's implicit task. The old implementation uses lwt and
resets enter_frame of R1's implicit task instead of R2's implicit task. The
new implementation uses master_th->th.th_current_task instead.
The second program: Consider the OpenMP program that contains parallel region
R1 which encloses an explicit task T. Assume that thread should create another
parallel region R2 during the execution of the T. The __kmpc_fork_call is
responsible to create R2 and set enter frame of T whose information is present
inside the master_th->th.th_current_task.
Old implementation tries to set the frame of
parent_team->t.t_implicit_task_taskdata[tid] which corresponds to the implicit
task of the R1, instead of T.
Differential Revision: https://reviews.llvm.org/D112419
As discussed in D108488, testing for invariants of omp_get_wtime would be more
reliable than testing for duration of sleep, as return from sleep might be
delayed due to system load.
Alternatively/in addition, we could compare the time measured by omp_get_wtime
to time measured with C++11 chrono (for portability?).
Differential Revision: https://reviews.llvm.org/D112458
The CHECK: line in the test had no effect, because the test does not
pipe to FileCheck. Since the test only checks for a single value,
encode the result in the return value of the test.
Where possible change to declare the variable before the loop.
Where not possible, specifically request -std=c99 (could be limited to
specific compilers like icc).
* clang-format test source.
* Removed the dead setup code.
* Using expect_expr etc. instead of raw expect.
* Slightly expanded with tests for vtable pointers (which mostly just crash atm.)
* Removed some other minor test guideline problems.
All but 2 of the vector builtins are only used by clang_builtin_alias.
When using clang_builtin_alias, the type string of the builtin is never
checked. Only the types in the function definition used for the alias
are checked.
This patch takes advantage of this to share a single builtin for
many different types. We already used type overloads on the IR intrinsic
so the codegen for the builtins that are being merge were already
the same. This extends the type overloading to the builtins.
I had to make a few tweaks to make this work.
-Floating point vector-vector vmerge now uses the vmerge intrinsic
instead of the vfmerge intrinsic. New isel patterns and tests are
added to support this.
-The SemaChecking for the immediate of vset_v/vget_v has been removed.
Determining the valid range is harder now. I've added masking to
ManualCodegen to ensure valid IR for invalid input.
This reduces the number of builtins from ~25000 to ~1100.
Reviewed By: HsiangKai
Differential Revision: https://reviews.llvm.org/D112102
Using callbacks for allocation/deallocation allows users to override
the default.
Also add an option to comprehensive bufferization pass to use `alloca`
instead of `alloc`s. Note that this option is just for testing. The
option to use `alloca` does not work well with the option to allow for
returning memrefs.
Differential Revision: https://reviews.llvm.org/D112166
Several parts in the `chrono` synopsis for C++20 are not yet
implemented. The current recommendation is that things are added to the
synopsis when implemented -- not beforehand. As such, remove the
not-yet-implemented parts to avoid confusion.
Reviewed By: ldionne, Quuxplusone, #libc
Differential Revision: https://reviews.llvm.org/D111922
Also fix a few places in the `shared_ptr` implementation where
`element_type` was passed to the `__is_compatible` helper. This could
result in `remove_extent` being applied twice to the pointer's template
type (first by the definition of `element_type` and then by the helper),
potentially leading to somewhat less readable error messages for some
incorrect code.
Differential Revision: https://reviews.llvm.org/D112092
Several of our C++20 and C++2b papers were missing the actual revision
number that was voted in to the Standard. The revision number is quite
important because in a few cases, a paper has a revision *after* the
one that is voted into the Standard, which isn't voted into the Standard.
Hence, if we simply followed the wg21.link blindly and implemented that,
we'd end up implementing the latest revision of the paper, which might
not have been voted.
As a fly-by fix, I found out that P1664 had been withdrawn from the
straw polls and had never been voted into the Standard. This commit
removes that entry from our list.
Differential Revision: https://reviews.llvm.org/D112339