Implicit derivatives are only valid in pixel shaders,
hence only implicitly enable WQM for pixel shaders.
This avoids unintended WQM in other shader types (e.g. compute)
when image sampling instructions are used.
Reviewed By: rampitec
Differential Revision: https://reviews.llvm.org/D114414
This is similar to D113574, but as a DAG combine, not tablegen patterns.
Doing the fold as a DAG combine allows the fadd to be folded with a
fmul, finally producing a predicated vfma. It performs the same fold of
fadd(x, vselect(p, y, -0.0)) to vselect p, (fadd x, y), x) using -0.0 as
the identity value of a fadd.
Differential Revision: https://reviews.llvm.org/D113584
In quite a few places we were calling getCurSDLoc() to get the debug
location, but this is already a local variable `sdl`.
Differential Revision: https://reviews.llvm.org/D114447
It appears that we can emit all the instructions for a function, including
debug instructions, and then optimise some of the values out late.
Specifically, in the attached test case, an argument gets optimised out
after DBG_VALUE / DBG_INSTR_REFs are created. This confuses
MachineFunction::finalizeDebugInstrRefs, which expects to be able to find a
defining instruction, and crashes instead.
Fix this by identifying when there's no defining instruction, and
translating that instead into a DBG_VALUE $noreg.
Differential Revision: https://reviews.llvm.org/D114476
Similar to D84091 which added extra predicated folds for integer operations
using the identity element of the operation, this adds them for floating
point operations for the form `BinOp(x, select(p, y, Identity))`. They are
folded back to predicated versions of the operator, with fadd having the
identity -0.0, fsub using the identity 0.0 and fmul using 1.0.
Differential Revision: https://reviews.llvm.org/D113574
Test size larger than clear_shadow_mmap_threshold,
which is handled differently.
Depends on D114348.
Reviewed By: vitalybuka
Differential Revision: https://reviews.llvm.org/D114366
This patch adds parallel processing of chunks. When reducing very large
inputs, e.g. functions with 500k basic blocks, processing chunks in
parallel can significantly speed up the reduction.
To allow modifying clones of the original module in parallel, each clone
needs their own LLVMContext object. To achieve this, each job parses the
input module with their own LLVMContext. In case a job successfully
reduced the input, it serializes the result module as bitcode into a
result array.
To ensure parallel reduction produces the same results as serial
reduction, only the first successfully reduced result is used, and
results of other successful jobs are dropped. Processing resumes after
the chunk that was successfully reduced.
The number of threads to use can be configured using the -j option.
It defaults to 1, which means serial processing.
Reviewed By: Meinersbur
Differential Revision: https://reviews.llvm.org/D113857
This patch updates the cost model for ordered reductions so that a call
to the llvm.fmuladd intrinsic is modelled as a normal fmul instruction
plus the cost of an ordered fadd reduction.
Differential Revision: https://reviews.llvm.org/D111630
In-loop vector reductions which use the llvm.fmuladd intrinsic involve
the creation of two recipes; a VPReductionRecipe for the fadd and a
VPInstruction for the fmul. If the call to llvm.fmuladd has fast-math flags
these should be propagated through to the fmul instruction, so an
interface setFastMathFlags has been added to the VPInstruction class to
enable this.
Differential Revision: https://reviews.llvm.org/D113125
`isConstRefReturningMethodCall` should be considering
`CXXOperatorCallExpr` in addition to `CXXMemberCallExpr`. Clang considers
these to be distinct (`CXXOperatorCallExpr` derives from `CallExpr`, not
`CXXMemberCallExpr`), but we don't care in the context of this
check.
This is important because of
`std::vector<Expensive>::operator[](size_t) const`.
Differential Revision: https://reviews.llvm.org/D114249
The existing constrained shift PatFrags only dealt with masked shift
from OpenCL front-ends. This change copies the
X86DAGToDAGISel::isUnneededShiftMask() function to AMDGPU and uses it in
the shift PatFrag predicates.
Differential Revision: https://reviews.llvm.org/D113448
The attribute 'r' allows (or disallows for the negative case) read-only
sections, i.e. ones without the SHF_WRITE flag, to be assigned to the
memory region. Before the patch, lld could put a section in the wrong
region or fail with "error: no memory region specified for section".
Differential Revision: https://reviews.llvm.org/D113771
Remove duplicate `Pass` suffix from view-op-graph pass class name. The
extra suffix would lead to methods like registerViewOpGraphPassPass
being generated.
Differential Revision: https://reviews.llvm.org/D114459
Also replace ArrayAttr with IndexElementsAttr to model subscript dimensions.
An array of attribute is a sparse inefficient storage, with an API that
requires to unpack/repack integers at every call site.
Instead we can store dense array of integer as IndexElementsAttr.
Reviewed By: clementval, kiranchandramohan
Differential Revision: https://reviews.llvm.org/D112899
I don't see a reason why not to. If we allows lookup functions by full names,
I can change the test case in D113930 to use `lldb-test symbols --find=function --name=full::name --function-flags=full ...`,
though the duplicate method decl prolem is still there for `lldb-test symbols --dump-ast`.
That's a seprate bug, we can fix it later.
Differential Revision: https://reviews.llvm.org/D114467
Nothing uses more than 8bit now. So the rest of the headers can store other data.
kStackTraceMax is 256 now, but all sanitizers by default store just 20-30 frames here.
Verified no diff exist between previous version, new version python 2, and python 3 for an example stack.
Reviewed By: eugenis
Differential Revision: https://reviews.llvm.org/D114404
Similarly to what GCC does, we should allow scalars with
the "v" constraint rather than introducing unnecessary
new constraints for scalars in Altivec registers.
Differential revision: https://reviews.llvm.org/D113635
AMDGPU is unusual in that the both stack is indexed in the same
direction as stack growth (up). We therefore always need the emergency
stack slots placed as low as possible to ensure they are in range of
load/store instruction immediate offsets. The existing logic is mostly
OK, but failed if we required stack realignment.
I don't understand what the existing control isFPCloseToIncomingSP is
supposed to mean, but can only be used to stop placing the scavenge
slots earlier. Make this explicit so that targets can opt-in rather
than opt-out only.
This diff is adding the capping_size determination for the list and forward list, to limit the number of children to be displayed. Also it modifies and unifies tests for libcxx and libstdcpp list data formatter.
Reviewed By: wallace
Differential Revision: https://reviews.llvm.org/D114433
This diff is avoiding the size limitation introduced by the capping size for the libcxx and libcpp bitset data formatters.
Reviewed By: wallace
Differential Revision: https://reviews.llvm.org/D114461
We need to add checks that ensure that some core variables are valid, so
that we avoid printing out garbage data. The worst that could happen is
that an non-initialized variable is being printed as something with
123123432 children instead of 0.
Differential Revision: https://reviews.llvm.org/D114458
As suggested by @labath in https://reviews.llvm.org/D114403, we should
make the formatter more resilient to corrupted data. The Libcxx version
explicitly checks for engaged = 1, so we can do that as well for safety.
Differential Revision: https://reviews.llvm.org/D114450
(a & b) ^ (~a | b) --> ~a
I was looking for a shortcut to reduce some of the complex logic
folds that are currently up for review (D113216
and others in that stack), and I found this missing from
instcombine/instsimplify.
There is a trade-off in putting it into instsimplify: because
we can't create new values here, we need a strict 'not' op (no
undef elements). Otherwise, the fold is not valid:
https://alive2.llvm.org/ce/z/k_AGGj
If this was in instcombine instead, we could create the proper
'not'. But having the fold here benefits other passes like GVN
that use instsimplify as an analysis.
There is a related fold where 'and' and 'or' are swapped, and
that is planned as a follow-up commit.
Differential Revision: https://reviews.llvm.org/D114462