Currently, BPF has XADD (locked add) insn support and the
asm looks like:
lock *(u32 *)(r1 + 0) += r2
lock *(u64 *)(r1 + 0) += r2
The instruction itself does not have a return value.
At the source code level, users often use
__sync_fetch_and_add()
which eventually translates to XADD. The return value of
__sync_fetch_and_add() is supposed to be the old value
in the xadd memory location. Since BPF::XADD insn does not
support such a return value, this patch added a PreEmit
phase to check such a usage. If such an illegal usage
pattern is detected, a fatal error will be reported like
line 4: Invalid usage of the XADD return value
if compiled with -g, or
Invalid usage of the XADD return value
if compiled without -g.
Signed-off-by: Yonghong Song <yhs@fb.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
llvm-svn: 342692
Summary: Adds the necessary support to lib/ObjectYAML and fixes SIMD
calls to allow the tests to work. Also removes some dead code that
would otherwise have to have been updated.
Reviewers: aheejin, dschuff, sbc100
Subscribers: jgravelle-google, sunfish, llvm-commits
Differential Revision: https://reviews.llvm.org/D52105
llvm-svn: 342689
The suffix tree won't ever consider sequences with a length less than 2.
Therefore, we really ought to not even consider them in the first place.
Also add a FIXME explaining that this should be defined in terms of the size
in B of an outlined call versus the size in B of the MBB.
llvm-svn: 342688
Summary:
AvailableExternal was not handled in isDiscardableIfUnused when isDiscardableIfUnused
was added in r158476. Till it was handled in r247044. This is a NFC.
Reviewers: pcc, tejohnson
Reviewed By: tejohnson
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D52319
llvm-svn: 342684
tryLocalSplit only handles a single use block, but an interval may
have multiple use blocks. So don't crash in that case. This fixes
PR38795.
Differential revision: https://reviews.llvm.org/D52277
llvm-svn: 342682
r342631 expanded bitc::METADATA_LOCATION by one element. The bitcode
metadata loader was changed in a backwards-incompatible way, leading to
crashes when disassembling old bitcode:
assertion: empty() && "PlaceholderQueue hasn't been flushed before being destroyed"
Assertion failed: (empty() && "PlaceholderQueue hasn't been flushed before being destroyed")
This commit teaches the metadata loader to assume that the newly-added
IsImplicitCode bit is 'false' when not present in old bitcode. I've added a
bitcode compat regression test.
rdar://44645820
llvm-svn: 342678
When you create an outlined function, you know everything you need to know
to decide if debug info should be created. If we emit debug info in
createOutlinedFunction, then we don't need to keep track of every IR function
we create.
llvm-svn: 342677
Summary:
rL323619 marks functions that are calling va_end as not viable for
inlining. This patch reverses that since this va_end doesn't need
access to the vriadic arguments list that are saved on the stack, only
va_start does.
Reviewers: efriedma, fhahn
Reviewed By: fhahn
Subscribers: eraman, haicheng, llvm-commits
Differential Revision: https://reviews.llvm.org/D52067
llvm-svn: 342675
x86 had 2 versions of peekThroughBitcast. DAGCombiner had 1. Plus, it had a 1-off implementation for the one-use variant.
Move the x86 versions of the code to SelectionDAG, so we don't have different copies of the code.
No functional change intended.
I'm putting this next to isBitwiseNot() because I am planning to use it in there. Another option is next to the
helpers in the ISD namespace (eg, ISD::isConstantSplatVector()). But if there's no good reason for those to be
there, I'd prefer to pull other helpers over to SelectionDAG in follow-up steps.
Differential Revision: https://reviews.llvm.org/D52285
llvm-svn: 342669
Pass Execution Instrumentation interface enables customizable instrumentation
of pass execution, as per "RFC: Pass Execution Instrumentation interface"
posted 06/07/2018 on llvm-dev@
The intent is to provide a common machinery to implement all
the pass-execution-debugging features like print-before/after,
opt-bisect, time-passes etc.
Here we get a basic implementation consisting of:
* PassInstrumentationCallbacks class that handles registration of callbacks
and access to them.
* PassInstrumentation class that handles instrumentation-point interfaces
that call into PassInstrumentationCallbacks.
* Callbacks accept StringRef which is just a name of the Pass right now.
There were some ideas to pass an opaque wrapper for the pointer to pass instance,
however it appears that pointer does not actually identify the instance
(adaptors and managers might have the same address with the pass they govern).
Hence it was decided to go simple for now and then later decide on what the proper
mental model of identifying a "pass in a phase of pipeline" is.
* Callbacks accept llvm::Any serving as a wrapper for const IRUnit*, to remove direct dependencies
on different IRUnits (e.g. Analyses).
* PassInstrumentationAnalysis analysis is explicitly requested from PassManager through
usual AnalysisManager::getResult. All pass managers were updated to run that
to get PassInstrumentation object for instrumentation calls.
* Using tuples/index_sequence getAnalysisResult helper to extract generic AnalysisManager's extra
args out of a generic PassManager's extra args. This is the only way I was able to explicitly
run getResult for PassInstrumentationAnalysis out of a generic code like PassManager::run or
RepeatedPass::run.
TODO: Upon lengthy discussions we agreed to accept this as an initial implementation
and then get rid of getAnalysisResult by improving RepeatedPass implementation.
* PassBuilder takes PassInstrumentationCallbacks object to pass it further into
PassInstrumentationAnalysis. Callbacks registration should be performed directly
through PassInstrumentationCallbacks.
* new-pm tests updated to account for PassInstrumentationAnalysis being run
* Added PassInstrumentation tests to PassBuilderCallbacks unit tests.
Other unit tests updated with registration of the now-required PassInstrumentationAnalysis.
Made getName helper to return std::string (instead of StringRef initially) to fix
asan builtbot failures on CGSCC tests.
Reviewers: chandlerc, philip.pfaffe
Differential Revision: https://reviews.llvm.org/D47858
llvm-svn: 342664
Summary:
The goal of this patch is to have the same behaviour than gcc-gcov.
Currently the hit counts for a line is the sum of the counts for each block on that line.
The idea is to detect the cycles in the graph of blocks in using the algorithm by Hawick & James.
The count for a cycle is the min of the counts for each edge in the cycle.
Once we've the count for each cycle, we can sum them and add the transition counts of those cycles.
Fix both https://bugs.llvm.org/show_bug.cgi?id=38065 and https://bugs.llvm.org/show_bug.cgi?id=38066
Reviewers: marco-c, davidxl
Reviewed By: marco-c
Subscribers: vsk, lebedev.ri, sylvestre.ledru, dblaikie, llvm-commits
Differential Revision: https://reviews.llvm.org/D49659
llvm-svn: 342657
Some records point to an LF_CLASS, LF_UNION, LF_STRUCTURE, or LF_ENUM
which is a forward reference and doesn't contain complete debug
information. In these cases, we'd like to be able to quickly locate the
full record. The TPI stream stores an array of pre-computed record hash
values, one for each type record. If we pre-process this on startup, we
can build a mapping from hash value -> {list of possible matching type
indices}. Since hashes of full records are only based on the name and or
unique name and not the full record contents, we can then use forward
ref record to compute the hash of what *would* be the full record by
just hashing the name, use this to get the list of possible matches, and
iterate those looking for a match on name or unique name.
llvm-pdbutil is updated to resolve forward references for the purposes
of testing (plus it's just useful).
Differential Revision: https://reviews.llvm.org/D52283
llvm-svn: 342656
This is a trivial refactoring that I'm committing now as it makes a patch I'm
about to post for review easier to follow. There is some overlap between
evaluateConstantImm and addExpr in RISCVAsmParser. This patch allows
evaluateConstantImm to be reused from addExpr to remove this overlap. The
benefit will be greater when a future patch adds extra code to allows
immediates to be evaluated from constant symbols (e.g. `.equ CONST, 0x1234`).
No functional change intended.
llvm-svn: 342641
Currently, we emit DW_AT_addr_base that points to the beginning of
the .debug_addr section. That is not correct for the DWARF5 case because address
table contains the header and the attribute should point to the first entry
following the header.
This is currently the reason why LLDB does not work with such executables correctly.
Patch fixes the issue.
Differential revision: https://reviews.llvm.org/D52168
llvm-svn: 342635
Summary:
Before removing basic blocks that ipsccp has considered as dead
all uses of the basic block label must be removed. That is done
by calling ConstantFoldTerminator on the users. An exception
is when the branch condition is an undef value. In such
scenarios ipsccp is using some internal assumptions regarding
which edge in the control flow that should remain, while
ConstantFoldTerminator don't know how to fold the terminator.
The problem addressed here is related to ConstantFoldTerminator's
ability to rewrite a 'switch' into a conditional 'br'. In such
situations ConstantFoldTerminator returns true indicating that
the terminator has been rewritten. However, ipsccp treated the
true value as if the edge to the dead basic block had been
removed. So the code for resolving an undef branch condition
did not trigger, and we ended up with assertion that there were
uses remaining when deleting the basic block.
The solution is to resolve indeterminate branches before the
call to ConstantFoldTerminator.
Reviewers: efriedma, fhahn, davide
Reviewed By: fhahn
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D52232
llvm-svn: 342632
Summary:
Some lines have a hit counter where they should not have one.
For example, in C++, some cleanup is adding at the end of a scope represented by a '}'.
So such a line has a hit counter where a user expects to not have one.
The goal of the patch is to add this information in DILocation which is used to get the covered lines in GCOVProfiling.cpp.
A following patch in clang will add this information when generating IR (https://reviews.llvm.org/D49916).
Reviewers: marco-c, davidxl, vsk, javed.absar, rnk
Reviewed By: rnk
Subscribers: eraman, xur, danielcdh, aprantl, rnk, dblaikie, #debug-info, vsk, llvm-commits, sylvestre.ledru
Tags: #debug-info
Differential Revision: https://reviews.llvm.org/D49915
llvm-svn: 342631
Examples such as `jal a3`, `j a3` and `jal a3, a3` are accepted by gas
but rejected by LLVM MC. This patch rectifies this. I introduce
RISCVAsmParser::parseJALOffset to ensure that symbol names that coincide with
register names can safely be parsed. This is made a somewhat fiddly due to the
single-operand alias form (see the comment in parseJALOffset for more info).
Differential Revision: https://reviews.llvm.org/D52029
llvm-svn: 342629
Summary:
Consider an instruction that has multiple defs of the same
vreg, but defining different subregs:
%7.sub1:rc, dead %7.sub2:rc = inst
Calling checkLivenessAtDef for the live interval associated
with %7 incorrectly reported "live range continues after a
dead def". The live range for %7 has a dead def at the slot
index for "inst" even if the live range continues (given that
there are later uses of %7.sub1).
This patch adjusts MachineVerifier::checkLivenessAtDef
to allow dead subregister definitions, unless we are checking
a subrange (when tracking subregister liveness).
A limitation is that we do not detect the situation when the
live range continues past an instruction that defines the
full virtual register by multiple dead subreg defines.
I also removed some dead code related to physical register
in checkLivenessAtDef. Wwe only call that method for virtual
registers, so I added an assertion instead.
Reviewers: kparzysz
Reviewed By: kparzysz
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D52237
llvm-svn: 342618
Building a vector out of multiple loads can be converted to a load of the vector type if the loads are consecutive.
But the special condition is that the element number is 1, such as <1 x i128>. So just early exit to fix the assert.
Patch By: wuzish (Zixuan Wu)
Differential Revision: https://reviews.llvm.org/D52072
llvm-svn: 342611
Summary:
This change leaves holes in the opcode space where missing
instructions could logically be added later if they were found to be
useful.
Reviewers: aheejin, dschuff
Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits
Differential Revision: https://reviews.llvm.org/D52282
llvm-svn: 342610
Pass Execution Instrumentation interface enables customizable instrumentation
of pass execution, as per "RFC: Pass Execution Instrumentation interface"
posted 06/07/2018 on llvm-dev@
The intent is to provide a common machinery to implement all
the pass-execution-debugging features like print-before/after,
opt-bisect, time-passes etc.
Here we get a basic implementation consisting of:
* PassInstrumentationCallbacks class that handles registration of callbacks
and access to them.
* PassInstrumentation class that handles instrumentation-point interfaces
that call into PassInstrumentationCallbacks.
* Callbacks accept StringRef which is just a name of the Pass right now.
There were some ideas to pass an opaque wrapper for the pointer to pass instance,
however it appears that pointer does not actually identify the instance
(adaptors and managers might have the same address with the pass they govern).
Hence it was decided to go simple for now and then later decide on what the proper
mental model of identifying a "pass in a phase of pipeline" is.
* Callbacks accept llvm::Any serving as a wrapper for const IRUnit*, to remove direct dependencies
on different IRUnits (e.g. Analyses).
* PassInstrumentationAnalysis analysis is explicitly requested from PassManager through
usual AnalysisManager::getResult. All pass managers were updated to run that
to get PassInstrumentation object for instrumentation calls.
* Using tuples/index_sequence getAnalysisResult helper to extract generic AnalysisManager's extra
args out of a generic PassManager's extra args. This is the only way I was able to explicitly
run getResult for PassInstrumentationAnalysis out of a generic code like PassManager::run or
RepeatedPass::run.
TODO: Upon lengthy discussions we agreed to accept this as an initial implementation
and then get rid of getAnalysisResult by improving RepeatedPass implementation.
* PassBuilder takes PassInstrumentationCallbacks object to pass it further into
PassInstrumentationAnalysis. Callbacks registration should be performed directly
through PassInstrumentationCallbacks.
* new-pm tests updated to account for PassInstrumentationAnalysis being run
* Added PassInstrumentation tests to PassBuilderCallbacks unit tests.
Other unit tests updated with registration of the now-required PassInstrumentationAnalysis.
Reviewers: chandlerc, philip.pfaffe
Differential Revision: https://reviews.llvm.org/D47858
llvm-svn: 342597
The test diff in not-and-simplify.ll is from a use in SimplifyDemandedBits,
and the test diff in add.ll is from a DAGCombiner transform.
llvm-svn: 342594
Add a flag to dump the schedule DAG to the debug stream. This will be
used in upcoming commits to test schedule DAG mutations such as macro
fusion.
llvm-svn: 342589
Enable enableMultipleCopyHints() on X86.
Original Patch by @jonpa:
While enabling the mischeduler for SystemZ, it was discovered that for some reason a test needed one extra seemingly needless COPY (test/CodeGen/SystemZ/call-03.ll). The handling for that is resulted in this patch, which improves the register coalescing by providing not just one copy hint, but a sorted list of copy hints. On SystemZ, this gives ~12500 less register moves on SPEC, as well as marginally less spilling.
Instead of improving just the SystemZ backend, the improvement has been implemented in common-code (calculateSpillWeightAndHint(). This gives a lot of test failures, but since this should be a general improvement I hope that the involved targets will help and review the test updates.
Differential Revision: https://reviews.llvm.org/D38128
llvm-svn: 342578
Summary: This patch adds a GlobalIsel copy utility into MI for flags and updates the instruction emitter for the SDAG path. Some tests show new behavior and I added one for GlobalIsel which mirrors an SDAG test for handling nsw/nuw.
Reviewers: spatel, wristow, arsenm
Reviewed By: arsenm
Subscribers: wdng
Differential Revision: https://reviews.llvm.org/D52006
llvm-svn: 342576
As the code comments suggest, these are about splitting, and they
are not necessarily limited to lowering, so that misled me.
There's nothing that's actually x86-specific in these either, so
they might be better placed in a common header so any target can
use them.
llvm-svn: 342575
Summary:
ThinLTO imports alias as a copy of a aliasee, so when we import such functions with type tests we will
need type ids used by function. However after D49565 we pick types only during processing of
FunctionSummary which is not happening for such aliesees.
Example:
Unit U1 with a type, a functions F with the type check, and an alias A to the function.
Unit U2 with only call to the alias A.
In particular, this happens when we use -mconstructor-aliases, which is default.
So if c++ unit only creates instance of the class, without calling any other methods it will lack of
necessary type ids, which will result in false CFI reports.
Reviewers: tejohnson, eugenis
Subscribers: pcc, mehdi_amini, inglorion, eraman, hiraditya, steven_wu, dexonsmith, llvm-commits
Differential Revision: https://reviews.llvm.org/D52201
llvm-svn: 342574
The patch extends size reduction pass for MicroMIPS. Two MOVE
instructions are transformed into one MOVEP instrucition.
Patch by Milena Vujosevic Janicic.
Differential revision: https://reviews.llvm.org/D52037
llvm-svn: 342572
The patch fixes definition of MOVEP instruction. Two registers are used
instead of register pairs. This is necessary as machine verifier cannot
handle register pairs.
Patch by Milena Vujosevic Janicic.
Differential revision: https://reviews.llvm.org/D52035
llvm-svn: 342571
This patch adds an initial x86 SimplifyDemandedVectorEltsForTargetNode implementation to handle target shuffles.
Currently the patch only decodes a target shuffle, calls SimplifyDemandedVectorElts on its input operands and removes any shuffle that reduces to undef/zero/identity.
Future work will need to integrate this with combineX86ShufflesRecursively, add support for other x86 ops, etc.
NOTE: There is a minor regression that appears to be affecting further (extractelement?) combines which I haven't been able to solve yet - possibly something to do with how nodes are added to the worklist after simplification.
Differential Revision: https://reviews.llvm.org/D52140
llvm-svn: 342564
Summary:
This is required for GPUs with 16 bit instructions where f16 is a
legal register type and hence int_to_fp i1 to f16 is not lowered
by legalizing.
Reviewers: arsenm, nhaehnle
Reviewed By: nhaehnle
Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits
Differential Revision: https://reviews.llvm.org/D52018
Change-Id: Ie4c0fd6ced7cf10ad612023c6879724d9ded5851
llvm-svn: 342558
Clang-compiled object files currently don't include the symbol sizes and
types. Some tools however need that information. For example, ctfconvert
uses that information to generate FreeBSD's CTF representation from ELF
files.
With this patch, symbol sizes and types are included in object files.
Signed-off-by: Paul Chaignon <paul.chaignon@orange.com>
Reported-by: Yutaro Hayakawa <yhayakawa3720@gmail.com>
llvm-svn: 342556
This patch adds the ability for processor models to describe dependency breaking
instructions.
Different processors may specify a different set of dependency-breaking
instructions.
That means, we cannot assume that all processors of the same target would use
the same rules to classify dependency breaking instructions.
The main goal of this patch is to provide the means to describe dependency
breaking instructions directly via tablegen, and have the following
TargetSubtargetInfo hooks redefined in overrides by tabegen'd
XXXGenSubtargetInfo classes (here, XXX is a Target name).
```
virtual bool isZeroIdiom(const MachineInstr *MI, APInt &Mask) const {
return false;
}
virtual bool isDependencyBreaking(const MachineInstr *MI, APInt &Mask) const {
return isZeroIdiom(MI);
}
```
An instruction MI is a dependency-breaking instruction if a call to method
isDependencyBreaking(MI) on the STI (TargetSubtargetInfo object) evaluates to
true. Similarly, an instruction MI is a special case of zero-idiom dependency
breaking instruction if a call to STI.isZeroIdiom(MI) returns true.
The extra APInt is used for those targets that may want to select which machine
operands have their dependency broken (see comments in code).
Note that by default, subtargets don't know about the existence of
dependency-breaking. In the absence of external information, those method calls
would always return false.
A new tablegen class named STIPredicate has been added by this patch to let
processor models classify instructions that have properties in common. The idea
is that, a MCInstrPredicate definition can be used to "generate" an instruction
equivalence class, with the idea that instructions of a same class all have a
property in common.
STIPredicate definitions are essentially a collection of instruction equivalence
classes.
Also, different processor models can specify a different variant of the same
STIPredicate with different rules (i.e. predicates) to classify instructions.
Tablegen backends (in this particular case, the SubtargetEmitter) will be able
to process STIPredicate definitions, and automatically generate functions in
XXXGenSubtargetInfo.
This patch introduces two special kind of STIPredicate classes named
IsZeroIdiomFunction and IsDepBreakingFunction in tablegen. It also adds a
definition for those in the BtVer2 scheduling model only.
This patch supersedes the one committed at r338372 (phabricator review: D49310).
The main advantages are:
- We can describe subtarget predicates via tablegen using STIPredicates.
- We can describe zero-idioms / dep-breaking instructions directly via
tablegen in the scheduling models.
In future, the STIPredicates framework can be used for solving other problems.
Examples of future developments are:
- Teach how to identify optimizable register-register moves
- Teach how to identify slow LEA instructions (each subtarget defining its own
concept of "slow" LEA).
- Teach how to identify instructions that have undocumented false dependencies
on the output registers on some processors only.
It is also (in my opinion) an elegant way to expose knowledge to both external
tools like llvm-mca, and codegen passes.
For example, machine schedulers in LLVM could reuse that information when
internally constructing the data dependency graph for a code region.
This new design feature is also an "opt-in" feature. Processor models don't have
to use the new STIPredicates. It has all been designed to be as unintrusive as
possible.
Differential Revision: https://reviews.llvm.org/D52174
llvm-svn: 342555
This is an alternative to D37896. I don't see a way to decompose multiplies
generically without a target hook to tell us when it's profitable.
ARM and AArch64 may be able to remove some duplicate code that overlaps with
this transform.
As a first step, we're only getting the most clear wins on the vector examples
requested in PR34474:
https://bugs.llvm.org/show_bug.cgi?id=34474
As noted in the code comment, it's likely that the x86 constraints are tighter
than necessary, but it may not always be a win to replace a pmullw/pmulld.
Differential Revision: https://reviews.llvm.org/D52195
llvm-svn: 342554
This involves changing the shouldExpandAtomicCmpXchgInIR interface, but I have
updated the in-tree backends using this hook (ARM, AArch64, Hexagon) so they
will see no functional change. Previously this hook returned bool, but it now
returns AtomicExpansionKind.
This hook allows targets to select how a given cmpxchg is to be expanded.
D48131 uses this to expand part-word cmpxchg to a target-specific intrinsic.
See my associated RFC for more info on the motivation for this change
<http://lists.llvm.org/pipermail/llvm-dev/2018-June/123993.html>.
Differential Revision: https://reviews.llvm.org/D48130
llvm-svn: 342550
Summary:
Same as to D52146.
`((1 << y)+(-1))` is simply non-canoniacal version of `~(-1 << y)`: https://rise4fun.com/Alive/0vl
We can not canonicalize it due to the extra uses. But we can handle it here.
Reviewers: spatel, craig.topper, RKSimon
Reviewed By: spatel
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D52147
llvm-svn: 342547
Summary:
Two folds are happening here:
1. https://rise4fun.com/Alive/oaFX
2. And then `foldICmpWithHighBitMask()` (D52001): https://rise4fun.com/Alive/wsP4
This change doesn't just add the handling for eq/ne predicates,
it actually builds upon the previous `foldICmpWithLowBitMaskedVal()` work,
so **all** the 16 fold variants* are immediately supported.
I'm indeed only testing these two predicates.
I do not feel like re-proving all 16 folds*, because they were already proven
for the general case of constant with all-ones in low bits. So as long as
the mask produces all-ones in low bits, i'm pretty sure the fold is valid.
But required, i can re-prove, let me know.
* eq/ne are commutative - 4 folds; ult/ule/ugt/uge - are not commutative (the commuted variant is InstSimplified), 4 folds; slt/sle/sgt/sge are not commutative - 4 folds. 12 folds in total.
https://bugs.llvm.org/show_bug.cgi?id=38123https://bugs.llvm.org/show_bug.cgi?id=38708
Reviewers: spatel, craig.topper, RKSimon
Reviewed By: spatel
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D52146
llvm-svn: 342546
Fixes the unwind information generated for floating-point registers.
Previously, all padding registers were assumed to be four bytes wide. Now, the
width of the register is used to specify the amount of padding.
Patch by Jackson Woodruff!
Differential revision: https://reviews.llvm.org/D51494
llvm-svn: 342545
Summary:
Pass Execution Instrumentation interface enables customizable instrumentation
of pass execution, as per "RFC: Pass Execution Instrumentation interface"
posted 06/07/2018 on llvm-dev@
The intent is to provide a common machinery to implement all
the pass-execution-debugging features like print-before/after,
opt-bisect, time-passes etc.
Here we get a basic implementation consisting of:
* PassInstrumentationCallbacks class that handles registration of callbacks
and access to them.
* PassInstrumentation class that handles instrumentation-point interfaces
that call into PassInstrumentationCallbacks.
* Callbacks accept StringRef which is just a name of the Pass right now.
There were some ideas to pass an opaque wrapper for the pointer to pass instance,
however it appears that pointer does not actually identify the instance
(adaptors and managers might have the same address with the pass they govern).
Hence it was decided to go simple for now and then later decide on what the proper
mental model of identifying a "pass in a phase of pipeline" is.
* Callbacks accept llvm::Any serving as a wrapper for const IRUnit*, to remove direct dependencies
on different IRUnits (e.g. Analyses).
* PassInstrumentationAnalysis analysis is explicitly requested from PassManager through
usual AnalysisManager::getResult. All pass managers were updated to run that
to get PassInstrumentation object for instrumentation calls.
* Using tuples/index_sequence getAnalysisResult helper to extract generic AnalysisManager's extra
args out of a generic PassManager's extra args. This is the only way I was able to explicitly
run getResult for PassInstrumentationAnalysis out of a generic code like PassManager::run or
RepeatedPass::run.
TODO: Upon lengthy discussions we agreed to accept this as an initial implementation
and then get rid of getAnalysisResult by improving RepeatedPass implementation.
* PassBuilder takes PassInstrumentationCallbacks object to pass it further into
PassInstrumentationAnalysis. Callbacks registration should be performed directly
through PassInstrumentationCallbacks.
* new-pm tests updated to account for PassInstrumentationAnalysis being run
* Added PassInstrumentation tests to PassBuilderCallbacks unit tests.
Other unit tests updated with registration of the now-required PassInstrumentationAnalysis.
Reviewers: chandlerc, philip.pfaffe
Differential Revision: https://reviews.llvm.org/D47858
llvm-svn: 342544
This is still unsafe for long double, we will transform things into tanl
even if tanl is for another type. But that's for someone else to fix.
llvm-svn: 342542
Introduce a new RISCVExpandPseudoInsts pass to expand atomic
pseudo-instructions after register allocation. This is necessary in order to
ensure that register spills aren't introduced between LL and SC, thus breaking
the forward progress guarantee for the operation. AArch64 does something
similar for CmpXchg (though only at O0), and Mips is moving towards this
approach (see D31287). See also [this mailing list
post](http://lists.llvm.org/pipermail/llvm-dev/2016-May/099490.html) from
James Knight, which summarises the issues with lowering to ll/sc in IR or
pre-RA.
See the [accompanying RFC
thread](http://lists.llvm.org/pipermail/llvm-dev/2018-June/123993.html) for an
overview of the lowering strategy.
Differential Revision: https://reviews.llvm.org/D47882
llvm-svn: 342534
The 0x800 bit in @feat.00 needs to be set in order to make LLD pick up
the .gfid$y table. I believe this is fine to set even if we don't emit
the instrumentation.
We haven't emitted @feat.00 on 64-bit before. I see that MSVC does emit
it, but I'm not entirely sure what the default value should be. I went
with zero since that seems as safe as not emitting the symbol in the
first place.
Differential Revision: https://reviews.llvm.org/D52235
llvm-svn: 342532
When SimplifyCFG changes the PHI node into a select instruction, the debug information becomes ambiguous. It causes the debugger to display wrong variable value.
Differential Revision: https://reviews.llvm.org/D51976
llvm-svn: 342527
It's pretty common for the verifier to dump the relevant DIE when it
finds an issue. This tends to be relatively verbose and error prone
because we have to pass the DIDumpOptions to the DIE's dump method. This
patch adds a helper function to the verifier to make this easier.
llvm-svn: 342526
- Instead of having both `SUnit::dump(ScheduleDAG*)` and
`ScheduleDAG::dumpNode(ScheduleDAG*)`, just keep the latter around.
- Add `ScheduleDAG::dump()` and avoid code duplication in several
places. Implement it for different ScheduleDAG variants.
- Add `ScheduleDAG::dumpNodeName()` in favor of the `SUnit::print()`
functions. They were only ever used for debug dumping and putting the
function into ScheduleDAG is consistent with the `dumpNode()` change.
llvm-svn: 342520
There were several issues with the previous implementation.
1) There were no tests.
2) We didn't support creating PDBSymbolTypePointer records for
builtin types since those aren't described by LF_POINTER
records.
3) We didn't support a wide enough variety of builtin types even
ignoring pointers.
This patch fixes all of these issues. In order to add tests,
it's helpful to be able to ignore the symbol index id hierarchy
because it makes the golden output from the DIA version not match
our output, so I've extended the dumper to disable dumping of id
fields.
llvm-svn: 342493
This allows the hard-coded shouldForceImmediate logic to be removed because
the generated MatchOperandParserImpl makes use of the current context (i.e.
the current mnemonic) to determine parsing behaviour, and so won't first try
to parse a register before parsing a symbol name.
No functional change is intended. gas accepts immediate arguments for call,
tail and lla. This patch doesn't address this discrepancy.
Differential Revision: https://reviews.llvm.org/D51733
llvm-svn: 342488
addi a0, a0, foo and lw a0, foo(a0) and similar are now rejected. An explicit
%lo and %pcrel_lo modifier is required. This matches gas behaviour.
llvm-svn: 342487
Reject bare symbols and accept only %pcrel_hi(sym) for auipc and %hi(sym) for
lui. Also test valid operand modifiers in rv32i-valid.s.
Note this is slightly stricter than gas, which will accept either %pcrel_hi or
%hi for both lui and auipc.
Differential Revision: https://reviews.llvm.org/D51731
llvm-svn: 342486
This is a follow-up to the previous patch that eliminated some of the rotates.
With this addition, we will also emit the record-form andis.
This patch increases the number of record-form rotates we eliminate by
more than 70%.
Differential revision: https://reviews.llvm.org/D44897
llvm-svn: 342478
Summary:
Currently only the first function in the module is checked to
see if it has remarks enabled. If that first function is a declaration,
remarks will be incorrectly skipped. Change to look for the first
non-empty function.
Reviewers: pcc
Subscribers: mehdi_amini, inglorion, eraman, steven_wu, dexonsmith, llvm-commits
Differential Revision: https://reviews.llvm.org/D51556
llvm-svn: 342477
Summary:
Adds LLVMAddUnifyFunctionExitNodesPass to expose
createUnifyFunctionExitNodesPass to the C and OCaml APIs.
Reviewers: whitequark, deadalnix
Reviewed By: whitequark
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D52212
llvm-svn: 342476
Both ANDIo and ANDISo (and the 64-bit versions) are record-form instructions.
When optimizing compares, we handle the former in order to eliminate the compare
instruction but not the latter. This patch just adds the latter to the set of
instructions we optimize.
The reason these instructions need to be handled separately is that they are not
part of the RecFormRel map (since they don't have a non-record-form). The
missing "and-immediate-shifted" is just an oversight in the initial
implementation.
Differential revision: https://reviews.llvm.org/D51353
llvm-svn: 342472
Since Android API version 9 the Android libm has had the sincos functions, so
they should be recognised as libcalls and sincos optimisation should be applied.
Differential Revision: https://reviews.llvm.org/D52025
llvm-svn: 342471
This tries to make use of evaluateAsRelocatable in AArch64AsmParser::classifySymbolRef
to parse more complex expressions as relocatable operands. It is hopefully better than
the existing code which only handles Symbol +- Constant.
This allows us to parse more complex adr/adrp, mov, ldr/str and add operands. It also
loosens the requirements on parsing addends in ld/st and mov's and adds a number of
tests.
Differential Revision: https://reviews.llvm.org/D51792
llvm-svn: 342455
A piece of logic in rewriteLoopExitValues has a weird check on number of
users which allowed an unprofitable transform in case if an instruction has
more than 6 users.
Differential Revision: https://reviews.llvm.org/D51404
Reviewed By: etherzhhb
llvm-svn: 342444
This was checking the hardcoded address space 0 for the stack.
Additionally, this should be checking for legality with
the adjusted alignment, so defer the alignment check.
Also try to split if the unaligned access isn't allowed.
llvm-svn: 342442
When doing some instruction scheduling work, we noticed some missing itineraries.
Before we switch to machine scheduler, those missing itineraries might not have impact to actually scheduling,
because we can still get same latency due to default values.
With machine scheduler, however, itineraries will have impact to scheduling.
eg: NumMicroOps will default to be 0 if there is NO itineraries for specific instruction class.
And most of the instruction class with itineraries will have NumMicroOps default to 1.
This will has impact on the count of RetiredMOps, affects the Pending/Available Queue,
then causing different scheduling or suboptimal scheduling further.
Patch By: jsji (Jinsong Ji)
Differential Revision: https://reviews.llvm.org/D52040
llvm-svn: 342441
Summary:
This patch adds LLVMIsLiteralStruct to the C API to expose
StructType::isLiteral. This is then used to implement the analogous
addition to the OCaml API.
Reviewers: whitequark, deadalnix
Reviewed By: whitequark
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D52209
llvm-svn: 342435
Summary:
ConstantExpr supports getIndices, but prior to this patch
LLVMGetNumIndices and LLVMGetIndices would error on them.
Reviewers: whitequark
Reviewed By: whitequark
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D52206
llvm-svn: 342434
This reverts r342395 as it caused error
> Argument value type does not match pointer operand type!
> %0 = atomicrmw volatile xchg i8* %_Value1, i32 1 monotonic, !dbg !25
> i8in function atomic_flag_test_and_set
> fatal error: error in backend: Broken function found, compilation aborted!
on bot http://green.lab.llvm.org/green/job/clang-stage1-configure-RA/
More details are available at https://reviews.llvm.org/D52080
llvm-svn: 342431
Summary:
EarlyCSE can make IR changes that will leave MemorySSA with accesses claiming to be optimized, but for which a subsequent MemorySSA run will yield a different optimized result.
Due to relying on AA queries, we can't fix this in general, unless we recompute MemorySSA.
Adding some tests to track this and a basic verify for future potential failures.
Reviewers: george.burgess.iv, gberry
Subscribers: sanjoy, jlebar, Prazek, llvm-commits
Differential Revision: https://reviews.llvm.org/D51960
llvm-svn: 342422
Add support mips64(el)-linux-gnuabin32 triples, and set them to N32.
Debian architecture name mipsn32/mipsn32el are also added. Set
UseIntegratedAssembler for N32 if we can detect it.
Patch by YunQiang Su.
Differential revision: https://reviews.llvm.org/D51408
llvm-svn: 342416
Previously we would dump the names of enum types, but not their
enumerator values. This adds support for enumerator values. In
doing so, we have to introduce a general purpose mechanism for
caching symbol indices of field list members. Unlike global
types, FieldList members do not have a TypeIndex. So instead,
we identify them by the pair {TypeIndexOfFieldList, IndexInFieldList}.
llvm-svn: 342415
Previously for cv-qualified types, we would just ignore them
and they would never get printed. Now we can enumerate them
and cache them like any other symbol type.
llvm-svn: 342414
getLoopID has different control flow for two cases: If there is a
single loop latch and for any other number of loop latches (0 and more
than one). The latter case should return the same result if there is
only a single latch. We can save the preceding redundant search for a
latch by handling both cases with the same code.
Differential Revision: https://reviews.llvm.org/D52118
llvm-svn: 342406
We were mapping an instruction every time we saw something we couldn't map
before this. Since each illegal mapping is unique, we only have to do this once.
This makes it so that we don't map illegal instructions when the previous
mapped instruction was illegal.
In CTMark (AArch64), this results in 240 fewer instruction mappings on
average over 619 files in total. The largest improvement is 12576 fewer
mappings in one file, and the smallest is 0. The median improvement is 101
fewer mappings.
llvm-svn: 342405
Summary:
The IR reference for the `byval` attribute states:
```
This indicates that the pointer parameter should really be passed by value
to the function. The attribute implies that a hidden copy of the pointee is
made between the caller and the callee, so the callee is unable to modify
the value in the caller. This attribute is only valid on LLVM pointer arguments.
```
However, on Win64, this attribute is unimplemented and the raw pointer is
passed to the callee instead. This is problematic, because frontend authors
relying on the implicit hidden copy (as happens for every other calling
convention) will see the passed value silently (if mutable memory) or
loudly (by means of a crash) modified because the callee treats the
location as scratch memory space it is allowed to mutate.
At this point, it's worth taking a step back to understand the context.
In most calling conventions, aggregates that are too large to be passed
in registers, instead get *copied* to the stack at a fixed (computable
from the signature) offset of the stack pointer. At the LLVM, we hide
this hidden copy behind the byval attribute. The caller passes a pointer
to the desired data and the callee receives a pointer, but these pointers
are not the same. In particular, the pointer that the callee receives
points to temporary stack memory allocated as part of the call lowering.
In most calling conventions, this pointer is never realized in registers
or memory. The temporary memory is simply defined by an implicit
offset from the stack pointer at function entry.
Win64, uniquely, works differently. The structure is still passed in
memory, but instead of being stored at an implicit memory offset, the
caller computes a pointer to the temporary memory and passes it to
the callee as a regular pointer (taking up a register, or if all
registers are taken up, an additional stack slot). Presumably, this
was done to allow eliding the copy when passing aggregates through
several functions on the stack.
This explains why ignoring the `byval` attribute mostly works on Win64.
The argument simply gets passed as a pointer and as long as we're ok
with the callee trampling all over that memory, there are no ill effects.
However, it does contradict the documentation of the `byval` attribute
which specifies that there is to be an implicit copy.
Frontends can of course work around this by never emitting the `byval`
attribute for Win64 and creating `alloca`s for the requisite temporary
stack slots (and that does appear to be what frontends are doing).
However, the presence of the `byval` attribute is not a trap for
frontend authors, since it seems to work, but silently modifies the
passed memory contrary to documentation.
I see two solutions:
- Disallow the `byval` attribute in the verifier if using the Win64
calling convention.
- Make it work by simply emitting a temporary stack copy as we would
with any other calling convention (frontends can of course always
not use the attribute if they want to elide the copy).
This patch implements the second option (make it work), though I would
be fine with the first also.
Ref: https://github.com/JuliaLang/julia/issues/28338
Reviewers: rnk
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D51842
llvm-svn: 342402
isSupportedValue explicitly checked and accepted many types of value,
primarily for debugging reasons. Remove most of these checks and do a
bit of refactoring now that the pass is more stable. This also enables
ZExts to be sources, but this has very little practical benefit at the
moment extend instructions will still be introduced.
Differential Revision: https://reviews.llvm.org/D52080
llvm-svn: 342395
We allow overflowing instructions if they're decreasing and only used
by an unsigned compare. Add the extra condition that the icmp cannot
be using a negative immediate.
Differential Revision: https://reviews.llvm.org/D52102
llvm-svn: 342392
Rebase rL341954 since https://bugs.llvm.org/show_bug.cgi?id=38912
has been fixed by rL342055.
Precommit testing performed:
* Overnight runs of csmith comparing the output between programs
compiled with gvn-hoist enabled/disabled.
* Bootstrap builds of clang with UbSan/ASan configurations.
llvm-svn: 342387
std::vector::iterator type may be a pointer, then
iterator::value_type fails to compile since iterator is not a class,
namespace, or enumeration.
Patch by orivej (Orivej Desh)
Differential Revision: https://reviews.llvm.org/D52142
llvm-svn: 342354
For constant non-uniform cases we'll never introduce more and/andn/or selects than already occur in generic pre-SSE41 ISD::SRL lowering.
llvm-svn: 342352
This is a follow-up suggested in D51630 and originally proposed as an IR transform in D49040.
Copying the motivational statement by @evandro from that patch:
"This transformation helps some benchmarks in SPEC CPU2000 and CPU2006, such as 188.ammp,
447.dealII, 453.povray, and especially 300.twolf, as well as some proprietary benchmarks.
Otherwise, no regressions on x86-64 or A64."
I'm proposing to add only the minimum support for a DAG node here. Since we don't have an
LLVM IR intrinsic for cbrt, and there are no other DAG ways to create a FCBRT node yet, I
don't think we need to worry about DAG builder, legalization, a strict variant, etc. We
should be able to expand as needed when adding more functionality/transforms. For reference,
these are transform suggestions currently listed in SimplifyLibCalls.cpp:
// * cbrt(expN(X)) -> expN(x/3)
// * cbrt(sqrt(x)) -> pow(x,1/6)
// * cbrt(cbrt(x)) -> pow(x,1/9)
Also, given that we bail out on long double for now, there should not be any logical
differences between platforms (unless there's some platform out there that has pow()
but not cbrt()).
Differential Revision: https://reviews.llvm.org/D51753
llvm-svn: 342348
https://bugs.llvm.org/show_bug.cgi?id=38949
It's not clear to me that we even need a one-use check in this fold.
Ie, 2 independent loads might be better than a load+dependent shuffle.
Note that the existing re-use tests are not affected. We actually do form a
broadcast node in those tests now because there's no extra use of the
insert_subvector node in those cases. But something later in isel pattern
matching decides that it is not worth using a broadcast for the full load in
those tests:
Legalized selection DAG: %bb.0 'test_broadcast_2f64_4f64_reuse:'
t7: v2f64,ch = load<(load 16 from %ir.p0)> t0, t2, undef:i64
t4: i64,ch = CopyFromReg t0, Register:i64 %1
t10: ch = store<(store 16 into %ir.p1)> t7:1, t7, t4, undef:i64
t18: v4f64 = insert_subvector undef:v4f64, t7, Constant:i64<0>
t20: v4f64 = insert_subvector t18, t7, Constant:i64<2>
Becomes:
t7: v2f64,ch = load<(load 16 from %ir.p0)> t0, t2, undef:i64
t4: i64,ch = CopyFromReg t0, Register:i64 %1
t10: ch = store<(store 16 into %ir.p1)> t7:1, t7, t4, undef:i64
t21: v4f64 = X86ISD::SUBV_BROADCAST t7
ISEL: Starting selection on root node: t21: v4f64 = X86ISD::SUBV_BROADCAST t7
...
Created node: t27: v4f64 = INSERT_SUBREG IMPLICIT_DEF:v4f64, t7, TargetConstant:i32<7>
Morphed node: t21: v4f64 = VINSERTF128rr t27, t7, TargetConstant:i8<1>
llvm-svn: 342347
Summary:
If the sub doesn't overflow in the original type we can move it above the sext/zext.
This is similar to what we do for add. The overflow checking for sub is currently weaker than add, so the test cases are constructed for what is supported.
Reviewers: spatel
Reviewed By: spatel
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D52075
llvm-svn: 342335
Naively computing the hash after the PDB data has been generated is in practice
as fast as other approaches I tried. I also tried online-computing the hash as
parts of the PDB were written out (https://reviews.llvm.org/D51887; that's also
where all the measuring data is) and computing the hash in parallel
(https://reviews.llvm.org/D51957). This approach here is simplest, without
being slower.
Differential Revision: https://reviews.llvm.org/D51956
llvm-svn: 342333
* Use same method of initializing the output stream and its buffer
* Allow a nullptr Status pointer
* Don't print the mangled name on demangling error
* Write to N (if it is non-nullptr)
Differential Revision: https://reviews.llvm.org/D52104
llvm-svn: 342330
Summary: This unfortunately adds a move, but isn't that better than going to the int domain and back?
Reviewers: RKSimon
Reviewed By: RKSimon
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D52134
llvm-svn: 342327
Summary:
MOVMSK only care about the sign bit so we don't need the setcc to fill the whole element with 0s/1s. We can just shift the bit we're looking for into the sign bit. This saves a constant pool load.
Inspired by PR38840.
Reviewers: RKSimon, spatel
Reviewed By: RKSimon
Subscribers: lebedev.ri, llvm-commits
Differential Revision: https://reviews.llvm.org/D52121
llvm-svn: 342326
Missing optimizations with blendv are shown in:
https://bugs.llvm.org/show_bug.cgi?id=38814
If this works, it's an easier and more powerful solution than adding pattern matching
for a few special cases in the backend. The potential danger with this transform in IR
is that the condition value can get separated from the select, and the backend might
not be able to make a blendv out of it again. I don't think that's too likely, but
I've kept this patch minimal with a 'TODO', so we can test that theory in the wild
before expanding the transform.
Differential Revision: https://reviews.llvm.org/D52059
llvm-svn: 342324
CodeGenPrepare has a transform that sinks {lshr, trunc} pairs to make it
easier for the backend to emit fancy extract-bits instructions (e.g UBFX).
Teach it to preserve debug locations and salvage debug values.
llvm-svn: 342319
Summary:
Implement shifts of vectors by i32. Since LLVM defines shifts as
binary operations between two vectors, this involves pattern matching
on splatted shift operands. For v2i64 shifts any i32 shift operands
have to be zero extended in the input and any i64 shift operands have
to be wrapped in the output. Depends on D52007.
Reviewers: aheejin, dschuff
Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits
Differential Revision: https://reviews.llvm.org/D51906
llvm-svn: 342302
Similar to rL342278:
The test diffs are all cosmetic due to the change in
value naming, but I'm including that to show that the
new code does perform these folds rather than something
else in instcombine.
D52075 should be able to use this code too rather than
duplicating all of the logic.
llvm-svn: 342292
Summary:
Integer types smaller than i32 must be extended to i32 by default.
The feature "crbits" introduced at r202451 handles i1 as a special case,
but it did not extend properly.
The caller was, therefore, passing i1 stack arguments by writing 0/1 to
the first byte of the 4-byte stack object and callee was
reading the first byte for the value.
"crbits" is enabled if the optimization level is greater than 1,
which is very common in "release builds".
Such discrepancies with ABI specification also introduces
potential incompatibility with programs or libraries
built with other compilers e.g. GCC.
Fixes PR38661
Reviewers: hfinkel, cuviper
Subscribers: sylvestre.ledru, glaubitz, nagisa, nemanjai, kbarton, llvm-commits
Differential Revision: https://reviews.llvm.org/D51108
llvm-svn: 342288
Eventually we need to be able to support nested types, which don't
have an associated CVType record. To handle this, remove the
CVType from all of the record classes, and instead store the
deserialized record. Then move the deserialization up to the thing
that creates the type. This actually makes error handling better
anyway as we can return an invalid symbol instead of asserting false.
llvm-svn: 342284
The patch saves a function offset table which maps function name index to the
offset of its function profile to the start of the binary profile. By using
the function offset table, for those function profiles which will not be used
when compiling a module, the profile reader does't have to read them. For
profile size around 10~20M, it saves ~10% compile time.
Differential Revision: https://reviews.llvm.org/D51863
llvm-svn: 342283
The test diffs are all cosmetic due to the change in
value naming, but I'm including that to show that the
new code does perform these folds rather than something
else in instcombine.
llvm-svn: 342278
The test used to fail with an invalid phi node: the two predecessors were outlined
and the SSA representation was left invalid. The patch adds the exit block to the
cold region.
llvm-svn: 342277
remove duplicate entries from isSingleEntrySingleExit: the Entry block is
already added by the loop over the dominance frontier.
Remove the heuristic from isOutlineCandidate that a region is too small when it
only contains a basic block. With this change we now grow regions starting from
a block and we continue adding to the ValidColdRegion. Check the heuristic just
before code generation.
llvm-svn: 342276
Also fix a problem in forward propagation:
const TerminatorInst *TI = It->getTerminator();
was set outside the while loop that iterates over It.
llvm-svn: 342275
This patch fixes the debug info handling for SelectionDAG legalization
of DAG nodes with two results. When an replaced SDNode has more than
one result, transferDbgValues was always copying the SDDbgValue from
the first result and attaching them to all members. In reality
SelectionDAG::ReplaceAllUsesWith() is given an array of SDNodes
(though the type signature doesn't make this obvious (cf. the call
site code in ReplaceNode()).
rdar://problem/44162227
Differential Revision: https://reviews.llvm.org/D52112
llvm-svn: 342264
Summary:
During threaded thinLTO, it is possible that the entry for current
module doesn't exist in StringMaps (like ExportLists, ResolvedODR,
etc.). Using operator[] might trigger a rehash for the StringMap, which
might happen on multiple threads at the same time.
rdar://problem/43846199
Reviewers: tejohnson, mehdi_amini, kromanova, pcc
Reviewed By: tejohnson
Subscribers: dang, inglorion, eraman, dexonsmith, llvm-commits
Differential Revision: https://reviews.llvm.org/D52049
llvm-svn: 342263
Attempt to lower a shuffle as an unpack of elements from two inputs followed by a single-input (wider) permutation.
As long as the permutation is wider this is a win - there may be some circumstances where same size permutations would also be useful but I've left that for future work.
Differential Revision: https://reviews.llvm.org/D52043
llvm-svn: 342257
Summary:
GFX9 and above support sin/cos instructions with a greater range and thus don't
require a fract instruction prior to invocation.
Added a subtarget feature to reflect this and added code to take advantage of
expanded range on GFX9+
Also updated the tests to check correct behaviour
Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, tpr, t-tye, llvm-commits
Differential Revision: https://reviews.llvm.org/D51933
Change-Id: I1c1f1d3726a5ae32116646ca5cfa1ab4ef69e5b0
llvm-svn: 342222
On failing to find sequences that can be converted into dual macs,
try to find sequential 16-bit loads that are used by muls which we
can then use smultb, smulbt, smultt with a wide load.
Differential Revision: https://reviews.llvm.org/D51983
llvm-svn: 342210
As preparation for LoopInterchange becoming a loop pass, it needs to
preserve ScalarEvolution. Even though interchanging should not change
the trip count of the loop, it modifies loop entry, latch and exit
blocks.
I added -verify-scev to some loop interchange tests, but the verification does
not catch problems caused by missing invalidation of SE in loop interchange, as
the trip counts themselves do not change. So there might be potential to
make the SE verification covering more stuff in the future.
Reviewers: mkazantsev, efriedma, karthikthecool
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D52026
llvm-svn: 342209
After recent improvements which makes better use of LOC instead of IPM, the
TTI cost functions also needs to be updated to reflect this.
This involves sext, zext and xor of i1.
The tests were updated so that for z13 the new costs are expected, while the
old costs are still checked for on zEC12.
Review: Ulrich Weigand
https://reviews.llvm.org/D51339
llvm-svn: 342207
When reading directives from a .drectve section, the directives are
tokenized as a normal windows command line. However in these cases,
link.exe allows the directives to be separated by null bytes, not only by
spaces.
A test case for this change will be added in the lld repo.
Differential Revision: https://reviews.llvm.org/D52014
llvm-svn: 342204
Summary:
[VPlan] Implement vector code generation support for simple outer loops.
Context: Patch Series #1 for outer loop vectorization support in LV using VPlan. (RFC: http://lists.llvm.org/pipermail/llvm-dev/2017-December/119523.html).
This patch introduces vector code generation support for simple outer loops that are currently supported in the VPlanNativePath. Changes here essentially do the following:
- force vector code generation using explicit vectorize_width
- add conservative early returns in cost model and other places for VPlanNativePath
- add code for setting up outer loop inductions
- support for widening non-induction PHIs that can result from inner loops and uniform conditional branches
- support for generating uniform inner branches
We plan to add a handful C outer loop executable tests once the initial code generation support is committed. This patch is expected to be NFC for the inner loop vectorizer path. Since we are moving in the direction of supporting outer loop vectorization in LV, it may also be time to rename classes such as InnerLoopVectorizer.
Reviewers: fhahn, rengolin, hsaito, dcaballe, mkuper, hfinkel, Ayal
Reviewed By: fhahn, hsaito
Subscribers: dmgreen, bollu, tschuett, rkruppe, rogfer01, llvm-commits
Differential Revision: https://reviews.llvm.org/D50820
llvm-svn: 342197
Summary:
I accidentally left this behind in D50306, and it causes a build warning
when I build with gcc7.
Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, t-tye, llvm-commits
Differential Revision: https://reviews.llvm.org/D52022
Change-Id: I30f7a47047e9d9d841f652da66d2fea19e74842c
llvm-svn: 342189
Summary:
Place global arrays in comdat sections with their associated functions.
This makes sure they are stripped along with the functions they
reference, even on the BFD linker.
Reviewers: eugenis
Reviewed By: eugenis
Subscribers: eraman, hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D51902
llvm-svn: 342186
INLINEASM instructions use extra operands to carry flags. If a register operand is removed without removing the flag operand, then the flags will no longer make sense.
This patch fixes this by preventing the removal when a flag operand is present.
The included test case was generated by MS inline assembly. Longer term maybe we should fix the inline assembly parsing to not generate redundant operands.
Differential Revision: https://reviews.llvm.org/D51829
llvm-svn: 342176
When replacing a named register input to the appropriately sized
sub/super-register. In the case of a 64-bit value being assigned to a
register in 32-bit mode, match GCC's assignment.
Reviewers: eli.friedman, craig.topper
Subscribers: nickdesaulniers, llvm-commits, hiraditya
Differential Revision: https://reviews.llvm.org/D51502
llvm-svn: 342175
Summary:
It is sometimes important to check that some newly-computed value
is non-negative and only n bits wide (where n is a variable.)
There are many ways to check that:
https://godbolt.org/z/o4RB8D
The last variant seems best?
(I'm sure there are some other variations i haven't thought of..)
More complicated, canonical pattern:
https://rise4fun.com/Alive/uhA
We do need to have two `switch()`'es like this,
to not mismatch the swappable predicates.
https://bugs.llvm.org/show_bug.cgi?id=38708
Reviewers: spatel, craig.topper, RKSimon
Reviewed By: spatel
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D52001
llvm-svn: 342173
This adds DebugCounter support to the PartiallyInlineLibCalls pass,
which should make debugging/automated bisection easier in the future.
Patch by Zhizhou Yang!
Differential Revision: https://reviews.llvm.org/D50093
llvm-svn: 342172
This allows the xor to be removed completely.
This might help with recomitting r341674, but seems good regardless.
Coincidentally fixes PR38915.
Differential Revision: https://reviews.llvm.org/D51964
llvm-svn: 342163
Summary:
Fixed assertions due to invalid fixup when encoding compressed instructions
(c.addi, c.addiw, c.li, c.andi) with bare symbols with/without modifiers.
This matches GAS behavior as well.
This bug was uncovered by a LLVM MC Disassembler Protocol Buffer Fuzzer
for the RISC-V assembly language.
Reviewers: asb
Reviewed By: asb
Subscribers: rbar, johnrusso, simoncook, sabuasal, niosHD, kito-cheng, shiva0217, zzheng, edward-jones, mgrang, rogfer01, MartinMosbeck, brucehoult, the_o, rkruppe, PkmX, jocewei, asb
Differential Revision: https://reviews.llvm.org/D52005
llvm-svn: 342160
Summary:
The illegal instruction 0x00 0x00 is being wrongly decoded as
c.addi4spn with 0 immediate.
The invalid instruction 0x01 0x61 is being wrongly decoded as
c.addi16sp with 0 immediate.
This bug was uncovered by a LLVM MC Disassembler Protocol Buffer Fuzzer
for the RISC-V assembly language.
Reviewers: asb
Reviewed By: asb
Subscribers: rbar, johnrusso, simoncook, sabuasal, niosHD, kito-cheng, shiva0217, zzheng, edward-jones, mgrang, rogfer01, MartinMosbeck, brucehoult, the_o, rkruppe, PkmX, jocewei, asb
Differential Revision: https://reviews.llvm.org/D51815
llvm-svn: 342159
Also, add a check to ensure that when main has the expected signature
we do not create a wrapper.
Differential Revision: https://reviews.llvm.org/D51562
llvm-svn: 342157
I accidentally committed this diff with rL342147 because
I had applied D51964. We probably do need those checks,
but D51964 has tests and more discussion/motivation,
so they should be re-added with that patch.
llvm-svn: 342149
I don't have a test case for this, but it's motivated by
the discussion in D51964, and I've added TODO comments for
the better fix - move simplifications into instsimplify
because that's more efficient and reduces risk of infinite
loops in instcombine caused by transforms trying to do the
opposite folds.
In this case, we know that the transform that tries to move
'not' through min/max can be fooled by the multiple uses
of a value in another min/max, so try to squash the
foldSPFofSPF() patterns first.
llvm-svn: 342147
We previously only allowed truncs as sinks, but now allow them as
sources too. We do this by checking that the result type is the
narrow type that we're trying to optimise for.
Differential Revision: https://reviews.llvm.org/D51978
llvm-svn: 342141
Part of FixConsts wrongly assumes either a 8- or 16-bit constant
which can result in the wrong constants being generated during
promotion.
Differential Revision: https://reviews.llvm.org/D52032
llvm-svn: 342140
In r319995, we fixed the line table format to version 2 on Darwin
because dsymutil didn't yet understand the new format which caused test
failures for the LLDB bots. This has been resolved in the meantime so
there's no reason to keep this limitation.
rdar://problem/35968332
llvm-svn: 342136
If an argument was passed on the stack, this
was using the default alignment.
I'm not sure there's an observable change from this. This
was observable due to bugs in expansion of unaligned
loads and stores, but since that is fixed I don't think
this matters much.
llvm-svn: 342133
The Technical Reference Manuals for these two CPUs state that branching
to an unaligned 32-bit instruction incurs an extra pipeline reload
penalty. That's bad.
This also enables the optimization at -Os since it costs on average one
byte per loop in return for 1 cycle per iteration, which is pretty good
going.
llvm-svn: 342127
Summary:
This change has a number of fixes for FDR mode in compiler-rt along with
changes to the tooling handling the traces in llvm.
In the runtime, we do the following:
- Advance the "last record" pointer appropriately when writing the
custom event data in the log.
- Add XRAY_NEVER_INSTRUMENT in the rewinding routine.
- When collecting the argument of functions appropriately marked, we
should not attempt to rewind them (and reset the counts of functions
that can be re-wound).
In the tooling, we do the following:
- Remove the state logic in BlockIndexer and instead rely on the
presence/absence of records to indicate blocks.
- Move the verifier into a loop associated with each block.
Reviewers: mboerger, eizan
Subscribers: llvm-commits, hiraditya
Differential Revision: https://reviews.llvm.org/D51965
llvm-svn: 342122
This implements suggesting alternative mnemonics when an invalid one is
specified. For example `addru $9, $6, 17767` leads to the following
error message:
error: unknown instruction, did you mean: add, addiu, addu, maddu?
Differential revision: https://reviews.llvm.org/D40646
llvm-svn: 342119
Summary:
Previously we type legalized v2i32 div/rem by promoting to v2i64. But we don't support div/rem of vectors so op legalization would then scalarize it using i64 scalar ops since it doesn't know about the original promotion. 64-bit scalar divides on Intel hardware are known to be slow and in 32-bit mode they require a libcall.
This patch switches type legalization to do the scalarizing itself using i32.
It looks like the division by power of 2 optimization is still kicking in and leaving the code as a vector. The division by other constant optimization doesn't kick in pre type legalization since it ignores illegal types. And previously, after type legalization we scalarized the v2i64 since we don't have v2i64 MULHS/MULHU support.
Another option might be to widen v2i32 to v4i32 so we could do division by constant optimizations, but we'd have to be careful to only do that for constant divisors or we risk scalaring to 4 scalar divides.
Reviewers: RKSimon, spatel
Reviewed By: spatel
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D51325
llvm-svn: 342114
The `IMAGE_REL_ARM_BRANCH20T` applies only to a `b.w` instruction. A
thumb-2 `bl` should be relocated using a `IMAGE_REL_ARM_BRANCH24T`.
Correct the relocation that we emit in such a case.
Resolves PR38620! Based on the patch by Jordan Rhee!
llvm-svn: 342109
Shufflevector instructions in LLVM IR that extract a subset of elements
of a longer input into a shorter vector can be done using VECTOR_SHUFFLEs.
This will avoid expanding them into constly extracts and inserts.
llvm-svn: 342091
This is available on most platforms (Linux/Mac/Win/BSD) with no extra syscalls.
On other platforms (e.g. Solaris) we stat() if this information is requested.
This will allow switching clang's VFS to efficiently expose (path, type) when
traversing a directory. Currently it exposes an entire Status, but does so by
calling fs::status() on all platforms.
Almost all callers only need the path, and all callers only need (path, type).
Patch by sammccall (Sam McCall)
Differential Revision: https://reviews.llvm.org/D51918
llvm-svn: 342089
template methods in JITDylib out-of-line.
This also splits JITDylib::define into a pair of template methods, one taking an
lvalue reference and the other an rvalue reference. This simplifies the
templates at the cost of a small amount of code duplication.
llvm-svn: 342087
construction, a new convenience lookup method, and add-to layer methods.
ExecutionSession now creates a special 'main' JITDylib upon construction. All
subsequently created JITDylibs are added to the main JITDylib's search order by
default (controlled by the AddToMainDylibSearchOrder parameter to
ExecutionSession::createDylib). The main JITDylib's search order will be used in
the future to properly handle cross-JITDylib weak symbols, with the first
definition in this search order selected.
This commit also adds a new ExecutionSession::lookup convenience method that
performs a blocking lookup using the main JITDylib's search order, as this will
be a very common operation for clients.
Finally, new convenience overloads of IRLayer and ObjectLayer's add methods are
introduced that add the given program representations to the main dylib, which
is likely to be the common case.
llvm-svn: 342086
Summary:
rL341389 broke code with tied register operands in inline assembly. For
example, `asm("" : "=r"(var) : "0"(var));`
The code above specifies the input operand to be in the same register
with the output operand, tying the two register. This patch makes this
kind of code work again.
Reviewers: dschuff
Subscribers: sbc100, jgravelle-google, eraman, sunfish, llvm-commits
Differential Revision: https://reviews.llvm.org/D51991
llvm-svn: 342084
r342003 added support for emitting FPO data from the
DEBUG_S_FRAMEDATA subsection of the .debug$S section to the PDB
file. However, that is not the end of the story. FPO can end
up in two different destinations in a PDB, each corresponding to
a different FPO data source.
The case handled by r342003 involves copying data from the
DEBUG_S_FRAMEDATA subsection of the .debug$S section to the
"New FPO" stream in the PDB, which is then referred to by the
DBI stream. The case handled by this patch involves copying
records from the .debug$F section of an object file to the "FPO"
stream (or perhaps more aptly, the "Old FPO" stream) in the PDB
file, which is also referred to by the DBI stream.
The formats are largely similar, and the difference is mostly
only visible in masm generated object files, such as some of the
low-level CRT object files like memcpy. MASM doesn't appear to
support writing the DEBUG_S_FRAMEDATA subsection, and instead
just writes these records to the .debug$F section.
Although clang-cl does not emit a .debug$F section ever, lld still
needs to support it so we have good debugging for CRT functions.
Differential Revision: https://reviews.llvm.org/D51958
llvm-svn: 342080
Scalarization of a shuffle will break up the source vectors into individual
elements, and use them to assemble the resulting vector. An element type of
a legal vector type may not necessarily be a legal scalar type, so make
sure that the extracted values are extended to a legal scalar type.
llvm-svn: 342079
Move isa version determination into TargetParser.
Also switch away from target features to CPU string when
determining isa version. This fixes an issue when we
output wrong isa version in the object code when features
of a particular CPU are altered (i.e. gfx902 w/o xnack
used to result in gfx900).
llvm-svn: 342069
Summary:
It is sometimes important to check that some newly-computed value
is non-negative and only `n` bits wide (where `n` is a variable.)
There are **many** ways to check that:
https://godbolt.org/z/o4RB8D
The last variant seems best?
(I'm sure there are some other variations i haven't thought of..)
Let's handle the second variant first, since it is much simpler.
https://rise4fun.com/Alive/LYjYhttps://bugs.llvm.org/show_bug.cgi?id=38708
Reviewers: spatel, craig.topper, RKSimon
Reviewed By: spatel
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D51985
llvm-svn: 342067
Summary:
Match the ordering semantics of non-vector comparisons. For
floating point comparisons that do not correspond to instructions, the
tests check that some vector comparison instruction was emitted but do
not care about the full implementation.
Reviewers: aheejin, dschuff
Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits
Differential Revision: https://reviews.llvm.org/D51765
llvm-svn: 342064
There's no advantage to this instruction unless you need to avoid touching other flag bits. It's encoding is longer, it can't fold an immediate, it doesn't write all the flags.
I don't think gcc will generate this instruction either.
Fixes PR38852.
Differential Revision: https://reviews.llvm.org/D51754
llvm-svn: 342059
Fix for https://bugs.llvm.org/show_bug.cgi?id=38912.
In GVNHoist::computeInsertionPoints() we iterate over the Value
Numbers and calculate the Iterated Dominance Frontiers without
clearing the IDFBlocks vector first. IDFBlocks ends up accumulating
an insane number of basic blocks, which bloats the compilation time
of SemaChecking.cpp with ubsan enabled.
Differential Revision: https://reviews.llvm.org/D51980
llvm-svn: 342055
This patch adds codegen support for the saving/restoring
V8-V23 for functions specified with the aarch64_vector_pcs
calling convention attribute, as added in patch D51477.
Reviewers: t.p.northover, gberry, thegameg, rengolin, javed.absar, MatzeB
Reviewed By: thegameg
Differential Revision: https://reviews.llvm.org/D51479
llvm-svn: 342049
Eliminating some duplication of rangelist dumping code at the expense of
some version-dependent code in dump and extract routines.
Reviewer: dblaikie, JDevlieghere, vleschuk
Differential revision: https://reviews.llvm.org/D51081
llvm-svn: 342048
The output of splitLargeGEPOffsets does not appear to be deterministic because
of the way that we iterate over a DenseMap. I've changed it to a MapVector for
consistent output.
The test here isn't particularly great, only showing a consmetic difference in
output. The original reproducer is much larger but show a diffierence in
instruction ordering, leading to different codegen.
Differential Revision: https://reviews.llvm.org/D51851
llvm-svn: 342043
Previously the alignment on the newly created switch table data was not set,
meaning that DataLayout::getPreferredAlignment was free to overalign it to 16
bytes. This causes unnecessary code bloat.
Differential Revision: https://reviews.llvm.org/D51800
llvm-svn: 342039
This patch refactors several parts of AArch64FrameLowering
so that it can be easily extended to support saving/restoring
of FPR128 (Q) registers.
Reviewers: t.p.northover, gberry, thegameg, rengolin, javed.absar
Reviewed By: thegameg
Differential Revision: https://reviews.llvm.org/D51478
llvm-svn: 342038
SMLAD and SMLALD instructions also come in the form of SMLADX and
SMLALDX which perform an exchange on their second operand. To support
this, more of the loads in the MAC candidates are compared for
sequential access and a boolean value has been added to BinOpChain.
AddMACCandiate has been refactored into a small pattern matching
state machine to reduce the amount of duplicated code, but also to
enable the matching to be more flexible. CreateParallelMACPairs now
iterates through all the candidates to find parallel ones.
Differential Revision: https://reviews.llvm.org/D51424
llvm-svn: 342033
This patch adds parsing support for the 'aarch64_vector_pcs'
calling convention attribute to calls and function declarations.
More information describing the vector ABI and procedure call standard
can be found here:
https://developer.arm.com/products/software-development-tools/\
hpc/arm-compiler-for-hpc/vector-function-abi
Reviewers: t.p.northover, rnk, rengolin, javed.absar, thegameg, SjoerdMeijer
Reviewed By: SjoerdMeijer
Differential Revision: https://reviews.llvm.org/D51477
llvm-svn: 342030
Move the 2 classes out of LoopVectorize.cpp to make it easier to re-use
them for VPlan outside LoopVectorize.cpp
Reviewers: Ayal, mssimpso, rengolin, dcaballe, mkuper, hsaito, hfinkel, xbolva00
Reviewed By: rengolin, xbolva00
Differential Revision: https://reviews.llvm.org/D49488
llvm-svn: 342027
This fixes a layering violation:
Analysis/IVDescrtors.cpp can't include Transforms/Utils/BasicBlockUtils.h,
since TransformUtils depends on Analysis.
llvm-svn: 342024
I suspect this became unecessary when the CSE of mgather was fixed in r338080. It may still be possible to hit this if we widen the element type of a gather outside of type legalization and the promote the mask of a separate gather node so they become the same. But that seems pretty unlikely.
llvm-svn: 342022
Summary:
The InductionDescriptor and RecurrenceDescriptor classes basically analyze the IR to identify the respective IVs. So, it is better to have them in the "Analysis" directory instead of the "Transforms" directory.
The rationale for this is to make the Induction and Recurrence descriptor classes available for analysis passes. Currently including them in an analysis pass produces link error (http://lists.llvm.org/pipermail/llvm-dev/2018-July/124456.html).
Induction and Recurrence descriptors are moved from Transforms/Utils/LoopUtils.h|cpp to Analysis/IVDescriptors.h|cpp.
Reviewers: dmgreen, llvm-commits, hfinkel
Reviewed By: dmgreen
Subscribers: mgorny
Differential Revision: https://reviews.llvm.org/D51153
llvm-svn: 342016
Summary:
In GNUX23, is64BitMode returns true, but pointers are 32-bits. So we shouldn't copy pointer values into RSI/RDI since the widths don't match.
Fixes PR38865 despite what the title says. I think the llvm_unreachable in the copyPhysReg code tricked the optimizer and made the fatal error trigger.
Reviewers: rnk, efriedma, MatzeB, echristo
Reviewed By: efriedma
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D51893
llvm-svn: 342015
Since the outliner is a module pass, it doesn't get codegen size remarks like
the other codegen passes do. This adds size remarks *to* the outliner.
This is kind of a workaround, so it's peppered with FIXMEs; size remarks
really ought to not ever be handled by the pass itself. However, since the
outliner is the only "MachineModulePass", this works for now. Since the
entire purpose of the MachineOutliner is to produce code size savings, it
really ought to be included in codgen size remarks.
If we ever go ahead and make a MachineModulePass (say, something similar to
MachineFunctionPass), then all of this ought to be moved there.
llvm-svn: 342009
Name: op_ugt_sum
%a = add i8 %x, %y
%r = icmp ugt i8 %x, %a
=>
%notx = xor i8 %x, -1
%r = icmp ugt i8 %y, %notx
Name: sum_ult_op
%a = add i8 %x, %y
%r = icmp ult i8 %a, %x
=>
%notx = xor i8 %x, -1
%r = icmp ugt i8 %y, %notx
https://rise4fun.com/Alive/ZRxI
AFAICT, this doesn't interfere with any add-saturation patterns
because those have >1 use for the 'add'. But this should be
better for IR analysis and codegen in the basic cases.
This is another fold inspired by PR14613:
https://bugs.llvm.org/show_bug.cgi?id=14613
llvm-svn: 342004
Summary:
Previously, check-llvm on my Windows 10 workstation took about 300s to
run, and it would lock up my mouse. Now, it takes just under 60s.
Previously running the tests only used about 15% of the available CPU
time, now it uses up to 60%.
Shell32.dll and ole32.dll have direct dependencies on user32.dll and
gdi32.dll. These DLLs are mostly used to for Windows GUI functionality,
which most LLVM tools don't need. It seems that these DLLs acquire and
release some system resources on startup and exit, and that requires
acquiring the same highly contended lock discussed in this post:
https://randomascii.wordpress.com/2017/07/09/24-core-cpu-and-i-cant-move-my-mouse/
Most LLVM tools still have a transitive dependency on
SHGetKnownFolderPathW, which is used to look up the user home directory
and local AppData directory. We also use SHFileOperationW to recursively
delete a directory, but only LLDB and clang-doc use this today. At some
point, we might consider removing these last shell32.dll deps, but for
now, I would like to take this free performance.
Many thanks to Bruce Dawson for suggesting this fix. He looked at an ETW
trace of an LLVM test run, noticed these DLLs, and suggested avoiding
them.
Reviewers: zturner, pcc, thakis
Subscribers: mgorny, llvm-commits, hiraditya
Differential Revision: https://reviews.llvm.org/D51952
llvm-svn: 342002
This reverts rL341954.
The builder `sanitizer-x86_64-linux-bootstrap-ubsan` has been
failing with timeouts at stage2 clang/ubsan:
[3065/3073] Linking CXX executable bin/lld
command timed out: 1200 seconds without output running python
../sanitizer_buildbot/sanitizers/buildbot_selector.py,
attempting to kill
llvm-svn: 342001
Summary:
There are two registers encoded in the S_FRAMEPROC flags: one for locals
and one for parameters. The encoding is described by the
ExpandEncodedBasePointerReg function in cvinfo.h. Two bits are used to
indicate one of four possible values:
0: no register - Used when there are no variables.
1: SP / standard - Variables are stored relative to the standard SP
for the ISA.
2: FP - Variables are addressed relative to the ISA frame
pointer, i.e. EBP on x86. If realignment is required, parameters
use this. If a dynamic alloca is used, locals will be EBP relative.
3: Alternative - Variables are stored relative to some alternative
third callee-saved register. This is required to address highly
aligned locals when there are dynamic stack adjustments. In this
case, both the incoming SP saved in the standard FP and the current
SP are at some dynamic offset from the locals. LLVM uses ESI in
this case, MSVC uses EBX.
Most of the changes in this patch are to pass around the CPU so that we
can decode these into real, named architectural registers.
Subscribers: hiraditya
Differential Revision: https://reviews.llvm.org/D51894
llvm-svn: 341999
In this diff we adjust the code of getSymbols to avoid creating LLVMContext when it's not necessary.
Without this patch when the function getSymbols was called on a MachO object with a __bitcode section
it was parsing the embedded bitcode and then ignoring the result.
Test plan: make check-all
Differential revision: https://reviews.llvm.org/D51759
llvm-svn: 341998
These are the folds in Alive;
Name: xor_ult
Pre: isPowerOf2(-C1)
%xor = xor i8 %x, C1
%r = icmp ult i8 %xor, C1
=>
%r = icmp ugt i8 %x, ~C1
Name: xor_ugt
Pre: isPowerOf2(C1+1)
%xor = xor i8 %x, C1
%r = icmp ugt i8 %xor, C1
=>
%r = icmp ugt i8 %x, C1
https://rise4fun.com/Alive/Vty
The ugt case in its simplest form was already handled by DemandedBits,
but that's not ideal as shown in the multi-use test.
I'm not sure if these are all of the symmetrical folds, but I adjusted
the existing code for one of the folds to try to show the similarities.
There's no obvious connection, but this is another preliminary step
for PR14613...
https://bugs.llvm.org/show_bug.cgi?id=14613
llvm-svn: 341997
Summary: Initial support for nsw, nuw and exact flags in MI
Reviewers: spatel, hfinkel, wristow
Reviewed By: spatel
Subscribers: nlopes
Differential Revision: https://reviews.llvm.org/D51738
llvm-svn: 341996
Fixes at_file.c test failure caused by r341988. We may want to change
how we treat \n in our tokenizer, but this is probably a good fix
regardless, since we can invoke all kinds of programs with different
interpretations of the command line quoting rules.
llvm-svn: 341992
Summary:
Shell32.dll depends on gdi32.dll and user32.dll, which are mostly DLLs
for Windows GUI functionality. LLVM's utilities don't typically need GUI
functionality, and loading these DLLs seems to be slowing down startup.
Also, we already have an implementation of Windows command line
tokenization in cl::TokenizeWindowsCommandLine, so we can just use it.
The goal is to get the original argv in UTF-8, so that it can pass
through most LLVM string APIs. A Windows process starts life with a
UTF-16 string for its command line, and it can be retreived with
GetCommandLineW from kernel32.dll.
Previously, we would:
1. Get the wide command line
2. Call CommandLineToArgvW to handle quoting rules and separate it into
arguments.
3. For each wide argument, expand wildcards (* and ?) using
FindFirstFileW.
4. Convert each argument to UTF-8
Now we:
1. Get the wide command line, convert the whole thing to UTF-8
2. Tokenize the UTF-8 command line with cl::TokenizeWindowsCommandLine
3. For each argument, expand wildcards if present
- This requires converting back to UTF-16 to call FindFirstFileW
- Results of FindFirstFileW must be converted back to UTF-8
Reviewers: zturner
Subscribers: hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D51941
llvm-svn: 341988
Summary:
Place global arrays in comdat sections with their associated functions.
This makes sure they are stripped along with the functions they
reference, even on the BFD linker.
Reviewers: eugenis
Reviewed By: eugenis
Subscribers: eraman, hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D51902
llvm-svn: 341987
Summary:
Update MemorySSA in old LoopUnswitch pass.
Actual dependency and update is disabled by default.
Subscribers: sanjoy, jlebar, Prazek, llvm-commits
Differential Revision: https://reviews.llvm.org/D45301
llvm-svn: 341984
into TargetParser.
Also switch away from target features to CPU string when
determining isa version. This fixes an issue when we
output wrong isa version in the object code when features
of a particular CPU are altered (i.e. gfx902 w/o xnack
used to result in gfx900).
Differential Revision: https://reviews.llvm.org/D51890
llvm-svn: 341982
I noticed that we were not back-propagating undef lanes to shuffle masks when we have a
shuffle that reduces the vector width. This is part of investigating/solving PR38691:
https://bugs.llvm.org/show_bug.cgi?id=38691
The DAG equivalent was proposed with:
D51696
Differential Revision: https://reviews.llvm.org/D51433
llvm-svn: 341981
Right now, the counters are added in regards of the number of successors
for a given BasicBlock: it's good when we've only 1 or 2 successors (at
least with BranchInstr). But in the case of a switch statement, the
BasicBlock after switch has several predecessors and we need know from
which BB we're coming from.
So the idea is to revert what we're doing: add a PHINode in each block
which will select the counter according to the incoming BB. They're
several pros for doing that:
- we fix the "switch" bug
- we remove the function call to "__llvm_gcov_indirect_counter_increment"
and the lookup table stuff
- we replace by PHINodes, so the optimizer will probably makes a better
job.
Patch by calixte!
Differential Revision: https://reviews.llvm.org/D51619
llvm-svn: 341977
Summary: Add function name when verification fails as an initial breadcrumb for debugging.
Patch by David Callahan.
Reviewers: mehdi_amini, modocache
Reviewed By: modocache
Subscribers: llvm-commits, modocache
Differential Revision: https://reviews.llvm.org/D51386
llvm-svn: 341974
In r337348, I changed lowering to prefer X86ISD::UNPCKL/UNPCKH opcodes over MOVLHPS/MOVHLPS for v2f64 {0,0} and {1,1} shuffles when we have SSE2. This enabled the removal of a bunch of weirdly bitcasted isel patterns in r337349. To avoid changing the tests I placed a gross hack in isel to still emit movhlps instructions for fake unary unpckh nodes. A similar hack was not needed for unpckl and movlhps because we do execution domain switching for those. But unpckh and movhlps have swapped operand order.
This patch removes the hack.
This is a code size increase since unpckhpd requires a 0x66 prefix and movhlps does not. But if that's a big concern we should be using movhlps for all unpckhpd opcodes and let commuteInstruction turnit into unpckhpd when its an advantage.
Differential Revision: https://reviews.llvm.org/D49499
llvm-svn: 341973
GNUX32 uses 32-bit pointers despite is64BitMode being true. So we should use EAX to return the value.
Fixes ones of the failures from PR38865.
Differential Revision: https://reviews.llvm.org/D51940
llvm-svn: 341972
For vectors, getPrimitiveSizeInBits returns the full vector width. This code should using the element size for vectors. This could be fixed by calling getScalarSizeInBits, but its even easier to just get it from the APInt we're checking.
Differential Revision: https://reviews.llvm.org/D51938
llvm-svn: 341971
There are 2 cases when we create PHI nodes:
* For the result of the call that was duplicated in the split blocks.
Those PHI nodes should have the debug location of the call.
* For values produced before the call. Those instructions need to be
duplicated in the split blocks and the PHI nodes should have the
debug locations of those instructions.
Fixes PR37962.
Reviewers: junbuml, gbedwell, vsk
Reviewed By: junbuml
Tags: #debug-info
Differential Revision: https://reviews.llvm.org/D51919
llvm-svn: 341970