Commit Graph

29897 Commits

Author SHA1 Message Date
Hongtao Yu b035513c06 [CSSPGO] Pseudo probe encoding and emission.
This change implements pseudo probe encoding and emission for CSSPGO. Please see RFC here for more context: https://groups.google.com/g/llvm-dev/c/1p1rdYbL93s

Pseudo probes are in the form of intrinsic calls on IR/MIR but they do not turn into any machine instructions. Instead they are emitted into the binary as a piece of data in standalone sections.  The probe-specific sections are not needed to be loaded into memory at execution time, thus they do not incur a runtime overhead. 

**ELF object emission**

The binary data to emit are organized as two ELF sections, i.e, the `.pseudo_probe_desc` section and the `.pseudo_probe` section. The `.pseudo_probe_desc` section stores a function descriptor for each function and the `.pseudo_probe` section stores the actual probes, each fo which corresponds to an IR basic block or an IR function callsite. A function descriptor is stored as a module-level metadata during the compilation and is serialized into the object file during object emission.

Both the probe descriptors and pseudo probes can be emitted into a separate ELF section per function to leverage the linker for deduplication.  A `.pseudo_probe` section shares the same COMDAT group with the function code so that when the function is dead, the probes are dead and disposed too. On the contrary, a `.pseudo_probe_desc` section has its own COMDAT group. This is because even if a function is dead, its probes may be inlined into other functions and its descriptor is still needed by the profile generation tool.

The format of `.pseudo_probe_desc` section looks like:

```
.section   .pseudo_probe_desc,"",@progbits
.quad   6309742469962978389  // Func GUID
.quad   4294967295           // Func Hash
.byte   9                    // Length of func name
.ascii  "_Z5funcAi"          // Func name
.quad   7102633082150537521
.quad   138828622701
.byte   12
.ascii  "_Z8funcLeafi"
.quad   446061515086924981
.quad   4294967295
.byte   9
.ascii  "_Z5funcBi"
.quad   -2016976694713209516
.quad   72617220756
.byte   7
.ascii  "_Z3fibi"
```

For each `.pseudoprobe` section, the encoded binary data consists of a single function record corresponding to an outlined function (i.e, a function with a code entry in the `.text` section). A function record has the following format :

```
FUNCTION BODY (one for each outlined function present in the text section)
    GUID (uint64)
        GUID of the function
    NPROBES (ULEB128)
        Number of probes originating from this function.
    NUM_INLINED_FUNCTIONS (ULEB128)
        Number of callees inlined into this function, aka number of
        first-level inlinees
    PROBE RECORDS
        A list of NPROBES entries. Each entry contains:
          INDEX (ULEB128)
          TYPE (uint4)
            0 - block probe, 1 - indirect call, 2 - direct call
          ATTRIBUTE (uint3)
            reserved
          ADDRESS_TYPE (uint1)
            0 - code address, 1 - address delta
          CODE_ADDRESS (uint64 or ULEB128)
            code address or address delta, depending on ADDRESS_TYPE
    INLINED FUNCTION RECORDS
        A list of NUM_INLINED_FUNCTIONS entries describing each of the inlined
        callees.  Each record contains:
          INLINE SITE
            GUID of the inlinee (uint64)
            ID of the callsite probe (ULEB128)
          FUNCTION BODY
            A FUNCTION BODY entry describing the inlined function.
```

To support building a context-sensitive profile, probes from inlinees are grouped by their inline contexts. An inline context is logically a call path through which a callee function lands in a caller function. The probe emitter builds an inline tree based on the debug metadata for each outlined function in the form of a trie tree. A tree root is the outlined function. Each tree edge stands for a callsite where inlining happens. Pseudo probes originating from an inlinee function are stored in a tree node and the tree path starting from the root all the way down to the tree node is the inline context of the probes. The emission happens on the whole tree top-down recursively. Probes of a tree node will be emitted altogether with their direct parent edge. Since a pseudo probe corresponds to a real code address, for size savings, the address is encoded as a delta from the previous probe except for the first probe. Variant-sized integer encoding, aka LEB128, is used for address delta and probe index.

**Assembling**

Pseudo probes can be printed as assembly directives alternatively. This allows for good assembly code readability and also provides a view of how optimizations and pseudo probes affect each other, especially helpful for diff time assembly analysis.

A pseudo probe directive has the following operands in order: function GUID, probe index, probe type, probe attributes and inline context. The directive is generated by the compiler and can be parsed by the assembler to form an encoded `.pseudoprobe` section in the object file.

A example assembly looks like:

```
foo2: # @foo2
# %bb.0: # %bb0
pushq %rax
testl %edi, %edi
.pseudoprobe 837061429793323041 1 0 0
je .LBB1_1
# %bb.2: # %bb2
.pseudoprobe 837061429793323041 6 2 0
callq foo
.pseudoprobe 837061429793323041 3 0 0
.pseudoprobe 837061429793323041 4 0 0
popq %rax
retq
.LBB1_1: # %bb1
.pseudoprobe 837061429793323041 5 1 0
callq *%rsi
.pseudoprobe 837061429793323041 2 0 0
.pseudoprobe 837061429793323041 4 0 0
popq %rax
retq
# -- End function
.section .pseudo_probe_desc,"",@progbits
.quad 6699318081062747564
.quad 72617220756
.byte 3
.ascii "foo"
.quad 837061429793323041
.quad 281547593931412
.byte 4
.ascii "foo2"
```

With inlining turned on, the assembly may look different around %bb2 with an inlined probe:

```
# %bb.2:                                # %bb2
.pseudoprobe    837061429793323041 3 0
.pseudoprobe    6699318081062747564 1 0 @ 837061429793323041:6
.pseudoprobe    837061429793323041 4 0
popq    %rax
retq
```

**Disassembling**

We have a disassembling tool (llvm-profgen) that can display disassembly alongside with pseudo probes. So far it only supports ELF executable file.

An example disassembly looks like:

```
00000000002011a0 <foo2>:
  2011a0: 50                    push   rax
  2011a1: 85 ff                 test   edi,edi
  [Probe]:  FUNC: foo2  Index: 1  Type: Block
  2011a3: 74 02                 je     2011a7 <foo2+0x7>
  [Probe]:  FUNC: foo2  Index: 3  Type: Block
  [Probe]:  FUNC: foo2  Index: 4  Type: Block
  [Probe]:  FUNC: foo   Index: 1  Type: Block  Inlined: @ foo2:6
  2011a5: 58                    pop    rax
  2011a6: c3                    ret
  [Probe]:  FUNC: foo2  Index: 2  Type: Block
  2011a7: bf 01 00 00 00        mov    edi,0x1
  [Probe]:  FUNC: foo2  Index: 5  Type: IndirectCall
  2011ac: ff d6                 call   rsi
  [Probe]:  FUNC: foo2  Index: 4  Type: Block
  2011ae: 58                    pop    rax
  2011af: c3                    ret
```

Reviewed By: wmi

Differential Revision: https://reviews.llvm.org/D91878
2020-12-10 09:50:08 -08:00
Craig Topper a1ae3c6ac9 [RISCV][LegalizeDAG] Expand SETO and SETUO comparisons. Teach LegalizeDAG to expand SETUO expansion when UNE isn't legal.
If SETUNE isn't legal, UO can use the NOT of the SETO expansion.

Removes some complex isel patterns. Most of the test changes are
from using XORI instead of SEQZ.

Differential Revision: https://reviews.llvm.org/D92008
2020-12-10 09:15:52 -08:00
David Green 0447f3508f [ARM][RegAlloc] Add t2LoopEndDec
We currently have problems with the way that low overhead loops are
specified, with LR being spilled between the t2LoopDec and the t2LoopEnd
forcing the entire loop to be reverted late in the backend. As they will
eventually become a single instruction, this patch introduces a
t2LoopEndDec which is the combination of the two, combined before
registry allocation to make sure this does not fail.

Unfortunately this instruction is a terminator that produces a value
(and also branches - it only produces the value around the branching
edge). So this needs some adjustment to phi elimination and the register
allocator to make sure that we do not spill this LR def around the loop
(needing to put a spill after the terminator). We treat the loop very
carefully, making sure that there is nothing else like calls that would
break it's ability to use LR. For that, this adds a
isUnspillableTerminator to opt in the new behaviour.

There is a chance that this could cause problems, and so I have added an
escape option incase. But I have not seen any problems in the testing
that I've tried, and not reverting Low overhead loops is important for
our performance. If this does work then we can hopefully do the same for
t2WhileLoopStart and t2DoLoopStart instructions.

This patch also contains the code needed to convert or revert the
t2LoopEndDec in the backend (which just needs a subs; bne) and the code
pre-ra to create them.

Differential Revision: https://reviews.llvm.org/D91358
2020-12-10 12:14:23 +00:00
Luo, Yuanke f80b29878b [X86] AMX programming model.
This patch implements amx programming model that discussed in llvm-dev
 (http://lists.llvm.org/pipermail/llvm-dev/2020-August/144302.html).
 Thank Hal for the good suggestion in the RA. The fast RA is not in the patch yet.
 This patch implemeted 7 components.

1. The c interface to end user.
2. The AMX intrinsics in LLVM IR.
3. Transform load/store <256 x i32> to AMX intrinsics or split the
   type into two <128 x i32>.
4. The Lowering from AMX intrinsics to AMX pseudo instruction.
5. Insert psuedo ldtilecfg and build the def-use between ldtilecfg to amx
   intruction.
6. The register allocation for tile register.
7. Morph AMX pseudo instruction to AMX real instruction.

Change-Id: I935e1080916ffcb72af54c2c83faa8b2e97d5cb0

Differential Revision: https://reviews.llvm.org/D87981
2020-12-10 17:01:54 +08:00
Justin Bogner e6a1187dd8 Limit the recursion depth of SelectionDAG::isSplatValue()
This method previously always recursively checked both the left-hand
side and right-hand side of binary operations for splatted (broadcast)
vector values to determine if the parent DAG node is a splat.

Like several other SelectionDAG methods, limit the recursion depth to
MaxRecursionDepth (6). This prevents stack overflow.
See also https://issuetracker.google.com/173785481

Patch by Nicolas Capens. Thanks!

Differential Revision: https://reviews.llvm.org/D92421
2020-12-09 10:35:07 -08:00
Djordje Todorovic 163c223161 [Debuginfo] [CSInfo] Do not create CSInfo for undef arguments
If a function parameter is marked as "undef", prevent creation
of CallSiteInfo for that parameter.
Without this patch, the parameter's call_site_value would be incorrect.
The incorrect call_value case reported in PR39716,
addressed in D85111.
​
Patch by Nikola Tesic
​
Differential revision: https://reviews.llvm.org/D92471
2020-12-09 12:54:59 +01:00
Kerry McLaughlin 05edfc5475 [SVE][CodeGen] Add DAG combines for s/zext_masked_gather
This patch adds the following DAGCombines, which apply if isVectorLoadExtDesirable() returns true:
 - fold (and (masked_gather x)) -> (zext_masked_gather x)
 - fold (sext_inreg (masked_gather x)) -> (sext_masked_gather x)

LowerMGATHER has also been updated to fetch the LoadExtType associated with the
gather and also use this value to determine the correct masked gather opcode to use.

Reviewed By: sdesmalen

Differential Revision: https://reviews.llvm.org/D92230
2020-12-09 11:53:19 +00:00
Kerry McLaughlin 4519ff4b6f [SVE][CodeGen] Add the ExtensionType flag to MGATHER
Adds the ExtensionType flag, which reflects the LoadExtType of a MaskedGatherSDNode.
Also updated SelectionDAGDumper::print_details so that details of the gather
load (is signed, is scaled & extension type) are printed.

Reviewed By: sdesmalen

Differential Revision: https://reviews.llvm.org/D91084
2020-12-09 11:19:08 +00:00
Joe Ellis 80c33de2d3 [SelectionDAG] Add llvm.vector.{extract,insert} intrinsics
This commit adds two new intrinsics.

- llvm.experimental.vector.insert: used to insert a vector into another
  vector starting at a given index.

- llvm.experimental.vector.extract: used to extract a subvector from a
  larger vector starting from a given index.

The codegen work for these intrinsics has already been completed; this
commit is simply exposing the existing ISD nodes to LLVM IR.

Reviewed By: cameron.mcinally

Differential Revision: https://reviews.llvm.org/D91362
2020-12-09 11:08:41 +00:00
Simon Moll 3ffbc79357 [VP] Build VP SDNodes
Translate VP intrinsics to VP_* SDNodes.  The tests check whether a
matching vp_* SDNode is emitted.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D91441
2020-12-09 11:36:51 +01:00
Ilya Leoshkevich d58f112ce0 Prevent FENTRY_CALL reordering
FEntryInserter prepends FENTRY_CALL to the first basic block. In case
there are other instructions, PostRA Machine Instruction Scheduler can
move FENTRY_CALL call around. This actually occurs on SystemZ (see the
testcase). This is bad for the following reasons:

* FENTRY_CALL clobbers registers.
* Linux Kernel depends on whatever FENTRY_CALL expands to to be the very
  first instruction in the function.

Fix by adding isCall attribute to FENTRY_CALL, which prevents reordering
by making it a scheduling boundary for PostRA Machine Instruction
Scheduler.

Reviewed By: niravd

Differential Revision: https://reviews.llvm.org/D91218
2020-12-09 00:59:01 +01:00
Anna Thomas 29356e3279 [ScalarizeMaskedMemIntrin] Add new PM support
This patch adds new PM support for the pass and the pass can be now used
during middle-end transforms. The old pass is remamed to
ScalarizeMaskedMemIntrinLegacyPass.

Reviewed-By: skatkov, aeubanks
Differential Revision: https://reviews.llvm.org/D92743
2020-12-08 17:15:22 -05:00
Chih-Ping Chen 1f67247eea [DebugInfo] Add handling of stringLengthExp operand of DIStringType.
This patch makes DWARF writer emit DW_AT_string_length using
the stringLengthExp operand of DIStringType.

This is part of the effort to add debug info support for
Fortran deferred length strings.

Also updated the tests to exercise the change.

Differential Revision: https://reviews.llvm.org/D92412
2020-12-08 14:49:59 -05:00
Huihui Zhang 8e6fc1f97e [AArch64][SVE] Add lowering for llvm.maxnum|minnum for scalable type.
LLVM intrinsic llvm.maxnum|minnum is overloaded intrinsic, can be used on any
floating-point or vector of floating-point type.
This patch extends current infrastructure to support scalable vector type.

This patch also fix a warning message of incorrect use of EVT::getVectorNumElements()
for scalable type, when DAGCombiner trying to split scalable vector.

Reviewed By: sdesmalen

Differential Revision: https://reviews.llvm.org/D92607
2020-12-08 09:35:53 -08:00
Anna Thomas 09f2f9605f [ScalarizeMaskedMemIntrinsic] Move from CodeGen into Transforms
ScalarizeMaskedMemIntrinsic is currently a codeGen level pass. The pass
is actually operating on IR level and does not use any code gen specific
passes.  It is useful to move it into transforms directory so that it
can be more widely used as a mid-level transform as well (apart from
usage in codegen pipeline).
In particular, we have a usecase downstream where we would like to use
this pass in our mid-level pipeline which operates on IR level.

The next change will be to add support for new PM.

Reviewers: craig.topper, apilipenko, skatkov
Reviewed-By: skatkov
Differential Revision: https://reviews.llvm.org/D92407
2020-12-08 12:25:58 -05:00
David Sherwood e22259fafe [SVE] Remove duplicate assert in DAGTypeLegalizer::SplitVecOp_EXTRACT_SUBVECTOR 2020-12-08 14:41:14 +00:00
Pan, Tao 7af802994e [CodeGen] Add text section prefix for COFF object file
Text section prefix is created in CodeGenPrepare, it's file format independent implementation,  text section name is written into object file in TargetLoweringObjectFile, it's file format dependent implementation, port code of adding text section prefix to text section name from ELF to COFF.
Different with ELF that use '.' as concatenation character, COFF use '$' as concatenation character. That is, concatenation character is variable, so split concatenation character from text section prefix.
Text section prefix is existing feature of ELF, it can help to reduce icache and itlb misses, it's also make possible aggregate other compilers e.g. v8 created same prefix sections. Furthermore, the recent feature Machine Function Splitter (basic block level text prefix section) is based on text section prefix.

Reviewed By: pengfei, rnk

Differential Revision: https://reviews.llvm.org/D92073
2020-12-08 18:56:21 +08:00
Tim Northover c5978f42ec UBSAN: emit distinctive traps
Sometimes people get minimal crash reports after a UBSAN incident. This change
tags each trap with an integer representing the kind of failure encountered,
which can aid in tracking down the root cause of the problem.
2020-12-08 10:28:26 +00:00
Kai Luo 44bd8ea167 [DAGCombine][PowerPC] Simplify nabs by using legal `smin` operation
Convert `0 - abs(x)` to `smin (x, -x)` if `smin` is a legal operation.

Verification: https://alive2.llvm.org/ce/z/vpquFR

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D92637
2020-12-08 03:24:07 +00:00
Simon Pilgrim b6e847c396 [DAG] Cleanup by folding some single use VT.getScalarSizeInBits() calls into its comparison. NFCI. 2020-12-07 18:23:54 +00:00
Kerry McLaughlin 111f559bbd [SVE][CodeGen] Call refineIndexType & refineUniformBase from visitMGATHER
The refineIndexType & refineUniformBase functions added by D90942 can also be used to
improve CodeGen of masked gathers.

These changes were split out from D91092

Reviewed By: sdesmalen

Differential Revision: https://reviews.llvm.org/D92319
2020-12-07 13:20:19 +00:00
Kerry McLaughlin f6dd32fd35 [SVE][CodeGen] Lower scalable masked gathers
Lowers the llvm.masked.gather intrinsics (scalar plus vector addressing mode only)

Changes in this patch:
- Add custom lowering for MGATHER, using getGatherVecOpcode() to choose the appropriate
  gather load opcode to use.
- Improve codegen with refineIndexType/refineUniformBase, added in D90942
- Tests added for gather loads with 32 & 64-bit scaled & unscaled offsets.

Reviewed By: sdesmalen

Differential Revision: https://reviews.llvm.org/D91092
2020-12-07 12:20:41 +00:00
Martin Storsjö 78a57069b5 [CodeGen] Restore accessing __stack_chk_guard via a .refptr stub on mingw after 2518433f86
Add tests for this particular detail for x86 and arm (similar tests
already existed for x86_64 and aarch64).

The libssp implementation may be located in a separate DLL, and in
those cases, the references need to be in a .refptr stub, to avoid
needing to touch up code in the text section at runtime (which is
supported but inefficient for x86, and unsupported for arm).

Differential Revision: https://reviews.llvm.org/D92738
2020-12-07 09:35:12 +02:00
Bing1 Yu eee30a6dce [CodeGen] Modify the refineIndexType(...)'s code to fix a bug in D90942.
In previous code, when refineIndexType(...) is called and Index is undef, Index.getOperand(0) will raise a assertion fail.

Reviewed By: pengfei

Differential Revision: https://reviews.llvm.org/D92548
2020-12-07 08:49:07 +08:00
Fangrui Song 2d03c8e2c8 [CodeGen] Delete 4 unused declarations 2020-12-06 15:02:18 -08:00
Fangrui Song 0e0d616fa2 [CodeGen] Delete 15 unused declarations
Notes about a few declarations:

* LiveVariables::RegisterDefIsDead: deleted by r47927
* createForwardControlFlowIntegrityPass, createJumpInstrTablesPass: deleted by r230780
* RegScavenger::setLiveInsUsed: deleted by r292543
* ScheduleDAGInstrs::{toggleKillFlag,startBlockForKills}: deleted by r304055
* Localizer::shouldLocalize: remnant of D75207
* DwarfDebug::addSectionLabel: deleted by r373273
2020-12-06 14:55:04 -08:00
Layton Kifer ac522f8700 [DAGCombiner] Fold (sext (not i1 x)) -> (add (zext i1 x), -1)
Move fold of (sext (not i1 x)) -> (add (zext i1 x), -1) from X86 to DAGCombiner to improve codegen on other targets.

Differential Revision: https://reviews.llvm.org/D91589
2020-12-06 11:52:10 -05:00
Kazu Hirata a553ac9791 [CodeGen] llvm::erase_if (NFC) 2020-12-05 15:44:40 -08:00
Fangrui Song 2518433f86 Make __stack_chk_guard dso_local if Reloc::Static
This is currently implied by TargetMachine::shouldAssumeDSOLocal
but will be changed in the future.
2020-12-04 16:57:45 -08:00
Simon Pilgrim 9cf4f493a7 [DAG] Move SelectionDAG implementation to KnownBits::setInReg(). NFCI. 2020-12-04 18:09:08 +00:00
Simon Pilgrim 6f4ee6f870 [DAGCombiner] Use const APInt& for getConstantOperandAPInt results. NFCI.
Avoid unnecessary instantiation.

Noticed while removing unnecessary autos
2020-12-04 09:44:58 +00:00
Arthur Eubanks 2f0de58294 [NewPM] Support --print-before/after in NPM
This changes --print-before/after to be a list of strings rather than
legacy passes. (this also has the effect of not showing the entire list
of passes in --help-hidden after --print-before/after, which IMO is
great for making it less verbose).

Currently PrintIRInstrumentation passes the class name rather than pass
name to llvm::shouldPrintBeforePass(), meaning
llvm::shouldPrintBeforePass() never functions as intended in the NPM.
There is no easy way of converting class names to pass names outside of
within an instance of PassBuilder.

This adds a map of pass class names to their short names in
PassRegistry.def within PassInstrumentationCallbacks. It is populated
inside the constructor of PassBuilder, which takes a
PassInstrumentationCallbacks.

Add a pointer to PassInstrumentationCallbacks inside
PrintIRInstrumentation and use the newly created map.

This is a bit hacky, but I can't think of a better way since the short
id to class name only exists within PassRegistry.def. This also doesn't
handle passes not in PassRegistry.def but rather added via
PassBuilder::registerPipelineParsingCallback().

llvm/test/CodeGen/Generic/print-after.ll doesn't seem very useful now
with this change.

Reviewed By: ychen, jamieschmeiser

Differential Revision: https://reviews.llvm.org/D87216
2020-12-03 16:52:14 -08:00
Anna Thomas fb2e109d45 [ScalarizeMaskedMemIntrin] NFC: Pass args by reference 2020-12-03 14:04:21 -05:00
Anna Thomas f86ec1e1fc [ScalarizeMaskedMemIntrin] NFC: Convert member functions to static
This will make it easier to add new PM support once the pass is moved
into transforms (D92407).
2020-12-03 11:46:38 -05:00
dfukalov 2ce38b3f03 [NFC] Reduce include files dependency.
1. Removed #include "...AliasAnalysis.h" in other headers and modules.
2. Cleaned up includes in AliasAnalysis.h.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D92489
2020-12-03 18:25:05 +03:00
Joe Ellis 78c0ea54a2 [DAGCombine] Fix TypeSize warning in DAGCombine::visitLIFETIME_END
Bail out early if we encounter a scalable store.

Reviewed By: peterwaller-arm

Differential Revision: https://reviews.llvm.org/D92392
2020-12-03 12:12:41 +00:00
Kazu Hirata 7a4af2a8e7 [SelectionDAG] Use is_contained (NFC) 2020-12-02 19:09:45 -08:00
Hsiangkai Wang f7bc7c2981 [RISCV] Support Zfh half-precision floating-point extension.
Support "Zfh" extension according to
https://github.com/riscv/riscv-isa-manual/blob/zfh/src/zfh.tex

Differential Revision: https://reviews.llvm.org/D90738
2020-12-03 09:16:33 +08:00
Mircea Trofin bab72dd5d5 [NFC][MC] TargetRegisterInfo::getSubReg is a MCRegister.
Typing the API appropriately.

Differential Revision: https://reviews.llvm.org/D92341
2020-12-02 15:46:38 -08:00
Hongtao Yu 24d4291ca7 [CSSPGO] Pseudo probes for function calls.
An indirect call site needs to be probed for its potential call targets. With CSSPGO a direct call also needs a probe so that a calling context can be represented by a stack of callsite probes. Unlike pseudo probes for basic blocks that are in form of standalone intrinsic call instructions, pseudo probes for callsites have to be attached to the call instruction, thus a separate instruction would not work.

One possible way of attaching a probe to a call instruction is to use a special metadata that carries information about the probe. The special metadata will have to make its way through the optimization pipeline down to object emission. This requires additional efforts to maintain the metadata in various places. Given that the `!dbg` metadata is a first-class metadata and has all essential support in place , leveraging the `!dbg` metadata as a channel to encode pseudo probe information is probably the easiest solution.

With the requirement of not inflating `!dbg` metadata that is allocated for almost every instruction, we found that the 32-bit DWARF discriminator field which mainly serves AutoFDO can be reused for pseudo probes. DWARF discriminators distinguish identical source locations between instructions and with pseudo probes such support is not required. In this change we are using the discriminator field to encode the ID and type of a callsite probe and the encoded value will be unpacked and consumed right before object emission. When a callsite is inlined, the callsite discriminator field will go with the inlined instructions. The `!dbg` metadata of an inlined instruction is in form of a scope stack. The top of the stack is the instruction's original `!dbg` metadata and the bottom of the stack is for the original callsite of the top-level inliner. Except for the top of the stack, all other elements of the stack actually refer to the nested inlined callsites whose discriminator field (which actually represents a calliste probe) can be used together to represent the inline context of an inlined PseudoProbeInst or CallInst.

To avoid collision with the baseline AutoFDO in various places that handles dwarf discriminators where a check against  the `-pseudo-probe-for-profiling` switch is not available, a special encoding scheme is used to tell apart a pseudo probe discriminator from a regular discriminator. For the regular discriminator, if all lowest 3 bits are non-zero, it means the discriminator is basically empty and all higher 29 bits can be reversed for pseudo probe use.

Callsite pseudo probes are inserted in `SampleProfileProbePass` and a target-independent MIR pass `PseudoProbeInserter` is added to unpack the probe ID/type from `!dbg`.

Note that with this work the switch -debug-info-for-profiling will not work with -pseudo-probe-for-profiling anymore. They cannot be used at the same time.

Reviewed By: wmi

Differential Revision: https://reviews.llvm.org/D91756
2020-12-02 13:45:20 -08:00
jasonliu 2c63e7604c [XCOFF][AIX] Alternative path in EHStreamer for platforms do not have uleb128 support
Summary:
Not all system assembler supports `.uleb128 label2 - label1` form.
When the target do not support this form, we have to take
alternative manual calculation to get the offsets from them.

Reviewed By: hubert.reinterpretcast

Diffierential Revision: https://reviews.llvm.org/D92058
2020-12-02 20:03:15 +00:00
Nick Desaulniers bc044a88ee [Inline] prevent inlining on stack protector mismatch
It's common for code that manipulates the stack via inline assembly or
that has to set up its own stack canary (such as the Linux kernel) would
like to avoid stack protectors in certain functions. In this case, we've
been bitten by numerous bugs where a callee with a stack protector is
inlined into an attribute((no_stack_protector)) caller, which
generally breaks the caller's assumptions about not having a stack
protector. LTO exacerbates the issue.

While developers can avoid this by putting all no_stack_protector
functions in one translation unit together and compiling those with
-fno-stack-protector, it's generally not very ergonomic or as
ergonomic as a function attribute, and still doesn't work for LTO. See also:
https://lore.kernel.org/linux-pm/20200915172658.1432732-1-rkir@google.com/
https://lore.kernel.org/lkml/20200918201436.2932360-30-samitolvanen@google.com/T/#u

SSP attributes can be ordered by strength. Weakest to strongest, they
are: ssp, sspstrong, sspreq.  Callees with differing SSP attributes may be
inlined into each other, and the strongest attribute will be applied to the
caller. (No change)

After this change:
* A callee with no SSP attributes will no longer be inlined into a
  caller with SSP attributes.
* The reverse is also true: a callee with an SSP attribute will not be
  inlined into a caller with no SSP attributes.
* The alwaysinline attribute overrides these rules.

Functions that get synthesized by the compiler may not get inlined as a
result if they are not created with the same stack protector function
attribute as their callers.

Alternative approach to https://reviews.llvm.org/D87956.

Fixes pr/47479.

Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>

Reviewed By: rnk, MaskRay

Differential Revision: https://reviews.llvm.org/D91816
2020-12-02 11:00:16 -08:00
jasonliu a65d8c5d72 [XCOFF][AIX] Generate LSDA data and compact unwind section on AIX
Summary:
AIX uses the existing EH infrastructure in clang and llvm.
The major differences would be
1. AIX do not have CFI instructions.
2. AIX uses a new personality routine, named __xlcxx_personality_v1.
   It doesn't use the GCC personality rountine, because the
   interoperability is not there yet on AIX.
3. AIX do not use eh_frame sections. Instead, it would use a eh_info
section (compat unwind section) to store the information about
personality routine and LSDA data address.

Reviewed By: daltenty, hubert.reinterpretcast

Differential Revision: https://reviews.llvm.org/D91455
2020-12-02 18:42:44 +00:00
James Park 78b0ec3d1c Avoid redundant inline with LLVM_ATTRIBUTE_ALWAYS_INLINE
Fix MSVC warning when __forceinline is paired with inline.

Reviewed By: dblaikie

Differential Revision: https://reviews.llvm.org/D85264
2020-12-01 14:43:16 -08:00
David Blaikie 615f63e149 Revert "[FastISel] Flush local value map on ever instruction" and dependent patches
This reverts commit cf1c774d6a.

This change caused several regressions in the gdb test suite - at least
a sample of which was due to line zero instructions making breakpoints
un-lined. I think they're worth investigating/understanding more (&
possibly addressing) before moving forward with this change.

Revert "[FastISel] NFC: Clean up unnecessary bookkeeping"
This reverts commit 3fd39d3694.

Revert "[FastISel] NFC: Remove obsolete -fast-isel-sink-local-values option"
This reverts commit a474657e30.

Revert "Remove static function unused after cf1c774."
This reverts commit dc35368ccf.

Revert "[lldb] Fix TestThreadStepOut.py after "Flush local value map on every instruction""
This reverts commit 53a14a47ee.
2020-12-01 14:26:23 -08:00
Layton Kifer d7fec38f05
[DAGCombiner][NFC] Replace duplicate implementation flipBoolean with DAG.getLogicalNOT
Reviewed By: lebedev.ri

Differential Revision: https://reviews.llvm.org/D92246
2020-12-01 22:23:04 +03:00
Fangrui Song a5309438fe static const char *const foo => const char foo[]
By default, a non-template variable of non-volatile const-qualified type
having namespace-scope has internal linkage, so no need for `static`.
2020-12-01 10:33:18 -08:00
Benjamin Kramer 107e92dff8 [DAG] Remove unused variable. NFC. 2020-12-01 16:29:02 +01:00
Simon Pilgrim 1b209ff9e3 [DAG] Move vselect(icmp_ult, 0, sub(x,y)) -> usubsat(x,y) to DAGCombine (PR40111)
Move the X86 VSELECT->USUBSAT fold to DAGCombiner - there's nothing target specific about these folds.
2020-12-01 14:25:29 +00:00
Simon Pilgrim 6dbd0d36a1 [DAG] Move vselect(icmp_ult, -1, add(x,y)) -> uaddsat(x,y) to DAGCombine (PR40111)
Move the X86 VSELECT->UADDSAT fold to DAGCombiner - there's nothing target specific about these folds.

The SSE42 test diffs are relatively benign - its avoiding an extra constant load in exchange for an extra xor operation - there are extra register moves, which is annoying as all those operations should commute them away.

Differential Revision: https://reviews.llvm.org/D91876
2020-12-01 11:56:26 +00:00
Kazu Hirata e785379aff [CodeView] Remove unused declaration collectInlineSiteChildren (NFC)
The function definition was removed on Sep 7, 2016 in commit
a9f4cc9510.  The declaration seems to be
unused since then.
2020-11-30 22:28:26 -08:00
Hendrik Greving d4ba5e15f4 Add MachineModuleInfo constructor with external MCContext
Adds a constructor to MachineModuleInfo and MachineModuleInfoWapperPass that
takes an external MCContext. If provided, the external context will be used
throughout codegen instead of MMI's default one.

This enables external drivers to take ownership of data put on the MMI's context
during codegen. The internal context is used otherwise and destroyed upon
finish.

Differential Revision: https://reviews.llvm.org/D91313
2020-11-30 20:28:13 -08:00
Fangrui Song d928dfc6f9 [GlobalISel] Fix -Wunused-variable in -DLLVM_ENABLE_ASSERTIONS=off builds 2020-11-30 18:31:42 -08:00
Fangrui Song 36fe1a9dea [GlobalISel] Fix -Wunused-variable 2020-11-30 18:25:54 -08:00
Amara Emerson 87ff156414 [AArch64][GlobalISel] Fix crash during legalization of a vector G_SELECT with scalar mask.
The lowering of vector selects needs to first splat the scalar mask into a vector
first.

This was causing a crash when building oggenc in the test suite.

Differential Revision: https://reviews.llvm.org/D91655
2020-11-30 16:37:49 -08:00
Paul Robinson 3fd39d3694 [FastISel] NFC: Clean up unnecessary bookkeeping
Now that we flush the local value map for every instruction, we don't
need any extra flushes for specific cases.  Also, LastFlushPoint is
not used for anything.  Follow-ups to #dc35368 (D91734).

Differential Revision: https://reviews.llvm.org/D92338
2020-11-30 12:27:50 -08:00
Matt Arsenault 29bd6519d2 SplitKit: Use Register 2020-11-30 15:09:33 -05:00
Paul Robinson a474657e30 [FastISel] NFC: Remove obsolete -fast-isel-sink-local-values option
This option is not used for anything after #dc35368 (D91734).
2020-11-30 10:55:49 -08:00
Francesco Petrogalli f6150aa41a [SelectionDAGBuilder] Update signature of `getRegsAndSizes()`.
The mapping between registers and relative size has been updated to
use TypeSize to account for the size of scalable EVTs.

The patch is a NFCI, if not for the fact that with this change the
function `getUnderlyingArgRegs` does not raise a warning for implicit
conversion of `TypeSize` to `unsigned` when generating machine code
from the test added to the patch.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D92096
2020-11-30 17:38:51 +00:00
Craig Topper fa0f01a3c0 [RISCV][LegalizeTypes] Teach type legalizer that it can promote UMIN/UMAX using SExtPromotedInteger if that's better for the target.
If Sext is cheaper than Zext for a target, we can use that to promote the operands of UMIN/UMAX. Using sext just makes numbers with the sign bit set even larger when treated as an unsigned number and it has no effect on number without the sign bit set. So the relative order doesn't change. This is similar to what we already do for promoting SETCC.

This is helpful on RISCV where i32 arguments are sign extended on RV64 and many instructions are able to produce results with 33 sign bits.

Differential Revision: https://reviews.llvm.org/D92128
2020-11-27 11:37:25 -08:00
Simon Pilgrim 969918e177 [DAG] Legalize umin(x,y) -> sub(x,usubsat(x,y)) and umax(x,y) -> add(x,usubsat(y,x)) iff usubsat is legal
If usubsat() is legal, this is likely to result in smaller codegen expansion than the default cmp+select codegen expansion.

Allows us to move the x86-specific lowering to the generic expansion code.

Differential Revision: https://reviews.llvm.org/D92183
2020-11-27 11:18:58 +00:00
QingShan Zhang 4d83aba422 [DAGCombine] Adding a hook to improve the precision of fsqrt if the input is denormal
For now, we will hardcode the result as 0.0 if the input is denormal or 0. That will
have the impact the precision. As the fsqrt added belong to the cold path of the
cmp+branch, it won't impact the performance for normal inputs for PowerPC, but improve
the precision if the input is denormal.

Reviewed By: Spatel

Differential Revision: https://reviews.llvm.org/D80974
2020-11-27 02:10:55 +00:00
Nikita Popov 4df8efce80 [AA] Split up LocationSize::unknown()
Currently, we have some confusion in the codebase regarding the
meaning of LocationSize::unknown(): Some parts (including most of
BasicAA) assume that LocationSize::unknown() only allows accesses
after the base pointer. Some parts (various callers of AA) assume
that LocationSize::unknown() allows accesses both before and after
the base pointer (but within the underlying object).

This patch splits up LocationSize::unknown() into
LocationSize::afterPointer() and LocationSize::beforeOrAfterPointer()
to make this completely unambiguous. I tried my best to determine
which one is appropriate for all the existing uses.

The test changes in cs-cs.ll in particular illustrate a previously
clearly incorrect AA result: We were effectively assuming that
argmemonly functions were only allowed to access their arguments
after the passed pointer, but not before it. I'm pretty sure that
this was not intentional, and it's certainly not specified by
LangRef that way.

Differential Revision: https://reviews.llvm.org/D91649
2020-11-26 18:39:55 +01:00
Simon Pilgrim 8057ebf4a0 Revert rG12d59b696b330 "[DAG] Legalize umin(x,y) -> sub(x,usubsat(x,y)) and umax(x,y) -> add(x,usubsat(y,x)) iff usubsat is legal"
This reverts commit 12d59b696b.

Prematurely pushed this to trunk
2020-11-26 15:07:45 +00:00
Simon Pilgrim 12d59b696b [DAG] Legalize umin(x,y) -> sub(x,usubsat(x,y)) and umax(x,y) -> add(x,usubsat(y,x)) iff usubsat is legal
If usubsat() is legal, this is likely to result in smaller codegen expansion than the default cmp+select codegen expansion.

Allows us to move the x86-specific lowering to the generic expansion code.
2020-11-26 14:47:28 +00:00
Robert Lougher 6464c4a170 [LiveDebugVariables] Strip all debug instructions from nodebug functions
A crash/assertion failure in the greedy register allocator was tracked
down to a debug instr being passed to LiveIntervals::getInstructionIndex.
Normally this should not occur as debug instructions are collected and
removed by LiveDebugVariables before RA, and reinserted afterwards.
However, when a function has no debug info, LiveDebugVariables simply
strips any debug values that are present as they're not needed (this
situation will occur when a function with debug info is inlined into a
nodebug function). The problem is, it only removes DBG_VALUE instructions,
leaving DBG_LABELs (the cause of the crash).

This patch updates the LiveDebugVariables nodebug path to remove all debug
instructions. The test case verifies that DBG_VALUE/DBG_LABEL instructions
are present, and that they are stripped.

When -experimental-debug-variable-locations is enabled, certain variable
locations are represented by DBG_INSTR_REF instead of DBG_VALUE. The test
case verifies that a DBG_INSTR_REF is emitted by the option, and that it
is also stripped.

Differential Revision: https://reviews.llvm.org/D92127
2020-11-26 14:30:18 +00:00
Kerry McLaughlin 4bee3197f6 [SVE][CodeGen] Extend isConstantSplatValue to support ISD::SPLAT_VECTOR
Updated the affected scalable_of_scalable tests in sve-gep.ll, as isConstantSplatValue now returns true in DAGCombiner::visitMUL and folds `(mul x, 1) -> x`

Reviewed By: sdesmalen

Differential Revision: https://reviews.llvm.org/D91363
2020-11-26 11:19:40 +00:00
Craig Topper aea130f736 [LegalizerTypes] Add support for scalarizing the operand of an FP_EXTEND when the result type is legal. 2020-11-25 20:30:21 -08:00
Amy Huang 1363dfaf31 [CodeView] Avoid emitting empty debug globals subsection.
In https://reviews.llvm.org/D89072 I added static const data members
to the debug subsection for globals. It skipped emitting an S_CONSTANT if it
didn't have a value, which meant the subsection could be empty.

This patch fixes the empty subsection issue.

Differential Revision: https://reviews.llvm.org/D92049
2020-11-25 16:13:32 -08:00
Craig Topper 2d6042937b [SelectionDAGBuilder] Add SPF_NABS support to visitSelect
We currently don't match this which limits the effectiveness of D91120 until
InstCombine starts canonicalizing to llvm.abs. This should be easy to remove
if/when we remove the SPF_ABS handling.

Differential Revision: https://reviews.llvm.org/D92118
2020-11-25 14:54:26 -08:00
Paul Robinson dc35368ccf Remove static function unused after cf1c774.
Caused some -Werror bot failures.
2020-11-25 13:43:06 -05:00
Simon Pilgrim 9c86c5e8ad [DAG] Legalize abs(x) -> umin(x,sub(0,x)) iff umin/sub are legal
If umin() is legal, this is likely to result in smaller codegen expansion for abs(x) than the xor(add,ashr) method.

Followup to D92095

Alive2: https://alive2.llvm.org/ce/z/8nuX6s  https://alive2.llvm.org/ce/z/q2hB9w
2020-11-25 18:06:02 +00:00
Paul Robinson cf1c774d6a [FastISel] Flush local value map on ever instruction
Local values are constants or addresses that can't be folded into
the instruction that uses them. FastISel materializes these in a
"local value" area that always dominates the current insertion
point, to try to avoid materializing these values more than once
(per block).

https://reviews.llvm.org/D43093 added code to sink these local
value instructions to their first use, which has two beneficial
effects. One, it is likely to avoid some unnecessary spills and
reloads; two, it allows us to attach the debug location of the
user to the local value instruction. The latter effect can
improve the debugging experience for debuggers with a "set next
statement" feature, such as the Visual Studio debugger and PS4
debugger, because instructions to set up constants for a given
statement will be associated with the appropriate source line.

There are also some constants (primarily addresses) that could be
produced by no-op casts or GEP instructions; the main difference
from "local value" instructions is that these are values from
separate IR instructions, and therefore could have multiple users
across multiple basic blocks. D43093 avoided sinking these, even
though they were emitted to the same "local value" area as the
other instructions. The patch comment for D43093 states:

  Local values may also be used by no-op casts, which adds the
  register to the RegFixups table. Without reversing the RegFixups
  map direction, we don't have enough information to sink these
  instructions.

This patch undoes most of D43093, and instead flushes the local
value map after(*) every IR instruction, using that instruction's
debug location. This avoids sometimes incorrect locations used
previously, and emits instructions in a more natural order.

This does mean materialized values are not re-used across IR
instruction boundaries; however, only about 5% of those values
were reused in an experimental self-build of clang.

(*) Actually, just prior to the next instruction. It seems like
it would be cleaner the other way, but I was having trouble
getting that to work.

Differential Revision: https://reviews.llvm.org/D91734
2020-11-25 13:05:00 -05:00
Simon Pilgrim 0637dfe88b [DAG] Legalize abs(x) -> smax(x,sub(0,x)) iff smax/sub are legal
If smax() is legal, this is likely to result in smaller codegen expansion for abs(x) than the xor(add,ashr) method.

This is also what PowerPC has been doing for its abs implementation, so it lets us get rid of a load of custom lowering code there (and which was never updated when they added smax lowering).

Alive2: https://alive2.llvm.org/ce/z/xRk3cD

Differential Revision: https://reviews.llvm.org/D92095
2020-11-25 15:03:03 +00:00
Simon Pilgrim 7e7106d104 DetectDeadLanes.cpp - remove unused headers. NFCI. 2020-11-25 11:38:28 +00:00
QingShan Zhang 9c588f53fc [DAGCombine] Add hook to allow target specific test for sqrt input
PowerPC has instruction ftsqrt/xstsqrtdp etc to do the input test for software square root.
LLVM now tests it with smallest normalized value using abs + setcc. We should add hook to
target that has test instructions.

Reviewed By: Spatel, Chen Zheng, Qiu Chao Fang

Differential Revision: https://reviews.llvm.org/D80706
2020-11-25 05:37:15 +00:00
Kai Luo 8e6d92026c [DAG][PowerPC] Fix dropped `nsw` flag in `SimplifySetCC` by adding `doesNodeExist` helper
`SimplifySetCC` invokes `getNodeIfExists` without passing `Flags` argument and `getNodeIfExists` uses a default `SDNodeFlags` to intersect the original flags, as a consequence, flags like `nsw` is dropped. Added a new helper function `doesNodeExist` to check if a node exists without modifying its flags.

Reviewed By: #powerpc, nemanjai

Differential Revision: https://reviews.llvm.org/D89938
2020-11-25 04:39:03 +00:00
Zarko Todorovski c92f29b05e [AIX] Add mabi=vec-extabi options to enable the AIX extended and default vector ABIs.
Added support for the options mabi=vec-extabi and mabi=vec-default which are analogous to qvecnvol and qnovecnvol when using XL on AIX.
The extended Altivec ABI on AIX is enabled using mabi=vec-extabi in clang and vec-extabi in llc.

Reviewed By: Xiangling_L, DiggerLin

Differential Revision: https://reviews.llvm.org/D89684
2020-11-24 18:17:53 -05:00
Hsiangkai Wang 8d06a678a5 [SelectionDAG] Avoid aliasing analysis if the object size is unknown.
If the size of memory access is unknown, do not use it to analysis. One
example of unknown size memory access is to load/store scalable vector
objects on the stack.

Differential Revision: https://reviews.llvm.org/D91833
2020-11-25 06:13:37 +08:00
Janek van Oirschot 42eaf4fe0a [HardwareLoops] Change order of SCEV expression construction for InitLoopCount.
Putting the +1 before the zero-extend will allow scalar evolution to fold the expression in some cases such as the one shown in PowerPC's `shrink-wrap.ll` test.

Reviewed By: samparker

Differential Revision: https://reviews.llvm.org/D91724
2020-11-24 18:01:42 +00:00
Yichao Yu a248eca665
Clear NewGEPBases after finish using them in CodeGenPrep pass
AFAICT all other set/map are correctly cleared in `runOnFunction`.

With assertion enabled this causes a crash when the module is freed and potentially if a later pass delete the instruction (not observed in real world though). Without assertion this can potentially cause confusing result when running on a new Function/Module.

Reviewed By: loladiro

Differential Revision: https://reviews.llvm.org/D84031
2020-11-24 12:12:00 -05:00
Thomas Preud'homme 9c8af93c93 Add support for STRICT_FSETCC promotion
Add missing handling of STRICT_FSETCC promotion. This prevents assert
failure in llvm::TargetLoweringBase::getTypeToPromoteTo().

Reviewed By: uweigand

Differential Revision: https://reviews.llvm.org/D91962
2020-11-24 16:53:49 +00:00
Kai Luo 5931be60b5 [DAGCombine][PowerPC] Convert negated abs to trivial arithmetic ops
This patch converts `0 - abs(x)` to `Y = sra (X, size(X)-1); sub (Y, xor (X, Y))` for better codegen.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D91120
2020-11-24 09:43:35 +00:00
Pavel Labath bce2ac9f6d Revert "[DebugInfo] Refactor code for emitting DWARF expressions for FP constants"
The commit introduced a crash when emitting (debug info for) complex
floats (pr48277).
2020-11-24 09:11:33 +01:00
Martin Storsjö 6f792041a5 Reapply "[CodeGen] [WinException] Only produce handler data at the end of the function if needed"
This reapplies 36c64af9d7 in updated
form.

Emit the xdata for each function at .seh_endproc. This keeps the
exact same output header order for most code generated by the LLVM
CodeGen layer. (Sections still change order for code built from
assembly where functions lack an explicit .seh_handlerdata
directive, and functions with chained unwind info.)

The practical effect should be that assembly output lacks
superfluous ".seh_handlerdata; .text" pairs at the end of functions
that don't handle exceptions, which allows such functions to use
the AArch64 packed unwind format again.

Differential Revision: https://reviews.llvm.org/D87448
2020-11-23 23:17:03 +02:00
Pavel Labath 6ef7835afc [DebugInfo] Refactor code for emitting DWARF expressions for FP constants
This patch moves the selection of the style used to emit the numbers
(DW_OP_implicit_value vs. DW_OP_const+DW_OP_stack_value) into
DwarfExpression::addUnsignedConstant. This logic is not FP-specific, and
it will be needed for large integers too.

The refactor also makes DW_OP_implicit_value (DW_OP_stack_value worked
already) be used for floating point constants other than float and
double, so I've added a _Float16 test for it.

Split off from D90916.

Differential Revision: https://reviews.llvm.org/D91058
2020-11-23 09:59:07 +01:00
Kazu Hirata 85d6af393c [CodeGen] Use pred_empty (NFC) 2020-11-22 22:16:13 -08:00
Simon Pilgrim 791040cd8b [DAG] LowerMINMAX - move default expansion to generic TargetLowering::expandIntMINMAX
This is part of the discussion on D91876 about trying to reduce custom lowering of MIN/MAX ops on older SSE targets - if we can improve generic vector expansion we should be able to relax the limitations in SelectionDAGBuilder when it will let MIN/MAX ops be generated, and avoid having to flag so many ops as 'custom'.
2020-11-22 13:02:27 +00:00
Kazu Hirata 68403af007 [MBP] Remove unused declaration shouldPredBlockBeOutlined (NFC)
The function was introduced on Jun 12, 2016 in commit
071d0f1807.  Its definition was removed
on Mar 2, 2017 in commit 1393761e0c.
2020-11-21 23:35:02 -08:00
Kazu Hirata 9d985082ad [MachineLICM] Remove unused declaration HoistRegion
The function definition was removed on Dec 22, 2011 in commit
in 1eed5b51e8.
2020-11-21 22:55:37 -08:00
Kazu Hirata c2309ff3d5 [SelectionDAG] Remove unused declaration ExpandStrictFPOp (NFC)
ExpandStrictFPOp started taking two parameters instead of one on Jan
10, 2020 in commit f678fc7660, but the
declaration for the single-perameter version has remained since.
2020-11-21 22:29:44 -08:00
Ella Ma 1756d67934 [llvm][clang][mlir] Add checks for the return values from Target::createXXX to prevent protential null deref
All these potential null pointer dereferences are reported by my static analyzer for null smart pointer dereferences, which has a different implementation from `alpha.cplusplus.SmartPtr`.

The checked pointers in this patch are initialized by Target::createXXX functions. When the creator function pointer is not correctly set, a null pointer will be returned, or the creator function may originally return a null pointer.

Some of them may not make sense as they may be checked before entering the function, but I fixed them all in this patch. I submit this fix because 1) similar checks are found in some other places in the LLVM codebase for the same return value of the function; and, 2) some of the pointers are dereferenced before they are checked, which may definitely trigger a null pointer dereference if the return value is nullptr.

Reviewed By: tejohnson, MaskRay, jpienaar

Differential Revision: https://reviews.llvm.org/D91410
2020-11-21 21:04:12 -08:00
Hongtao Yu d0e42037bf [CSSPGO] MIR target-independent pseudo instruction for pseudo-probe intrinsic
This change introduces a MIR target-independent pseudo instruction corresponding to the IR intrinsic llvm.pseudoprobe for pseudo-probe block instrumentation. Please refer to https://reviews.llvm.org/D86193 for the whole story.

An `llvm.pseudoprobe` intrinsic call will be lowered into a target-independent operation named `PSEUDO_PROBE`. Given the following instrumented IR,

```
define internal void @foo2(i32 %x, void (i32)* %f) !dbg !4 {
bb0:
   %cmp = icmp eq i32 %x, 0
   call void @llvm.pseudoprobe(i64 837061429793323041, i64 1)
   br i1 %cmp, label %bb1, label %bb2
bb1:
   call void @llvm.pseudoprobe(i64 837061429793323041, i64 2)
   br label %bb3
bb2:
   call void @llvm.pseudoprobe(i64 837061429793323041, i64 3)
   br label %bb3
bb3:
   call void @llvm.pseudoprobe(i64 837061429793323041, i64 4)
   ret void
}
```
the corresponding MIR is shown below. Note that block `bb3` is duplicated into `bb1` and `bb2` where its probe is duplicated too. This allows for an accurate execution count to be collected for `bb3`, which is basically the sum of the counts of `bb1` and `bb2`.

```
bb.0.bb0:
   frame-setup PUSH64r undef $rax, implicit-def $rsp, implicit $rsp
   TEST32rr killed renamable $edi, renamable $edi, implicit-def $eflags
   PSEUDO_PROBE 837061429793323041, 1, 0
   $edi = MOV32ri 1, debug-location !13; test.c:0
   JCC_1 %bb.1, 4, implicit $eflags

bb.2.bb2:
   PSEUDO_PROBE 837061429793323041, 3, 0
   PSEUDO_PROBE 837061429793323041, 4, 0
   $rax = frame-destroy POP64r implicit-def $rsp, implicit $rsp
   RETQ

bb.1.bb1:
   PSEUDO_PROBE 837061429793323041, 2, 0
   PSEUDO_PROBE 837061429793323041, 4, 0
   $rax = frame-destroy POP64r implicit-def $rsp, implicit $rsp
   RETQ
```

The target op PSEUDO_PROBE will be converted into a piece of binary data by the object emitter with no machine instructions generated. This is done in a different patch.

Reviewed By: wmi

Differential Revision: https://reviews.llvm.org/D86495
2020-11-20 10:52:43 -08:00
Hongtao Yu f3c445697d [CSSPGO] IR intrinsic for pseudo-probe block instrumentation
This change introduces a new IR intrinsic named `llvm.pseudoprobe` for pseudo-probe block instrumentation. Please refer to https://reviews.llvm.org/D86193 for the whole story.

A pseudo probe is used to collect the execution count of the block where the probe is instrumented. This requires a pseudo probe to be persisting. The LLVM PGO instrumentation also instruments in similar places by placing a counter in the form of atomic read/write operations or runtime helper calls. While these operations are very persisting or optimization-resilient, in theory we can borrow the atomic read/write implementation from PGO counters and cut it off at the end of compilation with all the atomics converted into binary data. This was our initial design and we’ve seen promising sample correlation quality with it. However, the atomics approach has a couple issues:

1. IR Optimizations are blocked unexpectedly. Those atomic instructions are not going to be physically present in the binary code, but since they are on the IR till very end of compilation, they can still prevent certain IR optimizations and result in lower code quality.
2. The counter atomics may not be fully cleaned up from the code stream eventually.
3. Extra work is needed for re-targeting.

We choose to implement pseudo probes based on a special LLVM intrinsic, which is expected to have most of the semantics that comes with an atomic operation but does not block desired optimizations as much as possible. More specifically the semantics associated with the new intrinsic enforces a pseudo probe to be virtually executed exactly the same number of times before and after an IR optimization. The intrinsic also comes with certain flags that are carefully chosen so that the places they are probing are not going to be messed up by the optimizer while most of the IR optimizations still work. The core flags given to the special intrinsic is `IntrInaccessibleMemOnly`, which means the intrinsic accesses memory and does have a side effect so that it is not removable, but is does not access memory locations that are accessible by any original instructions. This way the intrinsic does not alias with any original instruction and thus it does not block optimizations as much as an atomic operation does. We also assign a function GUID and a block index to an intrinsic so that they are uniquely identified and not merged in order to achieve good correlation quality.

Let's now look at an example. Given the following LLVM IR:

```
define internal void @foo2(i32 %x, void (i32)* %f) !dbg !4 {
bb0:
  %cmp = icmp eq i32 %x, 0
   br i1 %cmp, label %bb1, label %bb2
bb1:
   br label %bb3
bb2:
   br label %bb3
bb3:
   ret void
}
```

The instrumented IR will look like below. Note that each `llvm.pseudoprobe` intrinsic call represents a pseudo probe at a block, of which the first parameter is the GUID of the probe’s owner function and the second parameter is the probe’s ID.

```
define internal void @foo2(i32 %x, void (i32)* %f) !dbg !4 {
bb0:
   %cmp = icmp eq i32 %x, 0
   call void @llvm.pseudoprobe(i64 837061429793323041, i64 1)
   br i1 %cmp, label %bb1, label %bb2
bb1:
   call void @llvm.pseudoprobe(i64 837061429793323041, i64 2)
   br label %bb3
bb2:
   call void @llvm.pseudoprobe(i64 837061429793323041, i64 3)
   br label %bb3
bb3:
   call void @llvm.pseudoprobe(i64 837061429793323041, i64 4)
   ret void
}

```

Reviewed By: wmi

Differential Revision: https://reviews.llvm.org/D86490
2020-11-20 10:39:24 -08:00
Craig Topper a7eae62a42 [SelectionDAG][X86][PowerPC][Mips] Replace the default implementation of LowerOperationWrapper with the X86 and PowerPC version.
The default version only works if the returned node has a single
result. The X86 and PowerPC versions support multiple results
and allow a single result to be returned from a node with
multiple outputs. And allow a single result that is not result 0
of the node.

Also replace the Mips version since the new version should work
for it. The original version handled multiple results, but only
if the new node and original node had the same number of results.

Differential Revision: https://reviews.llvm.org/D91846
2020-11-20 10:06:53 -08:00
Andrew Wei 1cd19fc568 [DeadMachineInstrctionElim] Post order visit all blocks and Iteratively run DeadMachineInstructionElim pass until nothing dead
Patched by: guopeilin
Reviewed By: hliao,rampitec

Differential Revision: https://reviews.llvm.org/D91513
2020-11-21 00:43:23 +08:00
Pavel Iliin 4d7df43ffd [AArch64] Out-of-line atomics (-moutline-atomics) implementation.
This patch implements out of line atomics for LSE deployment
mechanism. Details how it works can be found in llvm/docs/Atomics.rst
Options -moutline-atomics and -mno-outline-atomics to enable and disable it
were added to clang driver. This is clang and llvm part of out-of-line atomics
interface, library part is already supported by libgcc. Compiler-rt
support is provided in separate patch.

Differential Revision: https://reviews.llvm.org/D91157
2020-11-20 13:30:12 +00:00
Kazu Hirata 2583d8eb08 [CodeGen] Use llvm::is_contained (NFC) 2020-11-19 22:07:56 -08:00
Nikita Popov 393b9e9db3 [MemLoc] Require LocationSize argument (NFC)
When constructing a MemoryLocation by hand, require that a
LocationSize is explicitly specified. D91649 will split up
LocationSize::unknown() into two different states, and callers
should make an explicit choice regarding the kind of MemoryLocation
they want to have.
2020-11-19 21:45:52 +01:00
Leonard Chan a97f62837f [llvm][IR] Add dso_local_equivalent Constant
The `dso_local_equivalent` constant is a wrapper for functions that represents a
value which is functionally equivalent to the global passed to this. That is, if
this accepts a function, calling this constant should have the same effects as
calling the function directly. This could be a direct reference to the function,
the `@plt` modifier on X86/AArch64, a thunk, or anything that's equivalent to the
resolved function as a call target.

When lowered, the returned address must have a constant offset at link time from
some other symbol defined within the same binary. The address of this value is
also insignificant. The name is leveraged from `dso_local` where use of a function
or variable is resolved to a symbol in the same linkage unit.

In this patch:
- Addition of `dso_local_equivalent` and handling it
- Update Constant::needsRelocation() to strip constant inbound GEPs and take
  advantage of `dso_local_equivalent` for relative references

This is useful for the [Relative VTables C++ ABI](https://reviews.llvm.org/D72959)
which makes vtables readonly. This works by replacing the dynamic relocations for
function pointers in them with static relocations that represent the offset between
the vtable and virtual functions. If a function is externally defined,
`dso_local_equivalent` can be used as a generic wrapper for the function to still
allow for this static offset calculation to be done.

See [RFC](http://lists.llvm.org/pipermail/llvm-dev/2020-August/144469.html) for more details.

Differential Revision: https://reviews.llvm.org/D77248
2020-11-19 10:26:17 -08:00