For transformations between i32 and i64, if it is explicit signed extension:
- first cast the operand to i64
- then use SLL + SRA to finish the extension.
if it is explicit zero extension:
- first cast the operand to i64
- then use SLL + SRL to finish the extension.
if it is explicit any extension:
- just refer to 64-bit register.
if it is explicit truncation:
- just refer to 32-bit subregister.
NOTE: Some of the zero extension sequences might be unnecessary, they will be
removed by an peephole pass on MachineInstruction layer.
Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Reviewed-by: Yonghong Song <yhs@fb.com>
llvm-svn: 325981
These 32-bit ALU insn patterns which takes immediate as one operand were
initially added to enable AsmParser support, and the AsmMatcher uses "ins"
and "outs" fields to deduct the operand constraint.
However, the instruction selector doesn't work the same as AsmMatcher. The
selector will use the "pattern" field for which we are not setting the
predication for immediate operands correctly.
Without this patch, i32 would eventually means all i32 operands are valid,
both imm and gpr, while these patterns should allow imm only.
Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Reviewed-by: Yonghong Song <yhs@fb.com>
llvm-svn: 325980
markSuperRegs is the canonical helper function used to mark reserved
registers. It could mark any overlapping sub-registers automatically.
Reviewed-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
llvm-svn: 325979
The instruction sequence used to get the address of the PC into a GPR requires
that we clobber the link register. Doing so without having first saved it in
the prologue leaves the function unable to return. Currently, this sequence is
emitted into the entry block. To ensure the prologue is inserted before this
sequence, disable shrink-wrapping.
This fixes PR33547.
Differential Revision: https://reviews.llvm.org/D43677
llvm-svn: 325972
In DWARF v5 the Line Number Program Header is extensible, allowing values with
new content types. In this extension a content type is added,
DW_LNCT_LLVM_source, which contains the embedded source code of the file.
Add new optional attribute for !DIFile IR metadata called source which contains
source text. Use this to output the source to the DWARF line table of code
objects. Analogously extend METADATA_FILE in Bitcode and .file directive in ASM
to support optional source.
Teach llvm-dwarfdump and llvm-objdump about the new values. Update the output
format of llvm-dwarfdump to make room for the new attribute on file_names
entries, and support embedded sources for the -source option in llvm-objdump.
Differential Revision: https://reviews.llvm.org/D42765
llvm-svn: 325970
This has the advantage of making release only builds more warning
free and there's no need to make this routine a class function if
it isn't using class members anyhow.
llvm-svn: 325967
This is the first in a series of patches that will define more
instructions using InstRW so that we can move away from ItinRW
and ultimately have a complete Power 9 scheduler.
Differential Revision: https://reviews.llvm.org/D43635
llvm-svn: 325956
These can be created by type legalization promoting the inputs to select to match scalar boolean contents.
We were trying to pattern match them away during isel, but its better to just remove them from the DAG.
I've cleaned up some patterns to not check for this 'and' anymore. But I suspect this has also opened up opportunities for pattern removal.
llvm-svn: 325949
https://github.com/golang/go/issues/23672
By this change, building a Go code with LLVM Go bindings causes a compilation error as follows.
go build llvm.org/llvm/bindings/go/llvm: invalid flag in #cgo LDFLAGS: -Wl,-headerpad_max_install_names
llvm-go tool generates cgo LDFLAGS directive from `llvm-config --ldflags` and it contains -Wl,option options. But -Wl,option is banned by default. To avoid this problem, we need to set $CGO_LDFLAGS_ALLOW environment variable to notify a compiler that the flags should be allowed.
$ export CGO_LDFLAGS_ALLOW='-Wl,(-search_paths_first|-headerpad_max_install_names)'
By default for go 1.10 and go 1.9.5 these options should appear in the accepted set of options, however, if you're running into the error it's useful to have this documented.
Patch by Ryuichi Hayashida
llvm-svn: 325946
This feature enables the fusion of the comparison and the conditional select
instructions together.
Differential revision: https://reviews.llvm.org/D42392
llvm-svn: 325939
The test changes you can see are related to the changes in ReplaceNodeResults. Though shuffle-vs-trunc-512.ll does have a test that exercises the code in LowerBITCAST. Looks like the test output didn't change because DAG combining is able to clean up the resulting type legalization. Adding the custom hook just makes type legalization work less hard.
Differential Revision: https://reviews.llvm.org/D43447
llvm-svn: 325933
Summary:
Add a target option AllowRegisterRenaming that is used to opt in to
post-register-allocation renaming of registers. This is set to 0 by
default, which causes the hasExtraSrcRegAllocReq/hasExtraDstRegAllocReq
fields of all opcodes to be set to 1, causing
MachineOperand::isRenamable to always return false.
Set the AllowRegisterRenaming flag to 1 for all in-tree targets that
have lit tests that were effected by enabling COPY forwarding in
MachineCopyPropagation (AArch64, AMDGPU, ARM, Hexagon, Mips, PowerPC,
RISCV, Sparc, SystemZ and X86).
Add some more comments describing the semantics of the
MachineOperand::isRenamable function and how it is set and maintained.
Change isRenamable to check the operand's opcode
hasExtraSrcRegAllocReq/hasExtraDstRegAllocReq bit directly instead of
relying on it being consistently reflected in the IsRenamable bit
setting.
Clear the IsRenamable bit when changing an operand's register value.
Remove target code that was clearing the IsRenamable bit when changing
registers/opcodes now that this is done conservatively by default.
Change setting of hasExtraSrcRegAllocReq in AMDGPU target to be done in
one place covering all opcodes that have constant pipe read limit
restrictions.
Reviewers: qcolombet, MatzeB
Subscribers: aemerson, arsenm, jyknight, mcrosier, sdardis, nhaehnle, javed.absar, tpr, arichardson, kristof.beyls, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, jordy.potman.lists, apazos, sabuasal, niosHD, escha, nemanjai, llvm-commits
Differential Revision: https://reviews.llvm.org/D43042
llvm-svn: 325931
Summary:
This patch is an enhancement to propagate dbg.value information when Phis are created on behalf of LCSSA.
I noticed a case where a value carried across a loop was reported as <optimized out>.
Specifically this case:
```
int bar(int x, int y) {
return x + y;
}
int foo(int size) {
int val = 0;
for (int i = 0; i < size; ++i) {
val = bar(val, i); // Both val and i are correct
}
return val; // <optimized out>
}
```
In the above case, after all of the interesting computation completes our value
is reported as "optimized out." This change will add a dbg.value to correct this.
This patch also moves the dbg.value insertion routine from LoopRotation.cpp
into Local.cpp, so that we can share it in both places (LoopRotation and LCSSA).
Reviewers: mzolotukhin, aprantl, vsk, davide
Reviewed By: aprantl, vsk
Subscribers: dberlin, llvm-commits
Differential Revision: https://reviews.llvm.org/D42551
llvm-svn: 325926
If we have a loop like this:
int n = 0;
while (...) {
if (++n >= MAX) {
n = 0;
}
}
then the body of the 'if' statement will only be executed once every MAX
iterations. Detect this by looking for branches in loops where taking the branch
makes the branch condition evaluate to 'not taken' in the next iteration of the
loop, and reduce the probability of such branches.
This slightly improves EEMBC benchmarks on cortex-m4/cortex-m33 due to making
better choices in if-conversion, but has no effect on any other cpu/benchmark
that I could detect.
Differential Revision: https://reviews.llvm.org/D35804
llvm-svn: 325925
The existing code was inefficiently looking for 'nsz' variants.
That's unnecessary because we canonicalize those to the expected
form with -0.0.
We may also want to adjust or remove the fold that sinks negation.
We don't do that for fdiv (or integer ops?). That should be uniform?
It may also lead to missed optimization as in PR21914:
https://bugs.llvm.org/show_bug.cgi?id=21914
...or we just have to fix other passes to avoid that problem.
llvm-svn: 325924
The following set of instructions was originally planned to be added for Power 9
and so code was added to support them. However, a decision was made later on to
withdraw support for these instructions in the hardware.
xscmpnedp
xvcmpnesp
xvcmpnedp
This patch removes support for the instructions that were not added.
Differential Revision: https://reviews.llvm.org/D43641
llvm-svn: 325918
Summary:
There are transformation that change setcc into other constructs, and transform that try to reconstruct a setcc from the brcond condition. Depending on what order these transform are done, the end result differs.
Most of the time, it is preferable to get a setcc as a brcond argument (and this is why brcond try to recreate the setcc in the first place) so we ensure this is done every time by also doing it at the setcc level when the only user is a brcond.
Reviewers: spatel, hfinkel, niravd, craig.topper
Subscribers: nhaehnle, llvm-commits
Differential Revision: https://reviews.llvm.org/D41235
llvm-svn: 325892
This reverts r325884.
Clang's TableGen has dependencies on the exact ordering of superclasses.
Revert this change fully for now to fix the build.
Change-Id: Ib297f5571cc7809f00838702ad7ab53d47335b26
llvm-svn: 325891
Add GlobalISel infrastructure up to the point where we can select a ret
void.
Patch by Petar Avramovic.
Differential Revision: https://reviews.llvm.org/D43583
llvm-svn: 325888
A subsequent change intends to remove resolveListElementReference
entirely. This part of the removal can be split out for better
bisectability.
Change-Id: Ibd762d88fd2d1e2cc116a259e2a27a5e9f9a8b10
Reviewers: arsenm, craig.topper, tra, MartinO
Subscribers: wdng, llvm-commits
Differential Revision: https://reviews.llvm.org/D43561
Change-Id: Ifb695041cef1964ad8a3102f448249501a9243f0
llvm-svn: 325886
Summary:
Only check whether the left-hand side type is a subclass (or equal to)
the right-hand side type.
This requires a further fix in handling !if expressions and in type
resolution.
Furthermore, reverse the order of superclasses so that resolveTypes will
find a least common ancestor at least in simple cases.
Add a test that used to be accepted without flagging the obvious type
error.
Change-Id: Ib366db1a4e6a079f1a0851e469b402cddae76714
Reviewers: arsenm, craig.topper, tra, MartinO
Subscribers: wdng, llvm-commits
Differential Revision: https://reviews.llvm.org/D43559
llvm-svn: 325884
Summary:
Returns the size of a list. I have found this to be rather useful in some
development for the AMDGPU backend where we could simplify our .td files
by concatenating list<LLVMType> for complex intrinsics. Doing so requires
us to compute the position argument for LLVMMatchType.
Basically, the usage is in a pattern that looks somewhat like this:
list<LLVMType> argtypes =
!listconcat(base,
[llvm_any_ty, LLVMMatchType<!size(base)>]);
Change-Id: I360a0b000fd488d18bea412228230fd93722bd2c
Reviewers: arsenm, craig.topper, tra, MartinO
Subscribers: wdng, llvm-commits, tpr
Differential Revision: https://reviews.llvm.org/D43553
llvm-svn: 325883
Summary:
This handles def-after-use of physregs, and allows us to merge loads and
stores even across some physreg defs (typically M0 defs).
Change-Id: I076484b2bda27c2cf46013c845a0380c5b89b67b
Reviewers: arsenm, mareko, rampitec
Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits
Differential Revision: https://reviews.llvm.org/D42647
llvm-svn: 325882
Summary:
This fixes cases like the new test @nonuniform. In that test, %cc itself
is a uniform value; however, when reading it after the end of the loop in
basic block %if, its value is effectively non-uniform.
This problem was encountered in
https://bugs.freedesktop.org/show_bug.cgi?id=103743; however, this change
in itself is not sufficient to fix that bug, as there is another issue
in the AMDGPU backend.
Change-Id: I32bbffece4a32f686fab54964dae1a5dd72949d4
Reviewers: arsenm, rampitec, jlebar
Subscribers: wdng, tpr, llvm-commits
Differential Revision: https://reviews.llvm.org/D40546
llvm-svn: 325881
Summary:
MemDep caches results that signify that a dependence is non-local, and
there is currently no way to invalidate such cache entries.
Unfortunately, when MLSM sinks a store that can result in a non-local
dependence becoming a local one, and then MemDep gives wrong answers.
The easiest way out here is to just say that MLSM does indeed not
preserve MemDep results.
Reviewers: davide, Gerolf
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D43177
llvm-svn: 325880
Enable multiple COPY hints to eliminate more COPYs during register allocation.
Note that this is something all targets should do, see
https://reviews.llvm.org/D38128.
Review: Simon Dardis
llvm-svn: 325870
This is combination of two patches by Nicholas Wilson:
1. https://reviews.llvm.org/D41954
2. https://reviews.llvm.org/D42495
Along with a few local modifications:
- One change I made was to add the UNDEFINED bit to the binary format
to avoid the extra byte used when writing data symbols. Although this
bit is redundant for other symbols types (i.e. undefined can be
implied if a function or global is a wasm import)
- I prefer to be explicit and consistent and not have derived flags.
- Some field renaming.
- Some reverting of unrelated minor changes.
- No test output differences.
Differential Revision: https://reviews.llvm.org/D43147
llvm-svn: 325860
The base case for any_of was incorrectly returning true. Also add test
case which uses m_any_of(preds...) where none of the predicates are
true.
llvm-svn: 325848
We won't be able to fold the constant pool load, but its still better than materialing ones and xoring for the invert if we used PCMPEQ.
This will fix another regression from D42948.
llvm-svn: 325845
Move checks for each fusion case into separate functions for better
legibility and maintainability.
Differential revision: https://reviews.llvm.org/D43649
llvm-svn: 325844
Summary:
The built-in PDB types enum has been extended to include char16_t and char32_t.
llvm-pdbutil was hitting an llvm_unreachable because it didn't know about these
new values. The new values are not yet in the DIA documentation, but are
listed in the cvconst.h header that comes as part of the DIA SDK.
Reviewers: asmith, zturner, rnk
Subscribers: stella.stamenova, llvm-commits, sanjoy
Differential Revision: https://reviews.llvm.org/D43646
llvm-svn: 325838
The more popular opcodes were added at r325730, but we
should have everything here for symmetry. I think both
of these can be used in InstCombine already, but I'll
make those changes as separate clean-ups for InstCombine.
llvm-svn: 325832
Summary:
As pointed out in the review for D37993, for consistency with other
linkers, gold plugin should perform cache pruning whenever there is a
cache directory specified, which will use the default cache policy.
Reviewers: pcc
Subscribers: llvm-commits, inglorion
Differential Revision: https://reviews.llvm.org/D43389
llvm-svn: 325830
isCondCodeLegal internally checked Legal or Custom which is misleading. Though no targets set any cond code action to Custom today.
So I've renamed isCondCodeLegal to isCondCodeLegalOrCustom and added a real isCondCodeLegal that only checks Legal.
I've changed legalization code to use isCondCodeLegalOrCustom and left things reachable via DAG combine as isCondCodeLegal. I've also changed some places that called getCondCodeAction and compared to Legal to just use isCondCodeLegal.
I'm looking at trying to keep SETCC all the way to isel for the AVX512 integer comparisons and I suspect I'll need to make some condition codes Custom to stop DAG combine from changing things post LegalizeOps. Prior to this only Expand stopped DAG combine, but that causes LegalizeOps to try to swap operands or invert rather than calling our Custom handler.
Differential Revision: https://reviews.llvm.org/D43607
llvm-svn: 325829
Previously this code overrode the flags and opcode used by the later code in LowerVSETCC. This makes the code difficult to read and follow.
This patch moves all the SUBUS code into its own function and makes it responsible for creating its own SDNodes on success.
Differential Revision: https://reviews.llvm.org/D43530
llvm-svn: 325827
This patch reverts r325440 and r325438 because it triggers an
assertion in SelectionDAGBuilder.cpp. Also having debug enabled
may unintentionally affect code-gen. The patch is reverted until
we find a better solution.
llvm-svn: 325825
Summary:
The current integer representation of relative block frequency prevents
representing relative block frequencies below 1. This change uses a 8 of
the 29 bits to represent the decimal part by using a fixed scale of -8.
Reviewers: tejohnson, davidxl
Subscribers: mehdi_amini, inglorion, llvm-commits
Differential Revision: https://reviews.llvm.org/D43520
llvm-svn: 325823
When DataFlowSanitizer transforms a call to a custom function, the
new call has extra parameters. The attributes on parameters must be
updated to take the new position of each parameter into account.
Patch by Sam Kerner!
Differential Revision: https://reviews.llvm.org/D43132
llvm-svn: 325820
Summary:
ThinLTO indexing may decide to skip all objects. If we don't write something to
the list build system may consider this as failure or linker can reuse a file
from the previews build.
Reviewers: pcc, tejohnson
Subscribers: mehdi_amini, inglorion, eraman, hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D43415
llvm-svn: 325819
Summary:
This change is part of step five in the series of changes to remove alignment argument from
memcpy/memmove/memset in favour of alignment attributes. In particular, this changes the
AlignmentFromAssumptions pass to cease using the old getAlignment()/setAlignment API of
MemoryIntrinsic in favour of getting/setting source & dest specific alignments through
the new API. This allows us to simplify some of the code in this pass and also be more
aggressive about setting the source and destination alignments separately.
Steps:
Step 1) Remove alignment parameter and create alignment parameter attributes for
memcpy/memmove/memset. ( rL322965, rC322964, rL322963 )
Step 2) Expand the IRBuilder API to allow creation of memcpy/memmove with differing
source and dest alignments. ( rL323597 )
Step 3) Update Clang to use the new IRBuilder API. ( rC323617 )
Step 4) Update Polly to use the new IRBuilder API. ( rL323618 )
Step 5) Update LLVM passes that create memcpy/memmove calls to use the new IRBuilder API,
and those that use use MemIntrinsicInst::[get|set]Alignment() to use [get|set]DestAlignment()
and [get|set]SourceAlignment() instead. ( rL323886, rL323891, rL324148, rL324273, rL324278,
rL324384, rL324395, rL324402, rL324626, rL324642, rL324653, rL324654, rL324773, rL324774,
rL324781, rL324784, rL324955, rL324960 )
Step 6) Remove the single-alignment IRBuilder API for memcpy/memmove, and the
MemIntrinsicInst::[get|set]Alignment() methods.
Reference
http://lists.llvm.org/pipermail/llvm-dev/2015-August/089384.htmlhttp://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20151109/312083.html
Reviewers: hfinkel, bollu, reames
Reviewed By: reames
Subscribers: reames, llvm-commits
Differential Revision: https://reviews.llvm.org/D43081
llvm-svn: 325816
This allows us to improve vector constant matching in more DAG code (backends, TargetLowering etc.).
Differential Revision: https://reviews.llvm.org/D43466
llvm-svn: 325815
Extension to D12776, handle modulo by zero in the same way we handle divide by zero.
Differential Revision: https://reviews.llvm.org/D43631
llvm-svn: 325810
Also, add a helper for the constant folder to reduce duplication.
It seems out-of-place for and/or to be doing simplifications here?
Otherwise, I could have used the helper on those opcodes too.
llvm-svn: 325808
Summary:
If there is no debug info for macros, do not emit labels for empty
macinfo sections.
Reviewers: probinson, echristo
Subscribers: aprantl, llvm-commits, JDevlieghere
Differential Revision: https://reviews.llvm.org/D43589
llvm-svn: 325803
Summary:
Both of these errors should have been caught by type-checking during
parsing.
Change-Id: I891087936fd1a91d21bcda57c256e3edbe12b94d
Reviewers: arsenm, craig.topper, tra, MartinO
Subscribers: wdng, llvm-commits
Differential Revision: https://reviews.llvm.org/D43558
llvm-svn: 325800
Summary:
Perhaps the distinction between the two should be removed entirely
in the long term, and the [{ ... }] syntax should just be a convenient
way of writing multi-line strings.
In the meantime, a lot of existing .td files are quite relaxed about
string vs. code, and this change allows switching on more consistent
type checks without breaking those.
Change-Id: If85e3e04469e41b58e2703b62ac0032d2711713c
Reviewers: arsenm, craig.topper, tra, MartinO
Subscribers: wdng, llvm-commits
Differential Revision: https://reviews.llvm.org/D43557
llvm-svn: 325799
Summary:
There are no new test cases, but a subsequent patch will introduce
assertions that would be triggered by existing test cases without this
fix.
Change-Id: I6a82d4b311b012aff3932978ae86f6a2dcfbf725
Reviewers: arsenm, craig.topper, tra, MartinO
Subscribers: wdng, llvm-commits
Differential Revision: https://reviews.llvm.org/D43556
llvm-svn: 325798
Summary:
In the case of !foreach(id, input-list, transform) where the type of
input-list is list<A> and the type of transform is B, we now correctly
deduce list<B> as the type of the !foreach.
Change-Id: Ia19dd65eecc5991dd648280ba6a15f6a20fd61de
Reviewers: arsenm, craig.topper, tra, MartinO
Subscribers: wdng, llvm-commits
Differential Revision: https://reviews.llvm.org/D43555
llvm-svn: 325797
Summary:
This way, it should work even with complex operands.
Change-Id: Iaccf5bbb50bd5882a0ba5d59689e4381315fb361
Reviewers: arsenm, craig.topper, tra, MartinO
Subscribers: wdng, llvm-commits
Differential Revision: https://reviews.llvm.org/D43554
llvm-svn: 325796
Summary:
.NAME is a bit of an odd duck, in that we should really treat it like
a template argument, but we currently don't, and so when and where
NAME is initialized and how is pretty inconsistent. Best to just avoid
using it as a field of already instantiated records, and use cast to
string instead.
Change-Id: I5a0c202401cede3d5c3827ab9c7858ea48b29108
Reviewers: arsenm, rampitec
Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits
Differential Revision: https://reviews.llvm.org/D43551
llvm-svn: 325794
Implement c.lui immediate constraint to [1, 31] and [0xfffe0, 0xfffff].
The RISC-V ISA describes the constraint as [1, 63], with that value
being loaded in to bits 17-12 of the destination register and sign extended
from bit 17. Therefore, this 6-bit immediate can represent values in the
ranges [1, 31] and [0xfffe0, 0xfffff].
Differential Revision: https://reviews.llvm.org/D42834
llvm-svn: 325792
- Fix for bug 36078.
- Prevent the functionattrs, function-attrs, globalopt and argpromotion passes
from changing naked functions.
- These passes can perform some alterations to the functions that should not be
applied. An example is removing parameters that are seemingly not used because
they are only referenced in the inline assembly. Another example is marking
the function as fastcc.
llvm-svn: 325788
There were no memory dependencies made between stores generated
when lowering formal arguments and loads generated when
call lowering byVal arguments which made the Post-RA scheduler
place a load before a matching store.
Make the fixed object stored to mutable so that the load
instructions can have their memory dependencies added
Set the frame object as isAliased which clears the underlying
objects vector in ScheduleDAGInstrs::buildSchedGraph().
This results in addition of all stores as dependenies for loads.
This problem appeared when passing a byVal parameter
coupled with a fastcc function call.
Differential Revision: https://reviews.llvm.org/D37515
llvm-svn: 325782
NFC intended, syndicate common code to a parametric base class. Part of the original problem is that InvokeInst is a TerminatorInst, unlike CallInst. the problem is solved by introducing a parametrized class paramtertized by its base.
Differential Revision: https://reviews.llvm.org/D40727
llvm-svn: 325778
As pointed out by @sabuasal in a comment on D23568, the logic in
RISCVMCCodeEmitter::getImmOpValue could be more defensive. Although with the
current instruction definitions it is always the case that `VK_RISCV_LO` is
always used with either an I- or S-format instruction, this may not always be
the case in the future. Add a check to ensure we will get an assertion in
debug builds if that changes.
llvm-svn: 325775
Fixup to rL325573 for large xor constants.
Thanks to Eli Friedman for the catch.
Differential revision: https://reviews.llvm.org/D43549
llvm-svn: 325761
Calling realpath is expensive but necessary to perform the uniqueing in
dsymutil. Although we already cached the results for every individual
file in the line table, we had reports of it taking 40 seconds of a 3.5
minute link.
This patch adds a second level of caching. When we do have to call
realpath, we cache its result for its parents path. We didn't replace
the existing caching, because it's fast (indexed) and saves us from
reading the line table for entries we've already seen.
For WebkitCore this results in a decrease of 11% in linking time: from
85.79 to 76.11 seconds (average over 3 runs).
Differential revision: https://reviews.llvm.org/D43511
llvm-svn: 325757
This is a follow up of r325012, that allowed half types in constant pools.
Proper alignment was enforced when a big basic block was split up, but not when
a CPE was placed before/after a block; the successor block had the wrong
alignment.
Differential Revision: https://reviews.llvm.org/D43580
llvm-svn: 325754
We looked through a BITCAST, but the bitcast might be a from a scalar type rather than a vector.
I don't have a test case. I stumbled onto it while prototyping another change that isn't ready yet.
llvm-svn: 325750
Summary:
Exposing getOffset and findFunctionSamples as members of
SampleProfile. They are intimately tied to design choices of the
sample profile format - using offsets instead of line numbers, and
traversing inlined functions stack, respectively.
Reviewers: davidxl
Reviewed By: davidxl
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D43605
llvm-svn: 325747
SCEV has multiple occurences of code when we need to prove some predicate on
every iteration of a loop and do it with invocations of couple `isLoopEntryGuardedByCond`,
`isLoopBackedgeGuardedByCond`. This patch factors out these two calls into a separate
method. It is a preparation step to extend this logic: it is not the only way how we can prove
such conditions.
Differential Revision: https://reviews.llvm.org/D43373
llvm-svn: 325745
An FRem instruction inside a loop should prevent the loop from being converted
into a CTR loop since this is not an operation that is legal on any PPC
subtarget. This will always be a call to a library function which means the
loop will be invalid if this instruction is in the body.
Fixes PR36292.
llvm-svn: 325739
According to the current coverage report salvageDebugInfo() is called
5.12 million times during testing and almost always returns early.
The early return depends on LocalAsMetadata::getIfExists returning null,
which involves a DenseMap lookup in an LLVMContextImpl. We can probably
speed this up by simply checking the IsUsedByMD bit in Value.
llvm-svn: 325738
The pahole does not work with BPF backend properly:
-bash-4.2$ cat test.c
struct test_t {
int a;
int b;
};
int test(struct test_t *s) {
return s->a;
}
-bash-4.2$ clang -g -O2 -target bpf -c test.c
-bash-4.2$ pahole test.o
struct clang version 7.0.0 (trunk 325446) (llvm/trunk 325464) {
clang version 7.0.0 (trunk 325446) (llvm/trunk 325464) clang version 7.0.0 (trunk 325446) (llvm/trunk 325464); /* 0 4 */
clang version 7.0.0 (trunk 325446) (llvm/trunk 325464) clang version 7.0.0 (trunk 325446) (llvm/trunk 325464); /* 4 4 */
/* size: 8, cachelines: 1, members: 2 */
/* last cacheline: 8 bytes */
};
-bash-4.2$
The reason is that BPF backend is not yet implemented in elfutils backend
https://github.com/threatstack/elfutils/tree/master/backends
and pahole depends on elfutils for dwarf parsing and resolving relocation.
More specifically, the unsupported relocation in .debug_info for type/member name
against symbol table caused the incorrect result above. The following is
the raw .rel.debug_info for the above example,
Hex dump of section '.rel.debug_info':
0x00000000 06000000 00000000 0a000000 0b000000 ................
0x00000010 0c000000 00000000 0a000000 01000000 ................
0x00000020 12000000 00000000 0a000000 02000000 ................
0x00000030 16000000 00000000 0a000000 0e000000 ................
0x00000040 1a000000 00000000 0a000000 03000000 ................
----------------- -------- --------
reloc location type symtab index
Hex dump of section '.debug_info':
0x00000000 7b000000 04000000 00000801 00000000 {...............
0x00000010 0c000000 00000000 00000000 00000000 ................
0x00000020 00000000 00001000 00000200 00000000 ................
Based on "type", the proper value will be extracted from symbol table
and filled in .debug_info so later on .debug_info can be properly
resolved against debug strings.
There are two ways to fix this problem. One is to fix elfutils by adding
BPF support which is desirable. This could take a long time and won't work
with already deployed pahole. For a short term workaround, we can disable
dwarf cross-section relation which specifically avoids debug_info and
symbol table cross relocation. This should help any dwarf-related tool
which has not implement BPF specific relocations yet.
Now .rel.debug_info does not have any relocation for symbol table and
.debug_info itself contains necessary relocation information by itself.
Hex dump of section '.debug_info':
0x00000000 7b000000 04000000 00000801 00000000 {...............
0x00000010 0c003700 00000000 00003e00 00000000 ..7.......>.....
0x00000020 00000000 00001000 00000200 00000000 ................
location 0xc has 0, 0x12 has 0x37, 0x1a has 0x3e in place which
will be used in relocation resolution. Here, the values of 0, 0x37 and 0x3e
are offset in .debug_str section.
Please note the difference between two above .debug_info dumps.
With the fix, pahole works properly with BPF backend:
-bash-4.2$ clang -O2 -g -target bpf -c test.c
-bash-4.2$ pahole test.o
struct test_t {
int a; /* 0 4 */
int b; /* 4 4 */
/* size: 8, cachelines: 1, members: 2 */
/* last cacheline: 8 bytes */
};
Signed-off-by: Yonghong Song <yhs@fb.com>
llvm-svn: 325735
The issue was that the has function was generating different results depending
on the signedness of char on the host platform. This commit fixes the issue by
explicitly using an unsigned char type to prevent sign extension and
adds some extra tests.
The original commit message was:
This patch implements a variant of the DJB hash function which folds the
input according to the algorithm in the Dwarf 5 specification (Section
6.1.1.4.5), which in turn references the Unicode Standard (Section 5.18,
"Case Mappings").
To achieve this, I have added a llvm::sys::unicode::foldCharSimple
function, which performs this mapping. The implementation of this
function was generated from the CaseMatching.txt file from the Unicode
spec using a python script (which is also included in this patch). The
script tries to optimize the function by coalescing adjecant mappings
with the same shift and stride (terms I made up). Theoretically, it
could be made a bit smarter and merge adjecant blocks that were
interrupted by only one or two characters with exceptional mapping, but
this would save only a couple of branches, while it would greatly
complicate the implementation, so I deemed it was not worth it.
Since we assume that the vast majority of the input characters will be
US-ASCII, the folding hash function has a fast-path for handling these,
and only whips out the full decode+fold+encode logic if we encounter a
character outside of this range. It might be possible to implement the
folding directly on utf8 sequences, but this would also bring a lot of
complexity for the few cases where we will actually need to process
non-ascii characters.
Reviewers: JDevlieghere, aprantl, probinson, dblaikie
Subscribers: mgorny, hintonda, echristo, clayborg, vleschuk, llvm-commits
Differential Revision: https://reviews.llvm.org/D42740
llvm-svn: 325732
than a shared ObjectFile/MemoryBuffer pair.
There's no need to pre-parse the buffer into an ObjectFile before passing it
down to the linking layer, and moving the parsing into the linking layer allows
us remove the parsing code at each call site.
llvm-svn: 325725
This is a slight reduction of one of the benchmarks
that suffered with D43079. Cost model changes should
not cause this test to remain scalarized.
llvm-svn: 325717
This patch changes hwasan inline instrumentation:
Fixes address untagging for shadow address calculation (use 0xFF instead of 0x00 for the top byte).
Emits brk instruction instead of hlt for the kernel and user space.
Use 0x900 instead of 0x100 for brk immediate (0x100 - 0x800 are unavailable in the kernel).
Fixes and adds appropriate tests.
Patch by Andrey Konovalov.
Differential Revision: https://reviews.llvm.org/D43135
llvm-svn: 325711
Add a test that checks that kernel inline instrumentation works.
Patch by Andrey Konovalov!
Differential Revision: https://reviews.llvm.org/D42473
llvm-svn: 325710
Summary:
IMAGE_REL_AMD64_ADDR32NB relocations are currently set to zero in all cases.
This patch sets the relocation to the correct value when possible and shows an error when not.
Reviewers: enderby, lhames, compnerd
Reviewed By: compnerd
Subscribers: LepelTsmok, compnerd, martell, llvm-commits
Differential Revision: https://reviews.llvm.org/D30709
llvm-svn: 325700
Enable multiple COPY hints to eliminate more COPYs during register allocation.
Note that this is something all targets should do, see
https://reviews.llvm.org/D38128.
Review: Krzysztof Parzyszek
llvm-svn: 325697
Global Dynamic and Local Dynamic call relocations only implicitly
reference __tls_get_addr; there is no connection in the ELF file between
the relocations and the symbol other than the specification for the
relocations' semantics. However, it still needs to be in the symbol
table despite the lack of explicit references to the symbol table entry,
since it needs to be bound at link time for these relocations, otherwise
any objects will fail to link.
For details, see https://sourceware.org/bugzilla/show_bug.cgi?id=22832.
Path by: James Clarke (jrtc27)
Differential revision: https://reviews.llvm.org/D43271
llvm-svn: 325688
of turning SCEVUnknowns of PHIs into AddRecExprs.
This feature is now hidden behind the -scev-version-unknown flag.
Fixes PR36032 and PR35432.
llvm-svn: 325687
This results in 15 additional unique source variables in a stage2 build
of FileCheck (at '-Os -g'), with a negligible increase in the size of
the .debug_loc section.
llvm-svn: 325660
There are too many perf regressions resulting from this, so we need to
investigate (and add tests for) targets like ARM and AArch64 before
trying to reinstate.
llvm-svn: 325658
Cannon Lake does not support CLWB, therefore it
does not include all features listed under SKX anymore.
Instead, enumerate all SKX features with the exception of CLWB.
Patch by Gabor Buella
Differential Revision: https://reviews.llvm.org/D43380
llvm-svn: 325654
This patch provides mitigation for CVE-2017-5715, Spectre variant two,
which affects the P5600 and P6600. It implements the LLVM part of
-mindirect-jump=hazard. It is _not_ enabled by default for the P5600.
The migitation strategy suggested by MIPS for these processors is to use
hazard barrier instructions. 'jalr.hb' and 'jr.hb' are hazard
barrier variants of the 'jalr' and 'jr' instructions respectively.
These instructions impede the execution of instruction stream until
architecturally defined hazards (changes to the instruction stream,
privileged registers which may affect execution) are cleared. These
instructions in MIPS' designs are not speculated past.
These instructions are used with the attribute +use-indirect-jump-hazard
when branching indirectly and for indirect function calls.
These instructions are defined by the MIPS32R2 ISA, so this mitigation
method is not compatible with processors which implement an earlier
revision of the MIPS ISA.
Performance benchmarking of this option with -fpic and lld using
-z hazardplt shows a difference of overall 10%~ time increase
for the LLVM testsuite. Certain benchmarks such as methcall show a
substantially larger increase in time due to their nature.
Reviewers: atanasyan, zoran.jovanovic
Differential Revision: https://reviews.llvm.org/D43486
llvm-svn: 325653
Summary:
We used to remove the first memmove in cases like this:
memmove(p, p+2, 8);
memmove(p, p+2, 8);
which is incorrect. Fix this by changing isPossibleSelfRead to what was most
likely the intended behavior.
Historical note: the buggy code was added in https://reviews.llvm.org/rL120974
to address PR8728.
Reviewers: rsmith
Subscribers: mcrosier, llvm-commits, jlebar
Differential Revision: https://reviews.llvm.org/D43425
llvm-svn: 325641
Spilling may cause previously non-empty intervals (both for the spilled vreg
and others) to become empty. Moving the pruning into initializeGraph catches
these cases and fixes PR33038.
llvm-svn: 325632
This is usually not a problem because this code's main purpose is
eliminating unused new/delete pairs. We got deletes of nullptr or
nobuiltin deletes of builtin new wrong though.
llvm-svn: 325630
This is split off from D42948 and includes just the cases that constant fold to true or false. It also includes some refactoring to keep predicate checks together.
This supports things like
(setcc uge X, 0) -> true
Differential Revision: https://reviews.llvm.org/D43489
llvm-svn: 325627
Get rid of icky goto loops and make the code easier to maintain. Otherwise,
NFC.
Restore r324903 and fix PR36369.
Differentail revision: https://reviews.llvm.org/D43364
llvm-svn: 325621
Summary:
With D43396, no clients use the Path parameter anymore.
Depends on D43396.
Reviewers: pcc
Subscribers: mehdi_amini, inglorion, llvm-commits
Differential Revision: https://reviews.llvm.org/D43400
llvm-svn: 325619
Summary:
This will avoid the race condition described in the review for D37993.
I believe that the Path parameter to AddBufferFn is no longer utilized.
I would prefer to remove that as a follow up clean up patch to reduce
the diffs in this patch.
Reviewers: pcc
Reviewed By: pcc
Subscribers: inglorion, llvm-commits
Differential Revision: https://reviews.llvm.org/D43396
llvm-svn: 325618
The bug was introduced here:
https://reviews.llvm.org/rL296409
...but the patch doesn't use maxnum and nothing else in
trunk has tried since then, so the bug went unnoticed.
llvm-svn: 325607
SimplifyDemandedBits forces the demanded mask to all 1s if the node has multiple uses, unless the AssumeSingleUse flag is set.
So previously we were only really likely to simplify something if the condition had a single use. And on the off chance we did simplify with multiple uses the demanded mask being used was all ones so there was no reason to create a shrunkblend.
This patch now checks that the condition is only used by selects first, and then sets the AssumeSingleUse flag for the simplifcation. Then we convert the selects to shrunkblend, and finally replace condition.
Differential Revision: https://reviews.llvm.org/D43446
llvm-svn: 325604
DAGCombiner and SimplifySetCC both use getPointerTy for shift amounts pre-legalization. DAGCombiner uses a single helper function to hide this. SimplifySetCC does it in multiple places.
This patch adds a defaulted parameter to getShiftAmountTy that can make it return getPointerTy for scalar types. Use this parameter to simplify the SimplifySetCC and DAGCombiner.
Additionally, there were two places in SimplifySetCC that were creating shifts using the target's preferred shift amount pre-legalization. If the target uses a narrow type and the type is illegal, this can cause SimplfiySetCC to create a shift with an amount that can't represent all possible shift values for the type. To fix this we should use pointer type there too.
Alternatively we could make getScalarShiftAmountTy for each target return a safe value for large types as proposed in D43445. And maybe we should still do that, but fixing the SimplifySetCC code keeps other targets from tripping over this in the future.
Fixes PR36250.
Differential Revision: https://reviews.llvm.org/D43449
llvm-svn: 325602
This allows us to avoid an opsize prefix. And forcing some move immediates to i32 avoids a length changing prefix on those instructions.
This mostly replaces the existing combine we had for zext/sext+cmov of constants. I left in a case for sign extending a 32 bit cmov of constants to 64 bits.
Differential Revision: https://reviews.llvm.org/D43327
llvm-svn: 325601
This patch contains logic for handling DW_TAG_label that's present in
darwin's dsymutil implementation, but not yet upstream.
Differential revision: https://reviews.llvm.org/D43438
llvm-svn: 325600
Summary:
Currently vim syntax highlighting recognizes 'CHECK:' as a special
comment, but not CHECK-DAG, CHECK-NOT and other CHECKs. This patch
adds rules for these comments.
Reviewers: chandlerc, compnerd, rogfer01
Reviewed By: rogfer01
Subscribers: rogfer01, llvm-commits
Differential Revision: https://reviews.llvm.org/D43289
llvm-svn: 325599
These are fdiv-with-constant-divisor, so they already become
reciprocal multiplies. The last gap for vector ops should be
closed with rL325590.
It's possible that we're missing folds for some edge cases
with denormal intermediate constants after deleting these,
but there are no tests for those patterns, and it would be
better to handle denormals more consistently (and less
conservatively) as noted in TODO comments.
llvm-svn: 325595
An upcoming patch D41434, changes the ordering of the matcher table
for assembly. This patch corrects the definition of the normal MIPS
cvt.d.w not to be available in microMIPS.
llvm-svn: 325589
Current implementation always allocates the parameter save area conservatively
for fastcc functions. There is no reason to allocate the parameter save area if
all the parameters can be passed via registers.
Differential Revision: https://reviews.llvm.org/D42602
llvm-svn: 325581
ExpandUINT_TO_FLOAT can accept vXi32 or vXi64 inputs, so we need to use a uint64_t shift to generate the 2^(BW/2) constant.
No test case unfortunately as no upstream target uses this, but its affecting a downstream target.
llvm-svn: 325578