The function's new implementation from r340583 had a bug in it that
could cause an invalid scope to be generated when merging two
DILocations with no common ancestor scope.
This patch detects this situation and picks the scope of the first
location. This is not perfect, because the scope is misleading, but on
the other hand, this will be a line 0 location.
rdar://problem/43687474
Differential Revision: https://reviews.llvm.org/D51238
llvm-svn: 340672
demangling process when it does.
Use this to support a "lookup" query for the mangling canonicalizer that
does not create new nodes. This could also be used to implement
demangling with a fixed-size temporary storage buffer.
Reviewers: erik.pilkington
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D51003
llvm-svn: 340670
This node doesn't directly correspond to a mangled name fragment, so
it's useful to explicitly describe when it's created and what it's for.
llvm-svn: 340664
Summary:
Given a set of equivalent name fragments, this mechanism determines whether two
mangled names are equivalent. The intent is to use this for fuzzy matching of
profile data against the program after certain refactorings are performed.
Reviewers: erik.pilkington, dlj
Subscribers: mgorny, llvm-commits
Differential Revision: https://reviews.llvm.org/D50935
llvm-svn: 340663
Choosing to revert the change and do it again, hopefully preserving the history
of the changes by using svn copy instead of simply creating a new file from the
contents within Scheduler.
llvm-svn: 340661
Back in https://reviews.llvm.org/D19559, I tried to teach CVP about range facts implied by value/value icmps (i.e. no constants.) In the meantime, we've implemented the optimization, but I couldn't find tests checked in, so adding them.
llvm-svn: 340660
Otherwise, the debug info is incorrect. On its own, this is mostly
harmless, but the safe-stack also later inlines the call to
__safestack_pointer_address, which leads to debug info with the wrong
scope, which eventually causes an assertion failure (and incorrect debug
info in release mode).
Differential Revision: https://reviews.llvm.org/D51075
llvm-svn: 340651
This (partially) fixes a regression introduced by
https://reviews.llvm.org/D43945 / r327399, which parallelized
DwarfLinker. This patch avoids parsing and allocating the memory for
all input DIEs up front and instead only allocates them in the
concurrent loop in the AnalyzeLambda. At the end of the loop the
memory from the LinkContext is cleared again.
This reduces the peak memory needed to link the debug info of a
non-modular build of the Swift compiler by >3GB.
rdar://problem/43444464
Differential Revision: https://reviews.llvm.org/D51078
llvm-svn: 340650
Firstly, require the symbol to be used within the module. If a
symbol is unused within a module, then by definition it cannot be
address-significant within that module. This condition is useful on all
platforms because it could make symbol tables smaller -- without this
change, emitting an address-significance table could cause otherwise
unused undefined symbols to be added to the object file.
But this change is necessary with COFF specifically in order to
preserve the property that an unreferenced undefined symbol in an IR
module does not result in a link failure. This is already the case for
ELF because ELF linkers only reject links with unresolved symbols if
there is a relocation to that symbol, but COFF linkers require all
undefined symbols to be resolved regardless of relocations. So if
a module contains an unreferenced undefined symbol, we need to make
sure not to add it to the address-significance table (and thus the
symbol table) in case it doesn't end up resolved at link time.
Secondly, do not add dllimport symbols to the table. These symbols
won't be able to be resolved because their definitions live in another
module and are accessed via the IAT, and the address-significance
table has no effect on other modules anyway. It wouldn't make sense
to add the IAT entry symbol to the address-significance table either
because the IAT entry isn't address-significant -- the generated code
never takes its address.
Differential Revision: https://reviews.llvm.org/D51199
llvm-svn: 340648
My previoust test case had skipped CUs from one TU out of a two-TU LTO
scenario, which meant the CU index wasn't needed (as it was unambiguous
which CU a table entry applied to) - expanding the test to use 3 TUs,
skipping one (so long as it's not the last one) shows the indexes are
miscomputed. Fix that with a little indirection for the index.
llvm-svn: 340646
This patch will address using the xscpsgndp instruction to copy floating point
scalar registers instead of the xxlor (specifically XXLORf) instruction that is
currently used. Additionally, this patch of utilizing xscpsgndp will apply to
P9, while pre-P9 will still use xxlor.
Patch by amyk
Differential Revision: https://reviews.llvm.org/D50004
llvm-svn: 340643
Previously we allowed the store to be Custom. But without knowing for sure that the Custom handling won't split the store, we shouldn't convert a volatile store. We also probably shouldn't be creating a store the requires custom handling after LegalizeOps. This could lead to an infinite loop if the custom handling was to insert a bitcast. Though I guess isStoreBitCastBeneficial could be used to block such a loop.
The test changes here are due to the volatile part of this. The stores in the test are all volatile and i32 stores are marked custom, So we are no longer converting them
This is related to D50491 where I was trying to allow some bitcasting of volatile loads
Differential Revision: https://reviews.llvm.org/D50578
llvm-svn: 340626
Summary:
Sometimes reading an output *.ll file it is not easy to understand why some callsites are not inlined. We can read output of inline remarks (option --pass-remarks-missed=inline) and try correlating its messages with the callsites.
An easier way proposed by this patch is to add to every callsite processed by Inliner an attribute with the latest message that describes the cause of not inlining this callsite. The attribute is called //inline-remark//. By default this feature is off. It can be switched on by the option //-inline-remark-attribute//.
For example in the provided test the result method //@test1// has two callsites //@bar// and inline remarks report different inlining missed reasons:
remark: <unknown>:0:0: bar not inlined into test1 because too costly to inline (cost=-5, threshold=-6)
remark: <unknown>:0:0: bar not inlined into test1 because it should never be inlined (cost=never): recursive
It is not clear which remark correspond to which callsite. With the inline remark attribute enabled we get the reasons attached to their callsites:
define void @test1() {
call void @bar(i1 true) #0
call void @bar(i1 false) #2
ret void
}
attributes #0 = { "inline-remark"="(cost=-5, threshold=-6)" }
..
attributes #2 = { "inline-remark"="(cost=never): recursive" }
Patch by: yrouban (Yevgeny Rouban)
Reviewers: xbolva00, tejohnson, apilipenko
Reviewed By: xbolva00, tejohnson
Subscribers: eraman, llvm-commits
Differential Revision: https://reviews.llvm.org/D50435
llvm-svn: 340618
Once the invariant_start is reached, we know that no instruction *after* it can modify the memory. So, if we can prove the location isn't read *between entry into the loop and the execution of the invariant_start*, we can execute the invariant_start before entering the loop.
Differential Revision: https://reviews.llvm.org/D51181
llvm-svn: 340617
The way that PhiValues is integrated with BasicAA it is possible for a pass
which uses BasicAA to pick up an instance of BasicAA that uses PhiValues without
intending to, and then delete values from a function in a way that causes
PhiValues to return dangling pointers to these deleted values. Fix this by
having a set of callback value handles to invalidate values when they're
deleted.
llvm-svn: 340613
When used in cross-DSO mode, CFI will generate calls to special functions rather than trap instructions. For example, instead of generating
if (!InlinedFastCheck(f))
abort();
call *f
CFI generates
if (!InlinedFastCheck(f))
__cfi_slowpath(CallSiteTypeId, f);
call *f
This patch teaches cfi-verify to recognize calls to __cfi_slowpath and abort and treat them as trap functions.
In addition to normal symbols, we also parse the dynamic relocations to handle cross-DSO calls in libraries.
We also extend cfi-verify to recognize other patterns that occur using cross-DSO. For example, some indirect calls are not guarded by a branch to a trap but instead follow a call to __cfi_slowpath. For example:
if (!InlinedFastCheck(f))
call *f
else {
__cfi_slowpath(CallSiteTypeId, f);
call *f
}
In this case, the second call to f is not marked as protected by the current code. We thus recognize if indirect calls directly follow a call to a function that will trap on CFI violations and treat them as protected.
We also ignore indirect calls in the PLT, since on AArch64 each entry contains an indirect call that should not be protected by CFI, and these are labeled incorrectly when debug information is not present.
Differential Revision: https://reviews.llvm.org/D49383
llvm-svn: 340612
This adds a new method to ELFObjectFileBase that returns the symbols and addresses of PLT entries.
This design was suggested by pcc and eugenis in https://reviews.llvm.org/D49383.
Differential Revision: https://reviews.llvm.org/D50203
llvm-svn: 340610
This patch makes the DoesKMove argument non-optional, to force people
to think about it. Most cases where it is false are either code hoisting
or code sinking, where we pick one instruction from a set of
equal instructions among different code paths.
Reviewers: dberlin, nlopes, efriedma, davide
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D47475
llvm-svn: 340606
This patch splits the file trace loading function into two versions, one
that takes a filename and one that takes a `DataExtractor`.
This change is a precursor to larger changes to increase test coverage
for the trace loading implementation.
llvm-svn: 340603
Having the KnownBits as an output parameter is kind of awkward to use
and a holdover from when it was two separate APInts. Instead, just
return a KnownBits object.
I'm leaving the existing interface in place for now, since updating
the callers all at once would be thousands of lines of diff.
llvm-svn: 340594
Per LLVM's CommandGuide, llvm-profdata show -text is supposed to produce
textual output that can be passed as input to further llvm-profdata
invocations. This previously didn't work for two reasons:
1) -text was not sufficient to enable the machine-readable text format output;
instead, -text was effectively ignored if -counts was not also specified. (With
this patch, -counts is instead ignored if -text is specified, because the
machine-readable text format always includes counts.)
2) When the input data was an IR-level profile, the :ir marker was missing from
the output, resulting in a text format output that would not be usable as
profiling data due to function hash mismatches.
Differential Revision: https://reviews.llvm.org/D51188
llvm-svn: 340592
Fix a set of related bugs:
* Considering two locations as equivalent when their lines are the same
but their scopes are different causes erroneous debug info that
attributes a commoned call to be attributed to one of the two calls it
was commoned from.
* The previous code to compute a new location's scope was inaccurate and
would use the inlinedAt that was the /parent/ of the inlinedAt that is
the nearest common one, and also used that parent scope instead of the
nearest common scope.
* Not generating new locations generally seemed like a lower quality
choice
There was some risk that generating more new locations could hurt object
size by making more fine grained line table entries, but it looks like
that was offset by the decrease in line table (& address & ranges) size
caused by more accurately computing the scope - which likely lead to
fewer range entries (more contiguous ranges) & reduced size that way.
All up with these changes I saw minor reductions (-1.21%, -1.77%) in
.rela.debug_ranges and .rela.debug_addr (in a fission, compressed debug
info build) as well as other minor size changes (generally reductinos)
across the board (-1.32% debug_info.dwo, -1.28% debug_loc.dwo). Measured
in an optimized (-O2) build of the clang binary.
If you are investigating a size regression in an optimized debug builds,
this is certainly a patch to look into - and I'd be happy to look into
any major regressions found & see what we can do to address them.
llvm-svn: 340583
In order for more complex updates of MSSA to happen (e.g. those in
D45299), MemoryDefs need to be actual `Use`s of what they're optimized
to. This patch makes that happen.
In addition, this patch changes our optimization behavior for Defs
slightly: we'll now consider a Def optimization invalid if the
MemoryAccess it's optimized to changes. That we weren't doing this
before was a bug, but given that we were tracking these with a WeakVH
before, it was sort of difficult for that to matter.
We're already have both of these behaviors for MemoryUses. The
difference is that a MemoryUse's defining access is always its optimized
access, and defining accesses are always `Use`s (in the LLVM sense).
Nothing exploded when testing a stage3 clang+llvm locally, so...
This also includes the test-case promised in r340461.
llvm-svn: 340577
Lower integer arguments smaller than i32.
Support both register and stack arguments.
Define setLocInfo function for setting LocInfo field in ArgLocs vector.
Patch by Petar Avramovic.
Differential Revision: https://reviews.llvm.org/D51031
llvm-svn: 340572
Summary:
Splats are fewer bytes than v128.consts, so use them when either could
apply.
Reviewers: aheejin, dschuff
Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits
Differential Revision: https://reviews.llvm.org/D51179
llvm-svn: 340569
Summary:
Remove the use of pair inside the tuple in concat_iterator, and create separate begins and ends tuples instead.
This fixes the failure for llvm <= 3.7 and libstd++ that broke the hexagon build.
Reviewers: timshen
Subscribers: sanjoy, jlebar, dexonsmith, kparzysz, llvm-commits
Differential Revision: https://reviews.llvm.org/D51067
llvm-svn: 340567
The variable index pattern is different than the constant index
cases as shown in D51125. We might want to splat regardless of
whether the scalar is loaded from memory or transferred from GPR.
llvm-svn: 340565
Summary:
Avoid "count" if possible -> use "find" to check for the existence of keys.
Passed llvm test suite.
Reviewers: fhahn, dcaballe, mkuper, rengolin
Reviewed By: fhahn
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D51054
llvm-svn: 340563
The previous change ignored the latter resulting in crash dumps being generated when LLVM_ENABLE_CRASH_DUMPS was
set, but coreFilesPrevented was true.
llvm-svn: 340561
Summary:
I got "Use not jointly dominated by defs" when removePartialRedundancy
attempted to prune then re-extend a subrange whose only liveness was a
dead def at the copy being removed.
V2: Removed junk from test. Improved comment.
V3: Addressed minor review comments.
Subscribers: MatzeB, qcolombet, nhaehnle, llvm-commits
Differential Revision: https://reviews.llvm.org/D50914
Change-Id: I6f894e9f517f71e921e0c6d81d28c5f344db8dad
llvm-svn: 340549
We need to allow ConstantExpr Selects in addition to SelectInst.
I'll try to put together a test case, but I wanted to fix the issues being reported.
Fixes PR38677
llvm-svn: 340546
Thanks to @waltl for reporting this issue.
I have also added an assert to check for invalid null strategy objects, and I
have reworded a couple of code comments in Scheduler.h
llvm-svn: 340545
The commit that added this functionality:
rL322957
may be causing/exposing a miscompile in PR38648:
https://bugs.llvm.org/show_bug.cgi?id=38648
so allow enabling/disabling to make debugging easier.
llvm-svn: 340540
These changes expand the FunctionAttr logic in order to mark functions as
WriteOnly when appropriate. This is done through an additional bool variable
and extended logic.
Reviewers: hfinkel, jdoerfert
Differential Revision: https://reviews.llvm.org/D48387
llvm-svn: 340537
With this patch, users can now customize the pipeline selection strategy for
scheduler resources. The resource selection strategy can be defined at processor
resource granularity. This enables the definition of different strategies for
different hardware schedulers.
To override the strategy associated with a processor resource, users can call
method ResourceManager::setCustomStrategy(), and pass a 'ResourceStrategy'
object in input.
Class ResourceStrategy is an abstract class which declares virtual method
`ResourceStrategy::select()`. Method select() is meant to implement the actual
strategy; it is responsible for picking the next best resource from a set of
available pipeline resources. Custom strategy must simply override that method.
By default, processor resources are associated with instances of
'DefaultResourceStrategy'. A 'DefaultResourceStrategy' internally implements a
simple round-robin selector. For more details, please refer to the code comments
in Scheduler.h.
llvm-svn: 340536
When GVN sets the incoming value for a phi to undef because the incoming block
is unreachable it needs to also invalidate the cached info for that phi in
MemoryDependenceAnalysis, otherwise later queries will return stale information.
Differential Revision: https://reviews.llvm.org/D51099
llvm-svn: 340529
Both DWARFDebugLine and DWARFDebugAddr used the same callback mechanism
for handling recoverable errors. They both implemented similar warn() function
to be used as such callbacks.
In this revision we get rid of code duplication and move this warn() function
to DWARFContext as DWARFContext::dumpWarning().
Reviewers: lhames, jhenderson, aprantl, probinson, dblaikie, JDevlieghere
Reviewed By: jhenderson
Differential Revision: https://reviews.llvm.org/D51033
llvm-svn: 340528
This version of the patch fixes cleaning up ssa_copy intrinsics, so it does not
crash for instructions in blocks that have been marked unreachable.
This patch updates IPSCCP to use PredicateInfo to propagate
facts to true branches predicated by EQ and to false branches
predicated by NE.
As a follow up, we should be able to extend it to also propagate additional
facts about nonnull.
Reviewers: davide, mssimpso, dberlin, efriedma
Reviewed By: davide, dberlin
Differential Revision: https://reviews.llvm.org/D45330
llvm-svn: 340525
For the _WIN32 macro, it is the definedness that matters rather than
the value. Most uses of the macro already rely on the definedness.
This commit fixes the few remaining uses that relied on the value.
Differential Revision: https://reviews.llvm.org/D51105
llvm-svn: 340520
Most users won't have to worry about this as all of the
'getOrInsertFunction' functions on Module will default to the program
address space.
An overload has been added to Function::Create to abstract away the
details for most callers.
This is based on https://reviews.llvm.org/D37054 but without the changes to
make passing a Module to Function::Create() mandatory. I have also added
some more tests and fixed the LLParser to accept call instructions for
types in the program address space.
Reviewed By: bjope
Differential Revision: https://reviews.llvm.org/D47541
llvm-svn: 340519
subtarget features for indirect calls and indirect branches.
This is in preparation for enabling *only* the call retpolines when
using speculative load hardening.
I've continued to use subtarget features for now as they continue to
seem the best fit given the lack of other retpoline like constructs so
far.
The LLVM side is pretty simple. I'd like to eventually get rid of the
old feature, but not sure what backwards compatibility issues that will
cause.
This does remove the "implies" from requesting an external thunk. This
always seemed somewhat questionable and is now clearly not desirable --
you specify a thunk the same way no matter which set of things are
getting retpolines.
I really want to keep this nicely isolated from end users and just an
LLVM implementation detail, so I've moved the `-mretpoline` flag in
Clang to no longer rely on a specific subtarget feature by that name and
instead to be directly handled. In some ways this is simpler, but in
order to preserve existing behavior I've had to add some fallback code
so that users who relied on merely passing -mretpoline-external-thunk
continue to get the same behavior. We should eventually remove this
I suspect (we have never tested that it works!) but I've not done that
in this patch.
Differential Revision: https://reviews.llvm.org/D51150
llvm-svn: 340515
Aligning section contents is not required, but only
recommended, by the specification. Microsoft's documentation says
(https://docs.microsoft.com/en-us/windows/desktop/debug/pe-format#section-table-section-headers):
"For object files, the value should be aligned on a 4-byte boundary
for best performance."
However, according to my measurements, aligning section contents has
a neutral to negative effect on performance.
I measured the median run time of 100 links of Chromium's
base_unittests on Linux with lld-link and on Windows with link.exe with
both aligned and unaligned sections. On Linux I didn't see a measurable
performance difference, and on Windows the link was slightly faster
with unaligned sections (presumably because on Windows the bottleneck
is I/O).
Also, the sections created by cl.exe are unaligned, so we should expect
tools to broadly accept unaligned sections.
Differential Revision: https://reviews.llvm.org/D51149
llvm-svn: 340514
This patch's test case relies on debug prints which isn't generally an
OK way to test stuff in LLVM and fails whenever asserts aren't enabled.
I've send a heads-up to the commit and detailed comments on the review.
llvm-svn: 340513
When complaining that the triple is incompatible with all targets, print out the triple not just a generic error about triples not matching.
llvm-svn: 340509
In lib/CodeGen/LiveDebugVariables.cpp, it uses std::prev(MBBI) to
get DebugValue's SlotIndex. However, the previous instruction may be
also a debug instruction. It could not use a debug instruction to query
SlotIndex in mi2iMap.
Scan all debug instructions and use the first debug instruction to query
SlotIndex for following debug instructions. Only handle DBG_VALUE in
handleDebugValue().
Differential Revision: https://reviews.llvm.org/D50621
llvm-svn: 340508
Summary:
Reorganize WebAssemblyInstrSIMD.td to put all of the instruction
definitions together, making it easier to see which instructions have
been implemented already. Depends on D51143.
Reviewers: aheejin, dschuff
Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits
Differential Revision: https://reviews.llvm.org/D51113
llvm-svn: 340504
Summary:
WebAssemblyInstrFormats.td retains only multiclasses that are used in
multiple other tablegen files.
Reviewers: aheejin, dschuff
Subscribers: sbc100, jgravelle-google, sunfish, jfb, llvm-commits
Differential Revision: https://reviews.llvm.org/D51143
llvm-svn: 340503
The format is the same as in ELF: a sequence of ULEB128-encoded
symbol indexes.
Differential Revision: https://reviews.llvm.org/D51047
llvm-svn: 340499
If we have a min/max pair we can do a better job of counting sign bits if we look at them together. This is similar to what is done in the SelectionDAG version of computeNumSignBits for ISD::SMAX/SMIN.
Differential Revision: https://reviews.llvm.org/D51112
llvm-svn: 340480
Previously we asumed a vector reduction add is part of a loop and one of the input is a phi. But the code in SelectionDAGBuilder that sets vector reduction flag handles more cases than that. It just requires that the use chain ends in a horizontal reduction. And there are no other uses. This means it can handle unrolled reduction loops.
If the initial value of the reduction was 0, an unrolled loop would begin with a vector reduction add that has two sad inputs. Previously we would only transform one side of the add, but for this case we need to transform both sides.
I've created a lambda to reuse some of the code for both sides. And fixed the variables names to remove reference to "phi".
Differential Revision: https://reviews.llvm.org/D50817
llvm-svn: 340478
Summary:
This CL adds support for arbitrary BUILD_VECTORS, i.e. not splats and
not consts. This is the last feature needed to properly lower v2i64
multiplies without a i64x2.mul instruction (which is not in the spec),
so i64x2.mul is removed as well.
Reviewers: aheejin, dschuff
Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits
Differential Revision: https://reviews.llvm.org/D51082
Remove unnecessary condition and fix whitespace
llvm-svn: 340472
This solves the motivating case from:
https://bugs.llvm.org/show_bug.cgi?id=38527
If we are legalizing an FP vector op that maps to 1 of the LLVM intrinsics that mimic libm calls,
but we're going to end up with scalar libcalls for that vector type anyway, then we should unroll
the vector op into scalars before widening. This avoids libcalls because we've lost the knowledge
that some of the scalar elements are undef.
Differential Revision: https://reviews.llvm.org/D50791
llvm-svn: 340469
We're currently getting this behavior implicitly, since we determine if
a Def's optimization is valid based on the ID of its defining access.
This is incorrect, though I wouldn't be surprised if this was masked in
part by that we're using a WeakVH to track what Defs are optimized to.
(Not to mention that we don't move Defs super often, AFAICT). I'll
submit a patch to fix this shortly.
This also includes a minor refactor to reduce duplication a bit.
No test is included, since like said, this already happens to be our
behavior. I'll add a test for this with my fix to the other bug
mentioned above.
llvm-svn: 340461
The inline sequence is very long (about 70 bytes on Thumb1), so it's
not really a good idea to inline it, especially when optimizing for
size.
Differential Revision: https://reviews.llvm.org/D47917
llvm-svn: 340458
Add support for reading and writing MessagePack, a binary object serialization
format which aims to be more compact than text formats like JSON or YAML.
The specification can be found at
https://github.com/msgpack/msgpack/blob/master/spec.md
Will be used for encoding metadata in AMDGPU code objects.
Differential Revision: https://reviews.llvm.org/D44429
llvm-svn: 340457
Instead of asserting that the function doesn't have any unreachable
code, just ignore it for the purpose of computing liveness.
Differential Revision: https://reviews.llvm.org/D51070
llvm-svn: 340456
Fix bug https://bugs.llvm.org/show_bug.cgi?id=38643
In BPFAsmBackend applyFixup(), there is an assertion for FixedValue to be 0.
This may not be true, esp. for optimiation level 0.
For example, in the above bug, for the following two
static variables:
@bpf_map_lookup_elem = internal global i8* (i8*, i8*)*
inttoptr (i64 1 to i8* (i8*, i8*)*), align 8
@bpf_map_update_elem = internal global i32 (i8*, i8*, i8*, i64)*
inttoptr (i64 2 to i32 (i8*, i8*, i8*, i64)*), align 8
The static variable @bpf_map_update_elem will have a symbol
offset of 8 and a FK_SecRel_8 with FixupValue 8 will cause
the assertion if llvm is built with -DLLVM_ENABLE_ASSERTIONS=ON.
The above relocations will not exist if the program is compiled
with optimization level -O1 and above as the compiler optimizes
those static variables away. In the below error message, -O2
is suggested as this is the common practice.
Note that FixedValue = 0 in applyFixup() does exist and is valid,
e.g., for the global variable my_map in the above bug. The bpf
loader will process them properly for map_id's before loading
the program into the kernel.
The static variables, which are not optimized away by compiler,
may have FK_SecRel_8 relocation with non-zero FixedValue.
The patch removed the offending assertion and will issue
a hard error as below if the FixedValue in applyFixup()
is not 0.
$ llc -march=bpf -filetype=obj fixup.ll
LLVM ERROR: Unsupported relocation: try to compile with -O2 or above,
or check your static variable usage
Signed-off-by: Yonghong Song <yhs@fb.com>
llvm-svn: 340455
Summary:
When we don't actually have stack-allocated variables but need SP only
to support EH, we don't need to write SP back in the epilog, because we
don't bump down the stack pointer.
Reviewers: dschuff
Subscribers: jgravelle-google, sbc100, sunfish, llvm-commits
Differential Revision: https://reviews.llvm.org/D51114
llvm-svn: 340454
On Windows, movw+movt pairs with relocations are handled with a single
relocation that covers them both. Therefore we can't inject anything
between these instructions, otherwise the relocation (which in LLVM
only is treated as the movw instruction's relocation, while the movt
instruction's relocation is dropped) will end up bogus.
These instructions are bundled up until right before the constant
islands pass, making this effectively the only place that can split
them apart.
Differential Revision: https://reviews.llvm.org/D51032
llvm-svn: 340451
This avoids a potential infinite loop setting and unsetting bits in the
mask.
Reduced from a failure on the polly-aosp bot.
Differential Revision: https://reviews.llvm.org/D51066
llvm-svn: 340446
Summary:
Add MemorySSA as a dependency to LoopSimplifyCFG and preserve it.
Disabled by default until all passes preserve MemorySSA.
Reviewers: bogner, chandlerc
Subscribers: sanjoy, jlebar, Prazek, george.burgess.iv, llvm-commits
Differential Revision: https://reviews.llvm.org/D50911
llvm-svn: 340445
Summary:
Add MemorySSA as a depency to LoopInstInstSimplify and preserve it.
Disabled by default until all passes preserve MemorySSA.
Reviewers: chandlerc
Subscribers: sanjoy, jlebar, Prazek, george.burgess.iv, llvm-commits
Differential Revision: https://reviews.llvm.org/D50906
llvm-svn: 340444
There's no need to track a seperate variable for argmemonly aliasing. This falls out naturally of the modinfo union. Note that we may return earlier than we would have earlier if all arguments are explicitly readnone. The overall result doesn't change, just how we get there.
llvm-svn: 340443
Inspired by what AArch64 does for shifts, this patch attempts to replace shift amounts with neg if we can.
This is done directly as part of isel so its as late as possible to avoid breaking some BZHI patterns since those patterns need an unmasked (32-n) to be correct.
To avoid manual load folding and custom instruction selection for the negate. I've inserted new nodes in the DAG above the shift node in topological order.
Differential Revision: https://reviews.llvm.org/D48789
llvm-svn: 340441
Summary:
There are several functions in the form of `has***` or `needs***` in
`WebAssemblyFrameLowering` and its `MachineFrameInfo` argument can be
obtained from `MachineFunction` so it is not necessarily has to be
passed from a caller. Also, it is more in line with other overriden
fuctions like `hasBP` or `hasReservedCallFrame`, which also take only
`MachineFunction` argument.
Reviewers: dschuff
Subscribers: sbc100, sunfish, llvm-commits
Differential Revision: https://reviews.llvm.org/D51116
llvm-svn: 340438
This was needed way back because we didn't properly handle that the SOURCES property of a target could have things that weren't source files to compile. Almost 2 years ago Takumi fixed that, and now CMake is throwing warnings that we should get off the old behavior.
llvm-svn: 340436
There are several places where we use CMAKE_CONFIGURATION_TYPES to determine if we are using an IDE generator and in turn decide not to generate some of the convenience targets (like all the install-* and check-llvm-* targets). This decision is made because IDEs don't always deal well with the thousands of targets LLVM can generate.
This approach does not work for Visual Studio 15's new CMake integration. Because VS15 uses a Ninja generator, it isn't a multi-configuration build, and generating all these extra targets mucks up the UI and adds little value.
With this change we still don't generate these targets by default for Visual Studio and Xcode generators, and LLVM_ENABLE_IDE becomes a switch that can be enabled on the VS15 CMake builds, to improve the IDE experience.
llvm-svn: 340435
When the key is not already in the map, the access operator[] creates an empty value and grows the map.
Resizing a map is very slow, so this needs to be avoided.
Found with csmith + asserts.
May help with
https://bugs.llvm.org/show_bug.cgi?id=25843
Patch by Tom Rix.
Differential Revision: https://reviews.llvm.org/D50780
llvm-svn: 340434
Summary:
`catch` instruction certainly has rather huge side effects and the flag
was missing. At the moment this does not change any unit tests we
currently have.
Reviewers: dschuff
Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits
Differential Revision: https://reviews.llvm.org/D50919
llvm-svn: 340433
We're calling these functions quite a bit from outside of MemorySSA.cpp
now. Given that they're relatively simple one-liners, I think the style
preference is to have them inline.
llvm-svn: 340430
wasm-lld expects relocation entries to be sorted by offset. In most
cases llvm produces them in order, but the CODE section (which combines
many MCSections) is an exception because we order the functions in
Symbol order, not in section order. What is more, its not clear weather
`recordRelocation` is guaranteed to be called in offset order so this
sort of most likely needed in the general case too.
Differential Revision: https://reviews.llvm.org/D51065
llvm-svn: 340423
32-bit constant address space is declared as 6, so the
maximum number of address spaces is 6, not 5.
Fixes "LLVM ERROR: Pointer address space out of range".
v5: rename MAX_COMMON_ADDRESS to MAX_AMDGPU_ADDRESS
v4: - fix compilation issues
- fix out of bounds access
v3: use static_assert()
v2: add a very simple test for 32-bit addr space
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106630
llvm-svn: 340417
Constant and global may alias, also one rules table wasn't
ordered correctly.
Pinpointed by Matt.
v2: add a test with swapped parameters
llvm-svn: 340416
Add intrinsic isel patterns for sxtb16, sxtab16, uxtb16 and uxtab16
so that they can perform a ror.
Differential Revision: https://reviews.llvm.org/D51034
llvm-svn: 340405
This adds the plumbing for the Tiny code model for the AArch64 backend. This,
instead of loading addresses through the normal ADRP;ADD pair used in the Small
model, uses a single ADR. The 21 bit range of an ADR means that the code and
its statically defined symbols need to be within 1MB of each other.
This makes it mostly interesting for embedded applications where we want to fit
as much as we can in as small a space as possible.
Differential Revision: https://reviews.llvm.org/D49673
llvm-svn: 340397
This was hackily adding in the 4-bytes reserved for the callee's
emergency stack slot. Treat it like a normal stack allocation
so we get the correct alignment padding behavior. This fixes
an inconsistency between the caller and callee.
llvm-svn: 340396
Add patterns for unhandled CondCode enumerables:
SETEQ, SETGE, SETGT, SETLE, SETLT, SETNE.
Stated at the ISD::CondCode enum declaration:
`All of these (except for the 'always folded ops')
should be handled for floating point.`
Add patterns which use these nodes, same as corresponding
'ordered' CondCode nodes.
Referring to 'Ordered means that neither operand is a QNAN'
we assume it is safe to match ex. SETLT node to the same
instruction as SETOLT.
Differential Revision: https://reviews.llvm.org/D50757
llvm-svn: 340392
Summary:
This patch moves out the definition of the XRay log file header from
binary logs into its own header and implementation file.
This is one part of the refactoring being done in D50441.
Reviewers: eizan
Subscribers: mgorny, hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D51086
llvm-svn: 340389
Guard widening should not spend efforts on dealing with guards with trivial true/false conditions.
Such guards can easily be eliminated by any further cleanup pass like instcombine. However we
should not unconditionally delete them because it may be profitable to widen other conditions
into such guards.
Differential Revision: https://reviews.llvm.org/D50247
Reviewed By: fedor.sergeev
llvm-svn: 340381
Summary: This is to be consistent with lld behavior since rLLD340364.
Reviewers: tejohnson
Reviewed By: tejohnson
Subscribers: steven_wu, eraman, mehdi_amini, inglorion, dexonsmith, llvm-commits
Differential Revision: https://reviews.llvm.org/D51060
llvm-svn: 340380
Summary:
Before this change, pruning order was based on size. This changes it
to be based on time of last use instead, preferring to keep recently
used files and prune older ones.
Reviewers: pcc, rnk, espindola
Reviewed By: rnk
Subscribers: emaste, arichardson, hiraditya, steven_wu, dexonsmith, llvm-commits
Differential Revision: https://reviews.llvm.org/D51062
llvm-svn: 340374
Summary: We now write back not to memory but to __stack_pointer global.
Reviewers: dschuff
Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits
Differential Revision: https://reviews.llvm.org/D51074
llvm-svn: 340372
CodeGenPrepare has a strategy for moving dbg.values so that a value's
definition always dominates its debug users. This cleanup was happening
too early (before certain CGP transforms were run), resulting in some
dbg.value use-before-def errors.
Perform this cleanup as late as possible to avoid use-before-def.
llvm-svn: 340370
This test shows that optimizeSelectInst splits a select and sinks a
`fdiv` operation to one side of the diamond. However, the dbg.value for
the operation isn't moved.
llvm-svn: 340369
In optimizeSelectInst, when scanning for candidate selects to rewrite
into branches, scan past debug intrinsics. This makes the debug-enabled
and non-debug paths through optimizeSelectInst more congruent.
NFC because every select is eventually visited either way.
llvm-svn: 340368
This is preparation for landing a use-before-def verifier for debug
intrinsics (D46100).
As a drive-by, remove `tail` from debug intrinsic calls because it
doesn't mean anything in that context.
llvm-svn: 340366
Previously if you had something like this:
template<typename T>
struct Foo {
template<typename U>
Foo(U);
};
Foo F(3.7);
this would mangle as ??$?0N@?$Foo@H@@QEAA@N@Z
and this would be demangled as:
undname: __cdecl Foo<int>::Foo<int><double>(double)
llvm-undname: __cdecl Foo<int>::Foo<int>(double)
Note the lack of the constructor template parameter in our
demangling.
This patch makes it so we print the constructor argument list.
llvm-svn: 340356
Summary:
Computing the remaining latency can be very expensive especially
on graphs of N nodes where the number of edges approaches N^2.
This reduces the compile time of a pathological case with the
AMDGPU backend from ~7.5 seconds to ~3 seconds. This test case has
a basic block with 2655 stores, each with somewhere between 500
and 1500 successors and predecessors.
Reviewers: atrick, MatzeB, airlied, mareko
Reviewed By: mareko
Subscribers: tpr, javed.absar, llvm-commits
Differential Revision: https://reviews.llvm.org/D50486
llvm-svn: 340346
In general we can't assume flat loads are uniform, and cases where we can prove
they are should be handled through infer-address-spaces.
Differential Revision: https://reviews.llvm.org/D50991
llvm-svn: 340343
I found these by running llvm-undname over a couple hundred
megabytes of object files generated as part of building chromium.
The issues fixed in this patch are:
1) decltype-auto return types.
2) Indirect vtables (e.g. const A::`vftable'{for `B'})
3) Pointers, references, and rvalue-references to member pointers.
I have exactly one remaining symbol out of a few hundred MB of object
files that produces a name we can't demangle, and it's related to
back-referencing.
llvm-svn: 340341
Summary:
After the stack is unwound due to a thrown exception, the
`__stack_pointer` global can point to an invalid address. This inserts
instructions that restore `__stack_pointer` global.
Reviewers: jgravelle-google, dschuff
Subscribers: mgorny, sbc100, sunfish, llvm-commits
Differential Revision: https://reviews.llvm.org/D50980
llvm-svn: 340339
Summary:
This CL implements v128.const for each vector type. New operand types
are added to ensure the vector contents can be serialized without LEB
encoding. Tests are added for instruction selection, encoding,
assembly and disassembly.
Reviewers: aheejin, dschuff, aardappel
Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits
Differential Revision: https://reviews.llvm.org/D50873
llvm-svn: 340336
Currently CodeExtractor tries to use the next node after an invoke to
place the store for the result of the invoke, if it is an out parameter
of the region. This fails, as the invoke terminates the current BB.
In that case, we can place the store in the 'normal destination' BB, as
the result will only be available in that case.
Reviewers: davidxl, davide, efriedma
Reviewed By: davidxl
Differential Revision: https://reviews.llvm.org/D51037
llvm-svn: 340331
Summary:
Catchpads and cleanuppads are not funclet entries; they are only EH
scope entries. We already dont't set `isEHFuncletEntry` for catchpads.
This patch does the same thing for cleanuppads.
Reviewers: dschuff
Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits
Differential Revision: https://reviews.llvm.org/D50654
llvm-svn: 340330
Summary: SP is now a __stack_pointer global and not a memory address anymore.
Reviewers: dschuff
Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits
Differential Revision: https://reviews.llvm.org/D51046
llvm-svn: 340328
Summary:
When RegisterCoalescer::reMaterializeTrivialDef is substituting
a register use in a DBG_VALUE instruction, and the old register
is a subreg, and the new register is a physical register,
then we need to use substPhysReg in order to extract the correct
subreg.
Reviewers: wmi, aprantl
Reviewed By: wmi
Subscribers: hiraditya, MatzeB, qcolombet, tpr, llvm-commits
Differential Revision: https://reviews.llvm.org/D50844
llvm-svn: 340326
Summary:
So far, `isReturn` property is used to mean both a return instruction
from a functon and the end of an EH scope, a scope that starts with a EH
scope entry BB and ends with a catchret or a cleanupret instruction.
Because WinEH uses funclets, all EH-scope-ending instructions are also
real return instruction from a function. But for wasm, they only serve
as the end marker of an EH scope but not a return instruction that
exits a function. This mismatch caused incorrect prolog and epilog
generation in wasm EH scopes. This patch fixes this.
This patch is in the same vein with rL333045, which splits
`MachineBasicBlock::isEHFuncletEntry` into `isEHFuncletEntry` and
`isEHScopeEntry`.
Reviewers: dschuff
Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits
Differential Revision: https://reviews.llvm.org/D50653
llvm-svn: 340325
I'm assuming its easier to make sure the RHS of an XOR is all ones than it is to check for the many select patterns we have. So lets check that first. Same with the one use check.
llvm-svn: 340321
Currently we assign the same value number to two calls reading the same
memory location if we do not have MemoryDependence info. Without MemDep
Info we cannot guarantee that there is no store between the two calls, so we
have to assign a new number to the second call.
It also adds a new option EnableMemDep to enable/disable running
MemoryDependenceAnalysis and also renamed NoLoads to NoMemDepAnalysis to
be more explicit what it does. As it also impacts calls that read memory,
NoLoads is a bit confusing.
Reviewers: efriedma, sebpop, john.brawn, wmi
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D50893
llvm-svn: 340319
In removeCopyByCommutingDef, segments from the source live range are
copied into (and merged with) the segments of the target live range.
This is performed for all subranges of the source interval. It can
happen that there will be subranges of the target interval that had
no corresponding subranges in the source interval, and in such cases
these subrages will not be updated. Since the copy being coalesced
is about to be removed, these ranges need to be updated by removing
the segments that are started by the copy.
llvm-svn: 340318
The constructor of Scheduler now accepts a SchedulerStrategy object, which is
used internally by method Scheduler::select() to drive the instruction selection
process.
The goal of this patch is to enable the definition of custom selection
strategies while reusing the same algorithms implemented by class Scheduler.
The motivation is that, on some targets, the default strategy may not well
approximate the selection logic in the hardware schedulers.
This patch also adds the ability to pass a ResourceManager object to the
constructor of Scheduler. This gives a bit more flexibility to the design, and
potentially it allows to expose processor resources to SchedulerStrategy
objects.
Differential Revision: https://reviews.llvm.org/D51051
llvm-svn: 340314
The test demonstrates over-complicated codegen for a udiv that only has one divisor that doesn't equal 1. This should have allowed the codegen to be a lot simpler (uniform shifts etc.) but only the SSE2 manages to make use of this......
llvm-svn: 340313
Volatility is not an aliasing property. We used to model volatile as if it had extremely conservative aliasing implications, but that hasn't been true for several years now. So, it doesn't make sense to be in AliasSet.
It also turns out the code is entirely a noop. Outside of the AST code to update it, there was only one user: load store promotion in LICM. L/S promotion doesn't need the check since it walks all the users of the address anyway. It already checks each load or store via !isUnordered which causes us to bail for volatile accesses. (Look at the lines immediately following the two remove asserts.)
There is the possibility of some small compile time impact here, but the only case which will get noticeably slower is a loop with a large number of loads and stores to the same address where only the last one we inspect is volatile. This is sufficiently rare it's not worth optimizing for..
llvm-svn: 340312
This reverts commit d1341152d91398e9a882ba2ee924147ea2f9b589.
This patch originally made use of Nested MachineIRBuilder buildInstr
calls, and since order of argument processing is not well defined, the
instructions were built slightly in a different order (still correct).
I've removed the nested buildInstr calls to have a defined order now.
Patch was tested by Mikael.
llvm-svn: 340309
Most of these shifts are extended to vXi16 so we don't gain anything from forcing another round of generic shift lowering - we know these extended cases are legal constant splat shifts.
llvm-svn: 340307
DAGCombiner doesn't pay attention to whether constants are opaque before doing the div by constant optimization. So BypassSlowDivision shouldn't introduce control flow that would make DAGCombiner unable to see an opaque constant. This can occur when a div and rem of the same constant are used in the same basic block. it will be hoisted, but not leave the block.
Longer term we probably need to look into the X86 immediate cost model used by constant hoisting and maybe not mark div/rem immediates for hoisting at all.
This fixes the case from PR38649.
Differential Revision: https://reviews.llvm.org/D51000
llvm-svn: 340303
A future change in clang necessitates access of this information
from the driver, so move this into a common place.
Try to mimic something resembling the API the other targets are
using here.
One thing I'm uncertain about is how to split amdgcn and r600
handling. Here I've mostly duplicated the functions for each,
while keeping the same enums. I think this is a bit awkward
for the features which don't matter for amdgcn.
It's also a bit messy that this isn't a complete set of
subtarget features. This is just the minimum set needed
for the driver code. For example building the list of
subtarget feature names is still in clang.
llvm-svn: 340291
Summary: When run under llvm-mc-disassemble-fuzzer, there is no symbol lookup callback so tryAddingSymbolicOperand() must fail gracefully instead of crashing
Reviewers: aemerson, javed.absar
Reviewed By: aemerson
Subscribers: lhames, kristof.beyls, llvm-commits
Differential Revision: https://reviews.llvm.org/D51005
llvm-svn: 340287
Remove duplicate tests from InstCombine that were added with
D50582. I left negative tests there to verify that nothing
in InstCombine tries to go overboard. If isKnownNeverNaN is
improved to handle the FP binops or other cases, we should
have coverage under InstSimplify, so we could remove more
duplicate tests from InstCombine at that time.
llvm-svn: 340279
Summary:
Follow up change to rL339703, where we now vectorize loops with non-phi
instructions used outside the loop. Note that the cyclic dependency
identification occurs when identifying reduction/induction vars.
We also need to identify that we do not allow users where the PSCEV information
within and outside the loop are different. This was the fix added in rL307837
for PR33706.
Reviewers: Ayal, mkuper, fhahn
Subscribers: javed.absar, llvm-commits
Differential Revision: https://reviews.llvm.org/D50778
llvm-svn: 340278
This is a slight modification of the tests from D50582;
change half of the predicates to 'uno' so we have coverage
for that side too. All of the positive tests can fold to a
constant (true/false), so that should happen in instsimplify.
llvm-svn: 340276
The goal of this patch is to simplify the Scheduler's interface in preparation
for D50929.
Some methods in the Scheduler's interface should not be exposed to external
users, since their presence makes it hard to both understand, and extend the
Scheduler's interface.
This patch removes the following two methods from the public Scheduler's API:
- reclaimSimulatedResources()
- updatePendingQueue()
Their logic has been migrated to a new method named 'cycleEvent()'.
Methods 'updateIssuedSet()' and 'promoteToReadySet()' still exist. However,
they are now private members of class Scheduler.
This simplifies the interaction with the Scheduler from the ExecuteStage.
llvm-svn: 340273
Summary:
Previously the new llvm.amdgcn.raw/struct.buffer.load/store intrinsics
only allowed float types for the data to be loaded or stored, which
sometimes meant the frontend needed to generate a bitcast. In this, the
new intrinsics copied the old buffer intrinsics.
This commit extends the new intrinsics to allow int types as well.
Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, t-tye, llvm-commits
Differential Revision: https://reviews.llvm.org/D50315
Change-Id: I8202af2d036455553681dcbb3d7d32ae273f8f85
llvm-svn: 340270
Summary:
This commit adds new intrinsics
llvm.amdgcn.raw.buffer.load
llvm.amdgcn.raw.buffer.load.format
llvm.amdgcn.raw.buffer.load.format.d16
llvm.amdgcn.struct.buffer.load
llvm.amdgcn.struct.buffer.load.format
llvm.amdgcn.struct.buffer.load.format.d16
llvm.amdgcn.raw.buffer.store
llvm.amdgcn.raw.buffer.store.format
llvm.amdgcn.raw.buffer.store.format.d16
llvm.amdgcn.struct.buffer.store
llvm.amdgcn.struct.buffer.store.format
llvm.amdgcn.struct.buffer.store.format.d16
llvm.amdgcn.raw.buffer.atomic.*
llvm.amdgcn.struct.buffer.atomic.*
with the following changes from the llvm.amdgcn.buffer.*
intrinsics:
* there are separate raw and struct versions: raw does not have an
index arg and sets idxen=0 in the instruction, and struct always sets
idxen=1 in the instruction even if the index is 0, to allow for the
fact that gfx9 does bounds checking differently depending on whether
idxen is set;
* there is a combined cachepolicy arg (glc+slc)
* there are now only two offset args: one for the offset that is
included in bounds checking and swizzling, to be split between the
instruction's voffset and immoffset fields, and one for the offset
that is excluded from bounds checking and swizzling, to go into the
instruction's soffset field.
The AMDISD::BUFFER_* SD nodes always have an index operand, all three
offset operands, combined cachepolicy operand, and an extra idxen
operand.
The obsolescent llvm.amdgcn.buffer.* intrinsics continue to work.
Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, t-tye, jfb, llvm-commits
Differential Revision: https://reviews.llvm.org/D50306
Change-Id: If897ea7dc34fcbf4d5496e98cc99a934f62fc205
llvm-svn: 340269
Summary:
This commit adds new intrinsics
llvm.amdgcn.raw.tbuffer.load
llvm.amdgcn.struct.tbuffer.load
llvm.amdgcn.raw.tbuffer.store
llvm.amdgcn.struct.tbuffer.store
with the following changes from the llvm.amdgcn.tbuffer.* intrinsics:
* there are separate raw and struct versions: raw does not have an index
arg and sets idxen=0 in the instruction, and struct always sets
idxen=1 in the instruction even if the index is 0, to allow for the
fact that gfx9 does bounds checking differently depending on whether
idxen is set;
* there is a combined format arg (dfmt+nfmt)
* there is a combined cachepolicy arg (glc+slc)
* there are now only two offset args: one for the offset that is
included in bounds checking and swizzling, to be split between the
instruction's voffset and immoffset fields, and one for the offset
that is excluded from bounds checking and swizzling, to go into the
instruction's soffset field.
The AMDISD::TBUFFER_* SD nodes always have an index operand, all three
offset operands, combined format operand, combined cachepolicy operand,
and an extra idxen operand.
The tbuffer pseudo- and real instructions now also have a combined
format operand.
The obsolescent llvm.amdgcn.tbuffer.* and llvm.SI.tbuffer.store
intrinsics continue to work.
V2: Separate raw and struct intrinsics.
V3: Moved extract_glc and extract_slc defs to a more sensible place.
V4: Rebased on D49995.
V5: Only two separate offset args instead of three.
V6: Pseudo- and real instructions have joint format operand.
V7: Restored optionality of dfmt and nfmt in assembler.
V8: Addressed minor review comments.
Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, t-tye, llvm-commits
Differential Revision: https://reviews.llvm.org/D49026
Change-Id: If22ad77e349fac3a5d2f72dda53c010377d470d4
llvm-svn: 340268
Summary:
Previously a BUNDLE instruction inherited the DebugLoc from the
first instruction in the bundle, even if that DebugLoc had no
DILocation. With this commit this is changed into selecting the
first DebugLoc that has a DILocation, by searching among the
bundled instructions.
The idea is to reduce amount of bundles that are lacking
debug locations.
Reviewers: #debug-info, JDevlieghere
Reviewed By: JDevlieghere
Subscribers: JDevlieghere, mattd, llvm-commits
Differential Revision: https://reviews.llvm.org/D50639
llvm-svn: 340267
During combining, ReduceLoadWdith is used to combine AND nodes that
mask loads into narrow loads. This patch allows the mask to be a
shifted constant. This results in a narrow load which is then left
shifted to compensate for the new offset.
Differential Revision: https://reviews.llvm.org/D50432
llvm-svn: 340261
This reduces most of the sdiv stages (the MULHS, shifts etc.) to just zero/identity values and use the numerator scale factor to multiply by +1/-1.
llvm-svn: 340260