This is a follow up for the loop predication change 313981 to support ule, sle latch predicates.
Reviewed By: mkazantsev
Differential Revision: https://reviews.llvm.org/D38177
llvm-svn: 315616
WMMA = "Warp Level Matrix Multiply-Accumulate".
These are the new instructions introduced in PTX6.0 and available
on sm_70 GPUs.
Differential Revision: https://reviews.llvm.org/D38645
llvm-svn: 315601
Building with BUILD_SHARED_LIBS makes it tricky to copy around
executables at will, since they won't be able to find the LLVM
libraries any more. This makes testing a feature that's based on the
executable name problematic, so we'll just disable these two tests in
that configuration.
We could potentially fix this by symlinking the lib directory into the
test directory, but that wouldn't work on windows, and losing testing
on windows would be far worse than losing testing on a configuration
that's barely even supported.
llvm-svn: 315599
Add profitability checks for modifying counted loops to use the mtctr instruction.
The latency of mtctr is only justified if there are more than 4 comparisons that
will be removed as a result. Usually counted loops are formed relatively early
and before unrolling, so most low trip count loops often don't survive. However
we want to ensure that if they do, we do not mistakenly update them to mtctr loops.
Use CodeMetrics to ensure we are only doing this for small loops with small trip counts.
Differential Revision: https://reviews.llvm.org/D38212
llvm-svn: 315592
Summary:
The interpolation mode workaround ensures that at least one
interpolation mode is enabled in PSInputAddr. It does not also check
PSInputEna on the basis that the user might enable bits in that
depending on run-time state.
However, for amdpal os type, the user does not enable some bits after
compilation based on run-time states; the register values being
generated here are the final ones set in the hardware. Therefore, apply
the workaround to PSInputAddr and PSInputEnable together. (The case
where a bit is set in PSInputAddr but not in PSInputEnable is where the
frontend set up an input arg for a particular interpolation mode, but
nothing uses that input arg. Really we should have an earlier pass that
removes such an arg.)
Reviewers: arsenm, nhaehnle, dstuttard
Subscribers: kzhuravl, wdng, yaxunl, t-tye, llvm-commits
Differential Revision: https://reviews.llvm.org/D37758
llvm-svn: 315591
Summary:
The comments in the code said
// Remove <def,read-undef> flags. This def is now a partial redef.
but the code didn't just remove read-undef, it could introduce new ones which
could cause errors.
E.g. if we have something like
%vreg1<def> = IMPLICIT_DEF
%vreg2:subreg1<def, read-undef> = op %vreg3, %vreg4
%vreg2:subreg2<def> = op %vreg6, %vreg7
and we merge %vreg1 and %vreg2 then we should not set undef on the second subreg
def, which the old code did.
Now we solve this by actually do what the code comment says. We remove
read-undef flags rather than remove or introduce them.
Reviewers: qcolombet, MatzeB
Reviewed By: MatzeB
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D38616
llvm-svn: 315564
Here we add a secondary option parser to llvm-isel-fuzzer (and provide
it for use with other fuzzers). With this, you can copy the fuzzer to
a name like llvm-isel-fuzzer=aarch64-gisel for a fuzzer that fuzzer
AArch64 with GlobalISel enabled, or fuzzer=x86_64 to fuzz x86, with no
flags required. This should be useful for running these in OSS-Fuzz.
Note that this handrolls a subset of cl::opts to recognize, rather
than embedding a complete command parser for argv[0]. If we find we
really need the flexibility of handling arbitrary options at some
point we can rethink this.
This re-applies 315545 using "=" instead of ":" as a separator for
arguments.
llvm-svn: 315557
It broke some tests on Windows:
Failing Tests (4):
LLVM :: tools/llvm-isel-fuzzer/execname-options.ll
LLVM :: tools/llvm-isel-fuzzer/missing-triple.ll
LLVM :: tools/llvm-isel-fuzzer/x86-empty-bc.ll
LLVM :: tools/llvm-isel-fuzzer/x86-empty.ll
> llvm-isel-fuzzer: Handle a subset of backend flags in the executable name
>
> Here we add a secondary option parser to llvm-isel-fuzzer (and provide
> it for use with other fuzzers). With this, you can copy the fuzzer to
> a name like llvm-isel-fuzzer:aarch64-gisel for a fuzzer that fuzzer
> AArch64 with GlobalISel enabled, or fuzzer:x86_64 to fuzz x86, with no
> flags required. This should be useful for running these in OSS-Fuzz.
>
> Note that this handrolls a subset of cl::opts to recognize, rather
> than embedding a complete command parser for argv[0]. If we find we
> really need the flexibility of handling arbitrary options at some
> point we can rethink this.
llvm-svn: 315554
Here we add a secondary option parser to llvm-isel-fuzzer (and provide
it for use with other fuzzers). With this, you can copy the fuzzer to
a name like llvm-isel-fuzzer:aarch64-gisel for a fuzzer that fuzzer
AArch64 with GlobalISel enabled, or fuzzer:x86_64 to fuzz x86, with no
flags required. This should be useful for running these in OSS-Fuzz.
Note that this handrolls a subset of cl::opts to recognize, rather
than embedding a complete command parser for argv[0]. If we find we
really need the flexibility of handling arbitrary options at some
point we can rethink this.
llvm-svn: 315545
- Move PAL metadata definitions to AMDGPUMetadata
- Make naming consistent with HSA metadata
Differential Revision: https://reviews.llvm.org/D38745
llvm-svn: 315523
- Rename AMDGPUCodeObjectMetadata to AMDGPUMetadata (PAL metadata will be included in this file in the follow up change)
- Rename AMDGPUCodeObjectMetadataStreamer to AMDGPUHSAMetadataStreamer
- Introduce HSAMD namespace
- Other minor name changes in function and test names
llvm-svn: 315522
Summary:
This adds a set of new directives that describe 32-bit x86 prologues.
The directives are limited and do not expose the full complexity of
codeview FPO data. They are merely a convenience for the compiler to
generate more readable assembly so we don't need to generate tons of
labels in CodeGen. If our prologue emission changes in the future, we
can change the set of available directives to suit our needs. These are
modelled after the .seh_ directives, which use a different format that
interacts with exception handling.
The directives are:
.cv_fpo_proc _foo
.cv_fpo_pushreg ebp/ebx/etc
.cv_fpo_setframe ebp/esi/etc
.cv_fpo_stackalloc 200
.cv_fpo_endprologue
.cv_fpo_endproc
.cv_fpo_data _foo
I tried to follow the implementation of ARM EHABI CFI directives by
sinking most directives out of MCStreamer and into X86TargetStreamer.
This helps avoid polluting non-X86 code with WinCOFF specific logic.
I used cdb to confirm that this can show locals in parent CSRs in a few
cases, most importantly the one where we use ESI as a frame pointer,
i.e. the one in http://crbug.com/756153#c28
Once we have cdb integration in debuginfo-tests, we can add integration
tests there.
Reviewers: majnemer, hans
Subscribers: aemerson, mgorny, kristof.beyls, llvm-commits, hiraditya
Differential Revision: https://reviews.llvm.org/D38776
llvm-svn: 315513
Summary:
Fixes a bogus iterator resulting from the removal of a block's first instruction at the point that incremental update is enabled.
Patch by Paul Walker.
Reviewers: fhahn, Gerolf, efriedma, MatzeB
Reviewed By: fhahn
Subscribers: aemerson, javed.absar, llvm-commits
Differential Revision: https://reviews.llvm.org/D38734
llvm-svn: 315502
Currently we produce a bunch of unnecessary code when emitting the
prologue/epilogue for spills/restores. Namely, if the load from stack
slot/store to stack slot instruction is an X-Form instruction, we will
always produce an LIS/ORI sequence for the stack offset.
Furthermore, we have not exploited the P9 vector D-Form loads/stores for this
purpose.
This patch address both issues.
Specifying the D-Form load as the instruction to use for stack spills/reloads
should be safe because:
1. The stack should be aligned according to the ABI
2. If the stack isn't aligned, PPCRegisterInfo::eliminateFrameIndex() will
check for the offset being a multiple of 16 and will convert it to an
X-Form instruction if it isn't.
Differential Revision : https://reviews.llvm.org/D38758
llvm-svn: 315500
Previously we would only look in the current directory for a
resource, which might not be the same as the directory of the
rc file. Furthermore, MSVC rc supports a /I option, and can
also look in the system environment. This patch adds support
for this search algorithm.
Differential Revision: https://reviews.llvm.org/D38740
llvm-svn: 315499
Legalization of fp128 assumes things that we should have asserts for,
so that's another potential improvement.
Differential Revision: https://reviews.llvm.org/D38771
llvm-svn: 315485
ubsan caught an issue I made where I was converting a null pointer to a
reference.
elf utils implements a particularly extreme form of stripping that I'd
like to support. eu-strip has an option called "strip-sections" that
removes all section headers and leaves only program headers and the
segment data. I have implemented this option partly as a test but mainly
because in Fuchsia we would like to use this option to minimize the size
of our executables. The other strip options that are on my list include
--strip-all and --strip-debug. This is a preliminary implementation that
I'd like to start using in Fuchsia builds if possible. This change
implements such a stripping option for llvm-objcopy
Differential Revision: https://reviews.llvm.org/D38335
llvm-svn: 315484
The pipeliner is generating a serial sequence that causes poor
register allocation when a post-increment instruction appears
prior to the use of the post-increment register. This occurs when
there is a circular set of dependences involved with a sequence
of instructions in the same cycle. In this case, there is no
serialization of the parallel semantics that will not cause an
additional register to be allocated.
This patch fixes the problem by changing the instructions so that
the post-increment instruction is used by the subsequent
instruction, which enables the register allocator to make a
better decision and not require another register.
Patch by Brendon Cahoon.
llvm-svn: 315466
Eg:
insert v4i32 V, (v2i16 X), 2 --> shuffle v8i16 V', X', {0,1,2,3,8,9,6,7}
This is a generalization of the IR fold in D38316 to handle insertion into a non-undef vector.
We may want to abandon that one if we can't find value in squashing the more specific pattern sooner.
We're using the existing legal shuffle target hook to avoid AVX512 horror with vXi1 shuffles.
There may be room for improvement in the shuffle lowering here, but that would be follow-up work.
Differential Revision: https://reviews.llvm.org/D38388
llvm-svn: 315460
This patch adds timestamp verification for swiftmodule files.
- A new flag is provided to allows us to continue testing of the code
for embedding the__swift_ast. (git doesn't maintain timestamps)
- Adds a new test for fat (arm) binaries.
Differential revision: https://reviews.llvm.org/D38686
llvm-svn: 315456
Adding this test files now so after another commit that will add a new pattern for
TESTM and TESTNM instructions will show the improvemnts that have been done.
Change-Id: If3908b7f91897d764053312365a2bc1de78b291d
llvm-svn: 315443
This patch fixes the miscompile that happens when PRE hoists loads across guards and
other instructions that don't always pass control flow to their successors. PRE is now prohibited
to hoist across such instructions because there is no guarantee that the load standing after such
instruction is still valid before such instruction. For example, a load from under a guard may be
invalid before the guard in the following case:
int array[LEN];
...
guard(0 <= index && index < LEN);
use(array[index]);
Differential Revision: https://reviews.llvm.org/D37460
llvm-svn: 315440
Sinking of unordered atomic load into loop must be disallowed because it turns
a single load into multiple loads. The relevant section of the documentation
is: http://llvm.org/docs/Atomics.html#unordered, specifically the Notes for
Optimizers section. Here is the full text of this section:
> Notes for optimizers
> In terms of the optimizer, this **prohibits any transformation that
> transforms a single load into multiple loads**, transforms a store into
> multiple stores, narrows a store, or stores a value which would not be
> stored otherwise. Some examples of unsafe optimizations are narrowing
> an assignment into a bitfield, rematerializing a load, and turning loads
> and stores into a memcpy call. Reordering unordered operations is safe,
> though, and optimizers should take advantage of that because unordered
> operations are common in languages that need them.
Patch by Daniil Suchkov!
Reviewed By: reames
Differential Revision: https://reviews.llvm.org/D38392
llvm-svn: 315438
IRCE should not apply when the safe iteration range is proved to be empty.
In this case we do unneeded job creating pre/post loops and then never
go to the main loop.
This patch makes IRCE not apply to empty safe ranges, adds test for this
situation and also modifies one of existing tests where it used to happen
slightly.
Reviewed By: anna
Differential Revision: https://reviews.llvm.org/D38577
llvm-svn: 315437
elf utils implements a particularly extreme form of stripping that I'd
like to support. eu-strip has an option called "strip-sections" that
removes all section headers and leaves only program headers and the
segment data. I have implemented this option partly as a test but mainly
because in Fuchsia we would like to use this option to minimize the size
of our executables. The other strip options that are on my list include
--strip-all and --strip-debug. This is a preliminary implementation that
I'd like to start using in Fuchsia builds if possible. This change
implements such a stripping option for llvm-objcopy
Differential Revision: https://reviews.llvm.org/D38335
llvm-svn: 315412
This change adds the ability to use the "-R"/"-remove-section" option
multiple times.
Differential Revision: https://reviews.llvm.org/D38332
llvm-svn: 315385
Summary: In the current implementation, we only have accurate profile count for standalone symbols. For inlined functions, we do not have entry count data because it's not available in LBR. In this patch, we use the first instruction's frequency to estimiate the function's entry count, especially for inlined functions. This may be inaccurate due to debug info in optimized code. However, this is a better estimate than the static 80/20 estimation we have in the current implementation.
Reviewers: tejohnson, davidxl
Reviewed By: tejohnson
Subscribers: sanjoy, llvm-commits, aprantl
Differential Revision: https://reviews.llvm.org/D38478
llvm-svn: 315369
Rather than using the AdditionalPredicates mechanism to guard
the microMIPS instructions, use the existing predicates to properly
guard those instructions.
This also resolves a case where an instruction pattern was incorrectly
available for microMIPS32R6, which caused a register allocation failure
as the registers specified in the pattern were not available.
Reviewers: nitesh.jain, atanasyan
Differential Revision: https://reviews.llvm.org/D38451
llvm-svn: 315362
opt-bisect/optnone disable the AMDGPUUniformAnnotateValues pass.
The heuristic in the custom selector for brcond deferred the
branch uniformity check to the pattern, which would fail.
llvm-svn: 315360
This patch adds a post-linking pass which replaces the function pointer of enqueued
block kernel with a global variable (runtime handle) and adds
runtime-handle attribute to the enqueued block kernel.
In LLVM CodeGen the runtime-handle metadata will be translated to
RuntimeHandle metadata in code object. Runtime allocates a global buffer
for each kernel with RuntimeHandel metadata and saves the kernel address
required for the AQL packet into the buffer. __enqueue_kernel function
in device library knows that the invoke function pointer in the block
literal is actually runtime handle and loads the kernel address from it
and puts it into AQL packet for dispatching.
This cannot be done in FE since FE cannot create a unique global variable
with external linkage across LLVM modules. The global variable with internal
linkage does not work since optimization passes will try to replace loads
of the global variable with its initialization value.
Differential Revision: https://reviews.llvm.org/D38610
llvm-svn: 315352
This change adds support for removing sections using the -R field (as
GNU objcopy does as well). This change should let us add many helpful
tests and is a proper stepping stone for adding more general kinds of
stripping.
Differential Revision: https://reviews.llvm.org/D38260
llvm-svn: 315346
This patch ensures that the rule:
fold (zext (load x)) -> (zext (truncate (zextload x)))
propagates the SDLoc of the load to the zextload.
<rdar://problem/33755881>
llvm-svn: 315340
Summary:
This leak doesn't reproduce locally on macOS 10.12, but is causing
buildbot failures. Disable leak checking until it can be fixed.
Reviewers: sqlbyme, qcolombet, enderby, bruno
Reviewed By: bruno
Subscribers: bruno, llvm-commits
Differential Revision: https://reviews.llvm.org/D38699
llvm-svn: 315337
Summary:
The pass to fix function bitcasts generates thunks for functions that
are called directly with a mismatching signature. It was also generating
thunks in cases where the function was address-taken, causing aliasing
problems in otherwise valid cases.
This patch tightens the restrictions for when the pass runs.
Reviewers: sunfish, dschuff
Subscribers: jfb, sbc100, llvm-commits, aheejin
Differential Revision: https://reviews.llvm.org/D38640
llvm-svn: 315326
Add instruction definitions for FP32 mode for recip.d and rsqrt.d.
Previously these instructions were only defined when targeting the
full 64-bit FPU model but were not guarded properly.
Reviewers: nitesh.jain, atanasyan
Differential Revision: https://reviews.llvm.org/D38400
llvm-svn: 315318
This patch adds printing for DW_AT_type DIEs like it is already the case
for DW_AT_specification DIEs. This is a rather naive approach and only a
start. We should have pretty printers for different languages.
Recommit after being reverted in r315299.
Differential revision: https://reviews.llvm.org/D36993
llvm-svn: 315316
Previously, the parsing of the 'subu $reg, ($reg,) imm' relied on a parser
which also rendered the operand to the instruction. In some cases the
general parser could construct an MCExpr which was not a MCConstantExpr
which MipsAsmParser was expecting.
Address this by altering the special handling to cope with unexpected inputs
and fine-tune the handling of cases where an register name that is not
available in the current ABI is regarded as not a match for the custom parser
but also not as an outright error.
Also enforces the binutils restriction that only constants are accepted.
This partially resolves PR34391.
Thanks to Ed Maste for reporting the issue!
Reviewers: nitesh.jain, arichardson
Differential Revision: https://reviews.llvm.org/D37476
llvm-svn: 315310
Summary:
See https://llvm.org/PR33743 for more details
It seems that for non-power of 2 vector sizes, the algorithm can produce
non-matching sizes for input and result causing an assert.
This usually isn't a problem as the isAnyExtend check will weed these out, but
in some cases (most often with lots of undefined values for the mask indices) it
can pass this check for non power of 2 vectors.
Adding in an extra check that ensures that bit size will match for the result
and input (as required)
Subscribers: nhaehnle
Differential Revision: https://reviews.llvm.org/D35241
llvm-svn: 315307
Previously, the code that implemented the GNU assembler aliases for the
LDRD and STRD instructions (where the second register is omitted)
assumed that the input was a valid instruction. This caused assertion
failures for every example in ldrd-strd-gnu-bad-inst.s.
This improves this code so that it bails out if the instruction is not
in the expected format, the check bails out, and the asm parser is run
on the unmodified instruction.
It also relaxes the alias on thumb targets, so that unaligned pairs of
registers can be used. The restriction that Rt must be even-numbered
only applies to the ARM versions of these instructions.
Differential revision: https://reviews.llvm.org/D36732
llvm-svn: 315305
This adds diagnostic strings for the ARM floating-point register
classes, which will be used when these classes are expected by the
assembler, but the provided operand is not valid.
One of these, DPR, requires C++ code to select the correct error
message, as that class contains different registers depending on the
FPU. The rest can all have their diagnostic strings stored in the
tablegen decription of them.
Differential revision: https://reviews.llvm.org/D36693
llvm-svn: 315304
This adds diagnostic strings for the ARM general-purpose register
classes, which will be used when these classes are expected by the
assembler, but the provided operand is not valid.
One of these, rGPR, requires C++ code to select the correct error
message, as that class contains different registers in pre-v8 and v8
targets. The rest can all have their diagnostic strings stored in the
tablegen description of them.
Differential revision: https://reviews.llvm.org/D36692
llvm-svn: 315303
Summary:
Atomic buffer operations do not work (and trap on gfx9) when the
components are unaligned, even if their sum is aligned.
Previously, we generated an offset of 4156 without an SGPR by
splitting it as 4095 + 61 (immediate + inline constant). The
highest offset for which we can do this correctly is 4156 = 4092 + 64.
Fixes dEQP-GLES31.functional.ssbo.atomic.*
Reviewers: arsenm
Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, llvm-commits, t-tye
Differential Revision: https://reviews.llvm.org/D37850
llvm-svn: 315302
This patch adds printing for DW_AT_type DIEs like it is already the case
for DW_AT_specification DIEs. This is a rather naive approach and only a
start. We should have pretty printers for different languages.
Differential revision: https://reviews.llvm.org/D36993
llvm-svn: 315297
This allows a DiagnosticType and/or DiagnosticString to be associated
with a RegisterClass in tablegen, so that we can emit diagnostics in the
assembler when a register operand is incorrect.
DiagnosticType creates a predictable enum value, which gets returned as
the error code when an operand does not match, and can be used by the
assembly parser to map to a user-facing diagnostic. DiagnosticString
creates an anonymous enum value (currently based on the tablegen class
name), and a function to map from enum values to strings will be
generated. Both of these work the same was as they do for AsmOperand.
This isn't used by any targets yet, but has one (positive) side-effect.
It improves the diagnostic codes returned by validateOperandClass - we
always want to emit the diagnostic that relates to the expected operand
class, but this wasn't always being done when the expected and actual
classes were completely different (token/register/custom). This causes a
few AArch64 diagnostics to be improved, as Match_InvalidOperand was
being returned instead of a specific diagnostic type.
Differential revision: https://reviews.llvm.org/D36691
llvm-svn: 315295
NFC.
Updated 6 regression tests to differentiate between HASWELL and SKYLAKE scheduling information.
The fix is in preparation of a patch to update the information of the Skylake Client scheduling to include the appropriate load and store latencies.
Reviewers: zvi, RKSimon
Differential Revision: https://reviews.llvm.org/D38685
Change-Id: Ifc6b98d9eaf266913698f24c766fd994fc977555
llvm-svn: 315291
The issue is that we assume operand zero of the input to the add instruction
is a register. In this case, the input comes from inline assembly and
operand zero is not a register thereby causing a crash.
The code will bail anyway if the input instruction doesn't have the right
opcode. So do that check first and let short-circuiting prevent the crash.
llvm-svn: 315285
Eliminate inttype phi with inttoptr/ptrtoint.
This version fixed a bug in finding the matching
phi -- the order of the incoming blocks may be
different (triggered in self build on Windows).
A new test case is added.
llvm-svn: 315272
Removes two report_fatal_errors.
Implement this by removing EmitCFICommon, and do the checking in
getCurrentDwarfFrameInfo. Have the callers check for null before
dereferencing it.
llvm-svn: 315264
This makes the .seh_ directives slightly more usable from standalone
assembly files.
This removes a large number of report_fatal_errors and recovers from the
error by ignoring the directive.
llvm-svn: 315262
Summary:
This suppresses the generation of .Lcfi labels in our textual assembler.
It was annoying that this generated cascading .Lcfi labels:
llc foo.ll -o - | llvm-mc | llvm-mc
After three trips through MCAsmStreamer, we'd have three labels in the
output when none are necessary. We should only bother creating the
labels and frame data when making a real object file.
This supercedes D38605, which moved the entire .seh_ implementation into
MCObjectStreamer.
This has the advantage that we do more checking when emitting textual
assembly, as a minor efficiency cost. Outputting textual assembly is not
performance critical, so this shouldn't matter.
Reviewers: majnemer, MatzeB
Subscribers: qcolombet, nemanjai, javed.absar, eraman, hiraditya, JDevlieghere, llvm-commits
Differential Revision: https://reviews.llvm.org/D38638
llvm-svn: 315259
We end up creating COPY's that are either truncating/extending and this
should be illegal.
https://reviews.llvm.org/D37640
Patch for X86 and ARM by igorb, rovka
llvm-svn: 315240
Summary:
On behalf of julia.koval@intel.com
The patch transforms canonical version of unsigned saturation, which is sub(max(a,b),a) or sub(a,min(a,b)) to special psubus insturuction on targets, which support it(8bit and 16bit uints).
umax(a,b) - b -> subus(a,b)
a - umin(a,b) -> subus(a,b)
There is also extra case handled, when right part of sub is 32 bit and can be truncated, using UMIN(this transformation was discussed in https://reviews.llvm.org/D25987).
The example of special case code:
```
void foo(unsigned short *p, int max, int n) {
int i;
unsigned m;
for (i = 0; i < n; i++) {
m = *--p;
*p = (unsigned short)(m >= max ? m-max : 0);
}
}
```
Max in this example is truncated to max_short value, if it is greater than m, or just truncated to 16 bit, if it is not. It is vaid transformation, because if max > max_short, result of the expression will be zero.
Here is the table of types, I try to support, special case items are bold:
| Size | 128 | 256 | 512
| ----- | ----- | ----- | -----
| i8 | v16i8 | v32i8 | v64i8
| i16 | v8i16 | v16i16 | v32i16
| i32 | | **v8i32** | **v16i32**
| i64 | | | **v8i64**
Reviewers: zvi, spatel, DavidKreitzer, RKSimon
Reviewed By: zvi
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D37534
llvm-svn: 315237
It's rare but there are a small number of patterns like this:
(set i64:$dst, (add i64:$src1, i64:$src2))
These should be equivalent to register classes except they shouldn't check for
a specific register bank.
This doesn't occur in AArch64/ARM/X86 but does occasionally come up in other
in-tree targets such as BPF.
llvm-svn: 315226
Summary:
swiftc emits symbols without flags set, which led dsymutil to ignore
them when searching for global symbols, causing dwarf location data
to be omitted. Xcode's dsymutil handles this case correctly, and emits
valid location data. Add this functionality to llvm-dsymutil by
allowing parsing of symbols with no flags set.
Reviewers: aprantl, friss, JDevlieghere
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D38587
llvm-svn: 315218
This allows rc files to have comments. Eventually we should
just use clang's c preprocessor, but that's a bit larger
effort for minimal gain, and this is straightforward.
Differential Revision: https://reviews.llvm.org/D38651
llvm-svn: 315207
E.g. if we have a (xor(overflow-bit), 1) where overflow-bit comes from an
intrinsic like llvm.sadd.with.overflow then we can kill the xor and use the
inverted condition code for the CSEL.
rdar://28495949
Reviewed By: kristof.beyls
Differential Revision: https://reviews.llvm.org/D38160
llvm-svn: 315205
We believe that despite AMD's documentation, that they really do support all 32 comparision predicates under AVX.
Differential Revision: https://reviews.llvm.org/D38609
llvm-svn: 315201
Summary:
We currently disable some converting of shuffles to MOVSS/MOVSD during legalization if SSE41 is enabled. But later during shuffle combining we go back to prefering MOVSS/MOVSD.
Additionally we have patterns that look for BLENDIs to detect scalar arithmetic operations. I believe due to the combining using MOVSS/MOVSD these are unnecessary.
Interestingly, we still codegen blend instructions even though lowering/isel emit movss/movsd instructions. Turns out machine CSE commutes them to blend, and then commuting those blends back into blends that are equivalent to the original movss/movsd.
This patch fixes the inconsistency in legalization to prefer MOVSS/MOVSD. The one test change was caused by this change. The problem is that we have integer types and are mostly selecting integer instructions except for the shufps. This shufps forced the execution domain, but the vpblendw couldn't have its domain changed with a naive instruction swap. We could fix this by special casing VPBLENDW based on the immediate to widen the element type.
The rest of the patch is removing all the excess scalar patterns.
Long term we should probably add isel patterns to make MOVSS/MOVSD emit blends directly instead of relying on the double commute. We may also want to consider emitting movss/movsd for optsize. I also wonder if we should still use the VEX encoded blendi instructions even with AVX512. Blends have better throughput, and that may outweigh the register constraint.
Reviewers: RKSimon, zvi
Reviewed By: RKSimon
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D38023
llvm-svn: 315181
Adding the scheduling information for the SkylakeServer (SKX) target.
This patch adds the instruction scheduling information for the SkylakeServer (SKX) architecture target by adding the file X86SchedSkylakeServer.td located under the X86 Target.
We used the scheduling information retrieved from the Skylake architects in order to create the file.
The scheduling information includes latency, number of micro-Ops and used ports by each SKL instruction.
The patch continues the scheduling replacement and insertion effort started with the SNB target in r310792, the HSW target in r311879 and the SkylakeClient (SKL) target in rL313613.
Please expect some performance fluctuations due to code alignment effects.
Reviewers: zvi, RKSimon, craig.topper, chandlerc, aymanmu
Differential Revision: https://reviews.llvm.org/D38443
Change-Id: I5c228fcc09e9e5a99b6116e62b356c4f9b971185
llvm-svn: 315175
Recognise cases when we can merge the shuffles with their horizontal (HADD/HSUB/PACK) instruction inputs.
Replaces an older implementation which performed some of this during lowering, expanding an existing target shuffle combine stage instead.
Differential Revision: https://reviews.llvm.org/D38506
llvm-svn: 315150
Say you have two identical linkonceodr functions, one in M1 and one in M2.
Say that the outliner outlines A,B,C from one function, and D,E,F from another
function (where letters are instructions). Now those functions are not
identical, and cannot be deduped. Locally to M1 and M2, these outlining
choices would be good-- to the whole program, however, this might not be true!
To mitigate this, this commit makes it so that the outliner sees linkonceodr
functions as unsafe to outline from. It also adds a flag,
-enable-linkonceodr-outlining, which allows the user to specify that they
want to outline from such functions when they know what they're doing.
Changing this handles most code size regressions in the test suite caused by
competing with linker dedupe. It also doesn't have a huge impact on the code
size improvements from the outliner. There are 6 tests that regress > 5% from
outlining WITH linkonceodrs to outlining WITHOUT linkonceodrs. Overall, most
tests either improve or are not impacted.
Not outlined vs outlined without linkonceodrs:
https://hastebin.com/raw/qeguxavuda
Not outlined vs outlined with linkonceodrs:
https://hastebin.com/raw/edepoqoqic
Outlined with linkonceodrs vs outlined without linkonceodrs:
https://hastebin.com/raw/awiqifiheb
Numbers generated using compare.py with -m size.__text. Tests run for AArch64
with -Oz -mllvm -enable-machine-outliner -mno-red-zone.
llvm-svn: 315136
Patch to fix ternlog instructions with a folded
broadcast. The broadcast decorator, e.g. {1toX}, was missing.
Differential Revision: https://reviews.llvm.org/D38649
llvm-svn: 315122
This patch adds two new verifiers:
- It checks that the root DIE of a CU is actually a valid unit DIE.
(based on its tag)
- For DWARF5 which contains a unit type int he CU header, it checks that
this matches the type of the unit DIE.
Differential revision: https://reviews.llvm.org/D38453
llvm-svn: 315121
This allows the escape sequences (\a, \n, \r, \t, \\, \x[0-9a-f]*,
\[0-7]*, "") to appear in .rc scripts. These are parsed and output in
the same way as it's done in original MS implementation.
The way these sequences are processed depends on the type of the
resource it resides in, and on whether the user declared the string to
be "wide" or "narrow".
I tried to maintain the maximum compatibility with the original tool
(and fail in some erroneous situations that are accepted by .rc).
However, there are some (extremely rare) cases where Microsoft tool
outputs nonsense. I found it infeasible to detect such casses.
Patch by Marek Sokolowski
Differential Revision: https://reviews.llvm.org/D38426
llvm-svn: 315118
This allows rc to serialize user-defined resources, as
documented at:
msdn.microsoft.com/en-us/library/windows/desktop/aa381054.aspx
Escape sequences are yet unavailable, and are to be added in one of
child patches.
Patch by: Marek Sokolowski
Differential Revision: https://reviews.llvm.org/D38423
llvm-svn: 315117
This allows llvm-rc to serialize STRINGTABLE resources.
These are output in an unusual way: we locate them at the end of the
file, and strings are merged into bundles of max 16 strings, depending
on their IDs, language, and characteristics.
Ref: msdn.microsoft.com/en-us/library/windows/desktop/aa381050.aspx
Patch by: Marek Sokolowski
Differential Revision: https://reviews.llvm.org/D38420
llvm-svn: 315112
This is now able to dump VERSIONINFO resources.
Ref: msdn.microsoft.com/en-us/library/windows/desktop/aa381058.aspx
Differential Revision: https://reviews.llvm.org/D38410
Patch by: Marek Sokolowski
llvm-svn: 315110
This is part 6 of llvm-rc serialization.
This adds ability to output cursors and icons as resources.
Unfortunately, we can't just copy .cur or .ico files to output - as each
file might contain multiple images, each of them needs to be unpacked
and stored as a separate resource. This forces us to parse cursor and
icon contents. (Fortunately, these formats are pretty similar and can be
processed by mostly common code).
As test files are binary, here is a short explanation of .cur and .ico
files stored:
cursor.cur, cursor-8.cur, cursor-32.cur are sample correct cursor files,
differing in their bit depth.
icon-old.ico, icon-new.ico are sample correct icon files;
icon-png.ico is a sample correct icon file in PNG format (instead of
usual BMP);
cursor-eof.cur is an incorrect cursor file - this is cursor.cur with
some of its final bytes removed.
cursor-bad-offset.cur is an incorrect cursor file - image header states
that image data begins at offset 0xFFFFFFFF.
Sample correct cursors and icons were created by Nico Weber.
Patch by Marek Sokolowski
Differential Revision: https://reviews.llvm.org/D37878
llvm-svn: 315109
This appears to be miscompiling Clang, as shown on two Windows bootstrap
bots:
http://lab.llvm.org:8011/builders/clang-x86-windows-msvc2015/builds/7611http://lab.llvm.org:8011/builders/clang-x64-ninja-win7/builds/6870
Nothing else is in the blame list. Both emit errors on this valid code
in the Windows ucrt headers:
C:\...\ucrt\malloc.h:95:32: error: invalid operands to binary expression ('char *' and 'int')
_Ptr = (char*)_Ptr + _ALLOCA_S_MARKER_SIZE;
~~~~~~~~~~~ ^ ~~~~~~~~~~~~~~~~~~~~~
I am attempting to reproduce this now.
This reverts r315044
llvm-svn: 315108
This is part 5 of llvm-rc serialization support.
This allows DIALOG and DIALOGEX to serialize if dialog-specific optional
statements are provided. These are (as of now): CAPTION, FONT, and
STYLE.
Notably, FONT statement can take more than two arguments when describing
DIALOGEX resources (as in
msdn.microsoft.com/en-us/library/windows/desktop/aa381013.aspx). I made
some changes to the parser to reflect this fact.
Patch by Marek Sokolowski
Differential Revision: https://reviews.llvm.org/D37864
llvm-svn: 315104
At the last LLVM dev meeting we had a debug info for optimized code
BoF session. In that session I presented some graphs that showed how
the quality of the debug info produced by LLVM changed over the last
couple of years. This is a cleaned up version of the patch I used to
collect the this data. It is implemented as an extension of
llvm-dwarfdump, adding a new --statistics option. The intended
use-case is to automatically run this on the debug info produced by,
e.g., our bots, to identify eyebrow-raising changes or regressions
introduced by new transformations that we could act on.
In the current form, two kinds of data are being collected:
- The number of variables that have a debug location versus the number
of variables in total (this takes into account inlined instances of
the same function, so if a variable is completely missing form only
one instance it will be found).
- The PC range covered by variable location descriptions versus the PC
range of all variables' containing lexical scopes.
The output format is versioned and extensible, so I'm looking forward
to both bug fixes and ideas for other data that would be interesting
to track.
Differential Revision: https://reviews.llvm.org/D36627
llvm-svn: 315101
In some cases an instruction is deleted from the block during combining,
however it can still exist in the legalizer worklist.
This change modifies the combiner routines to add the given MI to the
dead instruction list rather than trying to remove it from the block
themselves. The responsibility is then on the caller to delete the instruction
from the block and any worklists.
Differential Revision: https://reviews.llvm.org/D38622
llvm-svn: 315092
The bitcode reader looks specifically for `__DATA, __objc_catlist` as a
section name. However, SVN r304661 removed the spaces (the two names
are functionally equivalent but do not compare equally
lexicographically). This causes compatibility issues. Add an
auto-upgrade path for removing the spaces as well as use the new name in
the LTO plugin.
llvm-svn: 315086
This addresses two sources of inconsistency in test configuration
files.
1. Substitution boundaries. Previously you would specify a
substitution, such as 'lli', and then additionally a set
of characters that should fail to match before and after
the tool. This was used, for example, so that matches that
are parts of full paths would not be replaced. But not all
tools did this, and those that did would often re-invent
the set of characters themselves, leading to inconsistency.
Now, every tool substitution defaults to using a sane set
of reasonable defaults and you have to explicitly opt out
of it. This actually fixed a few latent bugs that were
never being surfaced, but only on accident.
2. There was no standard way for the system to decide how to
locate a tool. Sometimes you have an explicit path, sometimes
we would search for it and build up a path ourselves, and
sometimes we would build up a full command line. Furthermore,
there was no standardized way to handle missing tools. Do we
warn, fail, ignore, etc? All of this is now encapsulated in
the ToolSubst class. You either specify an exact command to
run, or an instance of FindTool('<tool-name>') and everything
else just works. Furthermore, you can specify an action to
take if the tool cannot be resolved.
Differential Revision: https://reviews.llvm.org/D38565
llvm-svn: 315085
Summary:
swiftc emits symbols without flags set, which led dsymutil to ignore
them when searching for global symbols, causing dwarf location data
to be omitted. Xcode's dsymutil handles this case correctly, and emits
valid location data. Add this functionality to llvm-dsymutil by
allowing parsing of symbols with no flags set.
Reviewers: aprantl, friss, JDevlieghere
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D38587
llvm-svn: 315082
Old expansion was 20 VGPRs, 78 SGPRs and ~380 instructions.
This expansion is 11 VGPRs, 12 SGPRs and ~120 instructions.
Passes OpenCL conformance test_integer_ops quick_[u]long_math
Differential Revision: https://reviews.llvm.org/D38607
llvm-svn: 315081
Summary: stripPointerCast is not reliably returning the value that's being type-casted. Instead it may look further at function attributes to further propagate the value. Instead of relying on stripPOintercast, the more reliable solution is to directly use the pointer to the promoted direct call.
Reviewers: tejohnson, davidxl
Reviewed By: tejohnson
Subscribers: llvm-commits, sanjoy
Differential Revision: https://reviews.llvm.org/D38603
llvm-svn: 315077
Unfortunately TableGen doesn't handle this yet:
Unable to deduce gMIR opcode to handle Src (which is a leaf).
Just add some temporary hand-written code to generate the proper MOVsr.
llvm-svn: 315071
Summary:
Xcode's dsymutil emits a __swift_ast DWARF section, which is required for debugging,
and which contains a byte-for-byte dump of the swiftmodule file.
Add this feature to llvm-dsymutil.
Tested with `gobjdump --dwarf=info -s`, by verifying that the contents of
`__DWARF.__swift_ast` match between Xcode's dsymutil and llvm-dsymutil
(Xcode's dwarfdump and llvm-dwarfdump don't currently recognize the
__swift_ast section).
Reviewers: aprantl, friss
Subscribers: llvm-commits, JDevlieghere
Differential Revision: https://reviews.llvm.org/D38504
llvm-svn: 315066
The machine scheduler (before register allocation) is enabled by default for
SystemZ.
The SelectionDAG scheduling preference now becomes source order scheduling
(was regpressure).
Review: Ulrich Weigand
https://reviews.llvm.org/D37977
llvm-svn: 315063
It is possible for two modules to define the same set of external
symbols without causing a duplicate symbol error at link time,
as long as each of the symbols is a comdat member. So we cannot
use them as part of a unique id for the module.
Differential Revision: https://reviews.llvm.org/D38602
llvm-svn: 315026
Add extending loads and constant offset patterns
A bit more refactoring of the tablegen to make the patterns fairly nice and
uniform between the regular and atomic loads.
Differential Revision: https://reviews.llvm.org/D38523
llvm-svn: 315022
Summary: In SamplePGO, when an indirect call is promoted in the profiled binary, before profile annotation, it will be promoted and inlined. For the original indirect call, the current implementation will not mark VP profile on it. This is an issue when profile becomes stale. This patch annotates VP prof on indirect calls during annotation.
Reviewers: tejohnson
Reviewed By: tejohnson
Subscribers: sanjoy, llvm-commits
Differential Revision: https://reviews.llvm.org/D38477
llvm-svn: 315016
Summary:
Xcode's dsymutil emits a __swift_ast DWARF section, which is required for debugging,
and which contains a byte-for-byte dump of the swiftmodule file.
Add this feature to llvm-dsymutil.
Tested with `gobjdump --dwarf=info -s`, by verifying that the contents of
`__DWARF.__swift_ast` match between Xcode's dsymutil and llvm-dsymutil
(Xcode's dwarfdump and llvm-dwarfdump don't currently recognize the
__swift_ast section).
Reviewers: aprantl, friss
Subscribers: llvm-commits, JDevlieghere
Differential Revision: https://reviews.llvm.org/D38504
llvm-svn: 315014
Ensure the program_headers call will fail correctly if the program
headers are larger than the underlying buffer.
Patch by Parker Thompson!
llvm-svn: 315012
Summary:
Xcode's dsymutil emits a __swift_ast DWARF section, which is required for debugging,
and which contains a byte-for-byte dump of the swiftmodule file.
Add this feature to llvm-dsymutil.
Tested with `gobjdump --dwarf=info -s`, by verifying that the contents of
`__DWARF.__swift_ast` match between Xcode's dsymutil and llvm-dsymutil
(Xcode's dwarfdump and llvm-dwarfdump don't currently recognize the
__swift_ast section).
Reviewers: aprantl, friss
Subscribers: llvm-commits, JDevlieghere
Differential Revision: https://reviews.llvm.org/D38504
llvm-svn: 315004
This is the same exact change we did for the current pass manager
in rL314997, but the new pass manager pipeline already happened
to run GlobalOpt after the inliner, so we just insert a run of
GDCE here.
llvm-svn: 315003
Summary: Make test robust enough to not fail due to CFG changes and re-enable for ARM/AArch64.
Reviewers: rovka, fhahn
Reviewed By: fhahn
Subscribers: fhahn, aemerson, rengolin, mcrosier, llvm-commits, kristof.beyls
Differential Revision: https://reviews.llvm.org/D38590
llvm-svn: 315002
The inliner performs some kind of dead code elimination as it goes,
but there are cases that are not really caught by it. We might
at some point consider teaching the inliner about them, but it
is OK for now to run GlobalOpt + GlobalDCE in tandem as their
benefits generally outweight the cost, making the whole pipeline
faster.
This fixes PR34652.
Differential Revision: https://reviews.llvm.org/D38154
llvm-svn: 314997
Implement .set dspr2 directive with appropriate feature bits. This
directive is a counterpart of -mattr=dspr2 command line option with the
exception that it does not influence elf header flags.
Patch by Milos Stojanovic.
Differential Revision: https://reviews.llvm.org/D38537
llvm-svn: 314994
There is data racing to the static variable RecordIndex in index profile reader
when merging in multiple threads. Make it a member variable in
IndexedInstrProfReader to fix this.
Differential Revision: https://reviews.llvm.org/D38431
llvm-svn: 314990
The code which lowers BUILD_VECTOR of consecutive loads into a single vector
load doesn't update chains properly. As a result the vector load can be
reordered with the store to the same location.
The current code in EltsFromConsecutiveLoads only updates the chain following
the first load. The fix is to update the chains following all the loads
comprising the vector.
This is a fix for PR10114.
Reviewed By: niravd
Differential Revision: https://reviews.llvm.org/D38547
llvm-svn: 314988
When ignoring a load that participates in an interleaved group, make sure to
move a cast that needs to sink after it.
Testcase derived from reproducer of PR34743.
Differential Revision: https://reviews.llvm.org/D38338
llvm-svn: 314986
Instead of trying to keep LastWidenRecipe updated after creating each recipe,
have tryToWiden() retrieve the last recipe of the current VPBasicBlock and check
if it's a VPWidenRecipe when attempting to extend its range. This ensures that
such extensions, optimized to maintain the original instruction order, do so
only when the instructions are to maintain their relative order. The latter does
not always hold, e.g., when a cast needs to sink to unravel first order
recurrence (r306884).
Testcase derived from reproducer of PR34711.
Differential Revision: https://reviews.llvm.org/D38339
llvm-svn: 314981
Previously, instructions that were defined to use the FGR64 register class
were associated with the Mips64 table which was incorrect.
Reviewers: nitesh.jain, atanasyan
Differential Revision: https://reviews.llvm.org/D38454
llvm-svn: 314976
Summary:
When reinserting debug values after register allocation, make sure to
insert debug values after each redefinition of debug value register in
the slot index range. The reason for this is that DwarfDebug will end
the range of a debug variable when the physical reg is defined. For
instructions with e.g. tied operands this result in prematurely ended
debug range.
This resolves pr34545
Patch by Karl-Johan Karlsson and Bjorn Pettersson
Reviewers: rnk, aprantl
Reviewed By: rnk
Subscribers: bjope, llvm-commits
Differential Revision: https://reviews.llvm.org/D38229
llvm-svn: 314974
Currently llvm-mc just hangs inside infinite loop
while trying to parse file which has ".section .с" inside,
where section name is non-english character.
Patch fixes the issue.
In this patch I also moved content of non-english-characters.s
to test/MC/AsmParser/Inputs folder so that non-english-characters.s
becomes a single testcase for all invalid inputs containing non-english
symbols. That is convinent because llvm-mc otherwise tries
to parse and tokenize the whole testcase file with tools invocations and
it is harder to isolate the issue.
Differential revision: https://reviews.llvm.org/D38545
llvm-svn: 314973
We were using an i1 type and then zero extending to a vector. Instead just create the 0/1 directly as a ConstantInt with the correct type. No need to ask ConstantExpr to zero extend for us.
This bug is a bit tricky to hit because it requires us to visit a zext of an icmp that would normally be simplified to true/false, but that icmp hasnt' been visited yet. In the test case this zext and icmp were created by visiting a udiv and due to worklist ordering we got to the zext first.
Fixes PR34841.
llvm-svn: 314971
Summary: This is to avoid e.g. merging two cheap icmps if the target is not going to expand to something nice later.
Reviewers: dberlin, spatel
Subscribers: davide, nemanjai
Differential Revision: https://reviews.llvm.org/D38232
llvm-svn: 314970
Summary:
The arg1 logging handler changed in compiler-rt to start writing a
different type for entries encountered when logging the first argument
of XRay-instrumented functions. This change allows the trace loader to
support reading these record types as well as prepare for when the
basic (naive) mode implementation starts writing down the argument
payloads.
Without this change, binaries with arg1 logging support enabled start
writing unreadable logs for any of the XRay tracing tools.
Reviewers: pelikan
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D38550
llvm-svn: 314967
We can support ashr similar to lshr, if we know that none of the shifted in bits are used. In that case SimplifyDemandedBits would normally convert it to lshr. But that conversion doesn't happen if the shift has additional users.
Differential Revision: https://reviews.llvm.org/D38521
llvm-svn: 314945
Summary:
If the pointer width is 32 bits and the calculated GEP offset is
negative, we call APInt::getLimitedValue(), which does a
*zero*-extension of the offset. That's wrong -- we should do an sext.
Fixes a bug introduced in rL314362 and found by Evgeny Astigeevich.
Reviewers: efriedma
Subscribers: sanjoy, javed.absar, llvm-commits, eastig
Differential Revision: https://reviews.llvm.org/D38557
llvm-svn: 314935
Function isLoweredToCall can only accept non-null function pointer, but a function pointer can be null for indirect function call. So check it before calling isLoweredToCall from getInstructionLatency.
Differential Revision: https://reviews.llvm.org/D38204
llvm-svn: 314927
Recommitting r314517 with the fix for handling ConstantExpr.
Original commit message:
Currently, getGEPCost() returns TCC_FREE whenever a GEP is a legal addressing
mode in the target. However, since it doesn't check its actual users, it will
return FREE even in cases where the GEP cannot be folded away as a part of
actual addressing mode. For example, if an user of the GEP is a call
instruction taking the GEP as a parameter, then the GEP may not be folded in
isel.
llvm-svn: 314923
It broke the Chromium / SQLite build; see PR34830.
> Summary:
> 1/ Operand folding during complex pattern matching for LEAs has been
> extended, such that it promotes Scale to accommodate similar operand
> appearing in the DAG.
> e.g.
> T1 = A + B
> T2 = T1 + 10
> T3 = T2 + A
> For above DAG rooted at T3, X86AddressMode will no look like
> Base = B , Index = A , Scale = 2 , Disp = 10
>
> 2/ During OptimizeLEAPass down the pipeline factorization is now performed over LEAs
> so that if there is an opportunity then complex LEAs (having 3 operands)
> could be factored out.
> e.g.
> leal 1(%rax,%rcx,1), %rdx
> leal 1(%rax,%rcx,2), %rcx
> will be factored as following
> leal 1(%rax,%rcx,1), %rdx
> leal (%rdx,%rcx) , %edx
>
> 3/ Aggressive operand folding for AM based selection for LEAs is sensitive to loops,
> thus avoiding creation of any complex LEAs within a loop.
>
> Reviewers: lsaba, RKSimon, craig.topper, qcolombet, jmolloy
>
> Reviewed By: lsaba
>
> Subscribers: jmolloy, spatel, igorb, llvm-commits
>
> Differential Revision: https://reviews.llvm.org/D35014
llvm-svn: 314919
Somehow a few massive errors slipped though the cracks of testing.
1. The code in Segment::finalize was left over from the old layout
algorithm. In certain situations this would cause very strange issues
with segment layout. For instance in the shift-segments.test case it
would cause the second segment to have the same offset as the first.
2. In debugging this I discovered another issue. Namely section alignment
was not being computed based on Section->Align but instead
Section->Offset which is bizarre and makes no sense. I have no clue how
it worked in the first place. This issue is also fixed
3. Fixing #2 exposed a bug where things were not being written past the end
of the file that technically should have been. This was because in
certain cases (like overlapping-segments) the end of the file wouldn't
always be bumped if the offset could be chosen relative to an existing
segment that already had it's offset chosen. For fully nested segments
this is fine but for overlapping segments this leaves the end of the
file short. So I changed how the offset is bumped when looping though
segments.
Differential Revision: https://reviews.llvm.org/D38436
llvm-svn: 314918
This patch redefines the MOVSS/MOVSD instructions to take VR128 as its second input. This allows the MOVSS/SD->BLEND commute to work without requiring a COPY to be inserted.
This should fix PR33079
Overall this looks to be an improvement in the generated code. I haven't checked the EXPENSIVE_CHECKS build but I'll do that and update with results.
Differential Revision: https://reviews.llvm.org/D38449
llvm-svn: 314914
r314857 changed the CFG that resulted in the flaky test MachineBranchProb.ll to
fail the bots again. Marking it as unsupported for ARM/AArch64 again until we
find the cause.
llvm-svn: 314912
We can likely remove most of these as redundant in the near future,
but I'm trying to make sure I don't introduce any regressions with D38514.
llvm-svn: 314907
Previously, on long branches (relative jumps of >4 kB), an assertion
failure was hit, as AVRInstrInfo::insertIndirectBranch was not
implemented. Despite its name, it is called by the branch relaxator
for *all* unconditional jumps.
Patch by Thomas Backman.
llvm-svn: 314891
In some cases, the code generator attempts to generate instructions such as:
lddw r24, Y+63
which expands to:
ldd r24, Y+63
ldd r25, Y+64 # Oops! This is actually ld r25, Y in the binary
This commit limits the first offset to 62, and thus the second to 63.
It also updates some asserts in AVRExpandPseudoInsts.cpp, including for
INW and OUTW, which appear to be unused.
Patch by Thomas Backman.
llvm-svn: 314890
This adds diagnostics for invalid immediate operands to the MOVW and MOVT
instructions (ARM and Thumb).
Differential revision: https://reviews.llvm.org/D31879
llvm-svn: 314888
Currently, our diagnostics for assembly operands are not consistent.
Some start with (for example) "immediate operand must be ...",
and some with "operand must be an immediate ...". I think the latter
form is preferable for a few reasons:
* It's unambiguous that it is referring to the expected type of operand, not
the type the user provided. For example, the user could provide an register
operand, and get a message taking about an operand is if it is already an
immediate, just not in the accepted range.
* It allows us to have a consistent style once we add diagnostics for operands
that could take two forms, for example a label or pc-relative memory operand.
Differential revision: https://reviews.llvm.org/D36689
llvm-svn: 314887
Summary:
1/ Operand folding during complex pattern matching for LEAs has been
extended, such that it promotes Scale to accommodate similar operand
appearing in the DAG.
e.g.
T1 = A + B
T2 = T1 + 10
T3 = T2 + A
For above DAG rooted at T3, X86AddressMode will no look like
Base = B , Index = A , Scale = 2 , Disp = 10
2/ During OptimizeLEAPass down the pipeline factorization is now performed over LEAs
so that if there is an opportunity then complex LEAs (having 3 operands)
could be factored out.
e.g.
leal 1(%rax,%rcx,1), %rdx
leal 1(%rax,%rcx,2), %rcx
will be factored as following
leal 1(%rax,%rcx,1), %rdx
leal (%rdx,%rcx) , %edx
3/ Aggressive operand folding for AM based selection for LEAs is sensitive to loops,
thus avoiding creation of any complex LEAs within a loop.
Reviewers: lsaba, RKSimon, craig.topper, qcolombet, jmolloy
Reviewed By: lsaba
Subscribers: jmolloy, spatel, igorb, llvm-commits
Differential Revision: https://reviews.llvm.org/D35014
llvm-svn: 314886
I found that llvm-mc does not like non-english characters even in comments,
which it tries to tokenize.
Problem happens because of functions like isdigit(), isalnum() which takes
int argument and expects it is not negative.
But at the same time MCParser uses char* to store input buffer poiner, char has signed value,
so it is possible to pass negative value to one of functions from above and
that triggers an assert.
Testcase for demonstration is provided.
To fix the issue helper functions were introduced in StringExtras.h
Differential revision: https://reviews.llvm.org/D38461
llvm-svn: 314883
This time invoking llc with "-march=x86-64" in the testcase, so we don't assume
the default target is x86.
Summary:
If we have
%vreg0<def> = PHI %vreg2<undef>, <BB#0>, %vreg3, <BB#2>; GR32:%vreg0,%vreg2,%vreg3
%vreg3<def,tied1> = ADD32ri8 %vreg0<kill,tied0>, 1, %EFLAGS<imp-def>; GR32:%vreg3,%vreg0
then we can't just change %vreg0 into %vreg3, since %vreg2 is actually
undef. We would have to also copy the undef flag to be able to change the
register.
Instead we deal with this case like other cases where we can't just
replace the register: we insert a COPY. The code creating the COPY already
copied all flags from the PHI input, so the undef flag will be transferred
as it should.
Reviewers: kparzysz
Reviewed By: kparzysz
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D38235
llvm-svn: 314882
We have found some corner cases connected to range intersection where IRCE makes
a bad thing when the latch condition is unsigned. The fix for that will go as a follow up.
This patch temporarily disables IRCE for unsigned latch conditions until the issue is fixed.
The unsigned latch conditions were introduced to IRCE by rL310027.
Differential Revision: https://reviews.llvm.org/D38529
llvm-svn: 314881
Summary:
If we have
%vreg0<def> = PHI %vreg2<undef>, <BB#0>, %vreg3, <BB#2>; GR32:%vreg0,%vreg2,%vreg3
%vreg3<def,tied1> = ADD32ri8 %vreg0<kill,tied0>, 1, %EFLAGS<imp-def>; GR32:%vreg3,%vreg0
then we can't just change %vreg0 into %vreg3, since %vreg2 is actually
undef. We would have to also copy the undef flag to be able to change the
register.
Instead we deal with this case like other cases where we can't just
replace the register: we insert a COPY. The code creating the COPY already
copied all flags from the PHI input, so the undef flag will be transferred
as it should.
Reviewers: kparzysz
Reviewed By: kparzysz
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D38235
llvm-svn: 314879
The previous version didn't work if the jump table base address didn't
fit in 32 bit, since it was encoded as an immediate offset. And in case
the jump table is encoded as 32 bit label differences, we need to
load and add them to the table base first.
This solves the first half of the issues mentioned in PR34720.
Also fix some of the errors pointed out by -verify-machineinstrs, by
using GR32_NOSPRegClass.
Differential Revision: https://reviews.llvm.org/D38333
llvm-svn: 314876
Test needs some slight adjustment because we no longer check the existence of
BFI but rather that the actual hotness is set on the remark. If entry_count
is not set getBlockProfileCount returns None.
llvm-svn: 314874
Summary:
After r308422 we defer optimizations that can destroy loop canonical forms to
LateSimplifyCFG. Running LateSimplifyCFG after expanding atomic operations
can exploit more control-flow opportunities.
Reviewers: mcrosier, t.p.northover, efriedma
Reviewed By: efriedma
Subscribers: aemerson, rengolin, javed.absar, llvm-commits, kristof.beyls
Differential Revision: https://reviews.llvm.org/D38262
llvm-svn: 314857
Surprisingly, we have zero coverage for these patterns.
Many of these are handled in InstSimplify, but it's not obvious
what the rule for folding each case should be, so I've just
stamped out everything.
It should be possible to fold every case, but currently, we
miss these:
int ashr_slt(int x) {
return (x >> 1) < 1;
}
int ashr_sgt(int x) {
return (x >> 1) > 0;
}
https://godbolt.org/g/aB2hLE
llvm-svn: 314837
This commit does two things. Firstly, it cleans up some of the benefit
calculation wrt outlined functions and candidates. Secondly, it fixes an
off-by-one bug in the cost model which was caused by the benefit value of
an OutlinedFunction and Candidate differing by 1. It updates the remarks test
to reflect this change.
llvm-svn: 314836
Summary:
For the amdpal OS type:
We write an AMDGPU_PAL_METADATA record in the .note section in the ELF
(or as an assembler directive). It contains key=value pairs of 32 bit
ints. It is a merge of metadata from codegen of the shaders, and
metadata provided by the frontend as _amdgpu_pal_metadata IR metadata.
Where both sources have a key=value with the same key, the two values
are ORed together.
This .note record is part of the amdpal ABI and will be documented in
docs/AMDGPUUsage.rst in a future commit.
Eventually the amdpal OS type will stop generating the .AMDGPU.config
section once the frontend has safely moved over to using the .note
records above instead of .AMDGPU.config.
Reviewers: arsenm, nhaehnle, dstuttard
Subscribers: kzhuravl, wdng, yaxunl, llvm-commits, t-tye
Differential Revision: https://reviews.llvm.org/D37753
llvm-svn: 314829
The test fails on Linux; see follow-up email on the llvm-commits list.
> Add the option to lookup an address in the debug information and print
> out the file, function, block and line table details.
>
> Differential revision: https://reviews.llvm.org/D38409
This also reverts the follow-up r314818:
> [test] Fix llvm-dwarfdump/cmdline.test
>
> Fixes test/tools/llvm-dwarfdump/cmdline.test
llvm-svn: 314825
All the buildbots are red, e.g.
http://lab.llvm.org:8011/builders/clang-cmake-aarch64-lld/builds/2436/
> Summary:
> This patch tries to vectorize loads of consecutive memory accesses, accessed
> in non-consecutive or jumbled way. An earlier attempt was made with patch D26905
> which was reverted back due to some basic issue with representing the 'use mask' of
> jumbled accesses.
>
> This patch fixes the mask representation by recording the 'use mask' in the usertree entry.
>
> Change-Id: I9fe7f5045f065d84c126fa307ef6ebe0787296df
>
> Reviewers: mkuper, loladiro, Ayal, zvi, danielcdh
>
> Reviewed By: Ayal
>
> Subscribers: hans, mzolotukhin
>
> Differential Revision: https://reviews.llvm.org/D36130
llvm-svn: 314824
The list of register ids was previously written out in a couple of dirrent
places. This puts it in a .def file and also adds a few more registers (e.g.
the x87 regs) which should lead to more readable dumps, but I didn't include
the whole list since that seems unnecessary.
X86_MC::initLLVMToSEHAndCVRegMapping is pretty ugly, but at least it's not
relying on magic constants anymore. The TODO of using tablegen still stands.
Differential revision: https://reviews.llvm.org/D38480
llvm-svn: 314821
Summary:
This should fix a regression introduced by r313786, which switched from
MachineInstr::isIndirectDebugValue() to checking if operand 1 is an
immediate. I didn't have a test case for it until now.
A single UserValue, which approximates a user variable, may have many
DBG_VALUE instructions that disagree about whether the variable is in
memory or in a virtual register. This will become much more common once
we have llvm.dbg.addr, but you can construct such a test case manually
today with llvm.dbg.value.
Before this change, we would get two UserValues: one for direct and one
for indirect DBG_VALUE instructions describing the same variable. If we
build separate interval maps for direct and indirect locations, we will
end up accidentally coalescing identical DBG_VALUE intervals that need
to remain separate because they are broken up by intervals of the
opposite direct-ness.
Reviewers: aprantl
Subscribers: llvm-commits, hiraditya
Differential Revision: https://reviews.llvm.org/D37932
llvm-svn: 314819
Add the option to lookup an address in the debug information and print
out the file, function, block and line table details.
Differential revision: https://reviews.llvm.org/D38409
llvm-svn: 314817
Summary:
This patch tries to vectorize loads of consecutive memory accesses, accessed
in non-consecutive or jumbled way. An earlier attempt was made with patch D26905
which was reverted back due to some basic issue with representing the 'use mask' of
jumbled accesses.
This patch fixes the mask representation by recording the 'use mask' in the usertree entry.
Change-Id: I9fe7f5045f065d84c126fa307ef6ebe0787296df
Reviewers: mkuper, loladiro, Ayal, zvi, danielcdh
Reviewed By: Ayal
Subscribers: hans, mzolotukhin
Differential Revision: https://reviews.llvm.org/D36130
llvm-svn: 314806
This switches the ARM AsmParser to use assembly operand diagnostics from
tablegen, rather than a switch statement on the ARMMatchResultTy. It
moves the existing diagnostic strings to tablegen, but adds no new ones,
so this is NFC except for one diagnostic string that had an off-by-1 error
in the hand-written switch statement.
Differential revision: https://reviews.llvm.org/D31607
llvm-svn: 314804
tryParseRegister advances the lexer, so we need to take copies of the start and
end locations of the register operand before calling it.
Previously, the caret in the diagnostic pointer to the comma after the r0
operand in the test, rather than the start of the operand.
Differential revision: https://reviews.llvm.org/D31537
llvm-svn: 314799
The dsp register class is an alias of the gpr register class, so
we have to define instructions for spilling and reloading.
Reviewers: atanasyan
Differential Revision: https://reviews.llvm.org/D38038
llvm-svn: 314798
This lets us optimize away selects that perform the same address computation in
two different ways and is also the first step towards being able to handle
selects between two different, but compatible, address computations.
Differential Revision: https://reviews.llvm.org/D38242
llvm-svn: 314794
If the upper bits of a truncation shuffle patterns have at least the minimum number of sign/zero bits on their inputs then we can safely use PACKSS/PACKUS as shuffles.
Partial fix for https://bugs.llvm.org/show_bug.cgi?id=34773
Differential Revision: https://reviews.llvm.org/D38472
llvm-svn: 314788
Summary:
Take the target's endianness into account when splitting the
debug information in DAGTypeLegalizer::SetExpandedInteger.
This patch fixes so that, for big-endian targets, the fragment
expression corresponding to the high part of a split integer
value is placed at offset 0, in order to correctly represent
the memory address order.
I have attached a PPC32 reproducer where the resulting DWARF
pieces for a 64-bit integer were incorrectly reversed.
Original patch was reverted due to using -stop-after=isel in
the test case (but that is only working when AMDGPU target
is included in the llc build). The test case has now been
updated to use -stop-before=expand-isel-pseudos instead.
Patch by: dstenb
Reviewers: JDevlieghere, aprantl, dblaikie
Reviewed By: JDevlieghere, aprantl, dblaikie
Subscribers: nemanjai
Differential Revision: https://reviews.llvm.org/D38172
llvm-svn: 314781
This converts the ARM AsmParser to use the new assembly matcher error
reporting mechanism, which allows errors to be reported for multiple
instruction encodings when it is ambiguous which one the user intended
to use.
By itself this doesn't improve many error messages, because we don't have
diagnostic text for most operand types, but as we add that then this will allow
more of those diagnostic strings to be used when they are relevant.
Differential revision: https://reviews.llvm.org/D31530
llvm-svn: 314779
This was previously being silently dropped by obj2yaml and caused
parsing errors with yaml2obj.
Differential Revision: https://reviews.llvm.org/D38490
llvm-svn: 314768
This makes sure the LSDA pointer isn't truncated to 32 bit.
Make LowerINTRINSIC_WO_CHAIN a member function instead of a static
function, so that it can use the getGlobalWrapperKind method.
This solves the second half of the issues mentioned in PR34720.
Differential Revision: https://reviews.llvm.org/D38343
llvm-svn: 314767
Summary:
When checking if a constant expression is a noop cast we fetched the
IntPtrType by doing DL->getIntPtrType(V->getType())). However, there can
be cases where V doesn't return a pointer, and then getIntPtrType()
triggers an assertion.
Now we pass DataLayout to isNoopCast so the method itself can determine
what the IntPtrType is.
Reviewers: arsenm
Reviewed By: arsenm
Subscribers: wdng, llvm-commits
Differential Revision: https://reviews.llvm.org/D37894
llvm-svn: 314763
Call ConstantFoldSelectInstruction() to fold cases like below
select <2 x i1><i1 true, i1 false>, <2 x i8> <i8 0, i8 1>, <2 x i8> <i8 2, i8 3>
All operands are constants and the condition has mixed true and false conditions.
Differential Revision: https://reviews.llvm.org/D38369
llvm-svn: 314741