Third time's the charm. The previous commit was reverted as a
reverse for-loop in SelectionDAGBuilder::lowerWorkItem did 'I--'
on an iterator at the beginning of a vector, causing asserts
when using debugging iterators. This commit fixes that.
llvm-svn: 235608
Summary:
Make sure the abbrev operands are valid and that we can read/skip them
afterwards.
Bug found with AFL fuzz.
Reviewers: rafael
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D9030
llvm-svn: 235595
Patch to remove extra bitcasts from shuffles, this is often a legacy of XformToShuffleWithZero being used to combine bitmaskings (of float vectors bitcast to integer vectors) into shuffles: bitcast(shuffle(bitcast(s0),bitcast(s1))) -> shuffle(s0,s1)
Differential Revision: http://reviews.llvm.org/D9097
llvm-svn: 235578
This patch refactors the definition of common utility function "isInductionPHI" to LoopUtils.cpp.
This fixes compilation error when configured with -DBUILD_SHARED_LIBS=ON
llvm-svn: 235577
This is a re-commit of r235101, which also fixes the problems with the previous patch:
- Switches with only a default case and non-fallthrough were handled incorrectly
- The previous patch tickled a bug in PowerPC Early-Return Creation which is fixed here.
> This is a major rewrite of the SelectionDAG switch lowering. The previous code
> would lower switches as a binary tre, discovering clusters of cases
> suitable for lowering by jump tables or bit tests as it went along. To increase
> the likelihood of finding jump tables, the binary tree pivot was selected to
> maximize case density on both sides of the pivot.
>
> By not selecting the pivot in the middle, the binary trees would not always
> be balanced, leading to performance problems in the generated code.
>
> This patch rewrites the lowering to search for clusters of cases
> suitable for jump tables or bit tests first, and then builds the binary
> tree around those clusters. This way, the binary tree will always be balanced.
>
> This has the added benefit of decoupling the different aspects of the lowering:
> tree building and jump table or bit tests finding are now easier to tweak
> separately.
>
> For example, this will enable us to balance the tree based on profile info
> in the future.
>
> The algorithm for finding jump tables is quadratic, whereas the previous algorithm
> was O(n log n) for common cases, and quadratic only in the worst-case. This
> doesn't seem to be major problem in practice, e.g. compiling a file consisting
> of a 10k-case switch was only 30% slower, and such large switches should be rare
> in practice. Compiling e.g. gcc.c showed no compile-time difference. If this
> does turn out to be a problem, we could limit the search space of the algorithm.
>
> This commit also disables all optimizations during switch lowering in -O0.
>
> Differential Revision: http://reviews.llvm.org/D8649
llvm-svn: 235560
Only clear out the NSW/NUW flags if we are optimizing 'add'/'sub' while
taking advantage that the sign bit is not set. We do this optimization
to further shrink the mask but shrinking the mask isn't NSW/NUW
preserving in this case.
llvm-svn: 235558
This removes the -sehprepare flag and makes __C_specific_handler
functions always to use WinEHPrepare.
This was tested by building all of chromium_builder_tests and running a
few tests that use SEH, but if something breaks, we can revert this.
llvm-svn: 235557
In particular, this handles SSA values that are live *out* of a handler.
The existing code only handles values that are live *in* to a handler.
It also handles phi nodes in the block where normal control should
resume after the end of a catch handler. When EH return points have phi
nodes, we need to split the return edge. It is impossible for phi
elimination to emit copies in the previous block if that block gets
outlined. The indirectbr that we leave in the function is only notional,
and is eliminated from the MachineFunction CFG early on.
Reviewers: majnemer, andrew.w.kaylor
Differential Revision: http://reviews.llvm.org/D9158
llvm-svn: 235545
An nsw/nuw operation relies on the values feeding into it to not
overflow if 'poison' is not to be produced. This means that
optimizations which make modifications to the bottom of a chain (like
SimplifyDemandedBits) must strip out nsw/nuw if they cannot ensure that
they will be preserved.
This fixes PR23309.
llvm-svn: 235544
I added the poisoning back in r76891 (2009) because of some bugs in
Unladen Swallow, and then Evan Cheng added the setRangeWritable() call
in r81308. Profiling a Release+Asserts build on Windows shows that this
memory protection call is actually very expensive. 4 seconds of a 70
second Clang compilation are spent in VirtualQuery. These days we have
more reliable tools like ASan to find these kinds of bugs, so we can go
ahead and retire these checks.
llvm-svn: 235542
This reverts commit r235458.
It looks like this might be breaking something LTO-ish. Looking into it
& will recommit with a fix/test case/etc once I've got more to go on.
llvm-svn: 235533
The CondOpt pass currently uses LiveIntervals to set the dead flag on a def. This patch uses MachineRegisterInfo::use_empty instead as that is equivalent to the def being dead.
This removes an instance of LiveIntervals in the pass manager pipeline and saves 3.8% of compile time on llc conpiled for AArch64.
Reviewed by Chad Rosier and Zhaoshi.
llvm-svn: 235532
Summary:
Remove the CHECK-DAG calls introduced in r235341, and add a comment that
this test may break due to scheduling variations.
This patch completes the fix discussed in http://reviews.llvm.org/D8804
Reviewers: dsanders, srhines
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D9178
llvm-svn: 235530
This causes OpKind and all the bitfields after it to use 32-bit load/stores instead of i24's for the existing bitfields.
Bugs will be filed to track whether clang and llvm could have generated the 32-bit operations in the front-end or optimizer.
Reviewed by Rafael.
llvm-svn: 235528
ARM32 ELF R_ARM_V4BX relocation format is a special relocation type
that records the location of an ARMv4t BX instruction to enable a
static linker to generate ARMv4 compatible instructions. This
relocation does not contain a reference symbol.
This patch enabled its creation by removing the requeriment of a
relocation symbol target in ELFState<ELFT>::writeSectionContent.
llvm-svn: 235513
An assert was triggered when attempting to create a new SCEV
with operands of different types in the visitAddRecExpr. In this
test case, the operand types of the numerator and denominator
are different. The SCEV division code should generate a
conservative answer when this happens.
Differential Revision: http://reviews.llvm.org/D9021
llvm-svn: 235511
This fixes a regression introduced at revision 218263.
On AVX, if we optimize for size, a splat build_vector of a load
is lowered into a VBROADCAST node. This is done even if the value type of the
splat build_vector node is v2i64.
Since AVX doesn't support v2f64/v2i64 broadcasts, revision 218263 added two
extra tablegen patterns to allow selecting a VMOVDDUPrm from an X86VBroadcast
where the scalar element comes from a loadi64/loadf64.
However, revision 218263 forgot to add an extra fallback pattern for the case
where we have a X86VBroadcast of a loadi64 with multiple uses.
This patch adds the missing tablegen pattern in X86InstrSSE.td.
This patch also adds an extra test to 'splat-for-size.ll' to verify that ISel
doesn't crash with a 'fatal error in the backend' due to a missing AVX pattern
to select v2i64 X86ISD::BROADCAST nodes.
llvm-svn: 235509
This turned up after r235333, but was a pre-existing bug. The optimization
which transforms select(c, load, load) into a load of a select of the addresses
does not handle indexed loads (pre/post inc/dec). However, it did not check for
them either, leading to a crash if it tried to transform one of them.
llvm-svn: 235497
Enough concerns were raised that this optimization is pessimising some code patterns.
The obvious fix, to add a Reassociate run afterwards, causes even more pessimisation in some cases due to fewer complex addressing modes being matched. As there isn't a trivial fix for this, backing this out by default until someone gets a chance to fix the addressing mode matcher.
llvm-svn: 235491
X86 backend.
The code generated for symbolic targets is identical to the code generated for
constant targets, except that a relocation is emitted to fix up the actual
target address at link-time. This allows IR and object files containing
patchpoints to be cached across JIT-invocations where the target address may
change.
llvm-svn: 235483
Previously the code was accidentally checking if 'this' was an IntRecTy which it can't be since 'this' is a BitRecTy. Looking back at the history it appears it was intended to check RHS.
llvm-svn: 235477
Without pointee types the space optimization of storing only the pointer
type and not the value type won't be viable - so add the extra type
information that would be missing.
llvm-svn: 235475
Without pointee types the space optimization of storing only the pointer
type and not the value type won't be viable - so add the extra type
information that would be missing.
Storeatomic coming soon.
llvm-svn: 235474
Add a flag to lib/Linker (and `llvm-link`) to override linkage rules.
When set, the functions in the source module *always* replace those in
the destination module.
The `llvm-link` option is `-override=abc.ll`. All the "regular" modules
are loaded and linked first, followed by the `-override` modules. This
is useful for debugging workflows where some subset of the module (e.g.,
a single function) is extracted into a separate file where it's
optimized differently, before being merged back in.
Patch by Luqman Aden!
llvm-svn: 235473
Factor the loop for linking input files together into a combined module
into a separate function. This is in preparation for an upcoming patch
that runs the logic twice.
Patch by Luqman Aden!
llvm-svn: 235472
Turns out I misread the parentheses. Though I'm pretty sure its always a RecordRecTy and non of the callers really seem to expect null. But until I'm completely sure I'm going to revert this.
llvm-svn: 235469
With SSE2, we can generate a 'movq' or other 64-bit store op on a 32-bit system
even though 64-bit integers are not legal types.
So instead of producing this:
pshufd $229, %xmm0, %xmm1 ## xmm1 = xmm0[1,1,2,3]
movd %xmm0, (%eax)
movd %xmm1, 4(%eax)
We can do:
movq %xmm0, (%eax)
This is a fix for the problem noted in D7296.
Differential Revision: http://reviews.llvm.org/D9134
llvm-svn: 235460
We should also teach the inliner to collapse framerecover of
frameaddress of the current frame down to an alloca, but that can happen
later.
llvm-svn: 235459
Calls to llvm::Value::mutateType are becoming extra-sensitive now that
instructions have extra type information that will not be derived from
operands or result type (alloca, gep, load, call/invoke, etc... ). The
special-handling for mutateType will get more complicated as this work
continues - it might be worth making mutateType virtual & pushing the
complexity down into the classes that need special handling. But with
only two significant uses of mutateType (vectorization and linking) this
seems OK for now.
Totally open to ideas/suggestions/improvements, of course.
With this, and a bunch of exceptions, we can roundtrip an indirect call
site through bitcode and IR. (a direct call site is actually trickier...
I haven't figured out how to deal with the IR deserializer's lazy
construction of Function/GlobalVariable decl's based on the type of the
entity which means looking through the "pointer to T" type referring to
the global)
llvm-svn: 235458
https://llvm.org/bugs/show_bug.cgi?id=23163.
Gep merging sometimes behaves like a reverse CSE/LICM optimization,
which has negative impact on performance. In this patch we restrict
gep merging to happen only when the indexes to be merged are both consts,
which ensures such merge is always beneficial.
The patch makes gep merging only happen in very restrictive cases.
It is possible that some analysis/optimization passes rely on the merged
geps to get better result, and we havn't notice them yet. We will be ready
to further improve it once we see the cases.
Differential Revision: http://reviews.llvm.org/D8911
llvm-svn: 235455
https://llvm.org/bugs/show_bug.cgi?id=23163.
Gep merging sometimes behaves like a reverse CSE/LICM optimizations,
which has negative impact on performance. In this patch we restrict
gep merging to happen only when the indexes to be merged are both consts,
which ensures such merge is always beneficial.
The patch makes gep merging only happen in very restrictive cases.
It is possible that some analysis/optimization passes rely on the merged
geps to get better result, and we havn't notice them yet. We will be ready
to further improve it once we see the cases.
Differential Revision: http://reviews.llvm.org/D9007
llvm-svn: 235451
MemIntrinsic::getDest() looks through pointer casts, and using it
directly when building the new GEP+memset results in stuff like:
%0 = getelementptr i64* %p, i32 16
%1 = bitcast i64* %0 to i8*
call ..memset(i8* %1, ...)
instead of the correct:
%0 = bitcast i64* %p to i8*
%1 = getelementptr i8* %0, i32 16
call ..memset(i8* %1, ...)
Instead, use getRawDest, which just gives you the i8* value.
While there, use the memcpy's dest, as it's live anyway.
In most cases, when the optimization triggers, the memset and memcpy
sizes are the same, so the built memset is 0-sized and eliminated.
The problem occurs when they're different.
Fixes a regression caused by r235232: PR23300.
llvm-svn: 235419
Summary:
This lets us use range based for loops.
Reviewers: chandlerc
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D9169
llvm-svn: 235416
Summary:
With D9096 and D9101, there's no need to run DCE after SLSR and
SeparateConstOffsetFromGEP.
Test Plan: no regression
Reviewers: jholewinski, meheff
Subscribers: jholewinski, llvm-commits
Differential Revision: http://reviews.llvm.org/D9172
llvm-svn: 235415
Remove the `DIArray` and `DITypeArray` typedefs, preferring the
underlying types (`DebugNodeArray` and `MDTypeRefArray`, respectively).
llvm-svn: 235413
Summary:
After we rewrite a candidate, the instructions used by the old form may
become unused. This patch cleans up these unused instructions so that we
needn't run DCE after SLSR.
Test Plan: removed -dce in all the SLSR tests
Reviewers: broune, meheff
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D9101
llvm-svn: 235410
Summary: so that we needn't run DCE after this pass.
Test Plan: removed -dce from the commandline in split-gep.ll and split-gep-and-gvn.ll
Reviewers: meheff
Subscribers: llvm-commits, HaoLiu, hfinkel, jholewinski
Differential Revision: http://reviews.llvm.org/D9096
llvm-svn: 235409
Summary:
MemorySSA uses this algorithm as well, and this enables us to reuse the code in both places.
There are no actual algorithm or datastructure changes in here, just code movement.
Reviewers: qcolombet, chandlerc
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D9118
llvm-svn: 235406
Remove early returns for when `getVariable()` is null, and just assert
that it never happens. The Verifier already confirms that there's a
valid variable on these intrinsics, so we should assume the debug info
isn't broken. I also updated a check for a `!dbg` attachment, which the
Verifier similarly guarantees.
llvm-svn: 235400
Keep the old SEH fan-in lowering on by default for now, since projects
rely on it. This will make it easy to test this change with a simple
flag flip.
llvm-svn: 235399
There doesn't seem to be a reason to perform this target ISD node matching
in an DAGCombine, moving it to lowering fixes PR23296.
Differential Revision: http://reviews.llvm.org/D9137
llvm-svn: 235394
Summary:
This directive is exactly the same as .asciz, except it's only used by MIPS.
It is used to store null terminated strings in object files.
Reviewers: rafael, dsanders, echristo
Reviewed By: dsanders, echristo
Subscribers: echristo, llvm-commits
Differential Revision: http://reviews.llvm.org/D7530
llvm-svn: 235382
Summary:
The 64-bit version of the variable shift instructions uses the
shift_rotate_reg class which uses a GPR32Opnd to specify the variable
shift amount. With this patch we avoid the generation of a redundant
SLL instruction for the variable shift instructions in 64-bit targets.
Reviewers: dsanders
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D7413
llvm-svn: 235376
This is an updated version of Chandler's patch D7402 that got accepted but never committed, and has bit-rotted a bit since.
I've updated the execution domain declarations to match the approach of the packed templates and also added some extra scalar unary tests.
Differential Revision: http://reviews.llvm.org/D9095
llvm-svn: 235372
Fixed issue with the combine of CONCAT_VECTOR of 2 BUILD_VECTOR nodes - the optimisation wasn't ensuring that the scalar operands of both nodes were the same type/size for implicit truncation.
Test case spotted by Patrik Hagglund
llvm-svn: 235371
Summary:
This fixes http://llvm.org/bugs/show_bug.cgi?id=16439.
This is one possible way to approach this. The other would be to split InL>>(nbits-Amt) into (InL>>(nbits-1-Amt))>>1, which is also valid since since we only need to care about Amt up nbits-1. It's hard to tell which one is better since the shift might be expensive if this stage of expansion is not yet a legal machine integer, whereas comparisons with zero are relatively cheap at all sizes, but more expensive than a shift if the shift is on a legal machine type.
Patch by Keno Fischer!
Test Plan: regression test from http://reviews.llvm.org/D7752
Reviewers: chfast, resistor
Reviewed By: chfast, resistor
Subscribers: sanjoy, resistor, chfast, llvm-commits
Differential Revision: http://reviews.llvm.org/D4978
llvm-svn: 235370
This brings the utils/vim folder into a more vim-like format by moving
the syntax hightlighting files into a syntax subdirectory. It adds
some minimal settings that everyone should agree on to ftdetect/ftplugin and
features a new indentation plugin for .ll files.
llvm-svn: 235369
X86ISD::ADDSUB, X86ISD::(F)HADD, X86ISD::(F)HSUB should not be selected
if the operand types do not match the result type because vector type
legalization cannot deal with this for custom nodes.
Testcase X86ISD::ADDSUB is attached. I could not create a testcase for
the FHADD/FHSUB cases because of: https://llvm.org/bugs/show_bug.cgi?id=23296
Differential Revision: http://reviews.llvm.org/D9120
llvm-svn: 235367
Summary:
Bundle aligment requires that the functions always start at an aligned address.
Usually this is ensured by the compiler, but assembly code does not always
begin with a .align directive.
This change ensures that sections get the correct alignment if they contain
any instructions and bundling is enabled. (It also makes LLVM match the
behavior of GNU as).
Differential Revision: http://reviews.llvm.org/D9131
llvm-svn: 235365
Summary:
In the f16-promote test, make the checks for native conversion instructions
similar to the libcall checks:
- Remove hard coded register names
- Do not check exact instruction sequences.
This fixes test flakiness due to non-determinism in instruction
scheduling and register allocation. I also fixed a few minor things in
the CHECK-LIBCALL checks.
I'll try to find a way to check that unnecessary loads, stores, or
conversions don't happen.
Reviewers: mzolotukhin, srhines, ab
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D9112
llvm-svn: 235363
Summary:
This patch adds two flags to `bugpoint`: "-replace-funcs-with-null" and "-disable-pass-list-reduction".
When "-replace-funcs-with-null" is specified, bugpoint will, instead of simply deleting function bodies, replace all uses of functions and then will delete functions completely from the test module, correctly handling aliasing and @llvm.used && @llvm.compiler.used. This part was conceived while trying to debug the PNaCl IR simplification passes, which don't allow undefined functions (ie no declarations).
With "-disable-pass-list-reduction", bugpoint won't try to reduce the set of passes causing the "crash". This is needed in cases where one is trying to debug an issue inside the PNaCl IR simplification passes which is causing an PNaCl ABI verification error, for example.
Reviewers: jfb
Reviewed By: jfb
Subscribers: jfb, llvm-commits
Differential Revision: http://reviews.llvm.org/D8555
llvm-svn: 235362
Delete subclasses of (the already deleted) `DIType` in favour of
directly using pointers from the `Metadata` hierarchy.
While `DICompositeType` wraps `MDCompositeTypeBase` and `DIDerivedType`
wraps `MDDerivedTypeBase`, most uses of each really meant the more
specific `MDCompositeType` and `MDDerivedType`.
llvm-svn: 235351
The version of `constructTypeDIE()` for `MDSubroutineType` is unrelated
to (and has different callers than) the `MDCompositeType`. Split the
two in half.
This simplifies an upcoming patch to delete `DICompositeType`. There
shouldn't be any real functionality change here. `createTypeDIE()` is
`cast<>`'ing where it didn't need to before, but that function in turn
is only called for true `MDCompositeType`s.
llvm-svn: 235349
the function body.
This is necessary for correctness when lazily compiling.
Also, flesh out the Orc unit test infrastructure slightly, and add a unit test
for this.
llvm-svn: 235347
Update comment style in `DwarfUnit`.
- Drop duplicated comments at definition, and update the comments at
the declaration where the definition comments looked newer or more
complete.
- Drop the `functionName -` prefix.
- Add `\brief` in a few places.
- Remove a few comments entirely that weren't adding value (just
turned the function name and arguments into a sentence).
llvm-svn: 235345
Summary:
Set operation action for FP16 conversion opcodes, so the Op legalizer
can choose the gnu_* libcalls for Mips.
Set LoadExtAction and TruncStoreAction for f16 scalars and vectors to
prevent (fpext (load )) and (store (fptrunc)) from getting combined into
unsupported operations.
Added test cases to test that these operations are handled correctly
for f16 scalars and vectors. This patch depends on
http://reviews.llvm.org/D8755.
Reviewers: srhines
Subscribers: llvm-commits, ab
Differential Revision: http://reviews.llvm.org/D8804
llvm-svn: 235341
This is the last major parent class, so I'll probably start deleting
classes in batches now. Looks like many of the references to the DI*
hierarchy were updated organically along the way.
llvm-svn: 235331
Replace uses of `DIScope` with `MDScope*`. There was one spot where
I've left an `MDScope*` uninitialized (where `DIScope` would have been
default-initialized to `nullptr`) -- this is intentional, since the
if/else that follows should unconditional assign it to a value.
llvm-svn: 235327
This adds the following targets to cmake. These can be used to build and link only specific parts of a backend, instead of having to link the whole backend.
- AllTargetsAsmPrinters, AllTargetsAsmParsers, AllTargetsDescs, AllTargetsDisassemblers, AllTargetsInfos
A typical use for these is instead of linking ${LLVM_TARGETS_TO_BUILD}. This commit changes llvm-mc to show how to use the new targets.
Reviewed by Chris Bieneman.
llvm-svn: 235324
This commit fixes the code which adds lifetime markers in InlineFunction to skip
zero-sized allocas instead of asserting on them.
rdar://problem/20531155
llvm-svn: 235312
n/1 generates a quotient equal to n and a remainder of 0.
If this case is not recognized, then the SCEV divide() function
can return a remainder that is greater than or equal to the
denominator, which means the delinearized subscripts for the
test case will be incorrect.
Differential Revision: http://reviews.llvm.org/D9003
llvm-svn: 235311
The current implementations could exhibit some behavior differences:
raw_fd_ostream: Whatever the underlying fd does with seek+write. In a normal
file, the write position would be back to the old offset.
raw_svector_ostream: The write position is always the end of the stream, so
after pwrite the write position would be the new end. This matches what OS_X
(all BSD?) do with a pwrite in a O_APPEND fd.
Given that we don't need that feature and don't use O_APPEND a lot in LLVM,
just disallow it.
I am open to suggestions on renaming pwrite to something else, but this fixes
the issue for now.
Thanks to Yaron Keren for reporting it.
llvm-svn: 235303
We have to avoid converting a reference to a global into a reference to a local,
but it is fine to look past a local.
Patch by Vasileios Kalintiris.
I just moved the comment and added thet test.
llvm-svn: 235300
This fixes a regression introduced at revision 231243.
The target-independent selection algorithm in FastISel knows how to select
a SINT_TO_FP if the target is SSE but not AVX. That is because on X86, the
tablegen'd 'fastEmit' functions know how to select CVTSI2SSrr and CVTSI2SDrr.
Method X86FastISel::X86SelectSIToFP was therefore working under the
wrong assumption that the target was AVX. That assumption was incorrect since
we can have a target that is neither AVX nor SSE.
So, rather than asserting for the presence of AVX, we should have had an
early exit from 'X86SelectSIToFP' if the target was not AVX.
This patch fixes the issue replacing the invalid assertion with an early exit.
Thanks to Dimitry Andric for reporting this problem and for providing a small
reproducible testcase. Added test pr23273.ll.
llvm-svn: 235295