Commit Graph

390967 Commits

Author SHA1 Message Date
Saleem Abdulrasool 527a1821e6 DirectoryWatcher: also wait for the notifier thread
Ultimately the DirectoryWatcher is not ready until the notifier thread
is also active.  Failure to wait for the notifier thread may result in
loss of events.  While this is not catastrophic in practice, the tests
are sensitive to this as depending on the thread scheduler, the thread
may fail to being execution before the operations are completed by the
fixture.  Running this in a tight loop shows no regressions locally as
previously, but this failure mode was been sighted once on a builder.
2021-06-13 10:58:55 -07:00
Nico Weber 7d4c8a2b8f [lld/mac] clarify comment
This is a "we should do X in the future" fixme, not an "X might go wrong"
fixme.
2021-06-13 13:30:07 -04:00
Nikita Popov 6ecc99210c [LoopUnroll] Test multi-exit runtime unrolling with predictable exit (NFC)
The (prior to prologue insertion) predictable exit shouldn't get
folded here. Make sure it isn't...
2021-06-13 18:48:38 +02:00
Simon Pilgrim 4089e0bbfa RawError.h - remove unused <string> include. NFCI. 2021-06-13 17:32:57 +01:00
Simon Pilgrim 9efe89d82f BoundsChecking.cpp - tidy implicit header dependencies. NFCI.
We don't use <vector> but we do use std::pair (<utility>)
2021-06-13 17:08:15 +01:00
Simon Pilgrim 033e594c59 DIPrinter.h - tidy implicit header dependencies. NFCI.
We don't use <string> but we do use std::unique_ptr (<memory>) and llvm::Optional<>
2021-06-13 17:00:15 +01:00
Simon Pilgrim d1b57086d5 DetailedRecordsBackend.cpp - printSectionHeading - avoid std::string creation/copies.
Don't create std::string from constant c-strings or pass std::string by value - we can use StringRef instead.
2021-06-13 16:49:40 +01:00
Simon Pilgrim a03d09f423 DetailedRecordsBackend.cpp - tidy implicit header dependencies. NFCI.
We don't use <algorithm>, <set> or <vector>, but we do use std::pair (<utility>).
2021-06-13 16:27:17 +01:00
Simon Pilgrim 3dc727e81b ProfiledCallGraph.h - remove unused <string> include. NFCI. 2021-06-13 15:19:25 +01:00
Simon Pilgrim 2c4ee1e112 RegUsageInfoPropagate.cpp - remove unused <string> and <map> includes. NFCI. 2021-06-13 15:19:24 +01:00
Simon Pilgrim dbfa3d289b MachOObjectFile.cpp - remove unused <string> include. NFCI. 2021-06-13 15:19:24 +01:00
Simon Pilgrim 35a12023f3 DWARFDebugFrame.cpp - remove unused <string> include. NFCI. 2021-06-13 15:19:24 +01:00
Nico Weber 5f9bc580d8 fix comment typos to cycle bots 2021-06-13 10:18:51 -04:00
Simon Pilgrim 56541d1377 GVN.cpp - remove unused <vector> include. NFCI. 2021-06-13 14:06:32 +01:00
Simon Pilgrim c14fd171fe LoopUnrollAndJamPass.cpp - remove unused <vector> include. NFCI. 2021-06-13 14:06:32 +01:00
David Green bee2f618d5 [ARM] Introduce t2WhileLoopStartTP
This adds t2WhileLoopStartTP, similar to the t2DoLoopStartTP added in
D90591. It keeps a reference to both the tripcount register and the
element count register, so that the ARMLowOverheadLoops pass in the
backend can pick the correct one without having to search for it from
the operand of a VCTP.

Differential Revision: https://reviews.llvm.org/D103236
2021-06-13 13:55:34 +01:00
Markus Böck 7ff3a89a7b [clang][NFC] Add IsAnyDestructorNoReturn field to CXXRecord instead of calculating it on demand
This patch addresses a performance issue I noticed when using clang-12 to compile projects of mine. Even though the files weren't too large (around 1k cpp), the compiler was taking more than a minute to compile the source file, much longer than either GCC or MSVC.

Using a profiler it turned out the issue was the isAnyDestructorNoReturn function in CXXRecordDecl. In particular it being recursive, recalculating the property for every invocation, for every field and base class. This showed up in tracebacks in the profiler.

This patch instead adds IsAnyDestructorNoReturn as a Field to the data inside of CXXRecord and updates when a new base class, destructor, or record field member is added.

After this patch the problematic file of mine went from a compile time of 81s, down to 12s.

The patch itself should not change any functionality, just improve performance.

Differential Revision: https://reviews.llvm.org/D104182
2021-06-13 14:48:27 +02:00
Sanjay Patel afd44bb6f2 [InstCombine] fold ctlz/cttz of bool types
https://alive2.llvm.org/ce/z/tX4pUT
2021-06-13 08:26:40 -04:00
Simon Pilgrim 7d7e913e09 SValExplainer.h - get APSInt values by const reference instead of value. NFCI.
Avoid unnecessary copies.
2021-06-13 13:05:17 +01:00
Simon Pilgrim 2477b498f2 ArgumentPromotion.cpp - remove unused <string> include. NFCI. 2021-06-13 13:03:47 +01:00
Simon Pilgrim b013c58e82 VPlanSLP.cpp - tidy implicit header dependencies. NFCI.
We don't use std::string and std::vector, but we do use std::pair and std::max.
2021-06-13 12:37:17 +01:00
Lang Hames a7c3105adb [ORC-RT] Remove unused header in unit test. 2021-06-13 20:45:20 +10:00
Lang Hames fc3ca2cc08 [JITLink][MachO] Add missing testcase.
This test was accidentally left out of f9649d123d.
2021-06-13 20:43:49 +10:00
Lang Hames e405db075b [ORC-RT] Fix a comment. 2021-06-13 20:26:51 +10:00
Matheus Izvekov bf20631782 [clang] Implement P2266 Simpler implicit move
This Implements [[http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2021/p2266r1.html|P2266 Simpler implicit move]].

Signed-off-by: Matheus Izvekov <mizvekov@gmail.com>

Reviewed By: Quuxplusone

Differential Revision: https://reviews.llvm.org/D99005
2021-06-13 12:10:56 +02:00
Kristina Bessonova f6b9836b09 [ARM][NEON] Combine base address updates for vld1Ndup intrinsics
Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D103836
2021-06-13 11:18:32 +02:00
Luo, Yuanke 5be314f79b [X86] Check immediate before get it.
For CMP imm instruction, when the operand 1 is symbol address we should
check if it is immediate first. Here is the example code.
`CMP64mi32 $noreg, 8, killed renamable $rcx, @d, $noreg, @a, implicit-def
$eflags`
Many thanks to Craig, Topper for the test case to reproduce this issue.

Differential Revision: https://reviews.llvm.org/D104037
2021-06-13 15:40:52 +08:00
Luo, Yuanke 1e72b9d52f Revert "[X86] Check immediate before get it."
This reverts commit 9eb2f723c2.
2021-06-13 13:55:38 +08:00
Shoaib Meenai aa93603ff6 [runtimes] Fix umbrella component targets
When we're building the runtimes for multiple platform targets, we
create umbrella build targets for each distribution component, but those
targets didn't have any dependencies and were just no-ops. Make the
umbrella target depend on the sub-targets for each platform to fix this,
which is consistent with the behavior of the umbrella targets for each
runtime, and also consistent with the behavior when we've only specified
the default target.
2021-06-12 19:49:44 -07:00
David Blaikie 02c718301b llvm-objcopy: fix section size truncation/extension when dumping sections
Since this only comes up with inputs containing sections at least 4GB
large (I guess I could use a bzero section or something, so the input
file doesn't have to be 4GB, but even then the output file would have to
be 4GB, right?) I've skipped testing this. If there's a nice way to test
this without needing 4GB inputs or output files.

The subtlety here is demonstrated by this code:

struct t { operator uint64_t(); };
static_assert(std::is_same_v<int, decltype(std::declval<bool>() ? 0 : std::declval<t>())>);
static_assert(std::is_same_v<uint64_t, decltype(std::declval<bool>() ? 0 : std::declval<uint64_t>())>);

Because of this difference, the original source code was getting an int
type (truncating the actual size) and then extending it again, resulting
in bogus values (I haven't thought through this hard enough to explain
why the resulting value was 0xffff... - sign extension, possible UB, but
in any case it's the wrong answer - in this particular case I was
looking at that resulted in a size so large that we couldn't open a file
large enough to write to and ended up with a rather vague:

error: 'file_name.o': Invalid argument
2021-06-12 19:00:10 -07:00
Luo, Yuanke 9eb2f723c2 [X86] Check immediate before get it.
For CMP imm instruction, when the operand 1 is symbol address we should
check if it is immediate first. Here is the example code.
`CMP64mi32 $noreg, 8, killed renamable $rcx, @d, $noreg, @a, implicit-def
$eflags`
Many thanks to Craig, Topper for the test case to reproduce this issue.

Differential Revision: https://reviews.llvm.org/D104037
2021-06-13 09:08:40 +08:00
Lang Hames 49f4a58d53 [ORC-RT] Split Simple-Packed-Serialization code into its own header.
This will simplify integration of this code into LLVM -- The
Simple-Packed-Serialization code can be copied near-verbatim, but
WrapperFunctionResult will require more adaptation.
2021-06-13 10:17:13 +10:00
Mehdi Amini 152c9871e6 Simplify getArgAttrDict/getResultAttrDict by removing unnecessary checks
There is a slight change in behavior: if the arg dictionnary is empty
then we return this empty dictionnary instead of a null attribute.
This is more consistent with accessing it through:

  ArrayAttr args_attr = func_op.getAllArgAttrs();
  args_attr[num].cast<DictionnaryAttr>() ...

Differential Revision: https://reviews.llvm.org/D104189
2021-06-12 22:55:31 +00:00
Roman Lebedev 2db64e199a
[NFC][X86][Codegen] Add shuffle test that would benefit from sorting in reduceBuildVecToShuffle() 2021-06-13 00:07:48 +03:00
Mehdi Amini 8bc1ce0f61 Use dyn_cast_or_null instead of dyn_cast in FunctionLike::verifyTrait (NFC)
This is making the verifier more tolerant to cases where a "null"
Attribute would be inserted in the array of func arguments/results
attributes.
2021-06-12 20:08:37 +00:00
Ian McIntyre 5899278758 [llvm-objcopy] Exclude empty sections in IHexWriter output
IHexWriter was evaluating a section's physical address when deciding if
that section should be written to an output. This approach does not
account for a zero-sized section that has the same physical address as a
sized section. The behavior varies from GNU objcopy, and may result in a
HEX file that does not include all program sections.

The IHexWriter now excludes zero-sized sections when deciding what
should be written to the output. This affects the contents of the
writer's `Sections` collection; we will not try to insert multiple
sections that could have the same physical address. The behavior seems
consistent with GNU objcopy, which always excludes empty sections,
no matter the address.

The new test case evaluates the IHexWriter behavior when provided a
variety of empty sections that overlap or append a filled section. See
the input file's comments for more information. Given that test input,
and the change to the IHexWriter, GNU objcopy and llvm-objcopy produce
the same output.

Reviewed By: jhenderson, MaskRay, evgeny777

Differential Revision: https://reviews.llvm.org/D101332
2021-06-12 12:23:07 -07:00
Xun Li fae7debadc [CHR] Don't run ControlHeightReduction if any BB has address taken
This patch is to address https://bugs.llvm.org/show_bug.cgi?id=50610.
In computed goto pattern, there are usually a list of basic blocks that are all targets of indirectbr instruction, and each basic block also has address taken and stored in a variable.
CHR pass could potentially clone these basic blocks, which would generate a cloned version of the indirectbr and clonved version of all basic blocks in the list.
However these basic blocks will not have their addresses taken and stored anywhere. So latter SimplifyCFG pass will simply remove all tehse cloned basic blocks, resulting in incorrect code.
To fix this, when searching for scopes, we skip scopes that contains BBs with addresses taken.
Added a few test cases.

Reviewed By: aeubanks, wenlei, hoy

Differential Revision: https://reviews.llvm.org/D103867
2021-06-12 10:29:53 -07:00
Craig Topper c997867dc0 [X86] Add ISD::FREEZE and ISD::AssertAlign to the list of opcodes that don't guarantee upper 32 bits are zero.
The freeze issue was reported here
https://llvm.discourse.group/t/bug-or-feature-freeze-instruction/3639

I don't have a test for AssertAlign. I just noticed it was missing
and assume it should be similar to the other two Asserts.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D104178
2021-06-12 09:52:29 -07:00
Saleem Abdulrasool 76f1baa787 Revert "Revert "DirectoryWatcher: add an implementation for Windows""
This reverts commit 0ec1cf13f2.

Restore the implementation with some minor tweaks:
- Use std::unique_ptr for the path instead of std::vector
  * Stylistic improvement as the buffer is already heap allocated, this
    just makes it clearer.
- Correct the notification buffer allocation size
  * Memory usage fix: we were allocating 4x the computed size
- Correct the passing of the buffer size to RDC
  * Memory usage fix: we were reporting 1/4th of the size
- Convert the operation event to auto-reset
  * Bug Fix: we never reset the event
- Remove `FILE_NOTIFY_CHANGE_LAST_ACCESS` from RDC events
  * Memory usage fix: we never needed this notification
- Fold events for the notification action
  * Stylistic improvement to be clear how the events map
- Update comment
  * Stylistic improvement to be clear what the RAII controls
- Fix the race condition that was uncovered previously
  * We would return from the construction before the watcher thread
    began execution.  The test would then proceed to begin execution,
    and we would miss the initial notifications.  We now ensure that the
    watcher thread is initialized before we return.  This ensures that
    we do not miss the initial notifications.

Running the test on a SSD was able to uncover the access pattern.  This
now seems to pass reliably where it was previously flaky locally.
2021-06-12 09:27:44 -07:00
Matheus Izvekov 1e50c3d785 [clang] NRVO: Improvements and handling of more cases.
This expands NRVO propagation for more cases:

Parse analysis improvement:
* Lambdas and Blocks with dependent return type can have their variables
  marked as NRVO Candidates.

Variable instantiation improvements:
* Fixes crash when instantiating NRVO variables in Blocks.
* Functions, Lambdas, and Blocks which have auto return type have their
  variables' NRVO status propagated. For Blocks with non-auto return type,
  as a limitation, this propagation does not consider the actual return
  type.

This also implements exclusion of VarDecls which are references to
dependent types.

Signed-off-by: Matheus Izvekov <mizvekov@gmail.com>

Reviewed By: Quuxplusone

Differential Revision: https://reviews.llvm.org/D99696
2021-06-12 16:43:32 +02:00
Florian Hahn 0d9e8f5f4b
[VPlan] Add more sinking/merging tests with predicated loads/stores. 2021-06-12 15:36:51 +01:00
Shashij gupta 466e5aba64 [MLIR] Simplify affine.if ops with trivial conditions
The commit simplifies affine.if ops :
The affine if operation gets removed if the condition is universally true or false and then/else block is merged with the parent block.

Signed-off-by: Shashij Gupta shashij.gupta@polymagelabs.com

Reviewed By: bondhugula, pr4tgpt

Differential Revision: https://reviews.llvm.org/D104015
2021-06-12 19:29:10 +05:30
Florian Hahn b4583a5ad7
Revert "Allow signposts to take advantage of deferred string substitution"
This reverts commit 4fc93a3a1f because it
breaks LLDB builds on certain macOS platform & SDK combinations, e.g.
http://green.lab.llvm.org/green/job/lldb-cmake-standalone/3288/consoleFull#-195476041949ba4694-19c4-4d7e-bec5-911270d8a58c
2021-06-12 12:08:25 +01:00
Kristina Bessonova 8e62797963 [lit] Attempt for fix tests failing because of 'warning: non-portable path to file'
This is an attempt to fix clang test failures due to 'nonportable-include-path'
warnings on Windows when a path to llvm-project's base directory contains some
uppercase letters (excluding a drive letter).

The issue originates from 2 problems:
* discovery.py loads site config in lower case causing all the paths
based on __file__ and requested within the config file to be in lowercase as well,
* neither os.path.abspath() nor os.path.realpath() (both used to obtain paths of
config files, sources, object directories, etc) do not return paths in the correct
case for Windows (at least consistently for all python versions).

As os.path library doesn't seem to provide any relaible way to restore
the case for paths on Windows, this patch proposes to use pathlib.resolve().
pathlib is a part of Python 3.4 while llvm lit requires Python 3.6.

Reviewed By: Meinersbur

Differential Revision: https://reviews.llvm.org/D103014
2021-06-12 12:49:03 +02:00
Florian Hahn 5cd66420cc
Revert "[X86FixupLEAs] Transform the sequence LEA/SUB to SUB/SUB"
This reverts commit 1b748faf2b because it
breaks building the llvm-test-suite with -verify-machineinstrs on X86:
http://green.lab.llvm.org/green/job/test-suite-verify-machineinstrs-x86_64-O3/9585/

Running llc -verify-machineinstr on X86 crashes on the IR below:

    target datalayout = "e-m:o-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"

    %struct.widget = type { i32, i32, i32, i32, i32*, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, [16 x [16 x i16]], [6 x [32 x i32]], [16 x [16 x i32]], [4 x [12 x [4 x [4 x i32]]]], [16 x i32], i8**, i32*, i32***, i32**, i32, i32, i32, i32, %struct.baz*, %struct.wobble.1*, i32, i32, i32, i32, i32, i32, %struct.quux.2*, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, [3 x i32], i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32***, i32***, i32****, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, [3 x [2 x i32]], [3 x [2 x i32]], i32, i32, i64, i64, %struct.zot.3, %struct.zot.3, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32 }
    %struct.baz = type { i32, i32, i32, i32, i32, i32, i32, i32, i32, %struct.snork*, %struct.wombat.0*, %struct.wobble*, i32, i32*, i32*, i32*, i32, i32*, i32*, i32*, i32 (%struct.widget*, %struct.eggs*)*, i32, i32, i32, i32 }
    %struct.snork = type { %struct.spam*, %struct.zot, i32 (%struct.wombat*, %struct.widget*, %struct.snork*)* }
    %struct.spam = type { i32, i32, i32, i32, i8*, i32 }
    %struct.zot = type { i32, i32, i32, i32, i32, i8*, i32* }
    %struct.wombat = type { i32, i32, i32, i32, i32, i32, i32, i32, void (i32, i32, i32*, i32*)*, void (%struct.wombat*, %struct.widget*, %struct.zot*)* }
    %struct.wombat.0 = type { [4 x [11 x %struct.quux]], [2 x [9 x %struct.quux]], [2 x [10 x %struct.quux]], [2 x [6 x %struct.quux]], [4 x %struct.quux], [4 x %struct.quux], [3 x %struct.quux] }
    %struct.quux = type { i16, i8 }
    %struct.wobble = type { [2 x %struct.quux], [4 x %struct.quux], [3 x [4 x %struct.quux]], [10 x [4 x %struct.quux]], [10 x [15 x %struct.quux]], [10 x [15 x %struct.quux]], [10 x [5 x %struct.quux]], [10 x [5 x %struct.quux]], [10 x [15 x %struct.quux]], [10 x [15 x %struct.quux]] }
    %struct.eggs = type { [1000 x i8], [1000 x i8], [1000 x i8], i32, i32, i32, i32, i32, i32, i32, i32 }
    %struct.wobble.1 = type { i32, [2 x i32], i32, i32, %struct.wobble.1*, %struct.wobble.1*, i32, [2 x [4 x [4 x [2 x i32]]]], i32, i64, i64, i32, i32, [4 x i8], [4 x i8], i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32 }
    %struct.quux.2 = type { i32, i32, i32, i32, i32, %struct.quux.2* }
    %struct.zot.3 = type { i64, i16, i16, i16 }

    define void @blam(%struct.widget* %arg, i32 %arg1) local_unnamed_addr {
    bb:
      %tmp = load i32, i32* undef, align 4
      %tmp2 = sdiv i32 %tmp, 6
      %tmp3 = sdiv i32 undef, 6
      %tmp4 = load i32, i32* undef, align 4
      %tmp5 = icmp eq i32 %tmp4, 4
      %tmp6 = select i1 %tmp5, i32 %tmp3, i32 %tmp2
      %tmp7 = getelementptr inbounds [4 x [4 x i32]], [4 x [4 x i32]]* undef, i64 0, i64 0, i64 0
      %tmp8 = zext i16 undef to i32
      %tmp9 = zext i16 undef to i32
      %tmp10 = load i16, i16* undef, align 2
      %tmp11 = zext i16 %tmp10 to i32
      %tmp12 = zext i16 undef to i32
      %tmp13 = zext i16 undef to i32
      %tmp14 = zext i16 undef to i32
      %tmp15 = load i16, i16* undef, align 2
      %tmp16 = zext i16 %tmp15 to i32
      %tmp17 = zext i16 undef to i32
      %tmp18 = sub nsw i32 %tmp8, %tmp9
      %tmp19 = shl nsw i32 undef, 1
      %tmp20 = add nsw i32 %tmp19, %tmp18
      %tmp21 = sub nsw i32 %tmp11, %tmp12
      %tmp22 = shl nsw i32 undef, 1
      %tmp23 = add nsw i32 %tmp22, %tmp21
      %tmp24 = sub nsw i32 %tmp13, %tmp14
      %tmp25 = shl nsw i32 undef, 1
      %tmp26 = add nsw i32 %tmp25, %tmp24
      %tmp27 = sub nsw i32 %tmp16, %tmp17
      %tmp28 = shl nsw i32 undef, 1
      %tmp29 = add nsw i32 %tmp28, %tmp27
      %tmp30 = sub nsw i32 %tmp20, %tmp29
      %tmp31 = sub nsw i32 %tmp23, %tmp26
      %tmp32 = shl nsw i32 %tmp30, 1
      %tmp33 = add nsw i32 %tmp32, %tmp31
      store i32 %tmp33, i32* undef, align 4
      %tmp34 = mul nsw i32 %tmp31, -2
      %tmp35 = add nsw i32 %tmp34, %tmp30
      store i32 %tmp35, i32* undef, align 4
      %tmp36 = select i1 %tmp5, i32 undef, i32 undef
      br label %bb37

    bb37:                                             ; preds = %bb
      %tmp38 = load i32, i32* undef, align 4
      %tmp39 = ashr i32 %tmp38, %tmp6
      %tmp40 = load i32, i32* undef, align 4
      %tmp41 = sdiv i32 %tmp39, %tmp40
      store i32 %tmp41, i32* undef, align 4
      ret void
    }
2021-06-12 11:41:38 +01:00
Florian Hahn e087b4f149
Revert "[X86FixupLEAs] Sub register usage of LEA dest should block LEA/SUB optimization"
This reverts commit f35bcea1d4 because it
depends on 1b748faf2b, which breaks
building the llvm-test-suite with -verify-machineinstrs on X86.

See 154adc0f135cff3f8a8861c335d2b88c8049d098 for more details.
2021-06-12 11:40:47 +01:00
madhur13490 c27e8141b3 [AMDGPU][IndirectCalls] Fix register usage propagation for indirect/external calls
This patch computes max SGPRs and VGPRs used by module
in presence of indirect calls and makes that
as register requirement for functions/kernels
which makes indirect calls.

This patch also refactors code AMDGPUSubTarget.cpp
which add a "base" variants of getMaxNumSGPRs which
is used by MachineFunction and new Function version.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D103636
2021-06-12 11:59:34 +05:30
spupyrev 0a0800c4d1 A post-processing for BFI inference
The current implementation for computing relative block frequencies does
not handle correctly control-flow graphs containing irreducible loops. This
results in suboptimally generated binaries, whose perf can be up to 5%
worse than optimal.

To resolve the problem, we apply a post-processing step, which iteratively
updates block frequencies based on the frequencies of their predesessors.
This corresponds to finding the stationary point of the Markov chain by
an iterative method aka "PageRank computation". The algorithm takes at
most O(|E| * IterativeBFIMaxIterations) steps but typically converges faster.

It is turned on by passing option `use-iterative-bfi-inference`
and applied only for functions containing profile data and irreducible loops.

Tested on SPEC06/17, where it is helping to get correct profile counts for one of
the binaries (403.gcc). In prod binaries, we've seen a speedup of up to 2%-5%
for binaries containing functions with hot irreducible loops.

Reviewed By: hoy, wenlei, davidxl

Differential Revision: https://reviews.llvm.org/D103289
2021-06-11 21:46:04 -07:00
Michael Kruse dbc262968f [Flang][test] Fix Windows buildbot.
Commit 1b241b9b40 /
patch https://reviews.llvm.org/D104130 introduced an new test which
calls a UNIX shell script. Add
REQUIRES: shell
to not run it on Windows.
2021-06-11 23:25:33 -05:00
Stephen Neuendorffer 984e270a9a [mlir] make normalizeAffineFor public
Previously this was just a static method.
2021-06-11 20:12:37 -07:00