Commit Graph

152882 Commits

Author SHA1 Message Date
Phoebe Wang f13b43d570 [X86][FP16] Only generate approximate rsqrt when Reciprocal is true for half type
We have reasonable fast sqrt and accurate rsqrt for half type due to the
limited fractions. So neither do we need multi steps refinement for
rsqrt nor replace sqrt by rsqrt.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D114844
2021-12-02 13:52:45 +08:00
Phoebe Wang 4756a2f157 [X86] Insert FMUL for estimated non reciprocal SQRT when `RefinementSteps` = 0
Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D114843
2021-12-02 13:52:45 +08:00
Christudasan Devadasan 399b7de0ea [AMDGPU] Add a regclass flag for scalar registers
Along with vector RC flags, this scalar flag will
make various regclass queries like `isVGPR` more
accurate.

Regclasses other than vectors are currently set
with the new flag even though certain unallocatable
classes aren't truly scalars. It would be ok as long
as they remain unallocatable.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D110053
2021-12-01 23:31:07 -05:00
Daniel Sanders 54e21df973 [unroll] Fix a functional change in an NFC patch
5c77aa2b91 [unroll] Use early return in shouldFullUnroll [nfc]
wasn't quite NFC since !(x <= y) is x > y rather than x >= y

Credit to Justin Bogner for spotting the bug

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D114894
2021-12-01 17:28:12 -08:00
Noah Shutty 170783f991 [llvm] [Support] Add HTTP Client Support library.
This patch implements a small HTTP client library consisting primarily of the `HTTPRequest`, `HTTPResponseHandler`, and `BufferedHTTPResponseHandler` classes. Unit tests of the `HTTPResponseHandler` and `BufferedHTTPResponseHandler` are included.

Reviewed By: dblaikie

Differential Revision: https://reviews.llvm.org/D112751
2021-12-01 23:54:38 +00:00
Arthur Eubanks 512534bc16 [Cloning] Clone metadata on function declarations
Previously we missed cloning metadata on function declarations because
we don't call CloneFunctionInto() on declarations in CloneModule().

Reviewed By: dexonsmith

Differential Revision: https://reviews.llvm.org/D113812
2021-12-01 15:40:05 -08:00
spupyrev 7cc2493daa profi - a flow-based profile inference algorithm: Part I (out of 3)
The benefits of sampling-based PGO crucially depends on the quality of profile
data. This diff implements a flow-based algorithm, called profi, that helps to
overcome the inaccuracies in a profile after it is collected.

Profi is an extended and significantly re-engineered classic MCMF (min-cost
max-flow) approach suggested by Levin, Newman, and Haber [2008, Complementing
missing and inaccurate profiling using a minimum cost circulation algorithm]. It
models profile inference as an optimization problem on a control-flow graph with
the objectives and constraints capturing the desired properties of profile data.
Three important challenges that are being solved by profi:
- "fixing" errors in profiles caused by sampling;
- converting basic block counts to edge frequencies (branch probabilities);
- dealing with "dangling" blocks having no samples in the profile.

The main implementation (and required docs) are in SampleProfileInference.cpp.
The worst-time complexity is quadratic in the number of blocks in a function,
O(|V|^2). However a careful engineering and extensive evaluation shows that
the running time is (slightly) super-linear. In particular, instances with
1000 blocks are solved within 0.1 second.

The algorithm has been extensively tested internally on prod workloads,
significantly improving the quality of generated profile data and providing
speedups in the range from 0% to 5%. For "smaller" benchmarks (SPEC06/17), it
generally improves the performance (with a few outliers) but extra work in
the compiler might be needed to re-tune existing optimization passes relying on
profile counts.

UPD Dec 1st 2021:
- synced the declaration and definition of the option `SampleProfileUseProfi ` to use type `cl::opt<bool`;
- added `inline` for `SampleProfileInference<BT>::findUnlikelyJumps` and `SampleProfileInference<BT>::isExit` to avoid linking problems on windows.

Reviewed By: wenlei, hoy

Differential Revision: https://reviews.llvm.org/D109860
2021-12-01 15:30:38 -08:00
Florian Hahn ad88a37cea
[TLI] Add memset_pattern4, memset_pattern8 lib functions.
Similar to memset_pattern16, memset_pattern4, memset_pattern8 are
available on Darwin platforms.

https://developer.apple.com/library/archive/documentation/System/Conceptual/ManPages_iPhoneOS/man3/memset_pattern4.3.html

Reviewed By: ab

Differential Revision: https://reviews.llvm.org/D114881
2021-12-01 21:18:19 +00:00
Petar Avramovic 641906da8d AMDGPU/GlobalISel: Fix constant bus restriction errors for med3
Detected on targets older then gfx10 (e.g. gfx9) for constants that are
too large to be inlined (constant are sgpr by default).
In med3 combine it is expected that regbankselect maps all operands of
min/max we try to match to vgpr. However constants are mapped to sgpr
and there will be a sgpr-to-vgpr copy. Matchers look through sgpr-to-vgpr
copies and return sgpr and these break constant bus restriction.
Build med3 with all vgpr operands. Use existing sgpr-to-vgpr copies for
matched sgprs. If there is no such copy (not expected) build one.

Differential Revision: https://reviews.llvm.org/D114700
2021-12-01 21:36:37 +01:00
Nikita Popov 8d1759c404 [GlobalOpt] Simplify CleanupConstantGlobalUsers()
This bases the CleanupConstantGlobalUsers() implementation around
the ConstantFoldLoadFromConst() API. The general approach is that
we discover all users while looking through casts, and then
constant fold loads and drop stores and memintrinsics.

This avoids special cases and limitations in the previous
implementation, which is also incompatible with opaque pointers.
The result is a bit more powerful than before, because we now use
more general load folding logic which can for example look through
pointer bitcasts between different sizes. This is where the test
changes come from, as we now fold more loads and can thus remove
more globals.

Differential Revision: https://reviews.llvm.org/D114889
2021-12-01 21:06:25 +01:00
Ellis Hoag 9e647806f3 [InstrProf][NFC] Refactor ProfileDataMap usage
Instead of using `DenseMap::find()` and `DenseMap::insert()`, use
`DenseMap::operator[]` to get a reference to the profile data and update
the reference. This simplifies the changes in D114565.

Reviewed By: kyulee

Differential Revision: https://reviews.llvm.org/D114828
2021-12-01 11:47:14 -08:00
Craig Topper 2f6beb7b0e [RISCV] Add inline expansion for vector ftrunc/fceil/ffloor.
This prevents scalarization of fixed vector operations or crashes
on scalable vectors.

We don't have direct support for these operations. To emulate
ftrunc we can convert to the same sized integer and back to fp using
round to zero. We don't need to do a convert if the value is large
enough to have no fractional bits or is a nan.

The ceil and floor lowering would be better if we changed FRM, but
we don't model FRM correctly yet. So I've used the trunc lowering
with a conditional add or subtract with 1.0 if the truncate rounded
in the wrong direction.

There are also missed opportunities to use masked instructions.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D113543
2021-12-01 11:25:28 -08:00
Sanjay Patel aea6b9dcee [Support] replace check with assert in known bits of mul calculation; NFC 2021-12-01 13:41:12 -05:00
Simon Moll 435d44bf8a [VE][NFC] Fix use-after-free in VEInstrInfo
First call getOperand, then erase the MachineInstr. Not the other way
round.

Expected to fix test/CodeGen/VE/VELIntrinsics/lvm.ll

Detected by asan buildbot:

  sanitizer-x86_64-linux-fast
  (https://lab.llvm.org/buildbot/#/builders/5/builds/15384)
2021-12-01 19:30:27 +01:00
Reid Kleckner c6fa4c481a [AArch64] Fix unused variable warning with NDEBUG, NFC 2021-12-01 09:28:22 -08:00
Omer Aviram 617ad14060 [SelectionDAG] Add pattern to haveNoCommonBitsSet
Correctly identify the following pattern, which has no common bits: (X & ~M) op (Y & M).

Differential Revision: https://reviews.llvm.org/D113970
2021-12-01 12:04:04 -05:00
Joseph Huber 058c312a44 [OpenMP][FIX] SPMDzation guarding needs to account for all reaching kernels
If two reaching kernels disagree on the execution mode we cannot guard a
function right now. Ensure we do not as we otherwise will cause a
deadlock.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D114866
2021-12-01 11:44:32 -05:00
Simon Pilgrim 19d34f6e95 [X86] combinePMULH - recognise 'cheap' trunctions via PACKS/PACKUS as well as SEXT/ZEXT
combinePMULH currently only truncates vXi32/vXi64 multiplies to PMULHW/PMULUW if the source operands are SEXT/ZEXT instructions for a 'free' truncation.

But we can generalize this to any source operand with sufficient leading sign/zero bits that would allow PACKS/PACKUS to be used as a 'cheap' truncation.

This helps us avoid the wider multiplies, in exchange for truncation on both source operands instead of the result.

Differential Revision: https://reviews.llvm.org/D113371
2021-12-01 16:37:49 +00:00
Simon Pilgrim 1bd01defff [VE] Remove switch with only default case statement to fix MSVC warning. NFC. 2021-12-01 16:37:48 +00:00
Alexey Bataev afc9e7517a [SLP]Improve cost model for the shuffled extracts.
Improved the calculation of the shuffled extracts, where possible. Need
to calculate the cost for the extracted scalars if some users are not
insertelements + improved the total estimation of the shuffled scalars
used in insertelements build vectors.

Differential Revision: https://reviews.llvm.org/D113782
2021-12-01 08:10:57 -08:00
Yousuf Ali 415e821a50 [PowerPC][AIX] Add toc-data support for 64-bit AIX small code model.
The patch expands the existing 32-bit toc-data attribute support to 64-bit.
In both 32-bit and 64-bit it is supported for small code model only.

Differential Revision: https://reviews.llvm.org/D114654
2021-12-01 10:56:21 -05:00
Alexey Bataev cc30fbf242 [SLP]Introduce isUndefVector function to check for undef vectors.
Undefined vector might be not only the UndefValue, but also it can be
a constant vector with undef ot poison elements, need to check for this
kind of undef too.

Differential Revision: https://reviews.llvm.org/D114873
2021-12-01 07:46:10 -08:00
Bradley Smith fd9069ffce [AArch64][SVE] Duplicate FP_EXTEND/FP_TRUNC -> LOAD/STORE dag combines
By duplicating these dag combines we can bypass the legality checks that
they do, this allows us to perform these combines on larger than legal
fixed types, which in turn allows us to bring the same benefits D114580
brought but to larger than legal fixed types.

Depends on D114580

Differential Revision: https://reviews.llvm.org/D114628
2021-12-01 15:33:53 +00:00
Alexey Bataev ddce6e0561 [SLP]Improve vectorization of cmp instructions sequences.
Final attempt to vectorize bundles of comptatible cmp instructions after
all other instructions processing.

Metric: SLP.NumVectorInstructions

Program                                                                             results results0 diff
        test-suite :: MultiSource/Benchmarks/mediabench/g721/g721encode/encode.test    1.00    5.00  400.0%
                              test-suite :: MultiSource/Benchmarks/PAQ8p/paq8p.test    8.00   11.00   37.5%
                    test-suite :: MultiSource/Benchmarks/Olden/voronoi/voronoi.test   20.00   26.00   30.0%
                test-suite :: External/SPEC/CINT2017rate/525.x264_r/525.x264_r.test 1344.00 1648.00   22.6%
               test-suite :: External/SPEC/CINT2017speed/625.x264_s/625.x264_s.test 1344.00 1648.00   22.6%
                              test-suite :: MultiSource/Benchmarks/Olden/bh/bh.test  102.00  124.00   21.6%
                test-suite :: MultiSource/Benchmarks/DOE-ProxyApps-C/CoMD/CoMD.test  118.00  133.00   12.7%
          test-suite :: External/SPEC/CFP2017speed/638.imagick_s/638.imagick_s.test 3233.00 3554.00    9.9%
           test-suite :: External/SPEC/CFP2017rate/538.imagick_r/538.imagick_r.test 3233.00 3554.00    9.9%
                        test-suite :: MultiSource/Benchmarks/Olden/power/power.test   64.00   70.00    9.4%
           test-suite :: External/SPEC/CFP2017rate/526.blender_r/526.blender_r.test 7879.00 8604.00    9.2%
           test-suite :: MultiSource/Benchmarks/Prolangs-C/simulator/simulator.test   50.00   54.00    8.0%
                        test-suite :: MultiSource/Applications/sqlite3/sqlite3.test   27.00   29.00    7.4%
             test-suite :: External/SPEC/CFP2017rate/510.parest_r/510.parest_r.test 8345.00 8955.00    7.3%
     test-suite :: MultiSource/Benchmarks/Prolangs-C/TimberWolfMC/timberwolfmc.test  694.00  738.00    6.3%
                        test-suite :: MultiSource/Benchmarks/MallocBench/gs/gs.test  361.00  382.00    5.8%
                      test-suite :: MultiSource/Benchmarks/7zip/7zip-benchmark.test  409.00  430.00    5.1%
     test-suite :: External/SPEC/CINT2017speed/600.perlbench_s/600.perlbench_s.test  140.00  147.00    5.0%
      test-suite :: External/SPEC/CINT2017rate/500.perlbench_r/500.perlbench_r.test  140.00  147.00    5.0%
             test-suite :: External/SPEC/CFP2017rate/511.povray_r/511.povray_r.test 4013.00 4206.00    4.8%
                       test-suite :: MultiSource/Applications/JM/ldecod/ldecod.test  966.00 1011.00    4.7%
                           test-suite :: SingleSource/Benchmarks/Misc/oourafft.test   65.00   68.00    4.6%
                            test-suite :: MultiSource/Benchmarks/Bullet/bullet.test 4219.00 4381.00    3.8%
                    test-suite :: MultiSource/Benchmarks/tramp3d-v4/tramp3d-v4.test 1911.00 1973.00    3.2%
      test-suite :: External/SPEC/CINT2017rate/531.deepsjeng_r/531.deepsjeng_r.test   62.00   64.00    3.2%
     test-suite :: External/SPEC/CINT2017speed/631.deepsjeng_s/631.deepsjeng_s.test   62.00   64.00    3.2%
                 test-suite :: External/SPEC/CINT2017speed/602.gcc_s/602.gcc_s.test  852.00  877.00    2.9%
                  test-suite :: External/SPEC/CINT2017rate/502.gcc_r/502.gcc_r.test  852.00  877.00    2.9%
                       test-suite :: MultiSource/Applications/JM/lencod/lencod.test 1624.00 1668.00    2.7%
                         test-suite :: MultiSource/Benchmarks/McCat/18-imp/imp.test   39.00   40.00    2.6%
test-suite :: MultiSource/Benchmarks/MiBench/consumer-typeset/consumer-typeset.test  613.00  624.00    1.8%
      test-suite :: MultiSource/Benchmarks/MiBench/consumer-lame/consumer-lame.test  378.00  383.00    1.3%
      test-suite :: MultiSource/Benchmarks/MiBench/consumer-jpeg/consumer-jpeg.test  293.00  295.00    0.7%
            test-suite :: MultiSource/Benchmarks/mediabench/jpeg/jpeg-6a/cjpeg.test  297.00  299.00    0.7%
      test-suite :: External/SPEC/CINT2017rate/523.xalancbmk_r/523.xalancbmk_r.test 5522.00 5534.00    0.2%
     test-suite :: External/SPEC/CINT2017speed/623.xalancbmk_s/623.xalancbmk_s.test 5522.00 5534.00    0.2%

Differential Revision: https://reviews.llvm.org/D114799
2021-12-01 07:26:29 -08:00
Nikita Popov 9687c13174 [Verifier] Make matrix intrinsic verification compatible with opaque pointers
Don't check the pointer element type for opaque pointers.
2021-12-01 16:26:05 +01:00
David Green 13e66c070b Revert "[ARM] Teach getIntImmCostInst about the cost of saturating fp converts"
This reverts commit 6d41de380f as the
windows bots are not happy, in a way I do not understand. Revert whilst
we figure out what is wrong.
2021-12-01 15:25:19 +00:00
Florian Hahn e44298a8f8
[LV] Move code from vectorizeMemoryInstruction to recipe's execute().
The code in widenMemoryInstruction has already been transitioned
to only rely on information provided by VPWidenMemoryInstructionRecipe
directly.

Moving the code directly to VPWidenMemoryInstructionRecipe::execute
completes the transition for the recipe.

It provides the following advantages:

1. Less indirection, easier to see what's going on.
2. Removes accesses to fields of ILV.

2) in particular ensures that no dependencies on
fields in ILV for vector code generation are re-introduced.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D114324
2021-12-01 14:56:51 +00:00
Ties Stuij f5f28d5b0c [ARM] Implement BTI placement pass for PACBTI-M
This patch implements a new MachineFunction in the ARM backend for
placing BTI instructions. It is similar to the existing AArch64
aarch64-branch-targets pass.

BTI instructions are inserted into basic blocks that:
- Have their address taken
- Are the entry block of a function, if the function has external
  linkage or has its address taken
- Are mentioned in jump tables
- Are exception/cleanup landing pads

Each BTI instructions is placed in the beginning of a BB after the
so-called meta instructions (e.g. exception handler labels).

Each outlining candidate and the outlined function need to be in agreement about
whether BTI placement is enabled or not. If branch target enforcement is
disabled for a function, the outliner should not covertly enable it by emitting
a call to an outlined function, which begins with BTI.

The cost mode of the outliner is adjusted to account for the extra BTI
instructions in the outlined function.

The ARM Constant Islands pass will maintain the count of the jump tables, which
reference a block. A `BTI` instruction is removed from a block only if the
reference count reaches zero.

PAC instructions in entry blocks are replaced with PACBTI instructions (tests
for this case will be added in a later patch because the compiler currently does
not generate PAC instructions).

The ARM Constant Island pass is adjusted to handle BTI
instructions correctly.

Functions with static linkage that don't have their address taken can
still be called indirectly by linker-generated veneers and thus their
entry points need be marked with BTI or PACBTI.

The changes are tested using "LLVM IR -> assembly" tests, jump tables
also have a MIR test. Unfortunately it is not possible add MIR tests
for exception handling and computed gotos because of MIR parser
limitations.

This patch is part of a series that adds support for the PACBTI-M extension of
the Armv8.1-M architecture, as detailed here:

https://community.arm.com/arm-community-blogs/b/architectures-and-processors-blog/posts/armv8-1-m-pointer-authentication-and-branch-target-identification-extension

The PACBTI-M specification can be found in the Armv8-M Architecture Reference
Manual:

https://developer.arm.com/documentation/ddi0553/latest

The following people contributed to this patch:

- Mikhail Maltsev
- Momchil Velikov
- Ties Stuij

Reviewed By: ostannard

Differential Revision: https://reviews.llvm.org/D112426
2021-12-01 12:54:05 +00:00
Djordje Todorovic 72f9f066df Revert "[LICM] Hoist LOAD without sinking the STORE"
This reverts commit ecb9d8e4e3.

I'll reland this as soon as the failing tests are fixed/updated.
2021-12-01 04:39:26 -08:00
Djordje Todorovic ecb9d8e4e3 [LICM] Hoist LOAD without sinking the STORE
When doing load/store promotion within LICM, if we
cannot prove that it is safe to sink the store we won't
hoist the load, even though we can prove the load could
be dereferenced and moved outside the loop. This patch
implements the load promotion by moving it in the loop
preheader by inserting proper PHI in the loop. The store
is kept as is in the loop. By doing this, we avoid doing
the load from a memory location in each iteration.

Please consider this small example:

loop {
  var = *ptr;
  if (var) break;
  *ptr= var + 1;
}
After this patch, it will be:

var0 = *ptr;
loop {
  var1 = phi (var0, var2);
  if (var1) break;
  var2 = var1 + 1;
  *ptr = var2;
}
This addresses some problems from [0].

[0] https://bugs.llvm.org/show_bug.cgi?id=51193

Differential revision: https://reviews.llvm.org/D113289
2021-12-01 04:27:50 -08:00
Nikita Popov 1e1a8be21f [LICM] Support opaque pointers in scalar promotion
Make sure that all pointers have the same load/store access type,
rather than comparing pointer element types.
2021-12-01 12:56:24 +01:00
Bradley Smith 0eb1efb92c [DAGCombiner] When combining REM ensure optimized div nodes are unique
The REM DAG combine uses the visitDivLike functions to try and get an
optimized DIV node to provide better codegen, however in some cases this
visitDivLike call ends up in the BuildSDIVPow2 target hook, which in
turn sometimes will return the same node passed in to indicate not to
change it. The REM DAG combine does not anticipate this and creates a
cycle in the DAG because of it.

Fix this by ensuring any such optimized div node returned is distinct
from the node being combined.

Differential Revision: https://reviews.llvm.org/D114716
2021-12-01 11:24:26 +00:00
Simon Pilgrim 9981dd142f [DAG] Apply clang-format to visitMSTORE + visitMLOAD. NFC.
Reduce diff in D114582
2021-12-01 11:23:47 +00:00
Ties Stuij b430782be3 [ARM] emit PACBTI-M build attributes
This patch is part of a series that adds support for the PACBTI-M extension of
the Armv8.1-M architecture, as detailed here:

https://community.arm.com/arm-community-blogs/b/architectures-and-processors-blog/posts/armv8-1-m-pointer-authentication-and-branch-target-identification-extension

The PACBTI-M specification can be found in the Armv8-M Architecture Reference
Manual:

https://developer.arm.com/documentation/ddi0553/latest

The following people contributed to this patch:

- Victor Campos
- Ties Stuij

Reviewed By: ostannard

Differential Revision: https://reviews.llvm.org/D112425
2021-12-01 11:05:29 +00:00
Ties Stuij c12c7a84b0 [ARM] add common parts for PACBTI-M support in the backend
This patch encapsulates decision logic about when and how to generate
PAC/BTI related code. It's a part shared by PAC-RET, BTI placement,
build attribute emission, etc, so it make sense committing it
separately in order to unblock the aforementioned parts, which can
proceed concurrently.

This patch adds a few member functions to `ARMFunctionInfo`, which are currently
unused, therefore there is no testing for them at the moment. This code is
tested in follow-up PAC/BTI code gen patches.

This patch is part of a series that adds support for the PACBTI-M extension of
the Armv8.1-M architecture, as detailed here:

https://community.arm.com/arm-community-blogs/b/architectures-and-processors-blog/posts/armv8-1-m-pointer-authentication-and-branch-target-identification-extension

The PACBTI-M specification can be found in the Armv8-M Architecture Reference
Manual:

https://developer.arm.com/documentation/ddi0553/latest

The following people contributed to this patch:

- Momchil Velikov
- Ties Stuij

Reviewed By: danielkiss

Differential Revision: https://reviews.llvm.org/D112423
2021-12-01 10:48:30 +00:00
Ties Stuij e3b2f0226b [clang][ARM] PACBTI-M frontend support
Handle branch protection option on the commandline as well as a function
attribute. One patch for both mechanisms, as they use the same underlying
parsing mechanism.

These are recorded in a set of LLVM IR module-level attributes like we do for
AArch64 PAC/BTI (see https://reviews.llvm.org/D85649):

- command-line options are "translated" to module-level LLVM IR
  attributes (metadata).

- functions have PAC/BTI specific attributes iff the
  __attribute__((target("branch-protection=...))) was used in the function
  declaration.

- command-line option -mbranch-protection to armclang targeting Arm,
following this grammar:

branch-protection ::= "-mbranch-protection=" <protection>
protection ::=  "none" | "standard" | "bti" [ "+" <pac-ret-clause> ]
                | <pac-ret-clause> [ "+" "bti"]
pac-ret-clause ::= "pac-ret" [ "+" <pac-ret-option> ]
pac-ret-option ::= "leaf" ["+" "b-key"] | "b-key" ["+" "leaf"]

b-key is simply a placeholder to make it consistent with AArch64's
version. In Arm, however, it triggers a warning informing that b-key is
unsupported and a-key will be selected instead.

- Handle _attribute_((target(("branch-protection=..."))) for AArch32 with the
same grammer as the commandline options.

This patch is part of a series that adds support for the PACBTI-M extension of
the Armv8.1-M architecture, as detailed here:

https://community.arm.com/arm-community-blogs/b/architectures-and-processors-blog/posts/armv8-1-m-pointer-authentication-and-branch-target-identification-extension

The PACBTI-M specification can be found in the Armv8-M Architecture Reference
Manual:

https://developer.arm.com/documentation/ddi0553/latest

The following people contributed to this patch:

- Momchil Velikov
- Victor Campos
- Ties Stuij

Reviewed By: vhscampos

Differential Revision: https://reviews.llvm.org/D112421
2021-12-01 10:37:16 +00:00
David Green 6d41de380f [ARM] Teach getIntImmCostInst about the cost of saturating fp converts
Given a min(max(fptosi, INT_MIN), INT_MAX) with the correct constants,
we can now generate a fptosi.sat. But in the arm backend, the constant
can be treated as high cost, pulling it out of the basic block in a way
that the DAG combine can no longer see it. This teaches it again that it
is a low cost constant, not worth hoisting out.

Differential Revision: https://reviews.llvm.org/D114380
2021-12-01 10:25:52 +00:00
Florian Hahn 6a5e29d13f
[BuildLibCalls] Add argmemonly, writeonly, nounwind to memset_chk.
The memset_chk library function should match memset's attributes with
respect of memory effects (argmemonly, writeonly). It also does not
raise exceptions. It may not return, in case it aborts the program.

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D114793
2021-12-01 10:09:52 +00:00
David Green 388bfc5408 [ARM] Fix some identing in ARMAsmPrinter::emitInstruction, NFC 2021-12-01 10:08:37 +00:00
Shraiysh Vaishay ec97e1206a [OpenMP][IRBuilder] Fix createSections
Fix for the case when there are no instructions in the entry basic block before the call
to `createSections`

Reviewed By: Meinersbur

Differential Revision: https://reviews.llvm.org/D114143
2021-12-01 15:11:54 +05:30
Nikita Popov 84b978da3b [LoopUnrollRuntime] Remove unnecessary pointer BECount check (NFC)
BECounts are guaranteed to be integers nowadays.
2021-12-01 10:32:37 +01:00
Nikita Popov 67704801c6 [SCEV] Track backedge taken count users (NFCI)
Track which SCEVs are used as ExactNotTaken counts in
BackedgeTakenInfo structures, so we can directly determine which
loops need to be invalidated, rather than iterating over all BECounts.

This gives a small compile-time improvement on average, but the
motivation here is more to ensure there are no degenerate cases,
if the number of backedge taken counts is large.

Differential Revision: https://reviews.llvm.org/D114784
2021-12-01 10:16:47 +01:00
Florian Hahn 7de410440d
[DSE] Allow DSE to optimize MemorySSA by default.
This allows for better optimization of 'stores-of-existing-values' and
possibly helps passes further down the pipeline.

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D113712
2021-12-01 08:29:23 +00:00
Markus Lavin ce22b7f17b [NPM] Fix LoopNestPasses in -print-pipeline-passes
Fix printing of LoopNestPasses when using the opt pipeline printer
option -print-pipeline-passes.

Reviewed By: aeubanks

Differential Revision: https://reviews.llvm.org/D114771
2021-12-01 07:57:17 +01:00
Qiu Chaofan 15826eb437 [Legalizer] Avoid expansion to BR_CC if illegal
Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D110616
2021-12-01 12:22:21 +08:00
Snehasish Kumar 3a4d373ec2 [memprof] Align each rawprofile section to 8b.
The first 8b of each raw profile section need to be aligned to 8b since
the first item in each section is a u64 count of the number of items in
the section.
Summary of changes:
* Assert alignment when reading counts.
* Update test to check alignment, relax some size checks to allow padding.
* Update raw binary inputs for llvm-profdata tests.

Differential Revision: https://reviews.llvm.org/D114826
2021-11-30 20:12:43 -08:00
Craig Topper d8f9eaad89 [RISCV] Teach RISCVTargetLowering::shouldSinkOperands to handle udiv/sdiv/urem/srem.
The V extension supports .vx instructions for integer division and
remainder so we should sink splats for that operand.
2021-11-30 18:47:51 -08:00
Vincent Lee b83a4222b1 [ObjectYAML/obj2yaml/yaml2obj][MachO] Support indirect symbol table
Tools such as `llvm-objdump` or `llvm-readobj` support indirect symbol
tables. Here, support it for `obj2yaml` and `yaml2obj`.

Reviewed By: jhenderson, drodriguez

Differential Revision: https://reviews.llvm.org/D114410
2021-11-30 16:15:33 -08:00
Mircea Trofin a503cb00d1 [NFC][regalloc] Factor accesses to ExtraRegInfo
We'll move ExtraRegInfo to the RegAllocEvictionAdvisor subsequently.
This change prepares for that by factoring all accesses.

RFC: https://lists.llvm.org/pipermail/llvm-dev/2021-November/153639.html

Differential Revision: https://reviews.llvm.org/D114759
2021-11-30 15:10:49 -08:00
Tarique Islam 0850655da6 Big-endian version of vpermxor
A big-endian version of vpermxor, named vpermxor_be, is added to LLVM
and Clang. vpermxor_be can be called directly on both the little-endian
and the big-endian platforms.

Reviewed By: nemanjai

Differential Revision: https://reviews.llvm.org/D114540
2021-11-30 22:49:55 +00:00