If the LowerTypeTests pass decides to add a function to a jump
table for CFI, it will add its name to the set cfiFunctionDefs,
which among other things will cause the function to be renamed in
the ThinLTO backend.
One other thing that we must do with such functions is to not
internalize them, because the jump table in the full LTO object will
contain a reference to the actual function body in the ThinLTO object.
This patch handles that by ensuring that we export any functions
whose names appear in the cfiFunctionDefs set.
Fixes PR33831.
Differential Revision: https://reviews.llvm.org/D35605
llvm-svn: 308504
This patch cleans up and fixes issues in the M-Class system register handling:
1. It defines the system registers and the encoding (SYSm values) in one place:
a new ARMSystemRegister.td using SearchableTable, thereby removing the
hand-coded values which existed in multiple places.
2. Some system registers e.g. BASEPRI_MAX_NS which do not exist were being allowed!
Ref: ARMv6/7/8M architecture reference manual.
Reviewed by: @t.p.northover, @olist01, @john.brawn
Differential Revision: https://reviews.llvm.org/D35209
llvm-svn: 308456
Summary:
When simplifying unconditional branches from empty blocks, we pre-test if the
BB belongs to a set of loop headers and keep the block to prevent passes from
destroying canonical loop structure. However, the current algorithm fails if
the destination of the branch is a loop header. Especially when such a loop's
latch block is folded into loop header it results in additional backedges and
LoopSimplify turns it into a nested loop which prevent later optimizations
from being applied (e.g., loop unrolling and loop interleaving).
This patch augments the existing algorithm by further checking if the
destination of the branch belongs to a set of loop headers and defer
eliminating it if yes to LateSimplifyCFG.
Fixes PR33605: https://bugs.llvm.org/show_bug.cgi?id=33605
Reviewers: efriedma, mcrosier, pacxx, hsung, davidxl
Reviewed By: efriedma
Subscribers: ashutosh.nema, gberry, javed.absar, llvm-commits
Differential Revision: https://reviews.llvm.org/D35411
llvm-svn: 308422
Generate a single test to decide if there are enough iterations to jump to the
vectorized loop, or else go to the scalar remainder loop. This test compares the
Scalar Trip Count: if STC < VF * UF go to the scalar loop. If
requiresScalarEpilogue() holds, at-least one iteration must remain scalar; the
rest can be used to form vector iterations. So in this case the test checks
instead if (STC - 1) < VF * UF by comparing STC <= VF * UF, and going to the
scalar loop if so. Otherwise the vector loop is entered for at-least one vector
iteration.
This test covers the case where incrementing the backedge-taken count will
overflow leading to an incorrect trip count of zero. In this (rare) case we will
also avoid the vector loop and jump to the scalar loop.
This patch simplifies the existing tests and effectively removes the basic-block
originally named "min.iters.checked", leaving the single test in block
"vector.ph".
Original observation and initial patch by Evgeny Stupachenko.
Differential Revision: https://reviews.llvm.org/D34150
llvm-svn: 308421
Allowing cycles in Phi traversal increases the scope of optimize memory instruction
in case we are in loop.
The added test shows an example of enabling optimization inside a loop.
Reviewers: loladiro, spatel, efriedma
Reviewed By: efriedma
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D35294
llvm-svn: 308419
That part was reverted because the underlying change necessitating it
(r308025) was reverted in r308271.
Nirav re-landed r308025 again in r308350, so re-landing this fix.
llvm-svn: 308418
functions.
In the prior commit, we provide ordering to the LCG between functions
and library function definitions that they might begin to call through
transformations. But we still would delete these library functions from
the call graph if they became dead during inlining.
While this immediately crashed, it also exposed a loss of information.
We shouldn't remove definitions of library functions that can still
usefully participate in the LCG-powered CGSCC optimization process. If
new call edges are formed, we want to have definitions to be called.
We can still remove these functions if truly dead using global-dce, etc,
but removing them during the CGSCC walk is premature.
This fixes a crash in the new PM when optimizing some unusual libraries
that end up with "internal" lib functions such as the code in the "R"
language's libraries.
llvm-svn: 308417
Summary:
This patch adds the following
1. Adds a skeleton scheduler model for AMD Znver1.
2. Introduces the znver1 execution units and pipes.
3. Caters the instructions based on the generic scheduler classes.
4. Further additions to the scheduler model with instruction itineraries will be carried out incrementally based on
a. Instructions types
b. Registers used
5. Since itineraries are not added based on instructions, throughput information are bound to change when incremental changes are added.
6. Scheduler testcases are modified accordingly to suit the new model.
Patch by Ganesh Gopalasubramanian. With minor formatting tweaks from me.
Reviewers: craig.topper, RKSimon
Subscribers: javed.absar, shivaram, ddibyend, vprasad
Differential Revision: https://reviews.llvm.org/D35293
llvm-svn: 308411
Install an llvm-readelf symlink to llvm-readobj.
When invoked as *readelf*, default to -elf-output-style=GNU.
Patch by Roland McGrath
Differential Revision: https://reviews.llvm.org/D33869
llvm-svn: 308408
Summary: Currently, when GVN creates a load and when InstCombine creates a new store for unreachable Load, the DebugLoc info gets lost.
Reviewers: dberlin, davide, aprantl
Reviewed By: aprantl
Subscribers: davide, llvm-commits
Differential Revision: https://reviews.llvm.org/D34639
llvm-svn: 308404
DIImportedEntity has a line number, but not a file field. To determine
the decl_line/decl_file we combine the line number from the
DIImportedEntity with the file from the DIImportedEntity's scope. This
does not work correctly when the parent scope is a DINamespace or a
DIModule, both of which do not have a source file.
This patch adds a file field to DIImportedEntity to unambiguously
identify the source location of the using/import declaration. Most
testcase updates are mechanical, the interesting one is the removal of
the FIXME in test/DebugInfo/Generic/namespace.ll.
This fixes PR33822. See https://bugs.llvm.org/show_bug.cgi?id=33822
for more context.
<rdar://problem/33357889>
https://bugs.llvm.org/show_bug.cgi?id=33822
Differential Revision: https://reviews.llvm.org/D35583
llvm-svn: 308398
Accept and ignore --wide/-W. In GNU readelf this switch is
necessary to get the output format that's consistent between
32-bit and 64-bit targets. llvm-readobj always produces that
output format.
Patch by Roland McGrath
Differential Revision: https://reviews.llvm.org/D33873
llvm-svn: 308396
In GNU readelf, the short option for --sections is upper-case -S.
Note that GNU uses lower-case -s to mean --symbols, while LLVM
uses -s to mean --sections and -t to mean --symbols (-t has yet a
different meaning in GNU). So command-line uses with -S can now
be compatible, but uses with -s or -t are still incompatible.
Patch by Roland McGrath
Differential Revision: https://reviews.llvm.org/D33872
llvm-svn: 308392
Summary:
ASan determines the stack layout from alloca instructions. Since
arguments marked as "byval" do not have an explicit alloca instruction, ASan
does not produce red zones for them. This commit produces an explicit alloca
instruction and copies the byval argument into the allocated memory so that red
zones are produced.
Submitted on behalf of @morehouse (Matt Morehouse)
Reviewers: eugenis, vitalybuka
Reviewed By: eugenis
Subscribers: hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D34789
llvm-svn: 308387
A PE COFF spec compliant import library generator.
Intended to be used with mingw-w64.
Supports:
PE COFF spec (section 8, Import Library Format)
PE COFF spec (Aux Format 3: Weak Externals)
Reviewed By: ruiu
Differential Revision: https://reviews.llvm.org/D29892
This reapplies rL308329, which was reverted in rL308374
llvm-svn: 308379
Re-recommiting after landing DAG extension-crash fix.
Recommiting after adding check to avoid miscomputing alias information
on addresses of the same base but different subindices.
Memory accesses offset from frame indices may alias, e.g., we
may merge write from function arguments passed on the stack when they
are contiguous. As a result, when checking aliasing, we consider the
underlying frame index's offset from the stack pointer.
Static allocs are realized as stack objects in SelectionDAG, but its
offset is not set until post-DAG causing DAGCombiner's alias check to
consider access to static allocas to frequently alias. Modify isAlias
to consider access between static allocas and access from other frame
objects to be considered aliasing.
Many test changes are included here. Most are fixes for tests which
indirectly relied on our aliasing ability and needed to be modified to
preserve their original intent.
The remaining tests have minor improvements due to relaxed
ordering. The exception is CodeGen/X86/2011-10-19-widen_vselect.ll
which has a minor degradation dispite though the pre-legalized DAG is
improved.
Reviewers: rnk, mkuper, jonpa, hfinkel, uweigand
Reviewed By: rnk
Subscribers: sdardis, nemanjai, javed.absar, llvm-commits
Differential Revision: https://reviews.llvm.org/D33345
llvm-svn: 308350
Added a feature to the Sparc back-end that replaces the integer multiply and
divide instructions with calls to .mul/.sdiv/.udiv. This is a step towards
having full v7 support.
Patch by: Eric Kedaigle
Differential Revision: https://reviews.llvm.org/D35500
llvm-svn: 308343
When replacing a node and it's operand, replacing the operand node may
cause the deletion of the original node leading to an assertion
failure. Case around these replacements to avoid this without relying
on inspecting the DELETED_NODE opcode in various extend
dagcombiner cases.
Fixes PR32515.
Reviewers: dbabokin, RKSimon, davide, chandlerc
Subscribers: chandlerc, llvm-commits
Differential Revision: https://reviews.llvm.org/D34095
llvm-svn: 308330
A PE COFF spec compliant import library generator.
Intended to be used with mingw-w64.
Supports:
PE COFF spec (section 8, Import Library Format)
PE COFF spec (Aux Format 3: Weak Externals)
Reviewed By: ruiu
Differential Revision: https://reviews.llvm.org/D29892
llvm-svn: 308329
As an approximation of the existing handling to avoid
regressions. Fixes using too many registers with calls
on subtargets with the SGPR allocation bug.
llvm-svn: 308326
Introduce pseudo-registers for registers needed for stack
access, which are replaced during finalizeLowering.
Note these pseudo-registers are currently only used for the
used register location, and not for determining their
input argument register.
This is better because it avoids the need to try to predict
whether a call will be emitted from the IR, and also
detects stack objects introduced by legalization.
Test changes are from the HasStackObjects check being more
accurate since stack objects introduced during legalization
are now known.
llvm-svn: 308325
It should be a win to avoid going out to the system lib for all small memcmp() calls using scalar ops. For x86 32-bit, this means most everything up to 16 bytes. For 64-bit, that doubles because we can do 8-byte loads.
Notes:
Reduced from 4 to 2 loads for -Os behavior, which might not be optimal in all cases. It's effectively a question of how much do we trust the system implementation. Linux and macOS (and Windows I assume, but did not test) have optimized memcmp() code for x86, so it's probably not bad either way? PPC is using 8/4 for defaults on these. We do not expand at all for -Oz.
There are still potential improvements to make for the CGP expansion IR and/or lowering such as avoiding select-of-constants (D34904) and not doing zexts to the max load type before doing a compare.
We have special-case SSE/AVX codegen for (memcmp(x, y, 16/32) == 0) that will no longer be produced after this patch. I've shown the experimental justification for that change in PR33329:
https://bugs.llvm.org/show_bug.cgi?id=33329#c12
TLDR: While the vector code is a likely winner, we can't guarantee that it's a winner in all cases on all CPUs, so I'm willing to sacrifice it for the greater good of expanding all small memcmp(). If we want to resurrect that codegen, it can be done by adjusting the CGP params or poking a hole to let those fall-through the CGP expansion.
Committed on behalf of Sanjay Patel
Differential Revision: https://reviews.llvm.org/D35067
llvm-svn: 308322
The flag "-hexagon-emit-lut-text" (defaulted to false) is added to decide
on where to keep the switch generated lookup table.
Differential Revision: https://reviews.llvm.org/D34818
llvm-svn: 308316
Summary:
When an immediate is folded by constant folding, we re-scan the entire
use list for two reasons:
1. The constant folding may have created a new use of the same reg.
2. The constant folding may have removed an additional use in the list
we're currently traversing (e.g., constant folding an S_ADD_I32 c, c).
However, this could previously lead to a crash when an unrelated use was
added twice into the FoldList. Since we re-scan the whole list anyway, we
might as well just clear the FoldList again before we do so.
Using a MIR test to show this because real code seems to trigger the issue
only in connection with some really subtle control flow structures.
Fixes GL45-CTS.shading_language_420pack.binding_images on gfx9.
Reviewers: arsenm
Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, llvm-commits, t-tye
Differential Revision: https://reviews.llvm.org/D35416
llvm-svn: 308314
As discussed by @spatel on D35067:
"I added the cmov attribute to the 32-bit codegen test because it removes some noise for that file. I think the intent for the SSE vs no-SSE runs is to show the potential difference for the 16 and 32 byte cases rather than the lack of cmov (which has been available for all CPUs since SSE1, so that's why it shows up automatically with -mattr=sse2)."
llvm-svn: 308309
Summary:
G_FMA was recently added to GlobalISel which enables the import of rules
involving fma. Add the mapping to allow it.
Reviewers: ab, t.p.northover, qcolombet, rovka, aditya_nandakumar
Reviewed By: rovka
Subscribers: kristof.beyls, javed.absar, igorb, llvm-commits
Differential Revision: https://reviews.llvm.org/D35130
llvm-svn: 308308
This change introduces additional machine instructions in functions
dealing with the expansion of msa pseudo f16 instructions due to
register classes being inappropriate when checked with machine
verifier.
Differential Revision: https://reviews.llvm.org/D34276
llvm-svn: 308301
using runtime checks
Extend the SCEVPredicateRewriter to work a bit harder when it encounters an
UnknownSCEV for a Phi node; Try to build an AddRecurrence also for Phi nodes
whose update chain involves casts that can be ignored under the proper runtime
overflow test. This is one step towards addressing PR30654.
Differential revision: http://reviews.llvm.org/D30041
llvm-svn: 308299
Coverage hooks that take less-than-64-bit-integers as parameters need the
zeroext parameter attribute (http://llvm.org/docs/LangRef.html#paramattrs)
to make sure they are properly extended by the x86_64 ABI.
llvm-svn: 308296
Summary:
Currently most tests for the loop interchange pass are in
test/Transforms/LoopInterchange/interchange.ll. This patch splits up the
large test file in smaller pieces, which makes debugging test failures
easier.
Reviewers: karthikthecool, blitz.opensource, hfinkel
Reviewed By: hfinkel
Subscribers: hfinkel, mcrosier, mkuper, mzolotukhin, mssimpso, llvm-commits
Differential Revision: https://reviews.llvm.org/D35488
llvm-svn: 308284
The commit r308100 updated WebAssembly tests for r308025. In one case it
merely made the test more resilient but in another case it made
a substantive update. Because r308025 was reverted in r308271, these
changes to the test also need to be reverted. They should be folded into
the recommit of r308025 when it is ready.
llvm-svn: 308273
This isn't legal code, but we shouldn't crash on it. Now we just don't convert the gather intrinsic if the scale isn't constant and let it go through to isel where we'll report an isel failure.
Fixes PR33772.
llvm-svn: 308267
In some particular cases eq/ne conditions can be turned into equivalent
slt/sgt conditions. This patch teaches parseLoopStructure to handle some
of these cases.
Differential Revision: https://reviews.llvm.org/D35010
llvm-svn: 308264
Summary:
Add canary tests to verify that MSAN currently does nothing with the element atomic memory intrinsics for memcpy, memmove, and memset.
Placeholder tests that will fail once element atomic @llvm.mem[cpy|move|set] instrinsics have been added to the MemIntrinsic class hierarchy. These will act as a reminder to verify that MSAN handles these intrinsics properly once they have been added to that class hierarchy.
Reviewers: reames
Reviewed By: reames
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D35510
llvm-svn: 308251
Summary:
Add canary tests to verify that ESAN currently does nothing with the element atomic memory intrinsics for memcpy, memmove, and memset.
Placeholder tests that will fail once element atomic @llvm.mem[cpy|move|set] instrinsics have been added to the MemIntrinsic class hierarchy. These will act as a reminder to verify that ESAN handles these intrinsics properly once they have been added to that class hierarchy.
Reviewers: reames
Reviewed By: reames
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D35508
llvm-svn: 308250
Summary:
Add canary tests to verify that DFSAN currently does nothing with the element atomic memory intrinsics for memcpy, memmove, and memset.
Placeholder tests that will fail once @llvm.mem[cpy|move|set] instrinsics have been added to the MemIntrinsic class hierarchy. These will act as a reminder to verify that DFSAN handles these intrinsics properly once they have been added to that class hierarchy.
Note that there could be some trickiness with these element-atomic intrinsics for the dataflow sanitizer in racy multithreaded programs. The data flow sanitizer inserts additional lib calls to mirror the memory intrinsic's action, so it is possible (very likely, even) that the dfsan buffers will not be in sync with the original buffers. Furthermore, implementation of the dfsan buffer updates for the element atomic intrinsics will have to also use unordered atomic instructions. If we can assume that dfsan is never run on racy multithreaded programs, then the element atomic memory intrinsics can pretty much be treated the same as the regular memory intrinsics.
Reviewers: reames
Reviewed By: reames
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D35507
llvm-svn: 308249
Summary:
Add canary tests to verify that ASAN currently does nothing with the element atomic memory intrinsics for memcpy, memmove, and memset.
Placeholder tests that will fail once element atomic @llvm.mem[cpy|move|set] instrinsics have been added to the MemIntrinsic class hierarchy. These will act as a reminder to verify that ASAN handles these intrinsics properly once they have been added to that class hierarchy.
Reviewers: reames
Reviewed By: reames
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D35505
llvm-svn: 308248
Summary:
Add canary tests to verify that InstCombine currently does nothing with the element atomic memory intrinsics for memmove and memset.
Placeholder tests that will fail once element atomic @llvm.mem[move|set] instrinsics have been added to the MemIntrinsic class hierarchy. These will act as a reminder to verify that inst combine handles these intrinsics properly once they have been added to that class hierarchy.
Reviewers: reames
Reviewed By: reames
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D35502
llvm-svn: 308247
Summary:
This patch modifies the handleDebugInfo() function so that we verify the contents of each unit
in the .debug_info section only if its header has been successfully verified.
This change will allow for more/different verification checks depending on the type of the unit since from
dwarf5, the .debug_info section may consist of different types of units.
Subscribers: aprantl
Differential Revision: https://reviews.llvm.org/D35521
llvm-svn: 308245
Summary:
This removes the CVTypeVisitor updater and verifier classes. They were
made dead by the minimal type dumping refactoring. Replace them with a
single function that takes a type record and produces a hash. Call this
from the minimal type dumper and compare the hash.
I also noticed that the microsoft-pdb reference repository uses a basic
CRC32 for records that aren't special. We already have an implementation
of that CRC ready to use, because it's used in COFF for ICF.
I'll make LLD call this hashing utility in a follow-up change. We might
also consider using this same hash in type stream merging, so that we
don't have to hash our records twice.
Reviewers: inglorion, ruiu
Subscribers: llvm-commits, hiraditya
Differential Revision: https://reviews.llvm.org/D35515
llvm-svn: 308240
Summary:
We were treating the GUIDs in TypeServer2Record as strings, and the
non-ASCII bytes in the GUID would not round-trip through YAML.
We already had the PDB_UniqueId type portably represent a Windows GUID,
but we need to hoist that up to the DebugInfo/CodeView library so that
we can use it in the TypeServer2Record as well as in PDB parsing code.
Reviewers: inglorion, amccarth
Subscribers: llvm-commits, hiraditya
Differential Revision: https://reviews.llvm.org/D35495
llvm-svn: 308234
This wasn't necessary before since they are always enabled
for kernels, but this is necessary if they need to be
forwarded to a callable function.
llvm-svn: 308226
Summary:
This is the first patch towards creating the llvm-mt tool for merging
Windows manifests. This is a reimplementation of mt.exe.
Reviewers: zturner, ruiu, rnk
Subscribers: llvm-commits, mgorny
Differential Revision: https://reviews.llvm.org/D35333
llvm-svn: 308224
Rename the enum value from X86_64_Win64 to plain Win64.
The symbol exposed in the textual IR is changed from 'x86_64_win64cc'
to 'win64cc', but the numeric value is kept, keeping support for
old bitcode.
Differential Revision: https://reviews.llvm.org/D34474
llvm-svn: 308208
This reverts commit r308114 (and follow on fixes to test).
There is a linking failure in a ThinLTO bot:
http://green.lab.llvm.org/green/job/clang-stage2-configure-Rthinlto_build/3663/
(and undefined reference). It seems like it must be a second order
effect of the heuristic change I made, and may take some time to try
to reproduce locally and track down. Therefore, reverting for now.
llvm-svn: 308206
This adds support for the new 128-bit vector float instructions of z14.
Note that these instructions actually only operate on the f128 type,
since only each 128-bit vector register can hold only one 128-bit
float value. However, this is still preferable to the legacy 128-bit
float instructions, since those operate on pairs of floating-point
registers (so we can hold at most 8 values in registers), while the
new instructions use single vector registers (so we hold up to 32
value in registers).
Adding support includes:
- Enabling the instructions for the assembler/disassembler.
- CodeGen for the instructions. This includes allocating the f128
type now to the VR128BitRegClass instead of FP128BitRegClass.
- Scheduler description support for the instructions.
Note that for a small number of operations, we have no new vector
instructions (like integer <-> 128-bit float conversions), and so
we use the legacy instruction and then reformat the operand
(i.e. copy between a pair of floating-point registers and a
vector register).
llvm-svn: 308196
This adds support for the new 32-bit vector float instructions of z14.
This includes:
- Enabling the instructions for the assembler/disassembler.
- CodeGen for the instructions, including new LLVM intrinsics.
- Scheduler description support for the instructions.
- Update to the vector cost function calculations.
In general, CodeGen support for the new v4f32 instructions closely
matches support for the existing v2f64 instructions.
llvm-svn: 308195
This patch series adds support for the IBM z14 processor. This part includes:
- Basic support for the new processor and its features.
- Support for new instructions (except vector 32-bit float and 128-bit float).
- CodeGen for new instructions, including new LLVM intrinsics.
- Scheduler description for the new processor.
- Detection of z14 as host processor.
Support for the new 32-bit vector float and 128-bit vector float
instructions is provided by separate patches.
llvm-svn: 308194
The target-independent lowering works fine, except concatenating 32-bit
words. Add a pattern to generate A2_combinew instead of 64-bit asl/or.
llvm-svn: 308186
Summary:
Previously, CodeGen checked first src operand type to determine if omod is supported by instruction. This isn't correct for some instructions: e.g. V_CMP_EQ_F32 has floating-point src operands but desn't support omod.
Changed .td files to check if dst operand instead of src operand.
Reviewers: arsenm, vpykhtin
Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye
Differential Revision: https://reviews.llvm.org/D35350
llvm-svn: 308179
LLVM compiler recognizes opportunities to transform a branch into IR select instruction(s) - later it will be lowered into X86::CMOV instruction, assuming no other optimization eliminated the SelectInst.
However, it is not always profitable to emit X86::CMOV instruction. For example, branch is preferable over an X86::CMOV instruction when:
1. Branch is well predicted
2. Condition operand is expensive, compared to True-value and the False-value operands
In CodeGenPrepare pass there is a shallow optimization that tries to convert SelectInst into branch, but it is not enough.
This commit, implements machine optimization pass that converts X86::CMOV instruction(s) into branch, based on a conservative heuristic.
Differential Revision: https://reviews.llvm.org/D34769
llvm-svn: 308142
Finally figured out that some bots were failing from r308114
with the message:
llvm-lto2: LTO::run failed: No available targets are compatible with this triple.
after adding in some other checking that finally caused this to show up
in the FileCheck output.
Added "REQUIRES: x86-registered-target" which should fix it.
llvm-svn: 308119
This restores r308078/r308079 with a fix for bot non-determinisim (make
sure we run llvm-lto in single threaded mode so the debug output doesn't get
interleaved).
llvm-svn: 308114
Summary:
If one side simplifies to the identity value for inner opcode, we can replace the value with just the operation that can't be simplified.
I've removed a couple now unneeded special cases in visitAnd and visitOr. There are probably other cases I missed.
Reviewers: spatel, majnemer, hfinkel, dberlin
Reviewed By: spatel
Subscribers: grandinj, llvm-commits, spatel
Differential Revision: https://reviews.llvm.org/D35451
llvm-svn: 308111
This fold hit the trifecta:
1. It was untested.
2. It oversteps (multiuse is not checked, so increases instruction count).
3. It is incomplete (doesn't work for vectors).
llvm-svn: 308102
that appears to exhibit non-determinism and is flaking on the bots
pretty consistently.
r308078: [ThinLTO] Ensure we always select the same function copy to import
r308079: Require asserts in new test that uses debug flag
llvm-svn: 308095
function to every defined function known to LLVM as a library function.
LLVM can introduce calls to these functions either by replacing other
library calls or by recognizing patterns (such as memset_pattern or
vector math patterns) and replacing those with calls. When these library
functions are actually defined in the module, we need to have reference
edges to them initially so that we visit them during the CGSCC walk in
the right order and can effectively rebuild the call graph afterward.
This was discovered when building code with Fortify enabled as that is
a common case of both inline definitions of library calls and
simplifications of code into calling them.
This can in extreme cases of LTO-ing with libc introduce *many* more
reference edges. I discussed a bunch of different options with folks but
all of them are unsatisfying. They either make the graph operations
substantially more complex even when there are *no* defined libfuncs, or
they introduce some other complexity into the callgraph. So this patch
goes with the simplest possible solution of actual synthetic reference
edges. If this proves to be a memory problem, I'm happy to implement one
of the clever techniques to save memory here.
llvm-svn: 308088
If the `long-calls` feature flags is enabled, disable use of the `jal`
instruction. Instead of that call a function by by first loading its
address into a register, and then using the contents of that register.
Differential revision: https://reviews.llvm.org/D35168
llvm-svn: 308087
The type needs to be casted back to the original argument type.
Fixes an assert that for some reason is only run when
using -debug.
Includes an additional combine to avoid test regressions
from having conversions mixed with multiple Assert[SZ]ext
nodes. On subtargets where i16 is legal, this was producing an i32
register with an i16 AssertZExt, truncated to i16 with another i8
AssertZExt.
t2: i32,ch = CopyFromReg t0, Register:i32 %vreg0
t3: i16 = truncate t2
t5: i16 = AssertZext t3, ValueType:ch:i8
t6: i8 = truncate t5
t7: i32 = zero_extend t6
llvm-svn: 308082
Currently, for code like below,
===
inner_map = bpf_map_lookup_elem(outer_map, &port_key);
if (!inner_map) {
inner_map = &fallback_map;
}
===
the compiler generates (pseudo) code like the below:
===
I1: r1 = bpf_map_lookup_elem(outer_map, &port_key);
I2: r2 = 0
I3: if (r1 == r2)
I4: r6 = &fallback_map
I5: ...
===
During kernel verification process, After I1, r1 holds a state
map_ptr_or_null. If I3 condition is not taken
(path [I1, I2, I3, I5]), supposedly r1 should become map_ptr.
Unfortunately, kernel does not recognize this pattern
and r1 remains map_ptr_or_null at insn I5. This will cause
verificaiton failure later on.
Kernel, however, is able to recognize pattern "if (r1 == 0)"
properly and give a map_ptr state to r1 in the above case.
LLVM here generates suboptimal code which causes kernel verification
failure. This patch fixes the issue by changing BPF insn pattern
matching and lowering to generate proper codes if the righthand
parameter of the above condition is a constant. A test case
is also added.
Signed-off-by: Yonghong Song <yhs@fb.com>
llvm-svn: 308080
Summary:
Check if the first eligible callee is under the instruction threshold.
Checking this on the first eligible callee ensures that we don't end
up selecting different callees to import when we invoke this routine
with different thresholds due to reaching the callee via paths that
are shallower or hotter (when there are multiple copies, i.e. with
weak or linkonce linkage). We don't want to leave the decision of which
copy to import up to the backend.
Reviewers: mehdi_amini
Subscribers: inglorion, fhahn, llvm-commits
Differential Revision: https://reviews.llvm.org/D35436
llvm-svn: 308078
Now, getUserCost() only checks the src and dst types of EXT to decide it is free
or not. This change first checks the types, then calls isExtFreeImpl(), and
check if EXT can form ExtLoad at last. Currently, only AArch64 has customized
implementation of isExtFreeImpl() to check if EXT can be folded into its use.
Differential Revision: https://reviews.llvm.org/D34458
llvm-svn: 308076
Restricting register class to PointerRegClass for memory operands.
Also fix the PointerRegClass for AArch64 from GPR64 to GPR64sp, since
XZR cannot hold a memory pointer while SP is.
Fixes PR33134.
Differential Revision: https://reviews.llvm.org/D34999
llvm-svn: 308060
Summary:
This patch is the first step in reducing HW prefetcher instruction tag
collisions in inner loops for Falkor. It adds a pass that annotates IR
loads with metadata to indicate that they are known to be strided loads,
and adds a target lowering hook that translates this metadata to a
target-specific MachineMemOperand flag.
A follow on change will use this MachineMemOperand flag to re-write
instructions to reduce tag collisions.
Reviewers: mcrosier, t.p.northover
Subscribers: aemerson, rengolin, mgorny, javed.absar, kristof.beyls, llvm-commits
Differential Revision: https://reviews.llvm.org/D34963
llvm-svn: 308059
Summary:
When checking for memory dependencies between calls using MemorySSA,
handle cases where the calls have no MemoryAccess associated with them
because the AA analysis being used has determined that the call does not
read/write memory.
Fixes PR33756
Reviewers: dberlin, davide
Subscribers: mcrosier, llvm-commits, Prazek
Differential Revision: https://reviews.llvm.org/D35317
llvm-svn: 308051
Add the following pattern to TryToUnfoldSelectInCurrBB()
bb:
%p = phi [0, %bb1], [1, %bb2], [0, %bb3], [1, %bb4], ...
%c = cmp %p, 0
%s = select %c, trueval, falseval
The Select in the above pattern will be unfolded and then jump-threaded. The
current implementation does not allow CMP in the middle of PHI and Select.
Differential Revision: https://reviews.llvm.org/D34762
llvm-svn: 308050
In moveToVALU(), move to vector ALU is performed, all instrs in
the use chain will be visited. We do not want the same node to be
pushed to the visit worklist more than once.
Differential Revision: https://reviews.llvm.org/D34726
llvm-svn: 308039
Nothing special here, output format is similar to the format
used by binutils readelf and ELF Tool Chain readelf.
Differential revision: https://reviews.llvm.org/D35351
llvm-svn: 308033
This is the LLVM part, adding definitions for
void @llvm.hexagon.Y2.dccleana(i8*)
void @llvm.hexagon.Y2.dccleaninva(i8*)
void @llvm.hexagon.Y2.dcinva(i8*)
void @llvm.hexagon.Y2.dczeroa(i8*)
void @llvm.hexagon.Y4.l2fetch(i8*, i32)
void @llvm.hexagon.Y5.l2fetch(i8*, i64)
The clang part will follow.
llvm-svn: 308032
Recommiting after adding check to avoid miscomputing alias information
on addresses of the same base but different subindices.
Memory accesses offset from frame indices may alias, e.g., we
may merge write from function arguments passed on the stack when they
are contiguous. As a result, when checking aliasing, we consider the
underlying frame index's offset from the stack pointer.
Static allocs are realized as stack objects in SelectionDAG, but its
offset is not set until post-DAG causing DAGCombiner's alias check to
consider access to static allocas to frequently alias. Modify isAlias
to consider access between static allocas and access from other frame
objects to be considered aliasing.
Many test changes are included here. Most are fixes for tests which
indirectly relied on our aliasing ability and needed to be modified to
preserve their original intent.
The remaining tests have minor improvements due to relaxed
ordering. The exception is CodeGen/X86/2011-10-19-widen_vselect.ll
which has a minor degradation dispite though the pre-legalized DAG is
improved.
Reviewers: rnk, mkuper, jonpa, hfinkel, uweigand
Reviewed By: rnk
Subscribers: sdardis, nemanjai, javed.absar, llvm-commits
Differential Revision: https://reviews.llvm.org/D33345
llvm-svn: 308025
Unlike many other instructions, these instructions have aliases which
take coprocessor registers, gpr register, accumulator (and dsp accumulator)
registers, floating point registers, floating point control registers and
coprocessor 2 data and control operands.
For the moment, these aliases are treated as pseudo instructions which are
expanded into the underlying instruction. As a result, disassembling these
instructions shows the underlying instruction and not the alias.
Reviewers: slthakur, atanasyan
Differential Revision: https://reviews.llvm.org/D35253
The last version of this patch broke one of the expensive checks buildbots,
this version changes the failing test/MC/Mips/mt/invalid.s and other invalid
tests to write the errors to a file and run FileCheck on that, rather than
relying on the 'not llvm-mc ... <%s 2>&1 | Filecheck %s' idiom.
Hopefully this will sarisfy the buildbot.
llvm-svn: 308023
Author: milena.vujosevic.janicic
Reviewers: sdardis
The patch extends size reduction pass for MicroMIPS.
The following instructions are examined and transformed, if possible:
ADDIU instruction is transformed into 16-bit instruction ADDIUSP
ADDIU instruction is transformed into 16-bit instruction ADDIUR1SP
Function InRange is changed to avoid left shifting of negative values, since
that caused some sanitizer tests to fail (so the previous patch
Differential Revision: https://reviews.llvm.org/D34511
llvm-svn: 308011
Insert a TSTri to set the flags and a Bcc to branch based on their
values. This is a bit inefficient in the (common) cases where the
condition for the branch comes from a compare right before the branch,
since we set the flags both as part of the compare lowering and as part
of the branch lowering. We're going to live with that until we settle on
a principled way to handle this kind of situation, which occurs with
other patterns as well (combines might be the way forward here).
llvm-svn: 308009
Constants are crucial for code size in the ARM Thumb-1 instruction
set. The 16 bit instruction size often does not offer enough space
for immediate arguments. This means that additional instructions are
frequently used to load constants into registers. Since constants are
hoisted, this can lead to significant register spillage if they are
used multiple times in a single function. This can be avoided by
rematerialization, i.e. recomputing a constant instead of reloading
it from the stack. This patch fixes the rematerialization of literal
pool loads in the ARM Thumb instruction set.
Patch by Philip Ginsbach
Differential Revision: https://reviews.llvm.org/D33936
llvm-svn: 308004
When iterating through loop
for (int i = INT_MAX; i > 0; i--)
We fail to generate the pre-loop for it. It happens because we use the
overflown value in a comparison predicate when identifying whether or not
we need it.
In old logic, we used SLE predicate against Greatest value which exceeds all
seen values of the IV and might be overflown. Now we use the GreatestSeen
value of this IV with SLT predicate.
Also added a test that ensures that a pre-loop is generated for such loops.
Differential Revision: https://reviews.llvm.org/D35347
llvm-svn: 308001
Since GFX9 supports denorm modes for v_min_f32/v_max_f32 that
is possible to further optimize fcanonicalize and remove it
if applied to min/max given their operands are known not to be
an sNaN or that sNaNs are not supported.
Additionally we can remove fcanonicalize if denorms are supported
for the VT and we know that its argument is never a NaN.
Differential Revision: https://reviews.llvm.org/D35335
llvm-svn: 307976
This patch adds verification checks for the unit header chain in the .debug_info section.
Specifically, for each unit in the .debug_info section, the verifier checks that:
The unit length is valid (i.e. the unit can actually fit in the .debug_info section)
The dwarf version of the unit is valid
The address size is valid (4 or 8)
The unit type (if the unit is in dwarf5) is valid
The debug_abbrev_offset is valid
llvm-svn: 307975
As outlined in the PR, we didn't ensure that displacements for DQ-Form
instructions are multiples of 16. Since the instruction encoding encodes
a quad-word displacement, a sub-16 byte displacement is meaningless and
ends up being encoded incorrectly.
Fixes https://bugs.llvm.org/show_bug.cgi?id=33671.
Differential Revision: https://reviews.llvm.org/D35007
llvm-svn: 307934
Pass parameters properly in calls to such functions (pass all
floats in integer registers), and handle va_start properly (allocate
stack immediately below the arguments on the stack, to save the
register arguments into a single continuous array).
Differential Revision: https://reviews.llvm.org/D35006
llvm-svn: 307928
Previously such relocations fell into the last case for local
symbols, using the relocation addend as symbol index, leading to
a crash.
Differential Revision: https://reviews.llvm.org/D35239
llvm-svn: 307927
The AsmParser mnemonic spell checker was introduced in r307148 and enabled only
for ARM. This patch enables it for AArch64.
Differential Revision: https://reviews.llvm.org/D35357
llvm-svn: 307918
Summary:
When we runtime unroll with multiple exit blocks, we also need to update the
immediate dominators of the immediate successors of the exit blocks.
Reviewers: reames, mkuper, mzolotukhin, apilipenko
Reviewed by: mzolotukhin
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D35304
llvm-svn: 307909
For multiprecision arithmetic on MIPS, rather than using ISD::ADDE / ISD::ADDC,
get SelectionDAG to break down the operation into ISD::ADDs and ISD::SETCCs.
For MIPS, only the DSP ASE has a carry flag, so in the general case it is not
useful to directly support ISD::{ADDE, ADDC, SUBE, SUBC} nodes.
Also improve the generation code in such cases for targets with
TargetLoweringBase::ZeroOrOneBooleanContent by directly using the result of the
comparison node rather than using it in selects. Similarly for ISD::SUBE /
ISD::SUBC.
Address optimization breakage by moving the generation of MIPS specific integer
multiply-accumulate nodes to before legalization.
This revolves PR32713 and PR33424.
Thanks to Simonas Kazlauskas and Pirama Arumuga Nainar for reporting the issue!
Reviewers: slthakur
Differential Revision: https://reviews.llvm.org/D33494
The previous version of this patch was too aggressive in producing fused
integer multiple-addition instructions.
llvm-svn: 307906
This boils down to not crashing in reg bank select due to the lack of
register operands on this instruction, and adding some tests. The
instruction selection is already covered by the TableGen'erated code.
llvm-svn: 307904
Summary:
Similar to X86, it should be safe to inline callees if their
target-features are a subset of the caller. As some subtarget features
provide different instructions depending on whether they are set or
unset (e.g. ThumbMode and ModeSoftFloat), we use a whitelist of
target-features describing hardware capabilities only.
Reviewers: kristof.beyls, rengolin, t.p.northover, SjoerdMeijer, peter.smith, silviu.baranga, efriedma
Reviewed By: SjoerdMeijer, efriedma
Subscribers: dschuff, efriedma, aemerson, sdardis, javed.absar, arichardson, eraman, llvm-commits
Differential Revision: https://reviews.llvm.org/D34697
llvm-svn: 307889
All other code in MachODump.cpp uses the same comparison,
((r_length & 0x1) == 1), for distinguishing between the two,
while the code in llvm-objdump.cpp seemed to be incorrect.
Differential Revision: https://reviews.llvm.org/D35240
llvm-svn: 307882
Summary: Add target hooks for printing and parsing target MMO flags.
Targets may override getSerializableMachineMemOperandTargetFlags() to
return a mapping from string to flag value for target MMO values that
should be serialized/parsed in MIR output.
Add implementation of this hook for AArch64 SuppressPair MMO flag.
Reviewers: bogner, hfinkel, qcolombet, MatzeB
Subscribers: mcrosier, javed.absar, llvm-commits
Differential Revision: https://reviews.llvm.org/D34962
llvm-svn: 307877
I don't know a reliable way of crafting a test for this case,
but I'll try a little harder. In the meanwhile, let's get the
bots green again. Please note this will be tested by `check-cfi`
once r307215 relands.
llvm-svn: 307874
Code to convert MachO - specific section debug section names to standard DWARF v5
section names was in the wrong place.
Differential Revision: https://reviews.llvm.org/D35321
llvm-svn: 307872
The instrumentation tracks the return address and not that of the
call so we remove one to compensate. Thanks for Peter Collingbourne
for confirming the analysis of the problem.
llvm-svn: 307871
When we fail to sink an instruction, we must make sure not to modify
the function; otherwise, we end up in an infinite loop because
CodeGenPrepare iterates until it doesn't make any changes.
Fixes https://bugs.llvm.org/show_bug.cgi?id=33608 .
llvm-svn: 307866
This is an incremental change to the promotion feature.
There are two problems with the current behavior:
1) loops with multiple exiting blocks are totally disabled
2) a counter update can only be promoted one level up in
the loop nest -- which does help much for short trip
count inner loops inside a high trip-count outer loops.
Due to this limitation, we still saw very large profile
count fluctuations from run to run for the affected loops
which are usually very hot.
This patch adds the support for promotion counters iteratively
across the loop nest. It also turns on the promotion for
loops with multiple exiting blocks (with a limit).
For single-threaded applications, the performance impact is flat
on average. For instance, dealII improves, but povray regresses.
llvm-svn: 307863
Summary: Continuing the work from https://reviews.llvm.org/D33240, this change introduces an element unordered-atomic memset intrinsic. This intrinsic is essentially memset with the implementation requirement that all stores used for the assignment are done with unordered-atomic stores of a given element size.
Reviewers: eli.friedman, reames, mkazantsev, skatkov
Reviewed By: reames
Subscribers: jfb, dschuff, sbc100, jgravelle-google, aheejin, efriedma, llvm-commits
Differential Revision: https://reviews.llvm.org/D34885
llvm-svn: 307854
We are using multiplication by 1.0 to flush denormals and quiet sNaNs.
That is possible to omit this multiplication if source of the
fcanonicalize instruction is known to be flushed/quieted, i.e.
if it comes from another instruction known to do the normalization
and we are using IEEE mode to quiet sNaNs.
Differential Revision: https://reviews.llvm.org/D35218
llvm-svn: 307848
Refactored the code and separated out a function
`canSafelyUnrollMultiExitLoop` to reduce redundant checks and make it
easier to add profitability heuristics later.
Added tests to runtime unrolling to make sure that unrolling for
multi-exit loops is not done unless the option
-unroll-runtime-multi-exit is true.
llvm-svn: 307843
Unlike many other instructions, these instructions have aliases which
take coprocessor registers, gpr register, accumulator (and dsp accumulator)
registers, floating point registers, floating point control registers and
coprocessor 2 data and control operands.
For the moment, these aliases are treated as pseudo instructions which are
expanded into the underlying instruction. As a result, disassembling these
instructions shows the underlying instruction and not the alias.
Reviewers: slthakur, atanasyan
Differential Revision: https://reviews.llvm.org/D35253
llvm-svn: 307836
Summary:
LoopRotate manually updates the DoomTree by iterating over all predecessors of a basic block and computing the Nearest Common Dominator.
When a predecessor happens to be unreachable, `DT.findNearestCommonDominator` returns nullptr.
This patch teaches LoopRotate to handle this case and fixes [[ https://bugs.llvm.org/show_bug.cgi?id=33701 | PR33701 ]].
In the future, LoopRotate should be taught to use the new incremental API for updating the DomTree.
Reviewers: dberlin, davide, uabelho, grosser
Subscribers: efriedma, mzolotukhin
Differential Revision: https://reviews.llvm.org/D35074
llvm-svn: 307828
A generic variant of IMPLICIT_DEF was added in r306875, but this
survives to selection and hits a `Cannot Select`. Add handling that
converts the note to a regular IMPLICIT_DEF.
llvm-svn: 307817
As promised in D35003.
Uses -codegenprepare instead of -instcombine since we hit the same
buggy path anyway, and CGP lets us keep this test really simple
(instcombine likes turning the alloca T, N into alloca [N x T], which
hides the bug this is testing for).
llvm-svn: 307811
FastIsel can't handle them, so we would end up crashing during
register class selection.
Fixes PR26522.
Differential Revision: https://reviews.llvm.org/D35272
llvm-svn: 307797
Summary: Continuing the work from https://reviews.llvm.org/D33240, this change introduces an element unordered-atomic memmove intrinsic. This intrinsic is essentially memmove with the implementation requirement that all loads/stores used for the copy are done with unordered-atomic loads/stores of a given element size.
Reviewers: eli.friedman, reames, mkazantsev, skatkov
Reviewed By: reames
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D34884
llvm-svn: 307796
Summary:
NetBSD shell sh(1) does not support ">& /dev/null" construct.
This is bashism. The portable and POSIX solution is to use:
"> /dev/null 2>&1".
This change fixes 22 Unexpected Failures on NetBSD/amd64
for the "check-llvm" target.
Sponsored by <The NetBSD Foundation>
Reviewers: joerg, dim, rnk
Reviewed By: joerg, rnk
Subscribers: rnk, davide, llvm-commits
Differential Revision: https://reviews.llvm.org/D35277
llvm-svn: 307789
When we have a diamond ifcvt the fallthough block will have a branch at the end
of it that disappears when predicated, so discount it from the predication cost.
Differential Revision: https://reviews.llvm.org/D34952
llvm-svn: 307788
Summary:
By prepending `.text .thumb .balign 2` to the module-level inline
assembly from a Thumb module, the assembler will generate the assembly
from that module as Thumb, even if the destination module uses an ARM
triple. Similar directives are used for module-level inline assembly in
ARM modules.
The alignment and instruction set are reset based on the target triple
before emitting the first function label.
Reviewers: olista01, tejohnson, echristo, t.p.northover, rafael
Reviewed By: echristo
Subscribers: aemerson, javed.absar, eraman, kristof.beyls, llvm-commits
Differential Revision: https://reviews.llvm.org/D34622
llvm-svn: 307772
This normally indicates mixed CFI + non-CFI compilation, and will
result in us treating the function in the same way as a function
defined outside of the LTO unit.
Part of PR33752.
Differential Revision: https://reviews.llvm.org/D35281
llvm-svn: 307744
Summary:
This allows tools like lld that process relocations
to apply data relocation correctly. This information
is required because relocation are stored as section
offset.
Subscribers: jfb, dschuff, jgravelle-google, aheejin
Differential Revision: https://reviews.llvm.org/D35234
llvm-svn: 307741
Avoid duplicating DictScope with hand-written names everywhere. Print
the S_-prefixed symbol kind for every record. This should make it easier
to search for certain kinds of records when debugging PDB linking.
llvm-svn: 307732
The issue is not if the value is pcrel. It is whether we have a
relocation or not.
If we have a relocation, the static linker will select the upper
bits. If we don't have a relocation, we have to do it.
llvm-svn: 307730
[GlobalOpt] Remove unreachable blocks before optimizing a function.
While the change is presumably correct, it exposes a latent bug
in DI which breaks on of the CFI checks. I'll analyze it further
and try to understand what's going on.
llvm-svn: 307729
OpenCL 2.0 introduces the notion of memory scopes in atomic operations to
global and local memory. These scopes restrict how synchronization is
achieved, which can result in improved performance.
This change extends existing notion of synchronization scopes in LLVM to
support arbitrary scopes expressed as target-specific strings, in addition to
the already defined scopes (single thread, system).
The LLVM IR and MIR syntax for expressing synchronization scopes has changed
to use *syncscope("<scope>")*, where <scope> can be "singlethread" (this
replaces *singlethread* keyword), or a target-specific name. As before, if
the scope is not specified, it defaults to CrossThread/System scope.
Implementation details:
- Mapping from synchronization scope name/string to synchronization scope id
is stored in LLVM context;
- CrossThread/System and SingleThread scopes are pre-defined to efficiently
check for known scopes without comparing strings;
- Synchronization scope names are stored in SYNC_SCOPE_NAMES_BLOCK in
the bitcode.
Differential Revision: https://reviews.llvm.org/D21723
llvm-svn: 307722
This patch implements the .module and .set directives for the MT ASE,
notably that .module sets the relevant flags in .MIPS.abiflags and .set
doesn't.
Reviewers: slthakur, atanasyan
Differential Revision: https://reviews.llvm.org/D35249
llvm-svn: 307716
For ELF, a movw+movt pair is handled as two separate relocations.
If an offset should be applied to the symbol address, this offset is
stored as an immediate in the instruction (as opposed to stored as an
offset in the relocation itself).
Even though the actual value stored in the movt immediate after linking
is the top half of the value, we need to store the unshifted offset
prior to linking. When the relocation is made during linking, the offset
gets added to the target symbol value, and the upper half of the value
is stored in the instruction.
This makes sure that movw+movt with offset symbols get properly
handled, in case the offset addition in the lower half should be
carried over to the upper half.
This makes the output from the additions to the test case match
the output from GNU binutils.
For COFF and MachO, the movw/movt relocations are handled as a pair,
and the overflow from the lower half gets carried over to the movt,
so they should keep the shifted offset just as before.
Differential Revision: https://reviews.llvm.org/D35242
llvm-svn: 307713
This is fine as nothing in the code relies on leader and memory
leader being the same for a given congruency class. Ack'ed by
Dan.
Fixes PR33720.
llvm-svn: 307699
Preparatory work for adding the MIPS MT (multi-threading) ASE instructions.
Reviewers: slthakur, atanasyan
Differential Revision: https://reviews.llvm.org/D35247
llvm-svn: 307679
The loop structure for the outer loop does not contain the epilog
preheader when we try to unroll inner loop with multiple exits and
epilog code is generated. For now, we just bail out in such cases.
Added a test case that shows the problem. Without this bailout, we would
trip on assert saying LCSSA form is incorrect for outer loop.
llvm-svn: 307676
1. The available program storage region of the red zone to compilers is 288
bytes rather than 244 bytes.
2. The formula for negative number alignment calculation should be
y = x & ~(n-1) rather than y = (x + (n-1)) & ~(n-1).
Differential Revision: https://reviews.llvm.org/D34337
llvm-svn: 307672
Use CHECK-NEXT for the comparison sequence, to make sure we don't get
any unexpected instructions in the middle of our flag manipulation
efforts.
llvm-svn: 307656
Make sure that all the legalizer tests where the original instruction
needs to be removed check for the removal. We do this by adding
CHECK-NOT lines before and after the replacement sequence. This won't
catch pathological cases where the instruction remains somewhere in the
middle of the instruction sequence that's supposed to replace it, but
hopefully that won't occur in practice (since ideally we'd be setting
the insert point for the new instruction sequence either before or after
the original instruction and not fiddle with it while building the
sequence).
llvm-svn: 307647
In each rule, each use of ComplexPattern is assigned an element in the Renderers
array. The matcher then collects renderer functions in this array and they are
used to render instructions. This works well for a single instruction but a
bug in the allocation mechanism causes the elements to be assigned on a
per-instruction basis rather than a per-rule basis.
So in the case of:
(set GPR32:$dst, (Op complex:$src1, complex:$src2))
tablegen currently assigns elements 0 and 1 to $src1 and $src2 respectively,
but for:
(set GPR32:$dst, (Op complex:$src1, (Op complex:$src2)))
it currently assigned both $src1 and $src2 the same element (0). This results in
one complex operand being rendered twice and the other being forgotten.
This patch corrects the allocation such that $src1 and $src2 are still allocated
different elements in this case.
llvm-svn: 307646
This change allows the pc to be used as a destination register for the
pseudo instruction LDR pc,=expression . The pseudo instruction must not be
transformed into a MOV, but it can use the Thumb2 LDR (literal) instruction
to a constant pool entry. See (A7.7.43 from ARMv7M ARM ARM).
Differential Revision: https://reviews.llvm.org/D34751
llvm-svn: 307640
We used to forget to erase the original instruction when replacing a
G_FCMP true/false. Fix this bug and make sure the tests check for it.
llvm-svn: 307639
TreePatternNode considers them to be plain integers but MachineInstr considers
them to be a distinct kind of operand.
The tweak to AArch64InstrInfo.td to produce a simple test case is a NFC for
everything except GlobalISelEmitter (confirmed by diffing the tablegenerated
files). GlobalISelEmitter is currently unable to infer the type of operands in
the Dst pattern from the operands in the Src pattern.
llvm-svn: 307634
This is a second attempt to land this patch.
The first one resulted in a crash of clang sanitizer buildbot.
The fix is here and regression test is added.
This is a last fix for the corner case of PR32214. Actually this is not really corner case in general.
We should not do a loop rotation if we create an additional branch due to it.
Consider the case where we have a loop chain H, M, B, C , where
H is header with viable fallthrough from pre-header and exit from the loop
M - some middle block
B - backedge to Header but with exit from the loop also.
C - some cold block of the loop.
Let's H is determined as a best exit. If we do a loop rotation M, B, C, H we can introduce the extra branch.
Let's compute the change in number of branches:
+1 branch from pre-header to header
-1 branch from header to exit
+1 branch from header to middle block if there is such
-1 branch from cold bock to header if there is one
So if C is not a predecessor of H then we introduce extra branch.
This change actually prohibits rotation of the loop if both true
Best Exit has next element in chain as successor.
Last element in chain is not a predecessor of first element of chain.
Reviewers: iteratee, xur, sammccall, chandlerc
Reviewed By: iteratee
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D34745
llvm-svn: 307631
CodeGenPrepare::optimizeMemoryInst contains a check that we do nothing
if all instructions combining the address for memory instruction is in the same
block as memory instruction itself.
However if any of these instruction are placed after memory instruction then
address calculation will not be folded to memory instruction.
The added test case shows an example.
Reviewers: loladiro, spatel, efriedma
Reviewed By: efriedma
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D34862
llvm-svn: 307628
querying for analysis results on a function declaration rather than
a definition.
The only reason this worked previously is by chance -- because the way
we got alias analysis results with the legacy PM, we happened to not
compute a dominator tree and so we happened to not hit an assert even
though it didn't make any real sense. Now we bail out before trying to
compute alias analysis so that we don't hit these asserts.
llvm-svn: 307625
PR30735 reports an issue where llvm-cov hangs with a worker thread
waiting on a condition, and the main thread waiting to join() the
workers. While this doesn't appear to be a bug in llvm-cov or the
ThreadPool implementation, it would be helpful to disable the use of
threading in the llvm-cov tests where no test coverage is added.
More context: https://bugs.llvm.org/show_bug.cgi?id=30735
llvm-svn: 307610
When an output directory is specified, llvm-cov spawns some threads to
speed up the process of writing out file reports. Add an option which
allows users to control how many threads llvm-cov uses.
A CommandGuide.rst update + test is included.
llvm-svn: 307609
Reverting as it breaks tramp3d-v4 in the llvm test-suite. I added some
comments to https://reviews.llvm.org/D33345 about it.
This reverts commit r307546.
llvm-svn: 307589
Summary:
As metioned in https://reviews.llvm.org/D34576, checkings in
`collectConstantCandidates` can be replaced by using
`llvm::canReplaceOperandWithVariable`.
The only special case is that `collectConstantCandidates` return false for
all `IntrinsicInst` but it is safe for us to collect constant candidates from
`IntrinsicInst`.
Reviewers: pirama, efriedma, srhines
Reviewed By: efriedma
Subscribers: llvm-commits, javed.absar
Differential Revision: https://reviews.llvm.org/D34921
llvm-svn: 307587
For each checked-in wasm file, make sure the there is
corresponding .ll file that can be used to regenerate it
if needed.
Add test/Object/Inputs/trivial-object-test.wasm to match other
formats and add some new wasm tests in test/Object.
Differential Revision: https://reviews.llvm.org/D35213
llvm-svn: 307585
Summary: When implementing MCFillFragment, use the size of the fragment,
rather than the size of the section.
Patch by Dan Gohman
Differential Revision: https://reviews.llvm.org/D35090
llvm-svn: 307565
For this example:
float test (int *arr) {
return arr[2];
}
We currently generate the following code:
li r4, 8
lxsiwax f0, r3, r4
xscvsxdsp f1, f0
With this patch, we will now generate:
addi r3, r3, 8
lxsiwax f0, 0, r3
xscvsxdsp f1, f0
Originally reported in: https://bugs.llvm.org/show_bug.cgi?id=27204
Differential Revision: https://reviews.llvm.org/D35027
llvm-svn: 307553
Summary: White spaces in file names are causing Phabricator/SVN to crash.
Reviewers: bkramer
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D35206
llvm-svn: 307550
Memory accesses offset from frame indices may alias, e.g., we
may merge write from function arguments passed on the stack when they
are contiguous. As a result, when checking aliasing, we consider the
underlying frame index's offset from the stack pointer.
Static allocs are realized as stack objects in SelectionDAG, but its
offset is not set until post-DAG causing DAGCombiner's alias check to
consider access to static allocas to frequently alias. Modify isAlias
to consider access between static allocas and access from other frame
objects to be considered aliasing.
Many test changes are included here. Most are fixes for tests which
indirectly relied on our aliasing ability and needed to be modified to
preserve their original intent.
The remaining tests have minor improvements due to relaxed
ordering. The exception is CodeGen/X86/2011-10-19-widen_vselect.ll
which has a minor degradation dispite though the pre-legalized DAG is
improved.
Reviewers: rnk, mkuper, jonpa, hfinkel, uweigand
Reviewed By: rnk
Subscribers: sdardis, nemanjai, javed.absar, llvm-commits
Differential Revision: https://reviews.llvm.org/D33345
llvm-svn: 307546
When unrolling under multiple exits which is under off-by-default option,
the assert that checks for VMap entry in loop exit values is too strong.
(assert if VMap entry did not exist, the value should be a
constant). However, values derived from
constants or from values outside loop, does not have a VMap entry too.
Removed the assert and added a testcase showcasing the property for
non-constant values.
llvm-svn: 307542
Summary:
This patch adds a callback registration API to the PassBuilder,
enabling registering out-of-tree passes with it.
Through the Callback API, callers may register callbacks with the
various stages at which passes are added into pass managers, including
parsing of a pass pipeline as well as at extension points within the
default -O pipelines.
Registering utilities like `require<>` and `invalidate<>` needs to be
handled manually by the caller, but a helper is provided.
Additionally, adding passes at pipeline extension points is exposed
through the opt tool. This patch adds a `-passes-ep-X` commandline
option for every extension point X, which opt parses into pipelines
inserted into that extension point.
Reviewers: chandlerc
Reviewed By: chandlerc
Subscribers: lksbhm, grosser, davide, mehdi_amini, llvm-commits, mgorny
Differential Revision: https://reviews.llvm.org/D33464
llvm-svn: 307532
The SandyBridge architects have provided us with a more accurate information about each instruction latency, number of uOPs and used ports and I used it to replace the existing estimated SNB instructions scheduling and to add missing scheduling information.
Please note that the patch extensively affects the X86 MC instr scheduling for SNB.
Also note that this patch will be followed by additional patches for the remaining target architectures HSW, IVB, BDW, SKL and SKX.
The updated and extended information about each instruction includes the following details:
•static latency of the instruction
•number of uOps from which the instruction consists of
•all ports used by the instruction's' uOPs
For example, the following code dictates that instructions, ADC64mr, ADC8mr, SBB64mr, SBB8mr have a static latency of 9 cycles. Each of these instructions is decoded into 6 micro operations which use ports 4, ports 2 or 3 and port 0 and ports 0 or 1 or 5:
def SBWriteResGroup94 : SchedWriteRes<[SBPort4,SBPort23,SBPort0,SBPort015]> {
let Latency = 9;
let NumMicroOps = 6;
let ResourceCycles = [1,2,2,1];
}
def: InstRW<[SBWriteResGroup94], (instregex "ADC64mr")>;
def: InstRW<[SBWriteResGroup94], (instregex "ADC8mr")>;
def: InstRW<[SBWriteResGroup94], (instregex "SBB64mr")>;
def: InstRW<[SBWriteResGroup94], (instregex "SBB8mr")>;
Note that apart for the header, most of the X86SchedSandyBridge.td file was generated by a script.
Reviewers: zvi, chandlerc, RKSimon, m_zuckerman, craig.topper, igorb
Differential Revision: https://reviews.llvm.org/D35019#inline-304691
llvm-svn: 307529
Summary:
Mark G_ZEXT/G_SEXT i1 to i8/i16, i8 to i16 as legal.
Support G_ZEXT i1 to i8/i16 instruction selection ( C++ code).
This patch requred to support G_LOAD/G_STORE i1.
Reviewers: zvi, guyblank
Reviewed By: guyblank
Subscribers: rovka, llvm-commits, kristof.beyls
Differential Revision: https://reviews.llvm.org/D35177
llvm-svn: 307526
Summary:
This solves PR33641.
When removing a dead argument we must also handle possibly existing calls
to llvm.dbg.value that use the removed argument. Now we change the use
of the otherwise dead argument to an undef for some other pass to cleanup
later.
If the calls are left untouched, they will later on cause errors:
"function-local metadata used in wrong function"
since the ArgumentPromotion rewrites the code by creating a new function
with the wanted signature, but the metadata is not recreated so the new
function may then erroneously use metadata from the old function.
Reviewers: mstorsjo, rnk, arsenm
Reviewed By: rnk
Subscribers: wdng, llvm-commits
Differential Revision: https://reviews.llvm.org/D34874
llvm-svn: 307521
WidenVSELECTAndMask can fold (and it folds in this case) so we
get a BUILD_VECTOR of constants as mask. convertMask() seems to
work fine when the input is a vector of constants, and we still
need to call it to extend/add elements at the end. but the current
code just asserts on anything but a SETCC or AND/OR/XOR of 2xSETCC.
This change was discussed briefly with Simon Pilgrim, who also
suggests we might consider dropping this assertion in the future.
Fixes PR33715.
llvm-svn: 307508
This change fixes a bug in SelectionDAGBuilder::visitInsertValue and SelectionDAGBuilder::visitExtractValue where constant expressions (InsertValueConstantExpr and ExtractValueConstantExpr) would be treated as non-constant instructions (InsertValueInst and ExtractValueInst). This bug resulted in an incorrect memory access, which manifested as an assertion failure in SDValue::SDValue.
Fixes PR#33094.
Submitted on behalf of @Praetonus (Benoit Vey)
Differential Revision: https://reviews.llvm.org/D34538
llvm-svn: 307502
invalidation of analyses when merging SCCs.
While I've added a bunch of testing of this, it takes something much
more like the inliner to really trigger this as you need to have
partially-analyzed SCCs with updates at just the right time. So I've
added a direct test for this using the inliner and verifying the
domtree. Without the changes here, this test ends up finding a stale
dominator tree.
However, to handle this properly, we need to invalidate analyses
*before* merging the SCCs. After talking to Philip and Sanjoy about this
they convinced me this was the right approach. To do this, we need
a callback mechanism when merging SCCs so we can observe the cycle that
will be merged before the merge happens. This API update ended up being
surprisingly easy.
With this commit, the new PM passes the test-suite again. It hadn't
since MemorySSA was enabled for EarlyCSE as that also will find this bug
very quickly.
llvm-svn: 307498
the invalidation propagation logic from an SCC to a Function.
I wrote the infrastructure to test this but didn't actually use it in
the unit test where it was designed to be used. =[ My bad. Once
I actually added it to the test case I discovered that it also hadn't
been properly implemented, so I've implemented it. The logic in the FAM
proxy for an SCC pass to propagate invalidation follows the same ideas
as the FAM proxy for a Module pass, but the implementation is a bit
different to reflect the fact that it is forwarding just for an SCC.
However, implementing this correctly uncovered a surprising "bug" (it
was conservatively correct but relatively very expensive) in how we
handle invalidation when splitting one SCC into multiple SCCs. We did an
eager invalidation when in reality we should be deferring invaliadtion
for the *current* SCC to the CGSCC pass manager and just invaliating the
newly constructed SCCs. Otherwise we end up invalidating too much too
soon. This was exposed by the inliner test case that I've updated. Now,
we invalidate *just* the split off '(test1_f)' SCC when doing the CG
update, and then the inliner finishes and invalidates the '(test1_g,
test1_h)' SCC's analyses. The first few attempts at fixing this hit
still more bugs, but all of those are covered by existing tests. For
example, the inliner should also preserve the FAM proxy to avoid
unnecesasry invalidation, and this is safe because the CG update
routines it uses handle any necessary adjustments to the FAM proxy.
Finally, the unittests for the CGSCC pass manager needed a bunch of
updates where we weren't correctly preserving the FAM proxy because it
hadn't been fully implemented and failing to preserve it didn't matter.
Note that this doesn't yet fix the current crasher due to MemSSA finding
a stale dominator tree, but without this the fix to that crasher doesn't
really make any sense when testing because it relies on the proxy
behavior.
llvm-svn: 307487
The patch was reverted due to a bug. The bug was that if the IV is the 2nd operand of the icmp
instruction, then the "Pred" variable gets swapped and differs from the instruction's predicate.
In this patch we use the original predicate to do the transformation.
Also added a test case that exercises this situation.
Differentian Revision: https://reviews.llvm.org/D35107
llvm-svn: 307477
I'm looking at a cmp transform in InstCombine that would affect these tests,
but it's hard to know if it makes things better or worse without seeing the
full IR. OTOH, maybe these tests shouldn't be running a bunch of transform
passes in the first place?
llvm-svn: 307475
x86 scalar select-of-constants (Cond ? C1 : C2) combining/lowering is a mess
with missing optimizations. We handle some patterns, but miss logical variants.
To clean that up, we should convert all select-of-constants to logic/math and
enhance the combining for the expected patterns from that. Selecting 0 or -1
needs extra attention to produce the optimal code as shown here.
Attempt to verify that all of these IR forms are logically equivalent:
http://rise4fun.com/Alive/plxs
Earlier steps in this series:
rL306040
rL306072
rL307404 (D34652)
As acknowledged in the earlier review, there's a possibility that some Intel
uarch would prefer to produce an xor to clear the fake register operand with
sbb %eax, %eax. This will likely need to be addressed in a separate pass.
llvm-svn: 307471
Summary: For interative sample-pgo, if a hot call site is inlined in the profiling binary, we should inline it in before profile annotation in the backend. Before that, the compile phase first collects all GUIDs that needs to be imported and creates virtual "hot" call edge in the summary. However, "hot" is not good enough to guarantee the callsites get inlined. This patch introduces "critical" call edge, and assign much higher importing threshold for those edges.
Reviewers: tejohnson
Reviewed By: tejohnson
Subscribers: sanjoy, mehdi_amini, llvm-commits, eraman
Differential Revision: https://reviews.llvm.org/D35096
llvm-svn: 307439
With the NFC refactoring in rL307417 (git SHA 987dd01), all the logic
is in place to support multiple exit/exiting blocks when prolog
remainder is generated.
This patch removed the assert that multiple exit blocks unrolling is only
supported when epilog remainder is generated.
Also, added test runs and checks with PROLOG prefix in
runtime-loop-multiple-exits.ll test cases.
llvm-svn: 307435
When reusing a register for a new definition, the fast register allocator
used to insert a kill flag at the previous last use of that register to
inform later passes that this register is free between the redef and the
last use. However, this may be wrong when subregisters are involved.
Indeed, a partially redef would have trigger a kill of the full super
register, potentially wrongly marking all the other subregisters as
free. Given we don't track which lanes are still live, we cannot set the
kill flag in such case.
Note: This bug has been latent for about 7 years (r104056).
llvmg.org/PR33677
llvm-svn: 307428
r306334 fixed a bug in AArch64 dealing with wide interleaved accesses having
pointer types. The bug also exists in ARM, so this patch copies over the fix.
llvm-svn: 307409
x86 scalar select-of-constants (Cond ? C1 : C2) combining/lowering is a mess
with missing optimizations. We handle some patterns, but miss logical variants.
To clean that up, we should convert all select-of-constants to logic/math and
enhance the combining for the expected patterns from that. DAGCombiner already
has the foundation to allow the transforms, so we just need to fill in the holes
for x86 math op lowering. Selecting 0 or -1 needs extra attention to produce the
optimal code as shown here.
Attempt to verify that all of these IR forms are logically equivalent:
http://rise4fun.com/Alive/plxs
Earlier steps in this series:
rL306040
rL306072
Differential Revision: https://reviews.llvm.org/D34652
llvm-svn: 307404
Prior to this commit both of the added test cases were passing. However, in the
latter case (test7) we were doing a lot more work to arrive at the same answer
(i.e., we were using isImpliedCondMatchingOperands() to determine the
implication.).
llvm-svn: 307400
Today the safepoint IR verifier catches some unrelocated uses of base
pointers that are actually valid.
With this change, we narrow down the set of false positives.
Specifically, the verifier knows about compares to null and compares
between 2 unrelocated pointers.
Reviewed by: skatkov
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D35057
llvm-svn: 307392
Summary:
This change gives a 0.89% speed on execution time, a 0.94% improvement
in benchmark scores and a 0.62% increase in binary size on a Cortex-A57.
These numbers are the geomean results on a wide range of benchmarks from
the test-suite, SPEC2000, SPEC2006 and a range of proprietary suites.
The software optimization guide for the Cortex-A57 recommends 16 byte
branch alignment.
Reviewers: t.p.northover, mcrosier, javed.absar, kristof.beyls, sbaranga
Reviewed By: kristof.beyls
Subscribers: aemerson, rengolin, llvm-commits
Differential Revision: https://reviews.llvm.org/D34954
llvm-svn: 307389
Summary:
This change gives a 0.34% speed on execution time, a 0.61% improvement
in benchmark scores and a 0.57% increase in binary size on a Cortex-A72.
These numbers are the geomean results on a wide range of benchmarks from
the test-suite, SPEC2000, SPEC2006 and a range of proprietary suites.
The software optimization guide for the Cortex-A72 recommends 16 byte
branch alignment.
Reviewers: t.p.northover, kristof.beyls, rengolin, sbaranga, mcrosier, javed.absar
Reviewed By: kristof.beyls
Subscribers: llvm-commits, aemerson
Differential Revision: https://reviews.llvm.org/D34961
llvm-svn: 307380
We lower to a sequence consisting of:
- MOVi 0 into a register
- VCMPS to do the actual comparison and set the VFP flags
- FMSTAT to move the flags out of the VFP unit
- MOVCCi to either use the "zero register" that we have previously set
with the MOVi, or move 1 into the result register, based on the values
of the flags
As was the case with soft-float, for some predicates (one, ueq) we
actually need two comparisons instead of just one. When that happens, we
generate two VCMPS-FMSTAT-MOVCCi sequences and chain them by means of
using the result of the first MOVCCi as the "zero register" for the
second one. This is a bit overkill, since one comparison followed by
two non-flag-setting conditional moves should be enough. In any case,
the backend manages to CSE one of the comparisons away so it doesn't
matter much.
Note that unlike SelectionDAG and FastISel, we always use VCMPS, and not
VCMPES. This makes the code a lot simpler, and it also seems correct
since the LLVM Lang Ref defines simple true/false returns if the
operands are QNaN's. For SNaN's, even VCMPS throws an Invalid Operand
exception, so they won't be slipping through unnoticed.
Implementation-wise, this introduces a template so we can share the same
code that we use for handling integer comparisons, since the only
differences are in the details (exact opcodes to be used etc). Hopefully
this will be easy to extend to s64 G_FCMP.
llvm-svn: 307365
Based strictly on the name, this seems to have something to do
width edit & continue. The goal of this patch has nothing to do
with supporting edit and continue though. msvc link.exe writes
very basic information into this area even when *not* compiling
with support for E&C, and so the goal here is to bring lld-link
to parity. Since we cannot know what assumptions standard tools
make about the content of PDB files, we need to be as close as
possible.
This ECNames data structure is a standard PDB string hash table.
link.exe puts a single string into this hash table, which is the
full path to the PDB file on disk. It then references this string
from the module descriptor for the compiler generated `* Linker *`
module.
With this patch, lld-link will generate the exact same sequence of
bytes as MSVC link for this subsection for a given object file
input (as reported by `llvm-pdbutil bytes -ec`).
llvm-svn: 307356
When scavenging for a use in instruction MI, we will reload after
that instruction and hence cannot spill uses/defs of this instruction.
This fixes http://llvm.org/PR33687
llvm-svn: 307352
InferAddressSpaces does not check address space in collectFlatAddressExpressions,
which causes values with non flat address space put into Postorder and causes
assertion in cloneValueWithNewAddressSpace.
This patch fixes assertion in OpenCL 2.0 conformance test generic_address_space
subtest for amdgcn target.
Differential Revision: https://reviews.llvm.org/D34991
llvm-svn: 307349
Model weakly defined symbols as symbols that are both
exports and imported and marked as weak. Local references
to the symbols refer to the import but the linker can
resolve this to the weak export if not strong symbol
is found at link time.
Differential Revision: https://reviews.llvm.org/D35029
llvm-svn: 307348
Adds loop expansions for known-size and unknown-sized memcpy calls, allowing the
target to provide the operand types through TTI callbacks. The default values
for the TTI callbacks use int8 operand types and matches the existing behaviour
if they aren't overridden by the target.
Differential revision: https://reviews.llvm.org/D32536
llvm-svn: 307346
Revert "Copy arguments passed by value into explicit allocas for ASan."
Revert "[asan] Add end-to-end tests for overflows of byval arguments."
Build failure on lldb-x86_64-ubuntu-14.04-buildserver.
Test failure on clang-cmake-aarch64-42vma and sanitizer-x86_64-linux-android.
llvm-svn: 307345
ASan determines the stack layout from alloca instructions. Since
arguments marked as "byval" do not have an explicit alloca instruction, ASan
does not produce red zones for them. This commit produces an explicit alloca
instruction and copies the byval argument into the allocated memory so that red
zones are produced.
Patch by Matt Morehouse.
Differential revision: https://reviews.llvm.org/D34789
llvm-svn: 307342
Using profile information to guide consthoisting is generally helpful for
performance, so the patch turns it on by default. No compile time or perf
regression were found using spec2000 and spec2006 on x86. Some significant
improvement (>20%) was seen on internal benchmarks.
Differential Revision: https://reviews.llvm.org/D35063
llvm-svn: 307338