Add FastISel support for SSE4A scalar float / double non-temporal stores
Follow up to D13698
Differential Revision: http://reviews.llvm.org/D13773
llvm-svn: 250610
This patch improves support for combining the SSE4A EXTRQ(I) and INSERTQ(I) intrinsics:
1 - Converts INSERTQ/EXTRQ calls to INSERTQI/EXTRQI if the 'bit index' and 'length' operands are constant
2 - Converts INSERTQI/EXTRQI calls to shufflevector if the bit index/length are both byte aligned (we can already lower shuffles to INSERTQI/EXTRQI if its useful)
3 - Constant folding support
4 - Add zeroinitializer handling
Differential Revision: http://reviews.llvm.org/D13348
llvm-svn: 250609
This property was already used in the code path when no liveness
intervals are present. Unfortunately the code path that uses liveness
intervals tried to query a cached live interval for an allocatable
physreg, those are usually not computed so a conservative default was
used.
This doesn't affect any of the lit testcases. This is a foreclosure to
upcoming changes which should be NFC but without this patch this tidbit
wouldn't be NFC.
llvm-svn: 250596
This should not change behaviour because as far as I can see all code
reading the pressure changes has no effect if the PressureInc is 0.
Removing these entries however does avoid unnecessary computation, and
results in a more stable debug output. I want the stable debug output to
check that some upcoming changes are indeed NFC and identical even at
the debug output level.
llvm-svn: 250595
Summary:
This is a temporary hack until we get around to remapping the vreg
numbers to local numbers. Dead vregs cause bad numbering and make
consumers sad.
We could also just look at debug info an use named locals instead, but
vregs have to work properly anyways so there!
Reviewers: binji, sunfish
Subscribers: jfb, llvm-commits, dschuff
Differential Revision: http://reviews.llvm.org/D13839
llvm-svn: 250594
Summary:
Some shared code for handling eh.exceptionpointer and eh.exceptioncode
needs to not share the part that truncates to 32 bits, which is intended
just for exception codes.
Reviewers: rnk
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D13747
llvm-svn: 250588
Our previous value of "16 + 8 + MaxCallFrameSize" for ParentFrameOffset
is incorrect when CSRs are involved. We were supposed to have a test
case to catch this, but it wasn't very rigorous.
The main effect here is that calling _CxxThrowException inside a
catchpad doesn't immediately crash on MOVAPS when you have an odd number
of CSRs.
llvm-svn: 250583
The motivation for this patch starts with PR20134:
https://llvm.org/bugs/show_bug.cgi?id=20134
void foo(int *a, int i) {
a[i] = a[i+1] + a[i+2];
}
It seems better to produce this (14 bytes):
movslq %esi, %rsi
movl 0x4(%rdi,%rsi,4), %eax
addl 0x8(%rdi,%rsi,4), %eax
movl %eax, (%rdi,%rsi,4)
Rather than this (22 bytes):
leal 0x1(%rsi), %eax
cltq
leal 0x2(%rsi), %ecx
movslq %ecx, %rcx
movl (%rdi,%rcx,4), %ecx
addl (%rdi,%rax,4), %ecx
movslq %esi, %rax
movl %ecx, (%rdi,%rax,4)
The most basic problem (the first test case in the patch combines constants) should also be fixed in InstCombine,
but it gets more complicated after that because we need to consider architecture and micro-architecture. For
example, AArch64 may not see any benefit from the more general transform because the ISA solves the sexting in
hardware. Some x86 chips may not want to replace 2 ADD insts with 1 LEA, and there's an attribute for that:
FeatureSlowLEA. But I suspect that doesn't go far enough or maybe it's not getting used when it should; I'm
also not sure if FeatureSlowLEA should also mean "slow complex addressing mode".
I see no perf differences on test-suite with this change running on AMD Jaguar, but I see small code size
improvements when building clang and the LLVM tools with the patched compiler.
A more general solution to the sext(add nsw(x, C)) problem that works for multiple targets is available
in CodeGenPrepare, but it may take quite a bit more work to get that to fire on all of the test cases that
this patch takes care of.
Differential Revision: http://reviews.llvm.org/D13757
llvm-svn: 250560
Crashing is bad, m'kay? Fixing a 4 year old bug of my own creation.
Adding the testcase now which I should have added then which would have
long since caught this.
The problem is that printMessage() will display the diagnostic but not
set HadError to true, resulting in the assembler continuing on its way
and trying to create relocations for things that may not allow them or
otherwise get itself into trouble. Using the Error() helper function
here rather than calling printMessage() directly resolves this.
rdar://23133240
llvm-svn: 250557
Summary:
We now use the block for the catchpad itself, rather than its normal
successor, as the funclet entry.
Putting the normal successor in the map leads downstream funclet
membership computations to erroneous results.
Reviewers: majnemer, rnk
Subscribers: rnk, llvm-commits
Differential Revision: http://reviews.llvm.org/D13798
llvm-svn: 250552
The number of samples collected at the head of a function only make
sense for top-level functions (i.e., those actually called as opposed to
being inlined inside another).
Head samples essentially count the time spent inside the function's
prologue. This clearly doesn't make sense for inlined functions, so we
were always emitting 0 in those.
llvm-svn: 250539
Summary: The syntax has changed a bit recently.
Reviewers: binji
Subscribers: llvm-commits, jfb, sunfish, dschuff
Differential Revision: http://reviews.llvm.org/D13821
llvm-svn: 250535
Summary:
When a cleanup's cleanupendpad or cleanupret targets a catchendpad, stop
trying to propagate the cleanup's parent's color to the catchendpad, since
what's needed is the cleanup's grandparent's color and the catchendpad
will get that color from the catchpad linkage already. We already had
this exclusion for invokes, but were missing it for
cleanupendpad/cleanupret.
Also add a missing line that tags cleanupendpads' states in the
EHPadStateMap, without with lowering invokes that target cleanupendpads
which unwind to other handlers (and so don't have the -1 state) will fail.
This fixes the reduced IR repro in PR25163.
Reviewers: majnemer, andrew.w.kaylor, rnk
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D13797
llvm-svn: 250534
Ideally, we would like SimplifyCFG to be able to form select instructions even when the operands
are expensive (as defined by the TTI cost model) because that may expose further optimizations.
However, we would then like a later pass like CodeGenPrepare to undo that transformation if the
target would likely benefit from not speculatively executing an expensive op (this patch).
Once we have this safety mechanism in place, we can adjust SimplifyCFG to restore its
select-formation behavior that changed with r248439.
Differential Revision: http://reviews.llvm.org/D13297
llvm-svn: 250527
Summary: Make the relooper an analysis pass, to convert CFG to AST.
Reviewers: sunfish
Subscribers: jfb, dschuff
Differential Revision: http://reviews.llvm.org/D12744
llvm-svn: 250524
Summary: This patch replaces usage of deprecated SHGetFolderPathW with SHGetKnownFolderPath. The usage of SHGetKnownFolderPath is wrapped to allow queries for other "known" folders in the near future.
Reviewers: aaron.ballman, gbedwell
Subscribers: chapuni, llvm-commits
Differential Revision: http://reviews.llvm.org/D13753
llvm-svn: 250501
This patch adds the underlying infrastructure for an AVR backend to be included into LLVM. It is the first of a series of patches aimed at moving the out-of-tree AVR backend into the tree.
It consists of adding a new`Triple` target 'avr'.
llvm-svn: 250492
The `"statepoint-id"` and `"statepoint-num-patch-bytes"` attributes are
used solely to determine properties of the `gc.statepoint` being
created. Once the `gc.statepoint` is in place, these should be removed.
llvm-svn: 250491
Summary:
This is a step towards using operand bundles to carry deopt state till
RewriteStatepointsForGC. The change adds a flag to
RewriteStatepointsForGC that teaches it to pick up deopt state from a
`"deopt"` operand bundle attached to the `call` or `invoke` it is
wrapping.
The command line flag added, `-rs4gc-use-deopt-bundles`, will only exist
for a short while. Once we are able to pipe deopt bundle state through
the full optimization pipeline without problems, we will "constant fold"
`-rs4gc-use-deopt-bundles` to `true`.
Reviewers: swaroop.sridhar, reames
Subscribers: llvm-commits, sanjoy
Differential Revision: http://reviews.llvm.org/D13372
llvm-svn: 250489
Summary:
`cloneArithmeticIVUser` currently trips over expression like `add %iv,
-1` when `%iv` is being zero extended -- it tries to construct the
widened use as `add %iv.zext, zext(-1)` and (correctly) fails to prove
equivalence to `zext(add %iv, -1)` (here the SCEV for `%iv` is
`{1,+,1}`).
This change teaches `IndVars` to try sign extending the non-IV operand
if that makes the newly constructed IV use equivalent to the widened
narrow IV use.
Reviewers: atrick, hfinkel, reames
Subscribers: sanjoy, llvm-commits
Differential Revision: http://reviews.llvm.org/D13717
llvm-svn: 250483
Summary:
This NFC splitting is intended to make a later diff easier to follow.
It just tail duplicates `cloneIVUser` into `cloneArithmeticIVUser` and
`cloneBitwiseIVUser`.
Reviewers: atrick, hfinkel, reames
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D13716
llvm-svn: 250481
Summary:
Follow the same syntax as for the spec repo. Both have evolved slightly
independently and need to converge again.
This, along with wasmate changes, allows me to do the following:
echo "int add(int a, int b) { return a + b; }" > add.c
./out/bin/clang -O2 -S --target=wasm32-unknown-unknown add.c -o add.wack
./experimental/prototype-wasmate/wasmate.py add.wack > add.wast
./sexpr-wasm-prototype/out/sexpr-wasm add.wast -o add.wasm
./sexpr-wasm-prototype/third_party/v8-native-prototype/v8/v8/out/Release/d8 -e "print(WASM.instantiateModule(readbuffer('add.wasm'), {print:print}).add(42, 1337));"
As you'd expect, the d8 shell prints out the right value.
Reviewers: sunfish
Subscribers: jfb, llvm-commits, dschuff
Differential Revision: http://reviews.llvm.org/D13712
llvm-svn: 250480
Android libc provides a fixed TLS slot for the unsafe stack pointer,
and this change implements direct access to that slot on AArch64 via
__builtin_thread_pointer() + offset.
This change also moves more code into TargetLowering and its
target-specific subclasses to get rid of target-specific codegen
in SafeStackPass.
This change does not touch the ARM backend because ARM lowers
builting_thread_pointer as aeabi_read_tp, which is not available
on Android.
llvm-svn: 250456
D4796 taught LLVM to fold some atomic integer operations into a single
instruction. The pattern was unaware that the instructions clobbered
flags. I fixed some of this issue in D13680 but had missed INC/DEC.
This patch adds the missing EFLAGS definition.
llvm-svn: 250438
Turns out this approach is buggy. In discussion about follow on work, Sanjoy pointed out that we could be subject to circular logic problems.
Consider:
if (i u< L) leave()
if ((i + 1) u< L) leave()
print(a[i] + a[i+1])
If we know that L is less than UINT_MAX, we could possible prove (in a control dependent way) that i + 1 does not overflow. This gives us:
if (i u< L) leave()
if ((i +nuw 1) u< L) leave()
print(a[i] + a[i+1])
If we now do the transform this patch proposed, we end up with:
if ((i +nuw 1) u< L) leave_appropriately()
print(a[i] + a[i+1])
That would be a miscompile when i==-1. The problem here is that the control dependent nuw bits got used to prove something about the first condition. That's obviously invalid.
This won't happen today, but since I plan to enhance LVI/CVP with exactly that transform at some point in the not too distant future...
llvm-svn: 250430
Summary:
x86 codegen is clever about generating good code for relaxed
floating-point operations, but it was being silly when globals and
immediates were involved, forgetting where the global was and
loading/storing from/to the wrong place. The same applied to hard-coded
address immediates.
Don't let it forget about the displacement.
This fixes https://llvm.org/bugs/show_bug.cgi?id=25171
A very similar bug when doing floating-points atomics to the stack is
also fixed by this patch.
This fixes https://llvm.org/bugs/show_bug.cgi?id=25144
Reviewers: pete
Subscribers: llvm-commits, majnemer, rsmith
Differential Revision: http://reviews.llvm.org/D13749
llvm-svn: 250429
This adjusts all integers in the reader/writer to reflect the types
stored on profile files. They should all be unsigned 32-bit or 64-bit
values. Changed all associated internal types to be uint32_t or
uint64_t.
The only place that needed some adjustments is in the sample profile
transformation. Altough the weight read from the profile are 64-bit
values, the internal API for branch weights only accepts 32-bit values.
The pass now saturates weights that overflow uint32_t.
llvm-svn: 250427
Summary:
This macro is needed to prevent test/CodeGen/Mips/2008-08-01-AsmInline.ll from
failing after the integrated assembler is enabled by default.
Reviewers: vkalintiris
Subscribers: llvm-commits, dsanders
Differential Revision: http://reviews.llvm.org/D13654
llvm-svn: 250414
Summary:
The -mcpu=mips16 option caused the Integrated Assembler to crash because
it couldn't figure out the architecture revision number to write to the
.MIPS.abiflags section. This CPU definition has been removed because, like
microMIPS, MIPS16 is an ASE to a base architecture.
Reviewers: vkalintiris
Subscribers: rkotler, llvm-commits, dsanders
Differential Revision: http://reviews.llvm.org/D13656
llvm-svn: 250407
AVX-512 bit shuffle fails on 32 bit since we create a vector of 64-bit constants.
I split 8x64-bit const vector to 16x32 on 32-bit mode.
Differential Revision: http://reviews.llvm.org/D13644
llvm-svn: 250390
(e.g. bss sections).
MachO and ELF have been silently letting this pass, but COFFObjectFile contains
an assertion to catch this kind of (ab)use of the getSectionContents, and this
was causing the JIT to crash on COFF objects with BSS sections. This patch
should fix that.
llvm-svn: 250371
Recommit r250342: move coal-sections-powerpc.s to subdirectory for powerpc.
Some background on why we don't have to use *coal* sections anymore:
Long ago when C++ was new and "weak" had not been standardized, an attempt was
made in cctools to support C++ inlines that can be coalesced by putting them
into their own section (TEXT/textcoal_nt instead of TEXT/text).
The current macho linker supports the weak-def bit on any symbol to allow it to
be coalesced, but the compiler still puts weak-def functions/data into alternate
section names, which the linker must map back to the base section name.
This patch makes changes that are necessary to prevent the compiler from using
the "coal" sections and have it use the non-coal sections instead when the
target architecture is not powerpc:
TEXT/textcoal_nt instead use TEXT/text
TEXT/const_coal instead use TEXT/const
DATA/datacoal_nt instead use DATA/data
If the target is powerpc, we continue to use the *coal* sections since anyone
targeting powerpc is probably using an old linker that doesn't have support for
the weak-def bits.
Also, have the assembler issue a warning if it encounters a *coal* section in
the assembly file and inform the users to use the non-coal sections instead.
rdar://problem/14265330
Differential Revision: http://reviews.llvm.org/D13188
llvm-svn: 250370
With r250345 and r250343, we start to observe the following failure
when bootstrap clang with lto and pgo:
PHI node entries do not match predecessors!
%.sroa.029.3.i = phi %"class.llvm::SDNode.13298"* [ null, %30953 ], [ null, %31017 ], [ null, %30998 ], [ null, %_ZN4llvm8dyn_castINS_14ConstantSDNodeENS_7SDValueEEENS_10cast_rettyIT_T0_E8ret_typeERS5_.exit.i.1804 ], [ null, %30975 ], [ null, %30991 ], [ null, %_ZNK4llvm3EVT13getScalarTypeEv.exit.i.1812 ], [ %..sroa.029.0.i, %_ZN4llvm11SmallVectorIiLj8EED1Ev.exit.i.1826 ], !dbg !451895
label %30998
label %_ZNK4llvm3EVTeqES0_.exit19.thread.i
LLVM ERROR: Broken function found, compilation aborted!
I will re-commit this if the bot does not recover.
llvm-svn: 250366
A PDB can be thought of as a very simple file system. It is
occasionally illuminating to see the contents of the underlying files.
Differential Revision: http://reviews.llvm.org/D13674
llvm-svn: 250356
Recommit r250342: add -arch=ppc32 to the RUN lines of powerpc tests.
Some background on why we don't have to use *coal* sections anymore:
Long ago when C++ was new and "weak" had not been standardized, an attempt was
made in cctools to support C++ inlines that can be coalesced by putting them
into their own section (TEXT/textcoal_nt instead of TEXT/text).
The current macho linker supports the weak-def bit on any symbol to allow it to
be coalesced, but the compiler still puts weak-def functions/data into alternate
section names, which the linker must map back to the base section name.
This patch makes changes that are necessary to prevent the compiler from using
the "coal" sections and have it use the non-coal sections instead when the
target architecture is not powerpc:
TEXT/textcoal_nt instead use TEXT/text
TEXT/const_coal instead use TEXT/const
DATA/datacoal_nt instead use DATA/data
If the target is powerpc, we continue to use the *coal* sections since anyone
targeting powerpc is probably using an old linker that doesn't have support for
the weak-def bits.
Also, have the assembler issue a warning if it encounters a *coal* section in
the assembly file and inform the users to use the non-coal sections instead.
rdar://problem/14265330
Differential Revision: http://reviews.llvm.org/D13188
llvm-svn: 250349
Currently in JumpThreading pass, the branch weight metadata is not updated after CFG modification. Consider the jump threading on PredBB, BB, and SuccBB. After jump threading, the weight on BB->SuccBB should be adjusted as some of it is contributed by the edge PredBB->BB, which doesn't exist anymore. This patch tries to update the edge weight in metadata on BB->SuccBB by scaling it by 1 - Freq(PredBB->BB) / Freq(BB->SuccBB).
This is the third attempt to submit this patch, while the first two led to failures in some FDO tests. After investigation, it is the edge weight normalization that caused those failures. In this patch the edge weight normalization is fixed so that there is no zero weight in the output and the sum of all weights can fit in 32-bit integer. Several unit tests are added.
Differential revision: http://reviews.llvm.org/D10979
llvm-svn: 250345
If we have a series of branches which are all unlikely to fail, we can possibly combine them into a single check on the fastpath combined with a bit of dispatch logic on the slowpath. We don't want to do this unconditionally since it requires speculating instructions past a branch, but if the profiling metadata on the branch indicates profitability, this can reduce the number of checks needed along the fast path.
The canonical example this is trying to handle is removing the second bounds check implied by the Java code: a[i] + a[i+1]. Note that it can currently only do so for really simple conditions and the values of a[i] can't be used anywhere except in the addition. (i.e. the load has to have been sunk already and not prevent speculation.) I plan on extending this transform over the next few days to handle alternate sequences.
Differential Revision: http://reviews.llvm.org/D13070
llvm-svn: 250343
Some background on why we don't have to use *coal* sections anymore:
Long ago when C++ was new and "weak" had not been standardized, an attempt was
made in cctools to support C++ inlines that can be coalesced by putting them
into their own section (TEXT/textcoal_nt instead of TEXT/text).
The current macho linker supports the weak-def bit on any symbol to allow it to
be coalesced, but the compiler still puts weak-def functions/data into alternate
section names, which the linker must map back to the base section name.
This patch makes changes that are necessary to prevent the compiler from using
the "coal" sections and have it use the non-coal sections instead when the
target architecture is not powerpc:
TEXT/textcoal_nt instead use TEXT/text
TEXT/const_coal instead use TEXT/const
DATA/datacoal_nt instead use DATA/data
If the target is powerpc, we continue to use the *coal* sections since anyone
targeting powerpc is probably using an old linker that doesn't have support for
the weak-def bits.
Also, have the assembler issue a warning if it encounters a *coal* section in
the assembly file and inform the users to use the non-coal sections instead.
rdar://problem/14265330
Differential Revision: http://reviews.llvm.org/D13188
llvm-svn: 250342
This is a cleaned up patch from the one written by John Regehr based on the findings of the Souper superoptimizer.
The basic idea here is that input bits that are known zero reduce the maximum count that the intrinsic could return. We know that the number of bits required to represent a particular count is at most log2(N)+1.
Differential Revision: http://reviews.llvm.org/D13253
llvm-svn: 250338
PR25157 identifies a bug where a load plus a vector shuffle is
incorrectly converted into an LXVDSX instruction. That optimization
is only valid if the load is of a doubleword, and in the noted case,
it was not. This corrects that problem.
Joint patch with Eric Schweitz, who provided the bugpoint-reduced test
case.
llvm-svn: 250324
This adds documentation for the binary profile encoding and moves the
documentation for the text encoding into the header file
SampleProfReader.h.
llvm-svn: 250309
Summary:
Caching SDLoc(N), instead of recreating it in every single
function call, keeps the code denser, and allows to unwrap long lines.
Reviewers: sunfish, atrick, sdmitrouk
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D13726
llvm-svn: 250305
Summary: The two implementations had more code in common than not.
Reviewers: sunfish, MatzeB, sdmitrouk
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D13724
llvm-svn: 250302
This patch teaches x86 fast-isel how to select nontemporal stores.
On x86, we can use MOVNTI for nontemporal stores of doublewords/quadwords.
Instructions (V)MOVNTPS/PD/DQ can be used for SSE2/AVX aligned nontemporal
vector stores.
Before this patch, fast-isel always selected 'movd/movq' instead of 'movnti'
for doubleword/quadword nontemporal stores. In the case of nontemporal stores
of aligned vectors, fast-isel always selected movaps/movapd/movdqa instead of
movntps/movntpd/movntdq.
With this patch, if we use SSE2/AVX intrinsics for nontemporal stores we now
always get the expected (V)MOVNT instructions.
The lack of fast-isel support for nontemporal stores was spotted when analyzing
the -O0 codegen for nontemporal stores.
Differential Revision: http://reviews.llvm.org/D13698
llvm-svn: 250285
Binary encoded profiles used to encode all function names inline at
every reference. This is clearly suboptimal in terms of space. This
patch fixes this by adding a name table to the header of the file.
llvm-svn: 250241
We forgot to append the terminatepad's arguments which resulted in us
treating the old terminatepad as an argument to the new terminatepad
causing us to crash immediately. Instead, add the old terminatepad's
arguments to the new terminatepad.
This fixes PR25155.
llvm-svn: 250234
ArchiveMemberHeader, suggestion by Rafael Espíndola.
Also The clang-x86-win2008-selfhost bot still does not like the
malformed-machos 00000031.a test, so removing it for now. All
the other bots are fine with it however.
llvm-svn: 250222
Summary:
Emit the handler and clause locations immediately after the standard
xdata.
Clauses are emitted in the same order and format used to communiate them
to the CLR Execution Engine.
Add a lit test to verify correct table generation on a small but
interesting example function.
Reviewers: majnemer, andrew.w.kaylor, rnk
Subscribers: pgavlin, AndyAyers, llvm-commits
Differential Revision: http://reviews.llvm.org/D13451
llvm-svn: 250219
One of the changes in lib/Target/AMDGPU/AMDGPUMCInstLower.cpp was a new
one. Previously, bundle iterators and single-instruction iterators
could be compared to each other (comparing on underlying pointers).
I changed a comparison from using `MBB->end()` to using
`MBB->instr_end()`, since both end iterators should point at the some
place anyway.
I don't think the implicit conversion between the two iterator types is
a good idea since it's fairly easy to accidentally compare to the wrong
thing (they aren't always end iterators). Otherwise I would have just
added the conversion.
Even with that, no there should be functionality change here.
llvm-svn: 250218
Remove remaining `ilist_iterator` implicit conversions from
LLVMScalarOpts.
This change exposed some scary behaviour in
lib/Transforms/Scalar/SCCP.cpp around line 1770. This patch changes a
call from `Function::begin()` to `&Function::front()`, since the return
was immediately being passed into another function that takes a
`Function*`. `Function::front()` started to assert, since the function
was empty. Note that `Function::end()` does not point at a legal
`Function*` -- it points at an `ilist_half_node` -- so the other
function was getting garbage before. (I added the missing check for
`Function::isDeclaration()`.)
Otherwise, no functionality change intended.
llvm-svn: 250211
Currently in JumpThreading pass, the branch weight metadata is not updated after CFG modification. Consider the jump threading on PredBB, BB, and SuccBB. After jump threading, the weight on BB->SuccBB should be adjusted as some of it is contributed by the edge PredBB->BB, which doesn't exist anymore. This patch tries to update the edge weight in metadata on BB->SuccBB by scaling it by 1 - Freq(PredBB->BB) / Freq(BB->SuccBB).
Differential revision: http://reviews.llvm.org/D10979
llvm-svn: 250204
On Linux, the profile runtime can use __start_SECTNAME and __stop_SECTNAME
symbols defined by the linker to locate the start and end location of
a named section (with C name). This eliminates the need for instrumented
binary to call __llvm_profile_register_function during start-up time.
llvm-svn: 250199
Summary:
Add an iterator that can walk across blocks and which visits the state
transitions rather than state ranges, with explicit transitions to -1
indicating the presence of top-level calls that may throw and cause the
current function to unwind to caller. This will simplify code that needs
to identify nested try regions.
Refactor SEH and C++EH table generation to use the new
InvokeStateChangeIterator, and remove the InvokeLabelIterator they were
using.
Reviewers: majnemer, andrew.w.kaylor, rnk
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D13623
llvm-svn: 250179
As discussed in D13348 - the INSERTQI range combining code is wrong in that it confuses the insertion bit index with an extraction bit index.
The remaining legal combines are very unlikely (especially once we've converted to shuffles in D13348) so I'm removing the optimization.
llvm-svn: 250160
Now that all the known faults with GlobalsAA have been fixed, flip the big switch on -enable-non-lto-gmr again.
Feel free to pester me with any more bugs found, and don't hesitate to flip the switch back off.
llvm-svn: 250157
Weak linkage and friends allow a symbol to be overriden outside the
code generator's model, so GlobalsAA shouldn't assume that anything it
can compute about such a symbol is valid.
llvm-svn: 250156
Now LLVMBitWriter compiles without implicit ilist iterator conversions.
In these cases, the cleanest thing was to switch to range-based for
loops. Since there wasn't much noise I converted sub-loops and parent
loops as a drive-by.
llvm-svn: 250144
In a later commit, `SplitBinaryAdd` will be used outside `IsConstDiff`,
so lift that out. And lift out `IsConstDiff` as
`computeConstantDifference` to keep things clean and to avoid playing
C++ access specifier games.
NFC.
llvm-svn: 250143
Continuing the work from last week to remove implicit ilist iterator
conversions. First related commit was probably r249767, with some more
motivation in r249925. This edition gets LLVMTransformUtils compiling
without the implicit conversions.
No functional change intended.
llvm-svn: 250142
The comment says this was stopped because it was unlikely to be
profitable. This is not true if you want to combine vector loads
with multiple components.
For a simple case that looks like
t0 = load t0 ...
t1 = load t0 ...
t2 = load t0 ...
t3 = load t0 ...
t4 = store t0:1, t0:1
t5 = store t4, t1:0
t6 = store t5, t2:0
t7 = store t6, t3:0
We want to get all of these stores onto a chain
that is a TokenFactor of these N loads. This mostly
solves the AMDGPU merge-stores.ll regressions
with -combiner-alias-analysis for merging vector
stores of vector loads.
llvm-svn: 250138
Summary:
D4796 taught LLVM to fold some atomic integer operations into a single
instruction. The pattern was unaware that the instructions clobbered
flags.
This patch adds the missing EFLAGS definition.
Floating point operations don't set flags, the subsequent fadd
optimization is therefore correct. The same applies for surrounding
load/store optimizations.
Reviewers: rsmith, rtrieu
Subscribers: llvm-commits, reames, morisset
Differential Revision: http://reviews.llvm.org/D13680
llvm-svn: 250135
This basic combine was surprisingly missing.
AMDGPU legalizes many operations in terms of 32-bit vector components,
so not doing this results in many extra copies and subregister extracts
that need to be cleaned up later.
InstCombine already does this for the hasOneUse case. The target hook
is to fix a handful of tests which break (e.g. ARM/vmov.ll) which turn
from a vector materialize repeated immediate instruction to a constant
vector load with more scalar copies from it.
llvm-svn: 250129
When lowering invoke statement, all unwind destinations are directly added as successors of call site block, and the weight of those new edges are not assigned properly. Actually, default weight 16 are used for those edges. This patch calculates the proper edge weights for those edges when collecting all unwind destinations.
Differential revision: http://reviews.llvm.org/D13354
llvm-svn: 250119
We have a number of functions that implement constant folding of vectors (unary and binary ops) in near identical manners (and the differences don't appear to be critical).
This patch introduces a common implementation (SelectionDAG::FoldConstantVectorArithmetic) and calls this in both the unary and binary op cases.
After this initial patch I intend to begin enabling vector constant folding for a wider number of opcodes in SelectionDAG::getNode().
Differential Revision: http://reviews.llvm.org/D13665
llvm-svn: 250118
that caused aborts. This was because of the characters of the ‘Size’ field in
the archive header did not contain decimal characters.
rdar://22983603
llvm-svn: 250117
In JumpThreading pass, the branch weight metadata is not updated after CFG modification. Consider the jump threading on PredBB, BB, and SuccBB. After jump threading, the weight on BB->SuccBB should be adjusted as some of it is contributed by the edge PredBB->BB, which doesn't exist anymore. This patch tries to update the edge weight in metadata on BB->SuccBB by scaling it by 1 - Freq(PredBB->BB) / Freq(BB->SuccBB).
Differential revision: http://reviews.llvm.org/D10979
llvm-svn: 250089
We made them SP relative back in March (r233137) because that's the
value the runtime passes to EH functions. With the new cleanuppad IR,
funclets adjust their frame argument from SP to FP, so our offsets
should now be FP-relative.
llvm-svn: 250088
Function LowerVSETCC (in X86ISelLowering.cpp) worked under the wrong
assumption that for non-AVX512 targets, the source type and destination type
of a type-legalized setcc node were always the same type.
This assumption was unfortunately incorrect; the type legalizer is not always
able to promote the return type of a setcc to the same type as the first
operand of a setcc.
In the case of a vsetcc node, the legalizer firstly checks if the first input
operand has a legal type. If so, then it promotes the return type of the vsetcc
to that same type. Otherwise, the return type is promoted to the 'next legal
type', which, for vectors of MVT::i1 is always a 128-bit integer vector type.
Example (-mattr=+avx):
%0 = trunc <8 x i32> %a to <8 x i23>
%1 = icmp eq <8 x i23> %0, zeroinitializer
The initial selection dag for the code above is:
v8i1 = setcc t5, t7, seteq:ch
t5: v8i23 = truncate t2
t2: v8i32,ch = CopyFromReg t0, Register:v8i32 %vreg1
t7: v8i32 = build_vector of all zeroes.
The type legalizer would firstly check if 't5' has a legal type. If so, then it
would reuse that same type to promote the return type of the setcc node.
Unfortunately 't5' is of illegal type v8i23, and therefore it cannot be used to
promote the return type of the setcc node. Consequently, the setcc return type
is promoted to v8i16. Later on, 't5' is promoted to v8i32 thus leading to the
following dag node:
v8i16 = setcc t32, t25, seteq:ch
where t32 and t25 are now values of type v8i32.
Before this patch, function LowerVSETCC would have wrongly expanded the setcc
to a single X86ISD::PCMPEQ. Surprisingly, ISel was still able to match an
instruction. In our case, ISel would have matched a VPCMPEQWrr:
t37: v8i16 = X86ISD::VPCMPEQWrr t36, t25
However, t36 and t25 are both VR256, while the result type is instead of class
VR128. This inconsistency ended up causing the insertion of COPY instructions
like this:
%vreg7<def> = COPY %vreg3; VR128:%vreg7 VR256:%vreg3
Which is an invalid full copy (not a sub register copy).
Eventually, the backend would have hit an UNREACHABLE "Cannot emit physreg copy
instruction" in the attempt to expand the malformed pseudo COPY instructions.
This patch fixes the problem adding the missing logic in LowerVSETCC to handle
the corner case of a setcc with 128-bit return type and 256-bit operand type.
This problem was originally reported by Dimitry as PR25080. It has been latent
for a very long time. I have added the minimal reproducible from that bugzilla
as test setcc-lowering.ll.
Differential Revision: http://reviews.llvm.org/D13660
llvm-svn: 250085
This was a minor bug in r249492. Calling PrepareEHLandingPad on a
non-landingpad was a no-op, but it attempted to get the generic pointer
register class, which apparently doesn't exist for some targets.
llvm-svn: 250068
On targets where f32 is not legal, we have to look through a BITCAST SDNode to
find the register that an argument is stored in when emitting debug info, or we
will not be able to emit a DW_AT_location for it.
Differential Revision: http://reviews.llvm.org/D13005
llvm-svn: 250056
On Windows, fs::rename() could fail is another process was reading the
file at the same time using fs::openFileForRead(). In most cases the user
wouldn't notice as fs::rename() will continue to retry for 2000ms. Typically
this is enough for the read to complete and a retry to succeed, but if the
disk is being it too hard then the response time might be longer than the
retry time and the rename would fail with a permission error.
Add FILE_SHARE_DELETE to the sharing flags for CreateFileW() in
fs::openFileForRead() and try ReplaceFileW() prior to MoveFileExW()
in fs::rename().
Based on an initial patch by Edd Dawson!
Differential Revision: http://reviews.llvm.org/D13647
llvm-svn: 250046
Summary:
This removes unnecessary instructions when extracting from an undefined register
and also fixes a crash for O32 when passing undef to a double argument in
held in integer registers.
Reviewers: vkalintiris
Subscribers: llvm-commits, zoran.jovanovic, petarj
Differential Revision: http://reviews.llvm.org/D13467
llvm-svn: 250039
GlobalOpt currently merges stores into the initialisers of internal,
externally_initialized globals, but should not do so as the value of the global
may change between the initialiser and any code in the module being run.
llvm-svn: 250035
The Swift Machine Scheduler Model is incomplete. There are instructions
missing which can trigger the "incomplete machine model" abort. This was
observed when a downstream SchedMachineModel was added to the ARM
target.
Patch by Christof Douma!
llvm-svn: 250033
C semantics force sub-int-sized values (e.g. i8, i16) to be promoted to int
type (e.g. i32) whenever arithmetic is performed on them.
For targets with native i8 or i16 operations, usually InstCombine can shrink
the arithmetic type down again. However InstCombine refuses to create illegal
types, so for targets without i8 or i16 registers, the lengthening and
shrinking remains.
Most SIMD ISAs (e.g. NEON) however support vectors of i8 or i16 even when
their scalar equivalents do not, so during vectorization it is important to
remove these lengthens and truncates when deciding the profitability of
vectorization.
The algorithm this uses starts at truncs and icmps, trawling their use-def
chains until they terminate or instructions outside the loop are found (or
unsafe instructions like inttoptr casts are found). If the use-def chains
starting from different root instructions (truncs/icmps) meet, they are
unioned. The demanded bits of each node in the graph are ORed together to form
an overall mask of the demanded bits in the entire graph. The minimum bitwidth
that graph can be truncated to is the bitwidth minus the number of leading
zeroes in the overall mask.
The intention is that this algorithm should "first do no harm", so it will
never insert extra cast instructions. This is why the use-def graphs are
unioned, so that subgraphs with different minimum bitwidths do not need casts
inserted between them.
This algorithm works hard to reduce compile time impact. DemandedBits are only
queried if there are extends of illegal types and if a truncate to an illegal
type is seen. In the general case, this results in a simple linear scan of the
instructions in the loop.
No non-noise compile time impact was seen on a clang bootstrap build.
llvm-svn: 250032
This patch fixes a problem in function 'combineX86ShuffleChain' that causes a
chain of shuffles to be wrongly folded away when the combined shuffle mask has
only one element.
We may end up with a combined shuffle mask of one element as a result of
multiple calls to function 'canWidenShuffleElements()'.
Function canWidenShuffleElements attempts to simplify a shuffle mask by widening
the size of the elements being shuffled.
For every pair of shuffle indices, function canWidenShuffleElements checks if
indices refer to adjacent elements. If all pairs refer to "adjacent" elements
then the shuffle mask is safely widened. As a consequence of widening, we end up
with a new shuffle mask which is half the size of the original shuffle mask.
The byte shuffle (pshufb) from test pr24562.ll has a mask of all SM_SentinelZero
indices. Function canWidenShuffleElements would combine each pair of
SM_SentinelZero indices into a single SM_SentinelZero index. So, in a
logarithmic number of steps (4 in this case), the pshufb mask is simplified to
a mask with only one index which is equal to SM_SentinelZero.
Before this patch, function combineX86ShuffleChain wrongly assumed that a mask
of size one is always equivalent to an identity mask. So, the entire shuffle
chain was just folded away as the combined shuffle mask was treated as a no-op
mask.
With this patch we know check if the only element of a combined shuffle mask is
SM_SentinelZero. In case, we propagate a zero vector.
Differential Revision: http://reviews.llvm.org/D13364
llvm-svn: 250027
This patch also allows the -delinearize pass to delinearize expressions that do
not have an outermost SCEVAddRec expression. The SCEV::delinearize
infrastructure allowed this since r240952, but the -delinearize pass was not
updated yet.
llvm-svn: 250018
The XOP vector integer comparisons can deal with all signed/unsigned comparison cases directly and can be easily commuted as well (D7646).
llvm-svn: 249976
Summary:
The change to use the VST function entries for lazy deserialization did
not handle the case of anonymous functions without aliases. In that case
we must fall back to scanning the function blocks as there is no VST
entry.
Reviewers: dexonsmith, joker.eph, davidxl
Subscribers: tstellarAMD, llvm-commits
Differential Revision: http://reviews.llvm.org/D13596
llvm-svn: 249947
expandPostRAPseudo():
STX -> 2 * STD: The first STD should not have the kill flag set for the address.
SystemZElimCompare:
BRC -> BRCT conversion: Don't forget to remove the CC<use,kill> operand.
Needed to make SystemZ/asm-17.ll pass with -verify-machineinstrs, which
now runs with this flag.
Reviewed by Ulrich Weigand.
llvm-svn: 249945
Summary:
Rather than just iterating over all sections and checking whether we have relocations for them, iterate over the relocation map instead. This showed up heavily in an artificial julia benchmark that does lots of compilation. On that particular benchmark, this patch gives
~15% performance improvements. As far as I can tell the primary reason why the original
loop was so expensive is that Relocations[i] actually constructs a relocationList (allocating memory & doing lots of other unnecessary computing) if none is found.
Reviewers: lhames
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D13545
llvm-svn: 249942
Remove implicit ilist iterator conversions from LLVMAnalysis.
I came across something really scary in `llvm::isKnownNotFullPoison()`
which relied on `Instruction::getNextNode()` being completely broken
(not surprising, but scary nevertheless). This function is documented
(and coded to) return `nullptr` when it gets to the sentinel, but with
an `ilist_half_node` as a sentinel, the sentinel check looks into some
other memory and we don't recognize we've hit the end.
Rooting out these scary cases is the reason I'm removing the implicit
conversions before doing anything else with `ilist`; I'm not at all
surprised that clients rely on badness.
I found another scary case -- this time, not relying on badness, just
bad (but I guess getting lucky so far) -- in
`ObjectSizeOffsetEvaluator::compute_()`. Here, we save out the
insertion point, do some things, and then restore it. Previously, we
let the iterator auto-convert to `Instruction*`, and then set it back
using the `Instruction*` version:
Instruction *PrevInsertPoint = Builder.GetInsertPoint();
/* Logic that may change insert point */
if (PrevInsertPoint)
Builder.SetInsertPoint(PrevInsertPoint);
The check for `PrevInsertPoint` doesn't protect correctly against bad
accesses. If the insertion point has been set to the end of a basic
block (i.e., `SetInsertPoint(SomeBB)`), then `GetInsertPoint()` returns
an iterator pointing at the list sentinel. The version of
`SetInsertPoint()` that's getting called will then call
`PrevInsertPoint->getParent()`, which explodes horribly. The only
reason this hasn't blown up is that it's fairly unlikely the builder is
adding to the end of the block; usually, we're adding instructions
somewhere before the terminator.
llvm-svn: 249925
The new implementation works at least as well as the old implementation
did.
Also delete the associated preparation tests. They don't exercise
interesting corner cases of the new implementation. All the codegen
tests of the EH tables have already been ported.
llvm-svn: 249918
Also Fix a buglet where SEH tables had ranges that spanned funclets.
The remaining tests using the old landingpad IR are preparation tests,
and will be deleted along with the old preparation.
llvm-svn: 249917