This patch stops the implicit creation of comdats during codegen.
Clang now sets the comdat explicitly when it is required. With this patch clang and gcc
now produce the same result in pr19848.
llvm-svn: 226038
Some benchmarks have shown that this could lead to a potential
performance benefit, and so adding some flags to try to help measure the
difference.
A possible explanation. In diamond-shaped CFGs (A followed by either
B or C both followed by D), putting B and C both in between A and
D leads to the code being less dense than it could be. Always either
B or C have to be skipped increasing the chance of cache misses etc.
Moving either B or C to after D might be beneficial on average.
In the long run, but we should probably do a better job of analyzing the
basic block and branch probabilities to move the correct one of B or
C to after D. But even if we don't use this in the long run, it is
a good baseline for benchmarking.
Original patch authored by Daniel Jasper with test tweaks and a second
flag added by me.
Differential Revision: http://reviews.llvm.org/D6969
llvm-svn: 226034
A pass that adds random noops to X86 binaries to introduce diversity with the goal of increasing security against most return-oriented programming attacks.
Command line options:
-noop-insertion // Enable noop insertion.
-noop-insertion-percentage=X // X% of assembly instructions will have a noop prepended (default: 50%, requires -noop-insertion)
-max-noops-per-instruction=X // Randomly generate X noops per instruction. ie. roll the dice X times with probability set above (default: 1). This doesn't guarantee X noop instructions.
In addition, the following 'quick switch' in clang enables basic diversity using default settings (currently: noop insertion and schedule randomization; it is intended to be extended in the future).
-fdiversify
This is the llvm part of the patch.
clang part: D3393
http://reviews.llvm.org/D3392
Patch by Stephen Crane (@rinon)
llvm-svn: 225908
This adds handling for ExceptionHandling::MSVC, used by the
x86_64-pc-windows-msvc triple. It assumes that filter functions have
already been outlined in either the frontend or the backend. Filter
functions are used in place of the landingpad catch clause type info
operands. In catch clause order, the first filter to return true will
catch the exception.
The C specific handler table expects the landing pad to be split into
one block per handler, but LLVM IR uses a single landing pad for all
possible unwind actions. This patch papers over the mismatch by
synthesizing single instruction BBs for every catch clause to fill in
the EH selector that the landing pad block expects.
Missing functionality:
- Accessing data in the parent frame from outlined filters
- Cleanups (from __finally) are unsupported, as they will require
outlining and parent frame access
- Filter clauses are unsupported, as there's no clear analogue in SEH
In other words, this is the minimal set of changes needed to write IR to
catch arbitrary exceptions and resume normal execution.
Reviewers: majnemer
Differential Revision: http://reviews.llvm.org/D6300
llvm-svn: 225904
This now handles both 32 and 64-bit element sizes.
In this version, the test are in vector-shuffle-512-v8.ll, canonicalized by
Chandler's update_llc_test_checks.py.
Part of <rdar://problem/17688758>
llvm-svn: 225838
This name is less descriptive, but it sort of puts things in the
'llvm.frame...' namespace, relating it to frameallocate and
frameaddress. It also avoids using "allocate" and "allocation" together.
llvm-svn: 225752
These intrinsics allow multiple functions to share a single stack
allocation from one function's call frame. The function with the
allocation may only perform one allocation, and it must be in the entry
block.
Functions accessing the allocation call llvm.recoverframeallocation with
the function whose frame they are accessing and a frame pointer from an
active call frame of that function.
These intrinsics are very difficult to inline correctly, so the
intention is that they be introduced rarely, or at least very late
during EH preparation.
Reviewers: echristo, andrew.w.kaylor
Differential Revision: http://reviews.llvm.org/D6493
llvm-svn: 225746
Otherwise we'll attempt to forward ECX, EDX, and EAX for cdecl and
stdcall thunks, leaving us with no scratch registers for indirect call
targets.
Fixes PR22052.
llvm-svn: 225729
This happens in the HINT benchmark, where the SLP-vectorizer created
v2f32 fcmp/select code. The "correct" solution would have been to
teach the vectorizer cost model that v2f32 isn't legal (because really,
it isn't), but if we can vectorize we might as well do so.
We legalize these v2f32 FMIN/FMAX nodes by widening to v4f32 later on.
v3f32 were already widened to v4f32 by the generic unroll-and-build-vector
legalization.
rdar://15763436
Differential Revision: http://reviews.llvm.org/D6557
llvm-svn: 225691
It's possible for the constant pool entry for the shuffle mask to come
from a completely different operation. This occurs when Constants have
the same bit pattern but have different types.
Make DecodePSHUFBMask tolerant of types which, after a bitcast, are
appropriately sized vector types.
This fixes PR22188.
llvm-svn: 225597
Teach the ISelLowering for X86 about the L,M,O target specific constraints.
Although, for the moment, clang performs constraint validation and prevents
passing along inline asm which may have immediate constant constraints violated,
the backend should be able to cope with the invalid inline asm a bit better.
llvm-svn: 225596
In the current code we only attempt to match against insertps if we have exactly one element from the second input vector, irrespective of how much of the shuffle result is zeroable.
This patch checks to see if there is a single non-zeroable element from either input that requires insertion. It also supports matching of cases where only one of the inputs need to be referenced.
We also split insertps shuffle matching off into a new lowerVectorShuffleAsInsertPS function.
Differential Revision: http://reviews.llvm.org/D6879
llvm-svn: 225589
pshufb can shuffle in zero bytes as well as bytes from a source vector - we can use this to avoid having to shuffle 2 vectors and ORing the result when the used inputs from a vector are all zeroable.
Differential Revision: http://reviews.llvm.org/D6878
llvm-svn: 225551
The code that eliminated additional coalescable copies in
removeCopyByCommutingDef() used MergeValueNumberInto() which internally
may merge A into B or B into A. In this case A and B had different Def
points, so we have to reset ValNo.Def to the intended one after merging.
llvm-svn: 225503
The call lowering assumes that if the callee is a global, we want to emit a direct call.
This is correct for regular globals, but not for TLS ones.
Differential Revision: http://reviews.llvm.org/D6862
llvm-svn: 225438
A broken hint is a copy where both ends are assigned different colors. When a
variable gets evicted in the neighborhood of such copies, it is likely we can
reconcile some of them.
** Context **
Copies are inserted during the register allocation via splitting. These split
points are required to relax the constraints on the allocation problem. When
such a point is inserted, both ends of the copy would not share the same color
with respect to the current allocation problem. When variables get evicted,
the allocation problem becomes different and some split point may not be
required anymore. However, the related variables may already have been colored.
This usually shows up in the assembly with pattern like this:
def A
...
save A to B
def A
use A
restore A from B
...
use B
Whereas we could simply have done:
def B
...
def A
use A
...
use B
** Proposed Solution **
A variable having a broken hint is marked for late recoloring if and only if
selecting a register for it evict another variable. Indeed, if no eviction
happens this is pointless to look for recoloring opportunities as it means the
situation was the same as the initial allocation problem where we had to break
the hint.
Finally, when everything has been allocated, we look for recoloring
opportunities for all the identified candidates.
The recoloring is performed very late to rely on accurate copy cost (all
involved variables are allocated).
The recoloring is simple unlike the last change recoloring. It propagates the
color of the broken hint to all its copy-related variables. If the color is
available for them, the recoloring uses it, otherwise it gives up on that hint
even if a more complex coloring would have worked.
The recoloring happens only if it is profitable. The profitability is evaluated
using the expected frequency of the copies of the currently recolored variable
with a) its current color and b) with the target color. If a) is greater or
equal than b), then it is profitable and the recoloring happen.
** Example **
Consider the following example:
BB1:
a =
b =
BB2:
...
= b
= a
Let us assume b gets split:
BB1:
a =
b =
BB2:
c = b
...
d = c
= d
= a
Because of how the allocation work, b, c, and d may be assigned different
colors. Now, if a gets evicted to make room for c, assuming b and d were
assigned to something different than a.
We end up with:
BB1:
a =
st a, SpillSlot
b =
BB2:
c = b
...
d = c
= d
e = ld SpillSlot
= e
This is likely that we can assign the same register for b, c, and d,
getting rid of 2 copies.
** Performances **
Both ARM64 and x86_64 show performance improvements of up to 3% for the
llvm-testsuite + externals with Os and O3. There are a few regressions too that
comes from the (in)accuracy of the block frequency estimate.
<rdar://problem/18312047>
llvm-svn: 225422
Patch by: Ramkumar Ramachandra <artagnon@gmail.com>
"This patch started out as an exploration of gc.relocate, and an attempt
to write a simple test in call-lowering. I then noticed that the
arguments of gc.relocate were not checked fully, so I went in and fixed
a few things. Finally, the most important outcome of this patch is that
my new error handling code caught a bug in a callsite in
stackmap-format."
Differential Revision: http://reviews.llvm.org/D6824
llvm-svn: 225412
This change includes the most basic possible GCStrategy for a GC which is using the statepoint lowering code. At the moment, this GCStrategy doesn't really do much - aside from actually generate correct stackmaps that is - but I went ahead and added a few extra correctness checks as proof of concept. It's mostly here to provide documentation on how to do one, and to provide a point for various optimization legality hooks I'd like to add going forward. (For context, see the TODOs in InstCombine around gc.relocate.)
Most of the validation logic added here as proof of concept will soon move in to the Verifier. That move is dependent on http://reviews.llvm.org/D6811
There was discussion in the review thread about addrspace(1) being reserved for something. I'm going to follow up on a seperate llvmdev thread. If needed, I'll update all the code at once.
Note that I am deliberately not making a GCStrategy required to use gc.statepoints with this change. I want to give folks out of tree - including myself - a chance to migrate. In a week or two, I'll make having a GCStrategy be required for gc.statepoints. To this end, I added the gc tag to one of the test cases but not others.
Differential Revision: http://reviews.llvm.org/D6808
llvm-svn: 225365
LLVM emits stack probes on Windows targets to ensure that the stack is
correctly accessed. However, the amount of stack allocated before
emitting such a probe is hardcoded to 4096.
It is desirable to have this be configurable so that a function might
opt-out of stack probes. Our level of granularity is at the function
level instead of, say, the module level to permit proper generation of
code after LTO.
Patch by Andrew H!
N.B. The inliner needs to be updated to properly consider what happens
after inlining a function with a specific stack-probe-size into another
function with a different stack-probe-size.
llvm-svn: 225360
In order to make comdats always explicit in the IR, we decided to make
the syntax a bit more compact for the case of a GlobalObject in a
comdat with the same name.
Just dropping the $name causes problems for
@foo = globabl i32 0, comdat
$bar = comdat ...
and
declare void @foo() comdat
$bar = comdat ...
So the syntax is changed to
@g1 = globabl i32 0, comdat($c1)
@g2 = globabl i32 0, comdat
and
declare void @foo() comdat($c1)
declare void @foo() comdat
llvm-svn: 225302
This patch improves the logic added at revision 224899 (see review D6728) that
teaches the backend when it is profitable to speculate calls to cttz/ctlz.
The original algorithm conservatively avoided speculating more than one
instruction from a basic block in a control flow grap modelling an if-statement.
In particular, the only allowed instruction (excluding the terminator) was a
call to cttz/ctlz. However, there are cases where we could be less conservative
and still be able to speculate a call to cttz/ctlz.
With this patch, CodeGenPrepare now tries to speculate a cttz/ctlz if the
result is zero extended/truncated in the same basic block, and the zext/trunc
instruction is "free" for the target.
Added new test cases to CodeGen/X86/cttz-ctlz.ll
Differential Revision: http://reviews.llvm.org/D6853
llvm-svn: 225274
"ELF Handling for Thread-Local Storage" specifies that R_X86_64_GOTTPOFF
relocation target a movq or addq instruction.
Prohibit the truncation of such loads to movl or addl.
This fixes PR22083.
Differential Revision: http://reviews.llvm.org/D6839
llvm-svn: 225250
Under the large code model, we cannot assume that __morestack lives within
2^31 bytes of the call site, so we cannot use pc-relative addressing. We
cannot perform the call via a temporary register, as the rax register may
be used to store the static chain, and all other suitable registers may be
either callee-save or used for parameter passing. We cannot use the stack
at this point either because __morestack manipulates the stack directly.
To avoid these issues, perform an indirect call via a read-only memory
location containing the address.
This solution is not perfect, as it assumes that the .rodata section
is laid out within 2^31 bytes of each function body, but this seems to
be sufficient for JIT.
Differential Revision: http://reviews.llvm.org/D6787
llvm-svn: 225003
These are simply a collection of tests intended to show that information about the contents of gc references in the heap is lost at a statepoint. I've tried to write them so that they don't disallow correct transformations, while still being fairly easy to understand.
p.s. Ideas for additional tests are welcome.
Differential Revision: http://reviews.llvm.org/D6491
llvm-svn: 224971
The else case ResultReg was not checked for validity.
To my surprise, this case was not hit in any of the
existing test cases. This includes a new test cases
that tests this path.
Also drop the `target triple` declaration from the
original test as suggested by H.J. Lu, because
apparently with it the test won't be run on Linux
llvm-svn: 224901
If the control flow is modelling an if-statement where the only instruction in
the 'then' basic block (excluding the terminator) is a call to cttz/ctlz,
CodeGenPrepare can try to speculate the cttz/ctlz call and simplify the control
flow graph.
Example:
\code
entry:
%cmp = icmp eq i64 %val, 0
br i1 %cmp, label %end.bb, label %then.bb
then.bb:
%c = tail call i64 @llvm.cttz.i64(i64 %val, i1 true)
br label %end.bb
end.bb:
%cond = phi i64 [ %c, %then.bb ], [ 64, %entry]
\code
In this example, basic block %then.bb is taken if value %val is not zero.
Also, the phi node in %end.bb would propagate the size-of in bits of %val
only if %val is equal to zero.
With this patch, CodeGenPrepare will try to hoist the call to cttz from %then.bb
into basic block %entry only if cttz is cheap to speculate for the target.
Added two new hooks in TargetLowering.h to let targets customize the behavior
(i.e. decide whether it is cheap or not to speculate calls to cttz/ctlz). The
two new methods are 'isCheapToSpeculateCtlz' and 'isCheapToSpeculateCttz'.
By default, both methods return 'false'.
On X86, method 'isCheapToSpeculateCtlz' returns true only if the target has
LZCNT. Method 'isCheapToSpeculateCttz' only returns true if the target has BMI.
Differential Revision: http://reviews.llvm.org/D6728
llvm-svn: 224899
Masked vector intrinsics are a part of common LLVM IR, but they are really supported on AVX2 and AVX-512 targets. I added a code that translates masked intrinsic for all other targets. The masked vector intrinsic is converted to a chain of scalar operations inside conditional basic blocks.
http://reviews.llvm.org/D6436
llvm-svn: 224897
Summary:
Consider the following IR:
%3 = load i8* undef
%4 = trunc i8 %3 to i1
%5 = call %jl_value_t.0* @foo(..., i1 %4, ...)
ret %jl_value_t.0* %5
Bools (that are the result of direct truncs) are lowered as whatever
the argument to the trunc was and a "and 1", causing the part of the
MBB responsible for this argument to look something like this:
%vreg8<def,tied1> = AND8ri %vreg7<kill,tied0>, 1, %EFLAGS<imp-def>; GR8:%vreg8,%vreg7
Later, when the load is lowered, it will insert
%vreg15<def> = MOV8rm %vreg14, 1, %noreg, 0, %noreg; mem:LD1[undef] GR8:%vreg15 GR64:%vreg14
but remember to (at the end of isel) replace vreg7 by vreg15. Now for
the bug. In fast isel lowering, we mistakenly mark vreg8 as the result
of the load instead of the trunc. This adds a fixup to have
vreg8 replaced by whatever the result of the load is as well, so
we end up with
%vreg15<def,tied1> = AND8ri %vreg15<kill,tied0>, 1, %EFLAGS<imp-def>; GR8:%vreg15
which is an SSA violation and causes problems later down the road.
This fixes PR21557.
Test Plan: Test test case from PR21557 is added to the test suite.
Reviewers: ributzka
Reviewed By: ributzka
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D6245
llvm-svn: 224884