test/CodeGen/X86/shadow-stack.ll has the following machine verifier
errors:
```
*** Bad machine code: Using a killed virtual register ***
- function: bar
- basic block: %bb.6 entry (0x7fdc81857818)
- instruction: %3:gr64 = MOV64rm killed %2:gr64, 1, $noreg, 8, $noreg
- operand 1: killed %2:gr64
*** Bad machine code: Using a killed virtual register ***
- function: bar
- basic block: %bb.6 entry (0x7fdc81857818)
- instruction: $rsp = MOV64rm killed %2:gr64, 1, $noreg, 16, $noreg
- operand 1: killed %2:gr64
*** Bad machine code: Virtual register killed in block, but needed live out. ***
- function: bar
- basic block: %bb.2 entry (0x7fdc818574f8)
Virtual register %2 is used after the block.
```
The fix here is to only copy the machine operand's register without the
kill flags for all the instructions except the very last one of the
sequence.
I had to insert dummy PHIs in the test case to force the NoPHI function
property to be set to false. More on this here: https://llvm.org/PR38439
Differential Revision: https://reviews.llvm.org/D50260
llvm-svn: 340033
NewGVN uses InstructionSimplify for simplifications of leaders of
congruence classes. It is not guaranteed that the metadata or other
flags/keywords (like nsw or exact) of the leader is available for all members
in a congruence class, so we cannot use it for simplification.
This patch adds a InstrInfoQuery struct with a boolean field
UseInstrInfo (which defaults to true to keep the current behavior as
default) and a set of helper methods to get metadata/keywords for a
given instruction, if UseInstrInfo is true. The whole thing might need a
better name, to avoid confusion with TargetInstrInfo but I am not sure
what a better name would be.
The current patch threads through InstrInfoQuery to the required
places, which is messier then it would need to be, if
InstructionSimplify and ValueTracking would share the same Query struct.
The reason I added it as a separate struct is that it can be shared
between InstructionSimplify and ValueTracking's query objects. Also,
some places do not need a full query object, just the InstrInfoQuery.
It also updates some interfaces that do not take a Query object, but a
set of optional parameters to take an additional boolean UseInstrInfo.
See https://bugs.llvm.org/show_bug.cgi?id=37540.
Reviewers: dberlin, davide, efriedma, sebpop, hiraditya
Reviewed By: hiraditya
Differential Revision: https://reviews.llvm.org/D47143
llvm-svn: 340031
This patch performs a widening transformation of bitwise atomicrmw
{or,xor,and} and applies it prior to tryExpandAtomicRMW. This operates
similarly to convertCmpXchgToIntegerType. For these operations, the i8/i16
atomicrmw can be implemented in terms of the 32-bit atomicrmw by appropriately
manipulating the operands. There is no functional change for the handling of
partword or/xor, but the transformation for partword 'and' is new.
The advantage of performing this transformation early is that the same
code-path can be used regardless of the approach used to expand the atomicrmw
(AtomicExpansionKind). i.e. the same logic is used for
AtomicExpansionKind::CmpXchg and can also be used by the intrinsic-based
expansion in D47882.
Differential Revision: https://reviews.llvm.org/D48129
llvm-svn: 340027
Summary:
Currently, in LICM, we use the alias set tracker to identify if the
instruction (we're interested in hoisting) aliases with instruction that
modifies that memory location.
This patch adds an LICM alias analysis diagnostic tool that checks the
mod ref info of the instruction we are interested in hoisting/sinking,
with every instruction in the loop. Because of O(N^2) complexity this
is now only a diagnostic tool to show the limitation we have with the
alias set tracker and is OFF by default.
Test cases show the difference with the diagnostic analysis tool, where
we're able to hoist out loads and readonly + argmemonly calls from the
loop, where the alias set tracker analysis is not able to hoist these
instructions out.
Reviewers: reames, mkazantsev, fedor.sergeev, hfinkel
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D50854
llvm-svn: 340026
This function is not virtual, it is private and it is not called anywhere. No
regression is introduced by removing it.
I think we can safely remove it.
Differential Revision: https://reviews.llvm.org/D50836
llvm-svn: 340024
This is another step towards being able to canonicalize to the funnel shift
intrinsics in IR (see D49242 for the initial patch).
We should not have any loss of simplification power in IR between these and
the equivalent IR constructs.
Differential Revision: https://reviews.llvm.org/D50848
llvm-svn: 340022
- Generate pointer authentication instructions
- The functions instrumented depend on function attribtues:
all (all functions instrumentent)
non-leaf (only those that spill LR)
none
- Function epilogues sign the LR before spilling to the stack and authenticate
the LR once restored
- If the target is v8.3a or greater than can use the combined authenticate and
return instruction
Differential revision: https://reviews.llvm.org/D49793
llvm-svn: 340018
Add a DAG combine for the PowerPC code generator to generate the Power9 extswsli
extend sign and shift immediate instruction.
Patch by RolandF.
Differential revision: https://reviews.llvm.org/D49879
llvm-svn: 340016
This commit fixes a (gcc 7.3.0) [-Wunused-function] warning caused by the
presence of unused method FaddCombine::createFDiv().
The last use of that method was removed at r339519.
llvm-svn: 340014
Add +fp16fml feature for new FP16 instructions, which are a
mandatory part of FP16 from v8.4-A and an optional part of FP16
from v8.2-A. It doesn't seem to be possible to model this in
LLVM, but the relationship between the options is handled by
the related clang patch.
In keeping with what I think is the usual practice, the fp16fml
extension is accepted regardless of base architecture version.
Builds on/replaces Sjoerd Meijer's patch to add these instructions at
https://reviews.llvm.org/D49839.
Differential Revision: https://reviews.llvm.org/D50228
llvm-svn: 340013
Add support for cases where only some c1+c2 results exceed the max bitshift, clamping accordingly.
Differential Revision: https://reviews.llvm.org/D35722
llvm-svn: 340010
Summary:
Looking at the callee argument list, as is done now, might not work if
the function has been typecasted into one that is expected to return
a struct. This change also simplifies the code.
The isFP128ABICall() function can be removed as it is no longer needed.
The test in fp128.ll has been updated to verify this.
Reviewers: jyknight, venkatra
Reviewed By: jyknight
Subscribers: fedor.sergeev, jrtc27, llvm-commits
Differential Revision: https://reviews.llvm.org/D48117
llvm-svn: 340008
Summary: When @llvm.returnaddress is called with a value higher than 0
it needs to read from the call stack to get the return address. This
means that the register windows needs to be flushed to the stack to
guarantee that the data read is valid. For values higher than 1 this
is done indirectly by the call to getFRAMEADDR(), but not for the value 1.
Reviewers: jyknight, venkatra
Reviewed By: jyknight
Subscribers: fedor.sergeev, jrtc27, llvm-commits
Differential Revision: https://reviews.llvm.org/D48636
llvm-svn: 340003
The description of `isGuaranteedToExecute` does not correspond to its implementation.
According to description, it should return `true` if an instruction is executed under the
assumption that its loop is *entered*. However there is a sophisticated alrogithm inside
that tries to prove that the instruction is executed if the loop is *exited*, which is not the
same thing for infinite loops. There is an attempt to protect from dealing with infinite loops
by prohibiting loops without exit blocks, however an infinite loop can have exit blocks.
As result of that, MustExecute can falsely consider some blocks that are never entered as
mustexec, and LICM can hoist dangerous instructions out of them basing on this fact.
This may introduce UB to programs which did not contain it initially.
This patch removes the problematic algorithm and replaced it with a one which tries to
prove what is required in description.
Differential Revision: https://reviews.llvm.org/D50558
Reviewed By: reames
llvm-svn: 339984
Summary:
Formerly, all timer groups were automatically cleared when printed out. In
https://reviews.llvm.org/rL324788 this behaviour was changed to not-clearing
timers on printout, to allow printing timers more than once, but as a result
clients (specifically Swift) that relied on the clear-on-print behaviour to
inhibit duplicate timer printing on shutdown were broken.
Rather than revert that change, this change adds a new API that enables
clients that _want_ to clear all timers to do so explicitly.
Reviewers: george.karpenkov, thegameg
Reviewed By: george.karpenkov
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D50874
llvm-svn: 339980
https://reviews.llvm.org/D50401
Add opcodes for llvm.intrinsic.trunc, round, and update the IRTranslator
for the same.
Reviewed by: dsanders.
llvm-svn: 339977
Summary:
This adds support for exception handling to CFGStackify pass. This only
adds TRY / END_TRY markers and DOES NOT yet fix unwind mismatches that
can be created by the linearization of the CFG into the structural wasm
format. The mismatch fix will be added by following patches.
In detail, this patch
- Added support for TRY / END_TRY markers to support EH
- Changed many static functions into class member functions as they take
too many arguments now
- Added several more bookeeping data structures
- Refactored routines that decide where to insert markers, because
without refactoring this got too complicated as we added support for new
kinds of markers (TRY/END_TRY).
- Rewrote rethrow instructions' BB arguments to relative depths in EH
pad stack.
Reviewers: dschuff, sunfish
Subscribers: sbc100, jgravelle-google, llvm-commits
Differential Revision: https://reviews.llvm.org/D48273
llvm-svn: 339967
well as MIR parsing support for `MCSymbol` `MachineOperand`s.
The only real way to test pre- and post-instruction symbol support is to
use them in operands, so I ended up implementing that within the patch
as well. I can split out the operand support if folks really want but it
doesn't really seem worth it.
The functional implementation of pre- and post-instruction symbols is
now *completely trivial*. Two tiny bits of code in the (misnamed)
AsmPrinter. It should be completely target independent as well. We emit
these exactly the same way as we emit basic block labels. Most of the
code here is to give full dumping, MIR printing, and MIR parsing support
so that we can write useful tests.
The MIR parsing of MC symbol operands still isn't 100%, as it forces the
symbols to be non-temporary and non-local symbols with names. However,
those names often can encode most (if not all) of the special semantics
desired, and unnamed symbols seem especially annoying to serialize and
de-serialize. While this isn't perfect or full support, it seems plenty
to write tests that exercise usage of these kinds of operands.
The MIR support for pre-and post-instruction symbols was quite
straightforward. I chose to print them out in an as-if-operand syntax
similar to debug locations as this seemed the cleanest way and let me
use nice introducer tokens rather than inventing more magic punctuation
like we use for memoperands.
However, supporting MIR-based parsing of these symbols caused me to
change the design of the symbol support to allow setting arbitrary
symbols. Without this, I don't see any reasonable way to test things
with MIR.
Differential Revision: https://reviews.llvm.org/D50833
llvm-svn: 339962
This is a follow-up suggested with rL339604.
For tan(), we don't have a corresponding LLVM
intrinsic -- unlike sin/cos -- so this is the
only way/place that we can do this fold currently.
llvm-svn: 339958
Thread sanitizer instrumentation fails to skip all loads and stores to
profile counters. This can happen if profile counter updates are merged:
%.sink = phi i64* ...
%pgocount5 = load i64, i64* %.sink
%27 = add i64 %pgocount5, 1
%28 = bitcast i64* %.sink to i8*
call void @__tsan_write8(i8* %28)
store i64 %27, i64* %.sink
To suppress TSan diagnostics about racy counter updates, make the
counter updates atomic when TSan is enabled. If there's general interest
in this mode it can be surfaced as a clang/swift driver option.
Testing: check-{llvm,clang,profile}
rdar://40477803
Differential Revision: https://reviews.llvm.org/D50867
llvm-svn: 339955
Summary:
Add the posibility of creating a new DT using a set of Updates.
This will essentially create a DT based on a CFG snapshot/view.
Additional refactoring for either this patch or follow-ups:
- create an utility for building BUI.
- replace BUI with a GraphDiff.
Reviewers: kuhar
Subscribers: sanjoy, jlebar, llvm-commits
Differential Revision: https://reviews.llvm.org/D50671
llvm-svn: 339947
When nodes are reassociated the vector-reduction flag gets lost.
The test case is here is what would happen if you had a sum of absolute differences loop that started with a non-zero but contant sum and that loop was unrolled. The vectorizer will generate a constant vector for the initial value. And DAGCombiner reassociate tries to move it down the addition tree erasing the vector-reduction flag. Interestingly this moves constants the opposite direction of the reassociate IR pass.
I've chosen to just punt on the reassociate, but I suppose we could maybe preserve the flag if both nodes have it set.
Differential Revision: https://reviews.llvm.org/D50827
llvm-svn: 339946
Normally the peephole pass converts EXTRACT_SUBREG to COPY instructions. But we're after peephole so we can't rely on it to clean these up.
To fix this, the eflags pass now emits a COPY with a subreg input.
I also noticed that in 32-bit mode we need to constrain the input to the copy to ensure the subreg is valid. Otherwise we'll fail verify-machineinstrs
Differential Revision: https://reviews.llvm.org/D50656
llvm-svn: 339945
Handle the case when the symbol is private. Private symbols are not in
the COFF object file symbol table, so they aren't inserted into
SymbolMap. We can't look up the section of the symbol that way. Instead,
get the MCSection from the MCSymbol and map that to the object file
section.
Print a better error message when the symbol has no section, like when
the symbol is undefined.
Fixes PR38607
llvm-svn: 339942
a generically extensible collection of extra info attached to
a `MachineInstr`.
The primary change here is cleaning up the APIs used for setting and
manipulating the `MachineMemOperand` pointer arrays so chat we can
change how they are allocated.
Then we introduce an extra info object that using the trailing object
pattern to attach some number of MMOs but also other extra info. The
design of this is specifically so that this extra info has a fixed
necessary cost (the header tracking what extra info is included) and
everything else can be tail allocated. This pattern works especially
well with a `BumpPtrAllocator` which we use here.
I've also added the basic scaffolding for putting interesting pointers
into this, namely pre- and post-instruction symbols. These aren't used
anywhere yet, they're just there to ensure I've actually gotten the data
structure types correct. I'll flesh out support for these in
a subsequent patch (MIR dumping, parsing, the works).
Finally, I've included an optimization where we store any single pointer
inline in the `MachineInstr` to avoid the allocation overhead. This is
expected to be the overwhelmingly most common case and so should avoid
any memory usage growth due to slightly less clever / dense allocation
when dealing with >1 MMO. This did require several ergonomic
improvements to the `PointerSumType` to reasonably support the various
usage models.
This also has a side effect of freeing up 8 bits within the
`MachineInstr` which could be repurposed for something else.
The suggested direction here came largely from Hal Finkel. I hope it was
worth it. ;] It does hopefully clear a path for subsequent extensions
w/o nearly as much leg work. Lots of thanks to Reid and Justin for
careful reviews and ideas about how to do all of this.
Differential Revision: https://reviews.llvm.org/D50701
llvm-svn: 339940
In cases where the debugger load time is a worthwhile tradeoff (or less
costly - such as loading from a DWP instead of a variety of DWOs
(possibly over a high-latency/distributed filesystem)) against object
file size, it can be reasonable to disable pubnames and corresponding
gdb-index creation in the linker.
A backend-flag version of this was implemented for NVPTX in
D44385/r327994 - which was fine for NVPTX which wouldn't mix-and-match
CUs. Now that it's going to be a user-facing option (likely powered by
"-gno-pubnames", the same as GCC) it should be encoded in the
DICompileUnit so it can vary per-CU.
After this, likely the NVPTX support should be migrated to the metadata
& the previous flag implementation should be removed.
Reviewers: aprantl
Differential Revision: https://reviews.llvm.org/D50213
llvm-svn: 339939
The fix is fairly simple, but is says something unpleasant about the usage and testing of invariant.start/end scopes that this went undetected. To put this in perspective, *any* invariant.end in a loop flowing through LICM crashed. I haven't bothered to figure out just how far back this goes, but it's not caused by any of the recent changes. We're probably talking months if not years.
llvm-svn: 339936
Main value is just simplifying code. I'll further simply the argument handling case in a bit, but that involved a slightly orthogonal change so I went with the mildy ugly intermediate for this patch.
Note that the isSized check in the old LICM code was not carried across. It turns out that check was dead. a) no test exercised it, and b) langref and verifier had been updated to disallow unsized types used in loads.
llvm-svn: 339930
Summary:
EM_ASM no longer is lowered as varargs in C, so this workaround is
obsolete.
Reviewers: dschuff, sunfish
Subscribers: sbc100, aheejin, llvm-commits
Differential Revision: https://reviews.llvm.org/D50859
llvm-svn: 339925
Each use of a value should be jointly dominated by the union of defs and
undefs. It can happen that it will only be jointly dominated by undefs,
and that is still legal. Make sure that the verifier is aware of that.
llvm-svn: 339924
There is no way in the universe, that doing a full-width division in
software will be faster than doing overflowing multiplication in
software in the first place, especially given that this same full-width
multiplication needs to be done anyway.
This patch replaces the previous implementation with a direct lowering
into an overflowing multiplication algorithm based on half-width
operations.
Correctness of the algorithm was verified by exhaustively checking the
output of this algorithm for overflowing multiplication of 16 bit
integers against an obviously correct widening multiplication. Baring
any oversights introduced by porting the algorithm to DAG, confidence in
correctness of this algorithm is extremely high.
Following table shows the change in both t = runtime and s = space. The
change is expressed as a multiplier of original, so anything under 1 is
“better” and anything above 1 is worse.
+-------+-----------+-----------+-------------+-------------+
| Arch | u64*u64 t | u64*u64 s | u128*u128 t | u128*u128 s |
+-------+-----------+-----------+-------------+-------------+
| X64 | - | - | ~0.5 | ~0.64 |
| i686 | ~0.5 | ~0.6666 | ~0.05 | ~0.9 |
| armv7 | - | ~0.75 | - | ~1.4 |
+-------+-----------+-----------+-------------+-------------+
Performance numbers have been collected by running overflowing
multiplication in a loop under `perf` on two x86_64 (one Intel Haswell,
other AMD Ryzen) based machines. Size numbers have been collected by
looking at the size of function containing an overflowing multiply in
a loop.
All in all, it can be seen that both performance and size has improved
except in the case of armv7 where code size has regressed for 128-bit
multiply. u128*u128 overflowing multiply on 32-bit platforms seem to
benefit from this change a lot, taking only 5% of the time compared to
original algorithm to calculate the same thing.
The final benefit of this change is that LLVM is now capable of lowering
the overflowing unsigned multiply for integers of any bit-width as long
as the target is capable of lowering regular multiplication for the same
bit-width. Previously, 128-bit overflowing multiply was the widest
possible.
Patch by Simonas Kazlauskas!
Differential Revision: https://reviews.llvm.org/D50310
llvm-svn: 339922
This patch refactors the existing TargetLowering::BuildSDIV base implementation to support non-uniform constant vector denominators.
This is the last patch necessary to close PR36545
Differential Revision: https://reviews.llvm.org/D50765
llvm-svn: 339908
Summary:
This prefix was added in r333421, and it changed our dumper output to
say things like "CVRegEAX" instead of just "EAX". That's a functional
change that I'd rather avoid.
I tested GCC, Clang, and MSVC, and all of them support #pragma
push_macro. They don't issue warnings whem the macro is not defined
either.
I don't have a Mac so I can't test the real termios.h header, but I
looked at the termios.h sources online and looked for other conflicts.
I saw only the CR* macros, so those are the ones we work around.
Reviewers: zturner, JDevlieghere
Subscribers: hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D50851
llvm-svn: 339907
This will allow the library to just use __builtin_expf directly
without expanding this itself. Note f64 still won't work because
there is no exp instruction for it.
llvm-svn: 339902