llvm-project

Commit Graph

Author	SHA1	Message	Date
Igor Kudrin	00ce54689d	[DebugInfo] Fix emitting DWARF64 .debug_addr sections (15/19). The patch fixes emitting the header of the table. The content is independent of the DWARF format. Differential Revision: https://reviews.llvm.org/D87022	2020-09-15 12:23:31 +07:00
Igor Kudrin	3158d3dd4b	[DebugInfo] Fix emitting DWARF64 .debug_loclists sections (14/19). The size of the offsets in the table depends on the DWARF format. Differential Revision: https://reviews.llvm.org/D87020	2020-09-15 12:23:31 +07:00
Igor Kudrin	f9b242fe24	[DebugInfo] Fix emitting DWARF64 .debug_rnglists sections (13/19). The size of the offsets in the table depends on the DWARF format. Differential Revision: https://reviews.llvm.org/D87019	2020-09-15 12:23:31 +07:00
Igor Kudrin	03b09c6b68	[DebugInfo] Fix emitting pre-v5 name lookup tables in the DWARF64 format (12/19). The transition is done by using methods of AsmPrinter which automatically emit values in compliance with the selected DWARF format. Differential Revision: https://reviews.llvm.org/D87013	2020-09-15 12:23:30 +07:00
Igor Kudrin	b118030f3f	[DebugInfo] Fix emitting DWARF64 .debug_aranges sections (11/19). The patch fixes calculating the size of the table and emitting the fields which depend on the DWARF format by using methods that choose appropriate sizes automatically. Differential Revision: https://reviews.llvm.org/D87012	2020-09-15 12:23:30 +07:00
Igor Kudrin	18f23b3ecc	[DebugInfo] Fix emitting DWARF64 type units (10/19). The patch fixes emitting the offset to the type DIE. All other fields are already fixed in previous patches. Differential Revision: https://reviews.llvm.org/D87021	2020-09-15 11:31:07 +07:00
Igor Kudrin	924dc58076	[DebugInfo] Fix emitting DWARF64 DWO compilation units and string offset tables (9/19). These two fixes are better to go together because llvm-dwarfdump is unable to dump a table when another one is malformed. Differential Revision: https://reviews.llvm.org/D87018	2020-09-15 11:31:00 +07:00
Igor Kudrin	383d34c077	[DebugInfo] Fix emitting DWARF64 .debug_str_offsets sections (8/19). The patch fixes calculating the size of the table and emitting the unit length field. Differential Revision: https://reviews.llvm.org/D87017	2020-09-15 11:30:53 +07:00
Igor Kudrin	26f1f18831	[DebugInfo] Fix emitting the DW_AT_location attribute for 64-bit DWARFv3 (7/19). The patch uses a common method to determine the appropriate form for the value of the attribute. Differential Revision: https://reviews.llvm.org/D87016	2020-09-15 11:30:46 +07:00
Igor Kudrin	cae7c1eb78	[DebugInfo] Use a common method to determine a suitable form for section offsts (6/19). This is mostly an NFC patch because the involved methods are used when emitting DWO files, which is incompatible with DWARFv3, or for platforms where DWARF64 is not supported yet. Differential Revision: https://reviews.llvm.org/D87015	2020-09-15 11:30:38 +07:00
Igor Kudrin	5dd1c59188	[DebugInfo] Fix emitting DWARF64 compilation units (5/19). The patch also adds a method to choose an appropriate DWARF form to represent section offsets according to the version and the format of producing debug info. Differential Revision: https://reviews.llvm.org/D87014	2020-09-15 11:30:30 +07:00
Igor Kudrin	982b31fad2	[DebugInfo] Add the -dwarf64 switch to llc and other internal tools (4/19). The patch adds a switch to enable emitting debug info in the 64-bit DWARF format. Most emitter for sections will be updated in the subsequent patches, whereas for .debug_line and .debug_frame the emitters are in the MC library, which is already updated. For now, the switch is enabled only for 64-bit ELF targets. Differential Revision: https://reviews.llvm.org/D87011	2020-09-15 11:30:18 +07:00
Igor Kudrin	c3c501f5d7	[DebugInfo] Add new emitting methods for values which depend on the DWARF format (3/19). These methods are going to be used in subsequent patches. Differential Revision: https://reviews.llvm.org/D87010	2020-09-15 11:30:10 +07:00
Igor Kudrin	a8058c6f8d	[DebugInfo] Fix DIE value emitters to be compatible with DWARF64 (2/19). DW_FORM_sec_offset and DW_FORM_strp imply values of different sizes with DWARF32 and DWARF64. The patch fixes DIE value classes to use correct sizes when emitting their values. For DIELocList it ensures that the requested DWARF form matches the current DWARF format because that class uses a method that selects the size automatically. Differential Revision: https://reviews.llvm.org/D87009	2020-09-15 11:30:02 +07:00
Igor Kudrin	380e746bcc	[DebugInfo] Fix methods of AsmPrinter to emit values corresponding to the DWARF format (1/19). These methods are used to emit values which are 32-bit in DWARF32 and 64-bit in DWARF64. The patch fixes them so that they choose the length automatically, depending on the DWARF format set in the Context. Differential Revision: https://reviews.llvm.org/D87008	2020-09-15 11:29:48 +07:00
Petr Hosek	042c235068	[DebugInfo] Remove dots from getFilenameByIndex return value When concatenating directory with filename in getFilenameByIndex, we might end up with a path that contains extra dots. For example, if the input is /path and ./example, we would return /path/./example. Run sys::path::remove_dots on the output to eliminate unnecessary dots. Differential Revision: https://reviews.llvm.org/D87657	2020-09-14 20:19:06 -07:00
Quentin Colombet	b3afad0463	[GlobalISel] Add a `X, Y = G_UNMERGE(G_ZEXT Z)` -> X = G_ZEXT Z; Y = 0 combine Add a combiner helper to transform unmerge of zext into one zext and a constant 0 Differential Revision: https://reviews.llvm.org/D87427	2020-09-14 17:27:23 -07:00
Quentin Colombet	d2321129bd	[GlobalISel] Add `X,Y<dead> = G_UNMERGE Z` -> X = G_TRUNC Z Add a combiner helper that replaces G_UNMERGE where all the destination lanes are dead except the first one with a G_TRUNC. Differential Revision: https://reviews.llvm.org/D87174	2020-09-14 17:27:23 -07:00
Craig Topper	46673763fe	[X86] Place new constant node in topological order in X86DAGToDAGISel::matchBitExtract Fixes PR47482	2020-09-14 16:59:04 -07:00
Craig Topper	da1aaa0b70	Revert "[X86] Place new constant node in topological order in X86DAGToDAGISel::matchBitExtract." I got the bug number wrong. This reverts commit `3251593890`.	2020-09-14 16:58:57 -07:00
Philip Reames	e6bc7037d3	[AArch64] Statepoint support for AArch64. Differential Revision: https://reviews.llvm.org/D66012 Patch By: loicottet (with major rebase by me)	2020-09-14 16:43:08 -07:00
Quentin Colombet	a36278c2f8	[GlobalISel] Add G_UNMERGE(Cst) -> Cst1, Cst2, ... combine Add a combiner helper that replaces G_UNMERGE of big constants into direct use of smaller constants. Differential Revision: https://reviews.llvm.org/D87166	2020-09-14 16:30:18 -07:00
Craig Topper	3251593890	[X86] Place new constant node in topological order in X86DAGToDAGISel::matchBitExtract. Fixes PR47525	2020-09-14 16:28:37 -07:00
Krzysztof Parzyszek	bb877d1af2	[Hexagon] Widen loads and handle any-/sign-/zero-extensions	2020-09-14 18:10:23 -05:00
Krzysztof Parzyszek	6352381039	[Hexagon] Some HVX DAG combines 1. VINSERTW0 x, undef -> x 2. VROR (VROR x, a), b) -> VROR x, a+b	2020-09-14 18:10:23 -05:00
Arthur Eubanks	10b12d4035	Reland [docs][NewPM] Add docs for writing NPM passes As to not conflict with the legacy PM example passes under llvm/lib/Transforms/Hello, this is under HelloNew. This makes the CMakeLists.txt and general directory structure less confusing for people following the example. Much of the doc structure was taken from WritinAnLLVMPass.rst. This adds a HelloWorld pass which simply prints out each function name. More will follow after this, e.g. passes over different units of IR, analyses. https://llvm.org/docs/WritingAnLLVMPass.html contains a lot more. Relanded with missing "Support" dependency in LLVMBuild.txt. Reviewed By: ychen, asbirlea Differential Revision: https://reviews.llvm.org/D86979	2020-09-14 16:06:19 -07:00
Aditya Nandakumar	46f9137e43	[GISel]: Add combine for G_FABS to G_FABS https://reviews.llvm.org/D87554 Patch adds one new GICombinerRule for G_FABS. The combine rule folds G_FABS(G_FABS(X)) to G_FABS(X). Patch additionally adds new combiner tests for the AArch64 target to test this new combiner rule. Patch by mkitzan.	2020-09-14 15:56:24 -07:00
Arthur Eubanks	39ec36415d	Revert "[docs][NewPM] Add docs for writing NPM passes" This reverts commit `c2590de30d`. Breaks shared libs build	2020-09-14 15:55:17 -07:00
Quentin Colombet	670c276232	[GlobalISel] Add G_UNMERGE_VALUES(G_MERGE_VALUES) combine Add the matching and applying function to the combiner helper for G_UNMERGE_VALUES(G_MERGE_VALUES). This combine also supports any merge-like input nodes, like G_BUILD_VECTORS and is robust against bitcasts in between int unmerge and merge nodes. When the input type of the merge node and the output type of the unmerge node are not the same, but the sizes are, the combine still applies but creates bitcasts between the sources and the destinations instead of reusing the destinations directly. Long term, the artifact combiner should probably reuse that helper, but as of today, it doesn't use any outside helper, so I kept it this way. Differential Revision: https://reviews.llvm.org/D87117	2020-09-14 15:45:06 -07:00
Arthur Eubanks	f3d8344854	[PruneEH][NFC] Use CallGraphUpdater in PruneEH In preparation for porting the pass to NPM. Reviewed By: asbirlea Differential Revision: https://reviews.llvm.org/D87632	2020-09-14 14:43:19 -07:00
Craig Topper	c193a689b4	[SelectionDAG] Use Align/MaybeAlign in calls to getLoad/getStore/getExtLoad/getTruncStore. The versions that take 'unsigned' will be removed in the future. I tried to use getOriginalAlign instead of getAlign in some places. getAlign factors in the minimum alignment implied by the offset in the pointer info. Since we're also passing the pointer info we can use the original alignment. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D87592	2020-09-14 13:54:50 -07:00
Austin Kerbow	f859c30ecb	[AMDGPU] Add XDL resource to scheduling model Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D87621	2020-09-14 13:48:54 -07:00
Arthur Eubanks	c2590de30d	[docs][NewPM] Add docs for writing NPM passes As to not conflict with the legacy PM example passes under llvm/lib/Transforms/Hello, this is under HelloNew. This makes the CMakeLists.txt and general directory structure less confusing for people following the example. Much of the doc structure was taken from WritinAnLLVMPass.rst. This adds a HelloWorld pass which simply prints out each function name. More will follow after this, e.g. passes over different units of IR, analyses. https://llvm.org/docs/WritingAnLLVMPass.html contains a lot more. Reviewed By: ychen, asbirlea Differential Revision: https://reviews.llvm.org/D86979	2020-09-14 13:26:03 -07:00
Teresa Johnson	226d80ebe2	[MemProf] Rename HeapProfiler to MemProfiler for consistency This is consistent with the clang option added in `7ed8124d46`, and the comments on the runtime patch in D87120. Differential Revision: https://reviews.llvm.org/D87622	2020-09-14 13:14:57 -07:00
Craig Topper	4208ea3e19	[FastISel] Bail out of selectGetElementPtr for vector GEPs. The code that decomposes the GEP into ADD/MUL doesn't work properly for vector GEPs. It can create bad COPY instructions or possibly assert. For now just bail out to SelectionDAG. Fixes PR45906	2020-09-14 12:53:06 -07:00
Kamau Bridgeman	c0f199e566	[PowerPC] Implement Thread Local Storage Support for Local Exec This patch is the initial support for the Local Exec Thread Local Storage model to produce code sequence and relocations correct to the ABI for the model when using PC relative memory operations. Patch by: Kamau Bridgeman Differential Revision: https://reviews.llvm.org/D83404	2020-09-14 14:16:28 -05:00
Nikita Popov	53f36f06af	[Legalize][ARM][X86] Add float legalization for VECREDUCE This adds SoftenFloatRes, PromoteFloatRes and SoftPromoteHalfRes legalizations for VECREDUCE, to fill the remaining hole in the SDAG legalization. These legalizations simply expand the reduction and let it be recursively legalized. For the PromoteFloatRes case at least it is possible to do better than that, but it's pretty tricky (because we need to consider the interaction of three different vector legalizations and the type promotion) and probably not really worthwhile. I haven't added ExpandFloatRes support, as I am not familiar with ppc_fp128. Differential Revision: https://reviews.llvm.org/D87569	2020-09-14 20:42:09 +02:00
Eric Astor	23a2b03221	[ms] [llvm-ml] Add basic support for SEH, including PROC FRAME Add basic support for SEH, including PROC FRAME Reviewed By: thakis Differential Revision: https://reviews.llvm.org/D86948	2020-09-14 14:32:55 -04:00
Eric Astor	20201dc76a	[ms] [llvm-ml] Add support for size queries in MASM Add support for size inference, sizeof, typeof, and lengthof. Reviewed By: thakis Differential Revision: https://reviews.llvm.org/D86947	2020-09-14 14:27:06 -04:00
Eric Astor	7c44ee8e19	[ms] [llvm-ml] Fix struct padding logic MASM structs are end-padded to have size a multiple of the smaller of the requested alignment and the size of their largest field (taken recursively, if they have a field of STRUCT type). This matches the behavior of ml.exe and ml64.exe. Our original implementation followed the MASM 6.0 documentation, which instead specified that MASM structs were padded to a multiple of their requested alignment. Reviewed By: thakis Differential Revision: https://reviews.llvm.org/D87248	2020-09-14 14:12:20 -04:00
Eric Astor	da17e0d5c1	[ms] [llvm-ml] Add missing built-in type aliases Add signed aliases for integral types, as well as the "DF" abbreviation for the FWORD type. Reviewed By: thakis Differential Revision: https://reviews.llvm.org/D87246	2020-09-14 14:09:24 -04:00
Nikita Popov	cfff88c03c	[InstCombine] Simplify select operand based on equality condition For selects of the type X == Y ? A : B, check if we can simplify A by using the X == Y equality and replace the operand if that's possible. We already try to do this in InstSimplify, but will only fold if the result of the simplification is the same as B, in which case the select can be dropped entirely. Here the select will be retained, just one operand simplified. As we are performing an actual replacement here, we don't have problems with refinement / poison values. Differential Revision: https://reviews.llvm.org/D87480	2020-09-14 20:07:06 +02:00
Nikita Popov	8e69c3cde8	[DAGCombiner] Fold fmin/fmax with INF / FLT_MAX Similar to D87415, this folds the various float min/max opcodes with a constant INF or -INF operand, or FLT_MAX / -FLT_MAX operand if the ninf flag is set. Some of the folds are only possible under nnan. The fminnum(X, INF) with nnan and fmaxnum(X, -INF) with nnan cases are needed to improve the VECREDUCE_FMIN/FMAX lowerings on X86, the rest is here for the sake of completeness. Differential Revision: https://reviews.llvm.org/D87571	2020-09-14 19:59:33 +02:00
Simon Pilgrim	4ff4708d39	collectBitParts - use const references. NFCI. Fixes clang-tidy warnings first noticed on D87452.	2020-09-14 18:23:00 +01:00
Rahman Lavaee	7841e21c98	Let -basic-block-sections=labels emit basicblock metadata in a new .bb_addr_map section, instead of emitting special unary-encoded symbols. This patch introduces the new .bb_addr_map section feature which allows us to emit the bits needed for mapping binary profiles to basic blocks into a separate section. The format of the emitted data is represented as follows. It includes a header for every function: \| Address of the function \| -> 8 bytes (pointer size) \| Number of basic blocks in this function (>0) \| -> ULEB128 The header is followed by a BB record for every basic block. These records are ordered in the same order as MachineBasicBlocks are placed in the function. Each BB Info is structured as follows: \| Offset of the basic block relative to function begin \| -> ULEB128 \| Binary size of the basic block \| -> ULEB128 \| BB metadata \| -> ULEB128 [ MBB.isReturn() OR MBB.hasTailCall() << 1 OR MBB.isEHPad() << 2 ] The new feature will replace the existing "BB labels" functionality with -basic-block-sections=labels. The .bb_addr_map section scrubs the specially-encoded BB symbols from the binary and makes it friendly to profilers and debuggers. Furthermore, the new feature reduces the binary size overhead from 70% bloat to only 12%. For more information and results please refer to the RFC: https://lists.llvm.org/pipermail/llvm-dev/2020-July/143512.html Reviewed By: MaskRay, snehasish Differential Revision: https://reviews.llvm.org/D85408	2020-09-14 10:16:44 -07:00
Sanjay Patel	55d371abd7	[InstSimplify] add folds for fmin/fmax with 'nnan' maximum(nnan X, +INF) --> +INF minimum(nnan X, -INF) --> -INF This is based on the similar codegen transform proposed in: D87571	2020-09-14 11:46:11 -04:00
Sanjay Patel	7526376164	[InstSimplify] allow folds for fmin/fmax with 'ninf' maxnum(ninf X, +FLT_MAX) --> +FLT_MAX minnum(ninf X, -FLT_MAX) --> -FLT_MAX This is based on the similar codegen transform proposed in: D87571	2020-09-14 11:18:08 -04:00
Florian Hahn	c4f1b31441	[MemorySSA] Make sure PerformedPhiTrans is updated for each visited def. `1ce82015f6` added a fix to restrict phi optimizations after phi translations. But the current use of performedPhiTranslation only checked whether phi translation happened for the first iterator and missed cases where phi translations happens at subsequent iterators/upwards defs. This patch changes upward_defs_iteartor to take a pointer to a bool, so we can easily ensure the final value includes all visited defs, while still being able to conveniently use it with make_range & co.	2020-09-14 16:11:56 +01:00
Sanjay Patel	22c583c3d0	[InstSimplify] reduce code duplication for fmin/fmax folds; NFC We use the same code structure for folding integer min/max.	2020-09-14 10:32:11 -04:00
jasonliu	9868ea764f	[XCOFF][AIX] Handle TOC entries that could not be reached by positive range in small code model Summary: In small code model, AIX assembler could not deal with labels that could not be reached within the [-0x8000, 0x8000) range from TOC base. So when generating the assembly, we would need to help the assembler by subtracting an offset from the label to keep the actual value within [-0x8000, 0x8000). Reviewed By: hubert.reinterpretcast, Xiangling_L Differential Revision: https://reviews.llvm.org/D86879	2020-09-14 13:41:34 +00:00
Sanjay Patel	7bb9a2f996	[InstSimplify] fix miscompiles with maximum/minimum intrinsics As discussed in the sibling codegen functionality patch D87571, this transform was created with D52766, but it is not correct. The incorrect test diffs were missed during review, but the 'TODO' comment about this functionality was still in the code - we need 'nnan' to enable this fold.	2020-09-14 09:06:41 -04:00
Jay Foad	c799f873cb	[AMDGPU] Don't cluster stores Clustering loads has caching benefits, but as far as I know there is no advantage to clustering stores on any AMDGPU subtargets. The disadvantage is that it tends to increase register pressure and restricts scheduling freedom. Differential Revision: https://reviews.llvm.org/D85530	2020-09-14 13:40:17 +01:00
Simon Pilgrim	98eaacd73d	Assert we've found both vector types. NFCI. Fixes clang static analyzer warning about potential null dereferences.	2020-09-14 13:24:17 +01:00
Simon Pilgrim	7109fc9e42	Don't dereference from a dyn_cast<>. NFCI. Use cast<> instead which will assert if it fails and not just return null. Fixes clang static analyzer warning.	2020-09-14 13:05:17 +01:00
Max Kazantsev	412b417bfa	[NFC] Add missing `const` statements in SCEV	2020-09-14 18:43:24 +07:00
David Green	06fb4e9064	[CGP] Limit converting phi types to simple loads and stores Instcombine limits converting phi types to simple loads and stores. This does the same in codegenprepare, not processing phis that are not simple. Note that volatile loads/store ISel will happily convert between float and int. Atomics are more likely to always be integer. This just keeps things simple and doesn't process either. Differential Revision: https://reviews.llvm.org/D83770	2020-09-14 12:08:34 +01:00
Florian Hahn	f715d81c9d	[DSE] Only eliminate candidates that always store the same loc. AliasAnalysis/MemoryLocation does not account for loops. Two MemoryLocation can be must-overwrite, even if the first one writes multiple locations in a loop. This patch prevents removing such stores, by only considering candidates that are known to be loop invariant, or executed in the same BB. Currently the invariant check is quite conservative and only considers Alloca and Alloca-like instructions and arguments as invariant base pointers. It also considers GEPs with all constant indices and invariant bases as invariant. This can be improved in the future, but the current implementation has only minor impact on the total number of stores eliminated (25903 vs 26047 for the baseline). There are some 2-10% swings for some individual benchmarks. In roughly half of the cases, the number of stores removed increases actually, because we skip candidates that are unlikely to be valid candidates early.	2020-09-14 12:06:58 +01:00
Meera Nakrani	dd519bf0b0	[ARM] Selects SSAT/USAT from correct LLVM IR LLVM will canonicalize conditional selectors to a different pattern than the old code that was used. This is updating the function to match the new expected patterns and select SSAT or USAT when successful. Tests have also been updated to use the new patterns. Differential Review: https://reviews.llvm.org/D87379	2020-09-14 10:58:21 +00:00
Sjoerd Meijer	676febc044	[ARM][MVE] Tail-predication: check get.active.lane.mask's TC value This adds additional checks for the original scalar loop tripcount value, i.e. get.active.lane.mask second argument, and perform several sanity checks to see if it is of the form that we expect similarly like we already do for the IV which is the first argument of get.active.lane. Differential Revision: https://reviews.llvm.org/D86074	2020-09-14 11:32:15 +01:00
David Sherwood	816663adb5	[SVE] In LoopIdiomRecognize::isLegalStore bail out for scalable vectors The function LoopIdiomRecognize::isLegalStore looks for stores in loops that could be transformed into memset or memcpy. However, the algorithm currently requires that we know how big the store is at runtime, i.e. that the store size will not overflow an unsigned integer. For scalable vectors we cannot guarantee this so I have changed the code to bail out for now. In addition, even if we add a way to query the maximum value of vscale in future we will still need to update the algorithm to cope with non-constant strides. The additional cost associated with calculating the memset and memcpy arguments will need to be taken into account as well. This patch also fixes up an implicit TypeSize -> uint64_t cast, thereby removing a warning. I've added tests here showing a fixed width vector loop being transformed into memcpy, and a scalable vector loop remaining unchanged: Transforms/LoopIdiom/memcpy-vectors.ll Differential Revision: https://reviews.llvm.org/D87439	2020-09-14 11:28:31 +01:00
Petar Avramovic	6e2a86ed5a	AMDGPU/GlobalISel Check for NoNaNsFPMath in isKnownNeverSNaN Check for NoNaNsFPMath function attribute in isKnownNeverSNaN. Function attributes are in held in 'TargetMachine.Options'. Among other things, this allows selection of some patterns imported in D87351 since G_FCANONICALIZE is not generated when isKnownNeverSNaN returns true in lowerFMinNumMaxNum. However we notice some incorrect results since function attributes are not correctly written in TargetMachine.Options when next function is processed. Take a look at @v_test_no_global_nnans_med3_f32_pat0_srcmod0, it has "no-nans-fp-math"="false" but TargetMachine.Options still has it set to true since first function in test file had this attribute set to true. This will be fixed in D87511. Differential Revision: https://reviews.llvm.org/D87456	2020-09-14 12:11:00 +02:00
Simon Pilgrim	00e5676cf6	[LegalizeDAG] Fix MSVC "result of 32-bit shift implicitly converted to 64 bits" warning. NFCI.	2020-09-14 11:09:43 +01:00
Jeremy Morse	d3af441dfe	[DebugInstrRef][1/9] Add fields for instr-ref variable locations Add a DBG_INSTR_REF instruction and a "debug instruction number" field to MachineInstr. The two allow variable values to be specified by identifying where the value is computed, rather than the register it lies in, like so: %0 = fooinst, debug-instr-number 1 [...] DBG_INSTR_REF 1, 0 See the original RFC for motivation: http://lists.llvm.org/pipermail/llvm-dev/2020-February/139440.html This patch is NFCI; it only adds fields and other boiler plate. Differential Revision: https://reviews.llvm.org/D85741	2020-09-14 10:06:52 +01:00
Petar Avramovic	09b8871f8d	AMDGPU/GlobalISel/Emitter Support for predicate code that uses operands Predicates with 'let PredicateCodeUsesOperands = 1' want to examine matched operands. When we encounter predicate code that uses operands, analyze its named operand arguments and create a map between argument index and name. Later, when leaf node with name is encountered, emit GIM_RecordNamedOperand that will store that operand at its argument index in operand list. This operand list will be an argument to c++ code of the predicate. Differential Revision: https://reviews.llvm.org/D87285	2020-09-14 10:39:56 +02:00
David Stenberg	bfcb824ba5	[JumpThreading] Fix an incorrect Modified status This fixes PR47297. When ProcessBlock() was able to constant fold the terminator's condition, but not do any more transformations, the function would return false, which would lead to the JumpThreading pass returning an incorrect modified status. This patch makes so that ProcessBlock() returns true in such cases. This will trigger an unnecessary invocation of ProcessBlock() in such cases, but this should be rare to occur. This was caught using the check introduced by D80916. Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D87392	2020-09-14 10:36:13 +02:00
Jay Foad	9a4476072e	[UnifyLoopExits] Fix non-deterministic iteration order This was causing random minor codegen differences in shaders compiled with the AMDGPU backend. Differential Revision: https://reviews.llvm.org/D87548	2020-09-14 09:09:58 +01:00
Simon Wallis	4946802c5f	[ARM] Fix so immediates and pc relative checks Treating an SoImm offset as a multiple of 4 between -1020 and 1020 mis-handles the second of a pair of 16-bit constants where the offset is a multiple of 2 but not a multiple of 4, leading to an LLVM ERROR: out of range pc-relative fixup value For 32-bit and larger (64-bit) constants, continue to treat an SoImm offset as a multiple of 4 between -1020 and 1020. For smaller (16-bit) constants, treat an SoImm offset as a multiple of 1 between -255 and 255. Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D86949	2020-09-14 08:52:59 +01:00
David Sherwood	15bff4dec4	[CodeGen] Fix bug in IncrementPointer In an earlier patch I meant to add the correct flags to the ADD node when incrementing the pointer, but forgot to pass them to SelectionDAG::getNode. Differential Revision: https://reviews.llvm.org/D87496	2020-09-14 08:03:55 +01:00
Fangrui Song	4d7b194543	[llvm-cov gcov] Refactor counting and reporting The current organization of FileInfo and its referenced utility functions of (GCOVFile, GCOVFunction, GCOVBlock) is messy. Some members of FileInfo are just copied from GCOVFile. FileInfo::print (.gcov output and --intermediate output) is interleaved with branch statistics and computation of line execution counts. --intermediate has to do redundant .gcov output to gather branch statistics. This patch deletes lots of code and introduces a clearer work flow: ``` fn collectFunction for each block b for each line lineNum let line be LineInfo of the file on lineNum line.exists = 1 increment function's lines & linesExec if necessary increment line.count line.blocks.push_back(&b) fn collectSourceLine compute cycle counts count = incoming_counts + cycle_counts if line.exists ++summary->lines if line.count ++summary->linesExec fn collectSource for each line call collectSourceLine fn main for each function call collectFunction print function summary for each source file call collectSource print file summary annotate the source file with line execution counts if -i print intermediate file ``` The output order of functions and files now follows the original order in .gcno files.	2020-09-13 23:00:59 -07:00
Yevgeny Rouban	88690a9658	[CodeGenPrepare] Fix zapping dead operands of assume This patch fixes a problem of the commit `52cc97a0`. A test case is created to demonstrate the crash caused by the instruction iterator invalidated by the recursive removal of dead operands of assume. The solution restarts from the blocks's first instruction in case CurInstIterator is invalidated by RecursivelyDeleteTriviallyDeadInstructions(). Reviewed By: bkramer Differential Revision: https://reviews.llvm.org/D87434	2020-09-14 11:46:34 +07:00
Craig Topper	56b33391d3	[SelectionDAG] Move ISD:PARITY formation from DAGCombine to SimplifyDemandedBits. Previously, we formed ISD::PARITY by looking for (and (ctpop X), 1) but the AND might be separated from the ctpop. For example if the parity result is multiplied by 2, we'll pull the AND through the shift. So to handle more cases, move to SimplifyDemandedBits where we can handle more cases that result in only the LSB of the CTPOP being used.	2020-09-13 21:04:13 -07:00
Lang Hames	783ba64a89	[JITLink] Improve formatting for Edge, Block and Symbol debugging output.	2020-09-13 15:44:07 -07:00
Fangrui Song	b2c32c90ba	[llvm-cov gcov] Add -r (--relative-only) && -s (--source-prefix) gcov 4.7 introduced the two options. https://sourceware.org/pipermail/gcc-patches/2011-November/328782.html -r only dumps files with relative paths or absolute paths with the prefix specified by -s. The two options are useful filtering out system header files.	2020-09-13 14:54:20 -07:00
David Blaikie	ce89eeee16	PPCInstrInfo: Fix readability-inconsistent-declaration-parameter-name clang-tidy warning Reduces the chance of confusion when calling the function with autocomplete (will show the more accurate/informative variable name), etc.	2020-09-13 13:08:17 -07:00
David Blaikie	6e06f1cd08	GCOVProfiling: Avoid use-after-move Turns out this was use-after-move of function_ref, which is trivially copyable and movable, so the move did nothing and use after move was safe. But since this function_ref is being copied into a std::function, change the function_ref to be std::function to avoid extra layers of type erasure indirection - and then it's a real use after move, and fix that by referring to the moved-to member variable rather than the moved-from parameter.	2020-09-13 12:54:36 -07:00
Qiu Chaofan	a4c5351986	[DAGCombiner] Propagate FMF flags in FMA folding DAG combiner folds (fma a 1.0 b) into (fadd a b) but the flag isn't propagated into new fadd. This patch fixes that. Some code in visitFMA is redundant and such support for vector constants is missing. Need follow-up patch to clean. Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D87037	2020-09-14 00:19:06 +08:00
David Green	9237fde481	[CGP] Prevent optimizePhiType from iterating forever The recently added optimizePhiType algorithm had no checks to make sure it didn't continually iterate backward and forth between float and int types. This means that given an input like store(phi(bitcast(load))), we could convert that back and forth to store(bitcast(phi(load))). This particular case would usually have been simplified to a different load type (folding the bitcast into the load) before CGP, but other cases can occur. The one that came up was phi(bitcast(phi)), where the two phi's of different types were bitcast between. That was not helped by a dead bitcast being kept around which could make conversion look profitable. This adds an extra check of the bitcast Uses or Defs, to make sure that at least one is grounded and will not end up being converted back. It also makes sure that dead bitcasts are removed, and there is a minor change to include newly created Phi nodes in the Visited set so that they do not need to be revisited. Differential Revision: https://reviews.llvm.org/D82676	2020-09-13 16:11:01 +01:00
Qiu Chaofan	bec81dc67d	Reland "[PowerPC] Implement instruction clustering for stores" Commit `3c0b3250` introduced store fusion for PowerPC target, but it brought failure under UB sanitizer and was reverted. This patch fixes them.	2020-09-13 19:51:01 +08:00
Fangrui Song	5f4e9bf641	[gcov] Fix memory leak due to BranchProbabilityInfoWrapperPass This is weird.	2020-09-13 00:44:32 -07:00
Fangrui Song	63182c2ac0	[gcov] Add spanning tree optimization gcov is an "Edge Profiling with Edge Counters" application according to Optimally Profiling and Tracing Programs (1994). The minimum number of counters necessary is \|E\|-(\|V\|-1). The unmeasured edges form a spanning tree. Both GCC --coverage and clang -fprofile-generate leverage this optimization. This patch implements the optimization for clang --coverage. The produced .gcda files are much smaller now.	2020-09-13 00:07:31 -07:00
Fangrui Song	f086e85eea	[gcov] Assign names to some types and loaded values used in @__llvm_internal* This makes the generated IR much more readable.	2020-09-12 22:42:37 -07:00
Fangrui Song	8cf1ac97ce	[llvm-cov gcov] Improve accuracy when some edges are not measured Also guard against infinite recursion if GCOV_ARC_ON_TREE edges contain a cycle.	2020-09-12 22:33:41 -07:00
Craig Topper	61d29e0dff	[LegalizeTypes] Remove a few cases from SplitVectorOperand that should never happen. NFC CTTZ, CTLZ, CTPOP, and FCANONICALIZE all have the same input and output types so the operand should have already been legalized when the result type was legalized.	2020-09-12 20:59:14 -07:00
Craig Topper	758732a34e	[X86] Use ISD::PARITY directly instead of emitting CTPOP and AND from combineHorizontalPredicateResult. We have a PARITY ISD node now so might as well use it. It will get re-expanded later.	2020-09-12 20:01:17 -07:00
Krzysztof Parzyszek	9d300bc8d2	[Hexagon] Avoid widening vectors with non-HVX element types	2020-09-12 20:26:54 -05:00
Fangrui Song	d6fadc49e3	[gcov] Process .gcda immediately after the accompanying .gcno instead of doing all .gcda after all .gcno i.e. change the work flow from * .gcno for function A * .gcno for function B * .gcno for function C * .gcda for function A * .gcda for function B * .gcda for function C to * .gcno for function A * .gcda for function A * .gcno for function B * .gcda for function B * .gcno for function C * .gcda for function C Currently there is duplicate logic in .gcno & .gcda processing: how functions are filtered, which edges are instrumented, etc. This refactor enables simplification. Since we always process .gcno, in -fprofile-arcs -fno-test-coverage mode, __llvm_internal_gcov_emit_function_args.0 will have non-zero checksums.	2020-09-12 13:53:03 -07:00
Fangrui Song	7d3825ed95	Revert "[gcov] emitProfileArcs: iterate over GCOVFunction's instead of Function's to avoid duplicated filtering" This reverts commit `412c9c0bf2`.	2020-09-12 12:34:43 -07:00
Fangrui Song	412c9c0bf2	[gcov] emitProfileArcs: iterate over GCOVFunction's instead of Function's to avoid duplicated filtering	2020-09-12 12:21:32 -07:00
Fangrui Song	c55c14837e	[gcov] Clean up by getting llvm.dbg.cu earlier	2020-09-12 12:21:32 -07:00
Craig Topper	ad3d6f993d	[SelectionDAG][X86][ARM][AArch64] Add ISD opcode for __builtin_parity. Expand it to shifts and xors. Clang emits (and (ctpop X), 1) for __builtin_parity. If ctpop isn't natively supported by the target, this leads to poor codegen due to the expansion of ctpop being more complex than what is needed for parity. This adds a DAG combine to convert the pattern to ISD::PARITY before operation legalization. Type legalization is updated to handled Expanding and Promoting this operation. If after type legalization, CTPOP is supported for this type, LegalizeDAG will turn it back into CTPOP+AND. Otherwise LegalizeDAG will emit a series of shifts and xors followed by an AND with 1. I've avoided vectors in this patch to avoid more legalization complexity for this patch. X86 previously had a custom DAG combiner for this. This is now moved to Custom lowering for the new opcode. There is a minor regression in vector-reduce-xor-bool.ll, but a follow up patch can easily fix that. Fixes PR47433 Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D87209	2020-09-12 11:42:18 -07:00
Florian Hahn	e082dee2b5	[DSE] Bail out on MemoryPhis when deleting stores at end of function. When deleting stores at the end of a function, we have to do PHI translation, otherwise we might miss reads in different iterations of a loop. See multiblock-loop-carried-dependence.ll for details. This fixes a mis-compile and surprisingly also increases the number of eliminated stores from 26047 to 26572 for MultiSource/SPEC2000/SPEC2006 on X86 with -O3 -flto. This is most likely because we save budget by not exploring through MemoryPhis, which are less likely to result in valid candidates for elimination. The issue was reported post-commit for `fb109c42d9`.	2020-09-12 19:05:59 +01:00
David Green	74760bb00f	[LV][ARM] Add preferInloopReduction target hook. This allows the backend to tell the vectorizer to produce inloop reductions through a TTI hook. For the moment on ARM under MVE this means allowing integer add reductions of the correct size. In the future this can include integer min/max too, under -Os. Differential Revision: https://reviews.llvm.org/D75512	2020-09-12 17:47:04 +01:00
Paul C. Anagnostopoulos	8ce75e2778	TableGen: change a couple of member names to clarify their use.	2020-09-12 12:21:36 -04:00
Simon Pilgrim	3170d54842	[InstCombine][X86] Covert masked load/stores with (sign extended) bool vector masks to generic intrinsics. As detailed on PR11210, if the mask is known to come from a (sign extended) bool vector (e.g. comparisons) then we can represent with a generic masked load/store without losing anything. We already do something similar for BLENDV -> SELECT conversion.	2020-09-12 15:09:28 +01:00
Evgeny Leviant	2e61cd1295	[MachineScheduler] Fix operand scheduling for pre/post-increment loads Differential revision: https://reviews.llvm.org/D87557	2020-09-12 16:53:12 +03:00
Tyker	78de7297ab	Reland [AssumeBundles] Use operand bundles to encode alignment assumptions NOTE: There is a mailing list discussion on this: http://lists.llvm.org/pipermail/llvm-dev/2019-December/137632.html Complemantary to the assumption outliner prototype in D71692, this patch shows how we could simplify the code emitted for an alignemnt assumption. The generated code is smaller, less fragile, and it makes it easier to recognize the additional use as a "assumption use". As mentioned in D71692 and on the mailing list, we could adopt this scheme, and similar schemes for other patterns, without adopting the assumption outlining.	2020-09-12 15:36:06 +02:00
David Green	6cfd38d03d	[ARM] Fixup single source mla reductions. This fixes a complication on top of D87276. If we are sign extending around a mul with the two operands that are the same, instcombine will helpfully convert one of the sext to a zext. Reverse that so that we again generate a reduction. Differnetial Revision: https://reviews.llvm.org/D87287	2020-09-12 14:31:26 +01:00
Sanjay Patel	3a8ea8609b	[Intrinsics] define semantics for experimental fmax/fmin vector reductions As discussed on llvm-dev: http://lists.llvm.org/pipermail/llvm-dev/2020-April/140729.html This is hopefully the final remaining showstopper before we can remove the 'experimental' from the reduction intrinsics. No behavior was specified for the FP min/max reductions, so we have a mess of different interpretations. There are a few potential options for the semantics of these max/min ops. I think this is the simplest based on current behavior/implementation: make the reductions inherit from the existing llvm.maxnum/minnum intrinsics. These correspond to libm fmax/fmin, and those are similar to the (now deprecated?) IEEE-754 maxNum/minNum functions (NaNs are treated as missing data). So the default expansion creates calls to libm functions. Another option would be to inherit from llvm.maximum/minimum (NaNs propagate), but most targets just crash in codegen when given those nodes because no default expansion was ever implemented AFAICT. We could also just assume 'nnan' semantics by default (we are already assuming 'nsz' semantics in the maxnum/minnum intrinsics), but some targets (AArch64, PowerPC) support the more defined behavior, so it doesn't make much sense to not allow a tighter spec. Fast-math-flags (nnan) can be used to loosen the semantics. (Note that D67507 was proposed to update the LangRef to acknowledge the more recent IEEE-754 2019 standard, but that patch seems to have stalled. If we do update based on the new standard, the reduction instructions can seamlessly inherit from whatever updates are made to the max/min intrinsics.) x86 sees a regression here on 'nnan' tests because we have underlying, longstanding bugs in FMF creation/propagation. Those need to be fixed apart from this change (for example: https://llvm.org/PR35538). The expansion sequence before this patch may not have been correct. Differential Revision: https://reviews.llvm.org/D87391	2020-09-12 09:10:28 -04:00
Simon Pilgrim	50ee0b99ec	[InstCombine][X86] getNegativeIsTrueBoolVec - use ConstantExpr evaluators. NFCI. Don't do this manually, we can just use the ConstantExpr evaluators to do it more tidily for us.	2020-09-12 13:58:58 +01:00
David Green	c437446d90	[ARM] Recognize "double extend" reduction patterns We can sometimes get code that does: xe = zext i16 x to i32 ye = zext i16 y to i32 m = mul i32 xe, ye me = zext i32 m to i64 r = vecreduce.add(me) This "double extend" can trip up the reduction identification, but should give identical results. This extends the pattern matching to handle them. Differential Revision: https://reviews.llvm.org/D87276	2020-09-12 13:51:42 +01:00
Nikita Popov	36e2e2e12e	[InstCombine] Fix incorrect SimplifyWithOpReplaced transform (PR47322) This is a followup to D86834, which partially fixed this issue in InstSimplify. However, InstCombine repeats the same transform while dropping poison flags -- which does not cover cases where poison is introduced in some other way. The fix here is a bit more comprehensive, because things are quite entangled, and it's hard to only partially address it without regressing optimization. There are really two changes here: * Export the SimplifyWithOpReplaced API from InstSimplify, with an added AllowRefinement flag. For replacements inside the TrueVal we don't actually care whether refinement occurs or not, the replacement is always legal. This part of the transform is now done in InstSimplify only. (It should be noted that the current AllowRefinement check is not sufficient -- that's an issue we need to address separately.) * Change the InstCombine fold to work by temporarily dropping poison generating flags, running the fold and then restoring the flags if it didn't work out. This will ensure that the InstCombine fold is correct as long as the InstSimplify fold is correct. Differential Revision: https://reviews.llvm.org/D87445	2020-09-12 14:45:06 +02:00
Simon Pilgrim	35dc91aee2	[X86][SSE] lowerShuffleAsDecomposedShuffleBlend - support decomposed unpacks for some vXi8/vXi16 cases Follow up to D86429 to handle the remaining regressions. This patch generalizes lowerShuffleAsDecomposedShuffleBlend to lowerShuffleAsDecomposedShuffleMerge, and attempts to use an UNPCKL shuffle mask instead of a blend for the cases where the inputs are coming from alternating vXi8/vXi16 sources. Technically they don't have to be alternating (just as long as they can fit into a lower lane half for the unpack) but I didn't find as many general cases and it needed a lot more of the function to be altered. For vXi32/vXi64 cases this could still be beneficial but in most cases the existing permute+blend approach was better. Differential Revision: https://reviews.llvm.org/D87405	2020-09-12 13:39:33 +01:00
Jianzhou Zhao	0ece51c60c	Add raw_fd_stream that supports reading/seeking/writing This is used by https://reviews.llvm.org/D86905 to support bitcode writer's incremental flush.	2020-09-12 07:34:19 +00:00
QingShan Zhang	0680a3d56d	[Power10] Enable the heuristic for Power10 and switch the sched model with P9 Model Enable the pre-ra and post-ra scheduler strategy for Power10 as we want to customize the heuristic later. And switch the scheduler model with P9 model before P10 Model is available. The NoSchedModel is modelled as in-order cpu and the pre-ra scheduler is not bi-directional which will have big impact on the scheduler. Reviewed By: jji Differential Revision: https://reviews.llvm.org/D86865	2020-09-12 02:49:47 +00:00
QingShan Zhang	528554c39b	[PowerPC] Set the mayRaiseFPException for FCMPUS/FCMPUD From ISA, fcmpu will raise the Floating-Point Invalid Operation Exception (SNaN) if either of the operands is a Signaling NaN by setting the bit VXSNAN. But the instruction description didn't set the mayRaiseFPException which might have impact on the scheduling or some backend optimization. Reviewed By: qiucf Differential Revision: https://reviews.llvm.org/D83937	2020-09-12 02:42:22 +00:00
Yuanfang Chen	ad99e34c59	Revert "[NewPM][CodeGen] Introduce CodeGenPassBuilder to help build codegen pipeline" This reverts commit `31ecf8d29d`. This reverts commit `3fdaa8602a`. There is laying violation for Target->CodeGen.	2020-09-11 18:52:32 -07:00
Eli Friedman	d751f86189	[ConstantFold] Make areGlobalsPotentiallyEqual less aggressive. In particular, we shouldn't make assumptions about globals which are unnamed_addr: we can fold them together with other globals. Also while I'm here, use isInterposable() instead of trying to explicitly name all the different kinds of weak linkage. Fixes https://bugs.llvm.org/show_bug.cgi?id=47090 Differential Revision: https://reviews.llvm.org/D87123	2020-09-11 17:23:08 -07:00
Eli Friedman	37f2776d1a	[ConstantFold] Fold binary arithmetic on scalable vector splats. It's a nice simplification, and it confuses instcombine if we don't do it. Differential Revision: https://reviews.llvm.org/D87422	2020-09-11 16:41:58 -07:00
Yuanfang Chen	31ecf8d29d	[NewPM][CodeGen] Introduce CodeGenPassBuilder to help build codegen pipeline Following up on D67687. Please refer to the RFC here http://lists.llvm.org/pipermail/llvm-dev/2020-July/143309.html `CodeGenPassBuilder` is the NPM counterpart of `TargetPassConfig` with below differences. - Debugging features (MIR print/verify, disable pass, start/stop-before/after, etc.) living in `TargetPassConfig` are moved to use PassInstrument as much as possible. (Implementation also lives in `TargetPassConfig.cpp`) - `TargetPassConfig` is a polymorphic base (virtual inheritance) to build the target-dependent pipeline whereas `CodeGenPassBuilder` is the CRTP base/helper to implement the target-dependent pipeline. The motivation is flexibility for targets to customize the pipeline, inlining opportunity, and fits the overall NPM value semantics design. - `TargetPassConfig` is a legacy immutable pass to declare hooks for targets to customize some target-independent codegen layer behavior. This is partially ported to TargetMachine::options. The rest, such as `createMachineScheduler/createPostMachineScheduler`, are left out for now. They should be implemented in LLVMTargetMachine in the future. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D83608	2020-09-11 16:41:17 -07:00
Fangrui Song	45d0343900	[MC] Allow .org directives in SHT_NOBITS sections This is used by kvm-unit-tests and can be trivially supported.	2020-09-11 15:12:42 -07:00
Matt Arsenault	382b2b1b51	RegAllocFast: Fix typo in comment	2020-09-11 18:06:14 -04:00
Matt Arsenault	e21bb31eb6	CodeGen: Require SSA to run PeepholeOptimizer	2020-09-11 18:03:04 -04:00
Lang Hames	7dcd0042e8	Re-apply "[ORC] Make MaterializationResponsibility immovable..." with fixes. Re-applies `c74900ca67` with fixes for the ThinLtoJIT example.	2020-09-11 14:09:05 -07:00
Mircea Trofin	9a2bab5ea2	[ThinLTO] Make -lto-embed-bitcode an enum The current behavior of -lto-embed-bitcode is not quite the same as that of -fembed-bitcode. While both populate .llvmbc with bitcode, the latter populates it with pre-optimized bitcode(), while the former with post-optimized. The scenarios driving them are different - the latter's goal is to allow re-compilation, while the former, IIUC, is execution. I plan to add a third mode for thinlto cases, closely-related to -fembed-bitcode's scenario: adding the bitcode pre-optimization, but post-merging. This would allow re-compilation without requiring the other .bc files that were merged (akin to how -fembed-bitcode allows recompilation without all the .h files) The third mode can't co-exist with the current -lto-embed-bitcode mode, because the latter would overwrite it. For clarity, we change -lto-embed-bitcode to be an enum. () That's the compiler semantics. The driver splits compilation in 2 phases, so if -fembed-bitcode is given to the driver, the .llvmbc is optimized bitcode; if the option is passed to the compiler (after -cc1), the section is pre-optimized. Differential Revision: https://reviews.llvm.org/D87477	2020-09-11 13:24:54 -07:00
Sam Clegg	fa2a8acc71	[WebAssembly] Add assembly syntax for mutable globals This adds and optional ", immutable" to the end of a `.globaltype` declaration. I would have prefered to match the `.wat` syntax where immutable is the default and `mut` is the signifier for mutable globals. Sadly changing the default would break backwards compat with existing assembly in the wild so I think its best to stick with this approach. Differential Revision: https://reviews.llvm.org/D87515	2020-09-11 11:11:02 -07:00
Sanjay Patel	40f12ef621	[SLP] further limit bailout for load combine candidate (PR47450) The test example based on PR47450 shows that we can match non-byte-sized shifts, but those won't ever be bswap opportunities. This isn't a full fix (we'd still match if the shifts were by 8-bits for example), but this should be enough until there's evidence that we need to do more (this is a borderline case for vectorization in the first place).	2020-09-11 11:56:11 -04:00
Krzysztof Parzyszek	f92908cc74	[DSE] Make sure that DSE+MSSA can handle masked stores Differential Revision: https://reviews.llvm.org/D87414	2020-09-11 10:00:21 -05:00
Sanjay Patel	6aa3fc4a5b	Revert "[InstCombine] propagate 'nsw' on pointer difference of 'inbounds' geps (PR47430)" This reverts commit `324a53205a`. On closer examination of at least one of the test diffs, this does not appear to be correct in all cases. Even the existing 'nsw' creation may be wrong based on this example: https://alive2.llvm.org/ce/z/uL4Hw9 https://alive2.llvm.org/ce/z/fJMKQS	2020-09-11 10:54:48 -04:00
Sanjay Patel	324a53205a	[InstCombine] propagate 'nsw' on pointer difference of 'inbounds' geps (PR47430) There's no signed wrap if both geps have 'inbounds': https://alive2.llvm.org/ce/z/nZkQTg https://alive2.llvm.org/ce/z/7qFauh	2020-09-11 10:39:09 -04:00
Simon Pilgrim	48b510c4bc	[NFC] Fix compiler warnings due to integer comparison of different signedness Fix by directly using INT_MAX and INT32_MAX. Patch by: @nullptr.cpp (Yang Fan) Differential Revision: https://reviews.llvm.org/D87347	2020-09-11 15:32:03 +01:00
Florian Hahn	8da6ae4ce1	Revert "[ConstraintSystem] Add helpers to deal with linear constraints." This reverts commit `3eb141e507`. This uses __builtin_mul_overflow which is not available everywhere.	2020-09-11 14:49:04 +01:00
Florian Hahn	3eb141e507	[ConstraintSystem] Add helpers to deal with linear constraints. This patch introduces a new ConstraintSystem class, that maintains a set of linear constraints and uses Fourier–Motzkin elimination to eliminate constraints to check if there are solutions for the system. It also adds a convert-constraint-log-to-z3.py script, which can parse the debug output of the constraint system and convert it to a python script that feeds the constraints into Z3 and checks if it produces the same result as the LLVM implementation. This is for verification purposes. Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D84544	2020-09-11 14:43:22 +01:00
Andrzej Warzynski	4eed800b18	[NFC] Fix the signature and definition of findByPrefix In https://reviews.llvm.org/rG257b29715bb27b7d9f6c3c40c481b6a4af0b37e5, the definition of OptTable::Info::Flags was changed from `unsigned short` to `unsigned int`, but the definition/declaration of OptTable::findByPrefix wasn't updated to reflect that. This patch updates findByPrefix accordingly.	2020-09-11 12:38:28 +01:00
Jeremy Morse	0caeaff123	[LiveDebugValues][NFC] Re-land `60db26a66d`, add instr-ref tests This was landed but reverted in `5b9c2b1bea` due to asan picking up a memory leak. This is fixed in the change to InstrRefBasedImpl.cpp. Original commit message follows: [LiveDebugValues][NFC] Add instr-ref tests, adapt old tests This patch adds a few tests in DebugInfo/MIR/InstrRef/ of interesting behaviour that the instruction referencing implementation of LiveDebugValues has. Mostly, these tests exist to ensure that if you give the "-experimental-debug-variable-locations" command line switch, the right implementation runs; and to ensure it behaves the same way as the VarLoc LiveDebugValues implementation. I've also touched roughly 30 other tests, purely to make the tests less rigid about what output to accept. DBG_VALUE instructions are usually printed with a trailing !debug-location indicating its scope: !debug-location !1234 However InstrRefBasedLDV produces new DebugLoc instances on the fly, meaning there sometimes isn't a numbered node when they're printed, making the output: !debug-location !DILocation(line: 0, blah blah) Which causes a ton of these tests to fail. This patch removes checks for that final part of each DBG_VALUE instruction. None of them appear to be actually checking the scope is correct, just that it's present, so I don't believe there's any loss in coverage here. Differential Revision: https://reviews.llvm.org/D83054	2020-09-11 12:14:44 +01:00
Simon Pilgrim	70a05ee288	[X86] Keep variables from getDataLayout/getDebugLoc calls as const reference. NFCI. These are only ever used as references in the called functions, so just pass the original reference instead of copying it.	2020-09-11 10:44:42 +01:00
Benjamin Kramer	5405ee553a	[CodeGenPrepare] Simplify code. NFCI.	2020-09-11 11:24:08 +02:00
Florian Hahn	c0825fa5fc	Revert "[ORC] Make MaterializationResponsibility immovable, pass by unique_ptr." This reverts commit `c74900ca67`. This appears to be breaking some builds on macOS and has been causing build failures on Green Dragon (see below). I am reverting this for now, to unblock testing on Green Dragon. http://green.lab.llvm.org/green/job/clang-stage1-cmake-RA-incremental/18144/console [65/187] /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/c++ -DBUILD_EXAMPLES -DGTEST_HAS_RTTI=0 -D_DEBUG -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -Iexamples/ThinLtoJIT -I/Users/buildslave/jenkins/workspace/clang-stage1-cmake-RA-incremental/llvm-project/llvm/examples/ThinLtoJIT -Iinclude -I/Users/buildslave/jenkins/workspace/clang-stage1-cmake-RA-incremental/llvm-project/llvm/include -fPIC -fvisibility-inlines-hidden -Werror=date-time -Werror=unguarded-availability-new -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -pedantic -Wno-long-long -Wimplicit-fallthrough -Wcovered-switch-default -Wno-noexcept-type -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -Wstring-conversion -fdiagnostics-color -O3 -isysroot /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.15.sdk -mmacosx-version-min=10.9 -fno-exceptions -fno-rtti -UNDEBUG -std=c++14 -MD -MT examples/ThinLtoJIT/CMakeFiles/ThinLtoJIT.dir/ThinLtoDiscoveryThread.cpp.o -MF examples/ThinLtoJIT/CMakeFiles/ThinLtoJIT.dir/ThinLtoDiscoveryThread.cpp.o.d -o examples/ThinLtoJIT/CMakeFiles/ThinLtoJIT.dir/ThinLtoDiscoveryThread.cpp.o -c /Users/buildslave/jenkins/workspace/clang-stage1-cmake-RA-incremental/llvm-project/llvm/examples/ThinLtoJIT/ThinLtoDiscoveryThread.cpp FAILED: examples/ThinLtoJIT/CMakeFiles/ThinLtoJIT.dir/ThinLtoDiscoveryThread.cpp.o /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/c++ -DBUILD_EXAMPLES -DGTEST_HAS_RTTI=0 -D_DEBUG -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -Iexamples/ThinLtoJIT -I/Users/buildslave/jenkins/workspace/clang-stage1-cmake-RA-incremental/llvm-project/llvm/examples/ThinLtoJIT -Iinclude -I/Users/buildslave/jenkins/workspace/clang-stage1-cmake-RA-incremental/llvm-project/llvm/include -fPIC -fvisibility-inlines-hidden -Werror=date-time -Werror=unguarded-availability-new -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -pedantic -Wno-long-long -Wimplicit-fallthrough -Wcovered-switch-default -Wno-noexcept-type -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -Wstring-conversion -fdiagnostics-color -O3 -isysroot /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.15.sdk -mmacosx-version-min=10.9 -fno-exceptions -fno-rtti -UNDEBUG -std=c++14 -MD -MT examples/ThinLtoJIT/CMakeFiles/ThinLtoJIT.dir/ThinLtoDiscoveryThread.cpp.o -MF examples/ThinLtoJIT/CMakeFiles/ThinLtoJIT.dir/ThinLtoDiscoveryThread.cpp.o.d -o examples/ThinLtoJIT/CMakeFiles/ThinLtoJIT.dir/ThinLtoDiscoveryThread.cpp.o -c /Users/buildslave/jenkins/workspace/clang-stage1-cmake-RA-incremental/llvm-project/llvm/examples/ThinLtoJIT/ThinLtoDiscoveryThread.cpp In file included from /Users/buildslave/jenkins/workspace/clang-stage1-cmake-RA-incremental/llvm-project/llvm/examples/ThinLtoJIT/ThinLtoDiscoveryThread.cpp:7: /Users/buildslave/jenkins/workspace/clang-stage1-cmake-RA-incremental/llvm-project/llvm/examples/ThinLtoJIT/ThinLtoInstrumentationLayer.h:37:68: error: non-virtual member function marked 'override' hides virtual member function void emit(MaterializationResponsibility R, ThreadSafeModule TSM) override; ^ /Users/buildslave/jenkins/workspace/clang-stage1-cmake-RA-incremental/llvm-project/llvm/include/llvm/ExecutionEngine/Orc/Layer.h:103:16: note: hidden overloaded virtual function 'llvm::orc::IRLayer::emit' declared here: type mismatch at 1st parameter ('std::unique_ptr<MaterializationResponsibility>' vs 'llvm::orc::MaterializationResponsibility') virtual void emit(std::unique_ptr<MaterializationResponsibility> R, ^ 1 error generated.	2020-09-11 09:35:20 +01:00
Martin Storsjö	e6419d320d	[MC] [Win64EH] Fix builds with expensive checks enabled This fixes a failed assert if expensive checks are enabled, since `1308bb99e0`.	2020-09-11 11:17:01 +03:00
David Sherwood	1e1770a07e	[SVE][CodeGen] Fix InlineFunction for scalable vectors When inlining functions containing allocas of scalable vectors we cannot specify the size in the lifetime markers, since we don't know this at compile time. Added new test here: test/Transforms/Inline/AArch64/sve-alloca-merge.ll Differential Revision: https://reviews.llvm.org/D87139	2020-09-11 08:34:51 +01:00
Yevgeny Rouban	28012e00d8	[NewPM] Introduce PreserveCFG check Check that all passes, which report they preserve CFG, are really preserving CFG. A new standard instrumentation is introduced. It can be switched on/off by the flag verify-cfg-preserved, which is on by default for debug builds. Reviewers: kuhar, fedor.sergeev Differential Revision: https://reviews.llvm.org/D81558	2020-09-11 14:32:21 +07:00
Martin Storsjö	1308bb99e0	[MC] [Win64EH] Write packed ARM64 epilogues if possible This gives a pretty substantial size reduction; for a 6.5 MB DLL with 300 KB .xdata, the .xdata shrinks by 66 KB. Differential Revision: https://reviews.llvm.org/D87369	2020-09-11 10:31:04 +03:00
Martin Storsjö	700fbe591a	[MC] [Win64EH] Canonicalize ARM64 unwind opcodes Convert 2-byte opcodes to equivalent 1-byte ones. Adjust the existing exhaustive testcase to avoid being altered by the simplification rules (to keep that test exercising all individual opcodes). Fix the assembler parser limits for register pairs; for .seh_save_regp and .seh_save_regp_x, we can allow up to x29, for a x29+x30 pair (which gets remapped to the UOP_SaveFPLR(X) opcodes), for .seh_save_fregp and .seh_save_fregpx, allow up to d14+d15. Not creating .seh_save_next for float register pairs, as the actual unwinder implementation in current versions of Windows is buggy for that case. This gives a minimal but measurable size reduction. (For a 6.5 MB DLL with 300 KB .xdata, the .xdata shrinks by 48 bytes. The opcode sequences are padded to a 4 byte boundary, so very small improvements might not end up mattering directly.) Differential Revision: https://reviews.llvm.org/D87367	2020-09-11 10:31:04 +03:00
Martin Storsjö	46416f0803	[CodeGen] [WinException] Remove a redundant explicit section switch for aarch64 The following EmitWinEHHandlerData() implicitly switches to .xdata, just like on x86_64. This became orphaned from the original code requiring it in `0b61d220c9` / https://reviews.llvm.org/D61095. Differential Revision: https://reviews.llvm.org/D87447	2020-09-11 10:31:04 +03:00
Michael Liao	f787fe15d8	[EarlyCSE] Remove unnecessary operand swap. - As min/max are commutative operators, there is no need to swap operands. That breaks the convention calculating the hash value.	2020-09-11 02:14:04 -04:00
Alok Kumar Sharma	e45b0708ae	[DebugInfo] Fixing CodeView assert related to lowerBound field of DISubrange. This is to fix CodeView build failure https://bugs.llvm.org/show_bug.cgi?id=47287 after DIsSubrange upgrade D80197 Assert condition is now removed and Count is calculated in case LowerBound is absent or zero and Count or UpperBound is constant. If Count is unknown it is later handled as VLA (currently Count is set to zero). Reviewed By: rnk Differential Revision: https://reviews.llvm.org/D87406	2020-09-11 11:12:49 +05:30
Mircea Trofin	da92448828	[NFC][MLInliner] Presort instruction successions. Differential Revision: https://reviews.llvm.org/D87489	2020-09-10 21:40:49 -07:00
Michael Liao	41e68f7ee7	[EarlyCSE] Fix and recommit the revised `c9826829d7` In addition to calculate hash consistently by swapping SELECT's operands, we also need to inverse the select pattern favor to match the original logic. [EarlyCSE] Equivalent SELECTs should hash equally DenseMap<SimpleValue> assumes that, if its isEqual method returns true for two elements, then its getHashValue method must return the same value for them. This invariant is broken when one SELECT node is a min/max operation, and the other can be transformed into an equivalent min/max by inverting its predicate and swapping its operands. This patch fixes an assertion failure that would occur intermittently while compiling the following IR: define i32 @t(i32 %i) { %cmp = icmp sle i32 0, %i %twin1 = select i1 %cmp, i32 %i, i32 0 %cmpinv = icmp sgt i32 0, %i %twin2 = select i1 %cmpinv, i32 0, i32 %i %sink = add i32 %twin1, %twin2 ret i32 %sink } Differential Revision: https://reviews.llvm.org/D86843	2020-09-10 23:30:56 -04:00
Michael Liao	39dc75f66c	Revert "[EarlyCSE] Equivalent SELECTs should hash equally" This reverts commit `c9826829d7` as it breaks regression tests.	2020-09-10 22:37:35 -04:00
Zarko Todorovski	035396197a	Remove unused variable introduce in `0448d11a06` causing build failures with -Werror on.	2020-09-10 20:07:11 -04:00
Reid Kleckner	2c73bef7fa	Fix wrong comment about enabling optimizations to work around a bug	2020-09-10 16:45:20 -07:00
Amara Emerson	0448d11a06	[AArch64][GlobalISel] Don't emit a branch for a fallthrough G_BR at -O0. With optimizations we leave the decision to eliminate fallthrough branches to bock placement, but at -O0 we should do it in the selector to save code size. This regressed -O0 with a recent change to a combiner.	2020-09-10 15:01:26 -07:00
Reid Kleckner	4e3edef4b8	Use pragmas to work around MSVC x86_32 debug miscompile bug Halide users reported this here: https://llvm.org/pr46176 I reported the issue to MSVC here: https://developercommunity.visualstudio.com/content/problem/1179643/msvc-copies-overaligned-non-trivially-copyable-par.html This codepath is apparently not covered by LLVM's unit tests, so I added coverage in a unit test. If we want to support this configuration going forward, it means that is in general not safe to pass a SmallVector<T, N> by value if alignof(T) is greater than 4. This doesn't appear to come up often because passing a SmallVector by value is inefficient and not idiomatic: it copies the inline storage. In this case, the SmallVector<LLT,4> is captured by value by a lambda, and the lambda is passed by value into std::function, and that's how we hit the bug. Differential Revision: https://reviews.llvm.org/D87475	2020-09-10 14:50:01 -07:00
Florian Hahn	fb109c42d9	[DSE] Switch to MemorySSA-backed DSE by default. The tests have been updated and I plan to move them from the MSSA directory up. Some end-to-end tests needed small adjustments. One difference to the legacy DSE is that legacy DSE also deletes trivially dead instructions that are unrelated to memory operations. Because MemorySSA-backed DSE just walks the MemorySSA, we only visit/check memory instructions. But removing unrelated dead instructions is not really DSE's job and other passes will clean up. One noteworthy change is in llvm/test/Transforms/Coroutines/ArgAddr.ll, but I think this comes down to legacy DSE not handling instructions that may throw correctly in that case. To cover this with MemorySSA-backed DSE, we need an update to llvm.coro.begin to treat it's return value to belong to the same underlying object as the passed pointer. There are some minor cases MemorySSA-backed DSE currently misses, e.g. related to atomic operations, but I think those can be implemented after the switch. This has been discussed on llvm-dev: http://lists.llvm.org/pipermail/llvm-dev/2020-August/144417.html For the MultiSource/SPEC2000/SPEC2006 the number of eliminated stores goes from ~17500 (legayc DSE) to ~26300 (MemorySSA-backed). More numbers and details in the thread on llvm-dev. Impact on CTMark: ``` Legacy Pass Manager exec instrs size-text O3 + 0.60% - 0.27% ReleaseThinLTO + 1.00% - 0.42% ReleaseLTO-g. + 0.77% - 0.33% RelThinLTO (link only) + 0.87% - 0.42% RelLO-g (link only) + 0.78% - 0.33% ``` http://llvm-compile-time-tracker.com/compare.php?from=3f22e96d95c71ded906c67067d75278efb0a2525&to=ae8be4642533ff03803967ee9d7017c0d73b0ee0&stat=instructions ``` New Pass Manager exec instrs. size-text O3 + 0.95% - 0.25% ReleaseThinLTO + 1.34% - 0.41% ReleaseLTO-g. + 1.71% - 0.35% RelThinLTO (link only) + 0.96% - 0.41% RelLO-g (link only) + 2.21% - 0.35% ``` http://195.201.131.214:8000/compare.php?from=3f22e96d95c71ded906c67067d75278efb0a2525&to=ae8be4642533ff03803967ee9d7017c0d73b0ee0&stat=instructions Reviewed By: asbirlea, xbolva00, nikic Differential Revision: https://reviews.llvm.org/D87163	2020-09-10 22:24:32 +01:00
Bryan Chan	c9826829d7	[EarlyCSE] Equivalent SELECTs should hash equally DenseMap<SimpleValue> assumes that, if its isEqual method returns true for two elements, then its getHashValue method must return the same value for them. This invariant is broken when one SELECT node is a min/max operation, and the other can be transformed into an equivalent min/max by inverting its predicate and swapping its operands. This patch fixes an assertion failure that would occur intermittently while compiling the following IR: define i32 @t(i32 %i) { %cmp = icmp sle i32 0, %i %twin1 = select i1 %cmp, i32 %i, i32 0 %cmpinv = icmp sgt i32 0, %i %twin2 = select i1 %cmpinv, i32 0, i32 %i %sink = add i32 %twin1, %twin2 ret i32 %sink } Differential Revision: https://reviews.llvm.org/D86843	2020-09-10 16:59:24 -04:00
Lang Hames	c74900ca67	[ORC] Make MaterializationResponsibility immovable, pass by unique_ptr. Making MaterializationResponsibility instances immovable allows their associated VModuleKeys to be updated by the ExecutionSession while the responsibility is still in-flight. This will be used in the upcoming removable code feature to enable safe merging of resource keys even if there are active compiles using the keys being merged.	2020-09-10 13:21:46 -07:00
Nikita Popov	a5168bdb4a	[DemandedBits][BDCE] Add support for min/max intrinsics Add DemandedBits / BDCE support for min/max intrinsics: If the low bits are not demanded in the result, they also aren't demanded in the operands. Differential Revision: https://reviews.llvm.org/D87161	2020-09-10 22:13:31 +02:00
Nikita Popov	99e78cb718	[DemandedBits] Add braces to large if (NFC) While the if only contains a single statement, it happens to be a huge switch. Add braces to make this code easier to read.	2020-09-10 22:13:27 +02:00
Volkan Keles	d4bf90271f	GlobalISel: Combine fneg(fneg x) to x https://reviews.llvm.org/D87473	2020-09-10 12:57:38 -07:00
Anna Thomas	b1b9806370	[ImplicitNullChecks] NFC: Remove unused PointerReg arg in dep analysis The PointerReg arg was passed into the dependence function for an assertion which no longer exists. So, this patch updates the dependence functions to avoid the PointerReg in the signature. Tests-Run: make check	2020-09-10 15:31:57 -04:00
Christopher Tetreault	7ddfd9b3eb	[SVE] Bail from VectorUtils heuristics for scalable vectors Bail from maskIsAllZeroOrUndef and maskIsAllOneOrUndef prior to iterating over the number of elements for scalable vectors. Assert that the mask type is not scalable in possiblyDemandedEltsInMask . Assert that the types are correct in all three functions. Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D87424	2020-09-10 12:29:37 -07:00
Krzysztof Parzyszek	783e28a508	[Hexagon] Split pair-based masked memops	2020-09-10 14:24:42 -05:00
Dominic Chen	4252f3009b	[WebAssembly] Set unreachable as canonical to permit disassembly Currently, using llvm-objdump to disassemble a function containing unreachable will trigger an assertion while decoding the opcode, since both unreachable and debug_unreachable have the same encoding. To avoid this, set unreachable as the canonical decoding. Differential Revision: https://reviews.llvm.org/D87431	2020-09-10 15:04:16 -04:00
Craig Topper	c195ae2f00	[SLPVectorizer][X86][AMDGPU] Remove fcmp+select to fmin/fmax reduction support. Previously we could match fcmp+select to a reduction if the fcmp had the nonans fast math flag. But if the select had the nonans fast math flag, InstCombine would turn it into a fminnum/fmaxnum intrinsic before SLP gets to it. Seems fairly likely that if one of the fcmp+select pair have the fast math flag, they both would. My plan is to start vectorizing the fmaxnum/fminnum version soon, but I wanted to get this code out as it had some of the strangest fast math flag behaviors.	2020-09-10 11:49:19 -07:00
Fangrui Song	a0ffe2b21a	[PGO] Skip if an IndirectBrInst critical edge cannot be split PGOInstrumentation runs `SplitIndirectBrCriticalEdges` but some IndirectBrInst critical edge cannot be split. `getInstrBB` will crash when calling `SplitCriticalEdge`, e.g. int foo(char p) { void targets[2]; targets[0] = &&indirect; targets[1] = &&end; for (;; p++) if (p == 7) { indirect: goto targets[p[1]]; // the self loop is critical in -O } end: return 0; } Skip such critical edges to prevent a crash. Reviewed By: davidxl, lebedev.ri Differential Revision: https://reviews.llvm.org/D87435	2020-09-10 11:04:14 -07:00
Anna Thomas	46329f6079	[ImplicitNullCheck] Handle instructions that preserve zero value This is the first in a series of patches to make implicit null checks more general. This patch identifies instructions that preserves zero value of a register and considers that as a valid instruction to hoist along with the faulting load. See added testcases. Reviewed-By: reames, dantrushin Differential Revision: https://reviews.llvm.org/D87108	2020-09-10 13:39:50 -04:00
Mircea Trofin	e543708e5e	[NFC][ThinLTO] Let llvm::EmbedBitcodeInModule handle serialization. llvm::EmbedBitcodeInModule handles serializing the passed-in module, if the provided MemoryBufferRef is invalid. This is already the path taken in one of the uses of the API - clang::EmbedBitcode, when called from BackendConsumer::HandleTranslationUnit - so might as well do the same here and reduce (by very little) code duplication. The only difference this patch introduces is that the serialization happens with ShouldPreserveUseListOrder set to true. Differential Revision: https://reviews.llvm.org/D87339	2020-09-10 10:25:00 -07:00
Ettore Tiotto	6b13cfe739	[ArgumentPromotion]: Copy function metadata after promoting arguments The argument promotion pass currently fails to copy function annotations over to the modified function after promoting arguments. This patch copies the original function annotation to the new function. Reviewed By: fhann Differential Revision: https://reviews.llvm.org/D86630	2020-09-10 13:08:57 -04:00
Kit Barton	009cd4e491	[PPC][GlobalISel] Add initial GlobalIsel infrastructure This adds the initial GlobalISel skeleton for PowerPC. It can only run ir-translator and legalizer for `ret void`. This is largely based on the initial GlobalISel patch for RISCV (https://reviews.llvm.org/D65219). Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D83100	2020-09-10 11:58:01 -05:00
Simon Pilgrim	f42f733af9	SwitchLoweringUtils.h - reduce TargetLowering.h include. NFCI. Only include the headers we actually need, and move the remaining includes down to implicit dependent files.	2020-09-10 17:42:18 +01:00
Owen Anderson	3d9c85e4d8	Mark FMOV constant materialization as being as cheap as a move. This prevents us from doing things like LICM'ing it out of a loop, which is usually a net loss because we end up having to spill a callee-saved FPR to accomodate it. This does perturb instruction scheduling around this instruction, so a number of tests had to be updated to account for it. Reviewed By: t.p.northover Differential Revision: https://reviews.llvm.org/D87316	2020-09-10 16:38:59 +00:00
Simon Pilgrim	601557e9f9	Hexagon.h - remove unnecessary includes. NFCI. Replace with forward declarations and move includes to implicit dependent files.	2020-09-10 16:59:43 +01:00
Krzysztof Parzyszek	8a08740db6	[GVN] Account for masked loads/stores depending on load/store instructions This is a case where an intrinsic depends on a non-call instruction. Differential Revision: https://reviews.llvm.org/D87423	2020-09-10 10:57:33 -05:00
Simon Pilgrim	b585fdae24	[X86] Use Register instead of unsigned. NFCI. Fixes llvm-prefer-register-over-unsigned clang-tidy warnings.	2020-09-10 16:05:33 +01:00
Simon Pilgrim	9f830e0af7	AArch64MachineFunctionInfo.h - remove unnecessary TargetFrameLowering.h include. NFCI.	2020-09-10 16:05:33 +01:00
Nikita Popov	4e413e1621	[InstCombine] Temporarily do not drop volatile stores before unreachable See discussion in D87149. Dropping volatile stores here is legal per LLVM semantics, but causes issues for real code and may result in a change to LLVM volatile semantics. Temporarily treat volatile stores as "not guaranteed to transfer execution" in just this place, until this issue has been resolved.	2020-09-10 16:16:44 +02:00
Jay Foad	517202c720	[TargetLowering] Fix comments describing XOR -> OR/AND transformations	2020-09-10 13:56:34 +01:00
Florian Hahn	a5ec99da6e	[DSE] Support eliminating memcpy.inline. MemoryLocation has been taught about memcpy.inline, which means we can get the memory locations read and written by it. This means DSE can handle memcpy.inline	2020-09-10 13:19:25 +01:00
Max Kazantsev	8c0bbbade1	[NFC] Refactoring in SCEV: add missing `const` qualifiers	2020-09-10 19:06:37 +07:00
Simon Pilgrim	de25ebaac6	[CostModel][X86] Add vXi32 division by uniform constant costs (PR47476) Other types can be handled in future patches but their uniform / non-uniform costs are more similar and don't appear to cause many vectorization issues.	2020-09-10 12:17:54 +01:00
Simon Pilgrim	fc49abee56	[X86][SSE] lowerShuffleAsSplitOrBlend always returns a shuffle. lowerShuffleAsSplitOrBlend always returns a target shuffle result (and is the default operation for lowering some shuffle types), so we don't need to check for null.	2020-09-10 11:45:08 +01:00
Simon Pilgrim	e80605e242	[X86] Remove WaitInsert::TTI member. NFCI. This is only ever set/used inside WaitInsert::runOnMachineFunction so don't bother storing it in the class.	2020-09-10 11:45:08 +01:00
Kerry McLaughlin	cd89f5c91b	[SVE][CodeGen] Legalisation of truncate for scalable vectors Truncating from an illegal SVE type to a legal type, e.g. `trunc <vscale x 4 x i64> %in to <vscale x 4 x i32>` fails after PromoteIntOp_CONCAT_VECTORS attempts to create a BUILD_VECTOR. This patch changes the promote function to create a sequence of INSERT_SUBVECTORs if the return type is scalable, and replaces these with UNPK+UZP1 for AArch64. Reviewed By: paulwalker-arm Differential Revision: https://reviews.llvm.org/D86548	2020-09-10 11:35:33 +01:00
Juneyoung Lee	1b9884df8d	Enable InsertFreeze flag of JumpThreading when used in LTO This patch enables inserting freeze when JumpThreading converts a select to a conditional branch when it is run in LTO. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D85534	2020-09-10 19:05:49 +09:00
Sam Tebbs	b81c57d646	[ARM][LowOverheadLoops] Allow tail predication on predicated instructions with unknown lane values The effects of unpredicated vector instruction with unknown lanes cannot be predicted and therefore cannot be tail predicated. This does not apply to predicated vector instructions and so this patch allows tail predication on them. Differential Revision: https://reviews.llvm.org/D87376	2020-09-10 10:34:32 +01:00
Sam Parker	0bdf8c9127	[SCEV] Constant expansion cost at minsize As code size is the only thing we care about at minsize, query the cost of materialising immediates when calculating the cost of a SCEV expansion. We also modify the CostKind to TCK_CodeSize for minsize, instead of RecipThroughput. Differential Revision: https://reviews.llvm.org/D76434	2020-09-10 08:21:11 +01:00
Sam Parker	1919b65052	[ARM] Tail predicate VQDMULH and VQRDMULH Mark the family of instructions as valid for tail predication. Differential Revision: https://reviews.llvm.org/D87348	2020-09-10 08:20:07 +01:00
Juneyoung Lee	39c1653b3d	[JumpThreading] Conditionally freeze its condition when unfolding select This patch fixes pr45956 (https://bugs.llvm.org/show_bug.cgi?id=45956 ). To minimize its impact to the quality of generated code, I suggest enabling this only for LTO as a start (it has two JumpThreading passes registered). This patch contains a flag that makes JumpThreading enable it. Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D84940	2020-09-10 15:49:40 +09:00
Max Kazantsev	cde8fc65ae	[NFC] Rename variables to avoid name confusion Name `LI` is used for loop info, loop and load inst at the same function, which causes a lot of confusion.	2020-09-10 13:41:10 +07:00
Max Kazantsev	c413a8a8ec	[LoopLoadElim] Filter away candidates that stop being AddRecs after loop versioning. PR47457 The test in PR47457 demonstrates a situation when candidate load's pointer's SCEV is no loger a SCEVAddRec after loop versioning. The code there assumes that it is always a SCEVAddRec and crashes otherwise. This patch makes sure that we do not consider candidates for which this requirement is broken after the versioning. Differential Revision: https://reviews.llvm.org/D87355 Reviewed By: asbirlea	2020-09-10 13:30:31 +07:00
Qiu Chaofan	6afb279100	[PowerPC] [FPEnv] Disable strict FP mutation by default `22a0edd0` introduced a config IsStrictFPEnabled, which controls the strict floating point mutation (transforming some strict-fp operations into non-strict in ISel). This patch disables the mutation by default since we've finished PowerPC strict-fp enablement in backend. Reviewed By: uweigand Differential Revision: https://reviews.llvm.org/D87222	2020-09-10 13:28:09 +08:00
Petr Hosek	c4d7536136	[CMake] Simplify CMake handling for libxml2 This matches the changes made to handling of zlib done in `10b1b4a` where we rely on find_package and the imported target rather than manually appending the library and include paths. The use of LLVM_LIBXML2_ENABLED has been replaced by LLVM_ENABLE_LIBXML2 thus reducing the number of variables. Differential Revision: https://reviews.llvm.org/D84563	2020-09-09 21:44:44 -07:00
Jordan Rupprecht	52f0837778	[NFC] Move definition of variable now only used in debug builds	2020-09-09 20:23:59 -07:00
Matt Arsenault	e15215e041	AMDGPU: Hoist check for VGPRs	2020-09-09 19:45:40 -04:00
Matt Arsenault	85490874b2	AMDGPU: Skip all meta instructions in hazard recognizer This was not adding a necessary nop due to thinking the kill counted.	2020-09-09 19:45:40 -04:00
Matt Arsenault	82cbc9330a	AMDGPU: Fix inserting waitcnts before kill uses	2020-09-09 19:45:40 -04:00
Juneyoung Lee	a6183d0f02	[ValueTracking] isKnownNonZero, computeKnownBits for freeze This implements support for isKnownNonZero, computeKnownBits when freeze is involved. ``` br (x != 0), BB1, BB2 BB1: y = freeze x ``` In the above program, we can say that y is non-zero. The reason is as follows: (1) If x was poison, `br (x != 0)` raised UB (2) If x was fully undef, the branch again raised UB (3) If x was non-zero partially undef, say `undef \| 1`, `freeze x` will return a nondeterministic value which is also non-zero. (4) If x was just a concrete value, it is trivial Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D75808	2020-09-10 08:07:38 +09:00
dfukalov	c259d3a061	[AMDGPU] Fix for folding v2.16 literals. It was found some packed immediate operands (e.g. `<half 1.0, half 2.0>`) are incorrectly processed so one of two packed values were lost. Introduced new function to check immediate 32-bit operand can be folded. Converted condition about current op_sel flags value to fall-through. Fixes: SWDEV-247595 Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D87158	2020-09-10 01:39:25 +03:00
Jessica Paquette	480e7f43a2	[AArch64][GlobalISel] Share address mode selection code for memops We were missing support for the G_ADD_LOW + ADRP folding optimization in the manual selection code for G_LOAD, G_STORE, and G_ZEXTLOAD. As a result, we were missing cases like this: ``` @foo = external hidden global i32* define void @baz(i32* %0) { store i32* %0, i32** @foo ret void } ``` https://godbolt.org/z/16r7ad This functionality already existed in the addressing mode functions for the importer. So, this patch makes the manual selection code use `selectAddrModeIndexed` rather than duplicating work. This is a 0.2% geomean code size improvement for CTMark at -O3. There is one code size increase (0.1% on lencod) which is likely because `selectAddrModeIndexed` doesn't look through constants. Differential Revision: https://reviews.llvm.org/D87397	2020-09-09 15:14:46 -07:00
Florian Hahn	9969c317ff	[DSE,MemorySSA] Handle atomic stores explicitly in isReadClobber. Atomic stores are modeled as MemoryDef to model the fact that they may not be reordered, depending on the ordering constraints. Atomic stores that are monotonic or weaker do not limit re-ordering, so we do not have to treat them as potential read clobbers. Note that llvm/test/Transforms/DeadStoreElimination/MSSA/atomic.ll already contains a set of negative test cases. Reviewed By: asbirlea Differential Revision: https://reviews.llvm.org/D87386	2020-09-09 23:01:58 +01:00
Nikita Popov	0a5dc7effb	[DAGCombiner] Fold fmin/fmax of NaN fminnum(X, NaN) is X, fminimum(X, NaN) is NaN. This mirrors the behavior of existing InstSimplify folds. This is expected to improve the reduction lowerings in D87391, which use NaN as a neutral element. Differential Revision: https://reviews.llvm.org/D87415	2020-09-09 23:53:32 +02:00
Amara Emerson	e5784ef8f6	[GlobalISel] Enable usage of BranchProbabilityInfo in IRTranslator. We weren't using this before, so none of the MachineFunction CFG edges had the branch probability information added. As a result, block placement later in the pipeline was flying blind. This is enabled only with optimizations enabled like SelectionDAG. Differential Revision: https://reviews.llvm.org/D86824	2020-09-09 14:31:12 -07:00
Amara Emerson	467a071285	[GlobalISel][IRTranslator] Generate better conditional branch lowering. This is a port of the functionality from SelectionDAG, which tries to find a tree of conditions from compares that are then combined using OR or AND, before using that result as the input to a branch. Instead of naively lowering the code as is, this change converts that into a sequence of conditional branches on the sub-expressions of the tree. Like SelectionDAG, we re-use the case block codegen functionality from the switch lowering utils, which causes us to generate some different code. The result of which I've tried to mitigate in earlier combine patches. Differential Revision: https://reviews.llvm.org/D86665	2020-09-09 13:16:11 -07:00
Amara Emerson	cc76da7ada	[GlobalISel] Rewrite the elide-br-by-swapping-icmp-ops combine to do less. This combine previously tried to take sequences like: %cond = G_ICMP pred, a, b G_BRCOND %cond, %truebb G_BR %falsebb %truebb: ... %falsebb: ... and by inverting the compare predicate and swapping branch targets, delete the G_BR and instead have a single conditional branch to the falsebb. Since in an earlier patch we have a combine to fold not(icmp) into just an inverted icmp, we don't need this combine to do as much. This patch instead generalizes the combine by just looking for: G_BRCOND %cond, %truebb G_BR %falsebb %truebb: ... %falsebb: ... and then inverting the condition using a not (xor). The xor can be folded away in a separate combine. This change also lets us avoid some optimization code in the IRTranslator. I also think that deleting G_BRs in the combiner is unnecessary. That's something that targets can decide to do at selection time and could simplify generic code in future. Differential Revision: https://reviews.llvm.org/D86664	2020-09-09 13:08:16 -07:00
Hiroshi Yamauchi	0ab6a15698	[X86] Add support for using fast short rep mov for memcpy lowering. Disabled by default behind an option. Differential Revision: https://reviews.llvm.org/D86883	2020-09-09 12:46:40 -07:00
Jian Cai	415a4fbea7	[MC] Resolve the difference of symbols in consecutive MCDataFragements Try to resolve the difference of two symbols in consecutive MCDataFragments. This is important for an idiom like "foo:instr; .if . - foo; instr; .endif" (https://bugs.llvm.org/show_bug.cgi?id=43795). Reviewed By: nickdesaulniers Differential Revision: https://reviews.llvm.org/D69411	2020-09-09 12:35:43 -07:00
Fangrui Song	ad61e346d3	[gcov] Give the __llvm_gcov_ctr load instruction a name for more readable output	2020-09-09 12:34:43 -07:00
Krzysztof Parzyszek	0ee54cf883	[Hexagon] Account for truncating pairs to non-pairs when widening truncates Added missing selection patterns for vpackl.	2020-09-09 14:31:52 -05:00
Fangrui Song	dbac20bb6b	[gcov] Don't split entry block; add a synthetic entry block instead The entry block is split at the first instruction where `shouldKeepInEntry` returns false. The created basic block has a br jumping to the original entry block. The new basic block causes the function label line and the other entry block lines to be covered by different basic blocks, which can affect line counts with special control flows (fork/exec in the entry block requires heuristics in llvm-cov gcov to get consistent line counts). int main() { // BB0 return 0; // BB2 (due to entry block splitting) } // BB1 is the exit block (since gcov 4.8) This patch adds a synthetic entry block (like PGOInstrumentation and GCC) and inserts an edge from the synthetic entry block to the original entry block. We can thus remove the tricky `shouldKeepInEntry` and entry block splitting. The number of basic blocks does not change, but the emitted .gcno files will be smaller because we can save one GCOV_TAG_LINES tag. // BB0 is the synthetic entry block with a single edge to BB2 int main() { // BB2 return 0; // BB2 } // BB1 is the exit block (since gcov 4.8)	2020-09-09 12:25:24 -07:00
Guillaume Chatelet	5a4a0cfcfb	[NFC] Separate bitcode reading for FUNC_CODE_INST_CMPXCHG(_OLD) This is preparatory work to unable storing alignment for AtomicCmpXchgInst. See D83136 for context and bug: https://bugs.llvm.org/show_bug.cgi?id=27168 This is the fixed version of D83375, which was submitted and reverted. Differential Revision: https://reviews.llvm.org/D87373	2020-09-09 19:10:30 +00:00
Mark de Wever	08196e0b2e	Implements [[likely]] and [[unlikely]] in IfStmt. This is the initial part of the implementation of the C++20 likelihood attributes. It handles the attributes in an if statement. Differential Revision: https://reviews.llvm.org/D85091	2020-09-09 20:48:37 +02:00
Krzysztof Parzyszek	81ff2d30a9	[DSE] Handle masked stores	2020-09-09 13:31:31 -05:00
Ulrich Weigand	1a25133bcd	[DAGCombine] Skip re-visiting EntryToken to avoid compile time explosion During the main DAGCombine loop, whenever a node gets replaced, the new node and all its users are pushed onto the worklist. Omit this if the new node is the EntryToken (e.g. if a store managed to get optimized out), because re-visiting the EntryToken and its users will not uncover any additional opportunities, but there may be a large number of such users, potentially causing compile time explosion. This compile time explosion showed up in particular when building the SingleSource/UnitTests/matrix-types-spec.cpp test-suite case on any platform without SIMD vector support. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D86963	2020-09-09 19:13:46 +02:00
Mircea Trofin	4b15fc9ddb	[NFC][MLInliner] Don't initialize in an assert. Since the build bots have assertions enabled, this flew under the radar.	2020-09-09 09:56:07 -07:00
Simon Pilgrim	6e45b98934	X86CallFrameOptimization.cpp - use const references where possible. NFCI.	2020-09-09 16:35:08 +01:00
Simon Pilgrim	e706116e11	X86FrameLowering::adjustStackWithPops - cleanup auto usage. NFCI. Don't use auto for non-obvious types, and use const references.	2020-09-09 16:15:02 +01:00
Qiu Chaofan	88ff4d2ca1	[PowerPC] Fix STRICT_FRINT/STRICT_FNEARBYINT lowering In standard C library, both rint and nearbyint returns rounding result in current rounding mode. But nearbyint never raises inexact exception. On PowerPC, x(v\|s)r(d\|s)pic may modify FPSCR XX, raising inexact exception. So we can't select constrained fnearbyint into xvrdpic. One exception here is xsrqpi, which will not raise inexact exception, so fnearbyint f128 is okay here. Reviewed By: uweigand Differential Revision: https://reviews.llvm.org/D87220	2020-09-09 22:40:58 +08:00
Jay Foad	649bde488c	[AMDGPU] Simplify S_SETREG_B32 case in EmitInstrWithCustomInserter NFC.	2020-09-09 15:18:31 +01:00
Dmitry Preobrazhensky	95b7040e43	[AMDGPU][MC] Improved diagnostic messages for invalid registers Corrected parser to issue meaningful error messages for invalid and malformed registers. See bug 41303: https://bugs.llvm.org/show_bug.cgi?id=41303 Reviewers: arsenm, rampitec Differential Revision: https://reviews.llvm.org/D87234	2020-09-09 16:44:03 +03:00
Alon Kom	818cf30b83	[MachinePipeliner] Fix II_setByPragma initialization II_setByPragma was not reset between 2 calls of the MachinePipleiner pass Reviewed By: bcahoon Differential Revision: https://reviews.llvm.org/D87088	2020-09-09 13:38:35 +00:00
Denis Antrushin	4358fa782e	[Statepoints] Update DAG root after emitting statepoint. Since we always generate CopyToRegs for statepoint results, we must update DAG root after emitting statepoint, so that these copies are scheduled before any possible local uses. Note: getControlRoot() flushes all PendingExports, not only those we generates for relocates. If that'll become a problem, we can change it to flushing relocate exports only. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D87251	2020-09-09 20:22:10 +07:00
Ronak Chauhan	f078577f31	Revert "[AMDGPU] Support disassembly for AMDGPU kernel descriptors" This reverts commit `487a805310`. Tests fail on big endian machines.	2020-09-09 18:01:28 +05:30
Simon Pilgrim	d816499f95	[KnownBits] Move SelectionDAG::computeKnownBits ISD::ABS handling to KnownBits::abs Move the ISD::ABS handling to a KnownBits::abs handler, to simplify future implementations in ValueTracking/GlobalISel.	2020-09-09 13:22:58 +01:00
David Stenberg	48fc781438	[UnifyFunctionExitNodes] Fix Modified status for unreachable blocks If a function had at most one return block, the pass would return false regardless if an unified unreachable block was created. This patch fixes that by refactoring runOnFunction into two separate helper functions for handling the unreachable blocks respectively the return blocks, as suggested by @bjope in a review comment. This was caught using the check introduced by D80916. Reviewed By: serge-sans-paille Differential Revision: https://reviews.llvm.org/D85818	2020-09-09 13:36:03 +02:00
Juneyoung Lee	36c8621638	[BuildLibCalls] Add more noundef to library functions This patch follows D85345 and adds more noundef attributes to return values/arguments of library functions that are mostly about accessing the file system or processes. A few functions like `chmod` or `times` use typedef `mode_t` and `clock_t`. They are neither struct nor union, so they cannot contain undef even if they're lowered to iN in IR. So, it is fine to add noundef to them. - clock_t's actual type is size_t (C17, 7.27.1.3), so it isn't struct or union. - For mode_t, either int or long is used in practice because programmers use bit manipulation. So, I think it is okay that it's never aggregate in practice. After this patch, the remaining library functions are those that eagerly participate in optimizations: they can be removed, reordered, or introduced by a transformation from primitive IR operations. For them, a few testings is needed, since it may not be valid to add noundef anymore even if C standard says it's okay. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D85894	2020-09-09 20:33:35 +09:00
Juneyoung Lee	25ce1e0497	[ValueTracking] Add UndefOrPoison/Poison-only version of relevant functions This patch adds isGuaranteedNotToBePoison and programUndefinedIfUndefOrPoison. isGuaranteedNotToBePoison will be used at D75808. The latter function is used at isGuaranteedNotToBePoison. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D84242	2020-09-09 20:00:26 +09:00
Simon Pilgrim	455cce3e21	TrigramIndex.cpp - remove unnecessary includes. NFCI. TrigramIndex.h already includes most of these.	2020-09-09 11:38:31 +01:00
Simon Pilgrim	f16b2d8315	ARMTargetParser.cpp - use auto const references in for range loops. NFCI. Fix static analysis warnings about unnecessary copies.	2020-09-09 11:38:31 +01:00
Simon Pilgrim	24ecfdac7b	[APFloat] Fix uninitialized variable in IEEEFloat constructors Some constructors of IEEEFloat do not initialize member variable exponent. Fix it by initializing exponent with the following values: For NaNs, the `exponent` is `maxExponent+1`. For Infinities, the `exponent` is `maxExponent+1`. For Zeroes, the `exponent` is `maxExponent-1`. Patch by: @nullptr.cpp (Yang Fan) Differential Revision: https://reviews.llvm.org/D86997	2020-09-09 11:38:30 +01:00
Mirko Brkusanin	43af2a6faa	[AMDGPU] Workaround for LDS Misalignment bug on GFX10 Add subtarget feature check to avoid using ds_read/write_b96/128 with too low alignment if a bug is present on that specific hardware. Add this "feature" to GFX 10.1.1 as it is also affected. Add global-isel test.	2020-09-09 11:46:09 +02:00
Florian Hahn	2bcc4db761	[EarlyCSE] Explicitly require AAResultsWrapperPass. The MemorySSAWrapperPass depends on AAResultsWrapperPass and if MemorySSA is preserved but AAResultsWrapperPass is not, this could lead to a crash when updating the last user of the MemorySSAWrapperPass. Alternatively AAResultsWrapperPass could be marked preserved by GVN, but I am not sure if that would be safe. I am not sure what is required in order to preserve AAResultsWrapperPass. At the moment, it seems like a couple of passes that do similar transforms to GVN are preserving it. Reviewed By: asbirlea Differential Revision: https://reviews.llvm.org/D87137	2020-09-09 09:14:50 +01:00
Denis Antrushin	2a52c3301a	[Statepoints] Properly handle const base pointer. Current code in InstEmitter assumes all GC pointers are either VRegs or stack slots - hence, taking only one operand. But it is possible to have constant base, in which case it occupies two machine operands. Add a convinience function to StackMaps to get index of next meta argument and use it in InsrEmitter to properly advance to the next statepoint meta operand. Reviewed By: reames Differential Revision: https://reviews.llvm.org/D87252	2020-09-09 14:07:00 +07:00
Sam Parker	3ebc755227	[ARM] Try to rematerialize VCTP instructions We really want to try and avoid spilling P0, which can be difficult since there's only one register, so try to rematerialize any VCTP instructions. Differential Revision: https://reviews.llvm.org/D87280	2020-09-09 07:41:22 +01:00
Johannes Doerfert	d445b6dfec	[Attributor] Cleanup `::initialize` of various AAs This commit cleans up the ::initialize method of various AAs in the following ways: - If an associated function is required, give up on declarations. This was discovered as a real problem when lots of llvm.dbg.XXX call sites were assumed `noreturn` until proven otherwise. That does not make any sense and caused huge regressions and missed deductions. - Require more associated declarations for function interface AAs. - Use the IRAttribute::initialize to determine if function interface AAs can be used in IPO, don't replicate the checks (especially isFunctionIPOAmendable) all over the place. Arguably the function declaration check should be moved to some central place to.	2020-09-09 01:38:25 -05:00
Fangrui Song	6a9a0bfc33	[llvm-cov gcov] Simply computation of line counts and exit block counter	2020-09-08 23:15:37 -07:00
Johannes Doerfert	849146ba93	[Attributor] Associate the callback callee with a call site argument (if any) If we have a callback, call site arguments were already associated with the callback callee. Now we also associate the function with the callback callee, thus we know ensure that the following holds true (if all return nonnull): `getAssociatedArgument()->getParent() == getAssociatedFunction()` To test this an early exit from `AAMemoryBehaviorCallSiteArgument::initialize`` is included as well. Without the change to getAssociatedFunction() this kind of early exit for declarations would cause callback call site arguments to miss out.	2020-09-09 00:52:17 -05:00
Johannes Doerfert	cefd2a2c70	[Attributor] Cleanup `IRPosition::getArgNo` usages As we handle callback calls we need to disambiguate the call site argument number from the callee argument number. While always equal in non-callback calls, a callback comes with a partial parameter-argument mapping so there is no implicit correspondence. Here we split `IRPosition::getArgNo()` into two public functions, `getCallSiteArgNo()` and `getCalleeArgNo()`. Usages are adjusted to pick the right one for their purpose. This fixed some problems that would have been exposed as we more aggressively optimize callbacks.	2020-09-09 00:52:17 -05:00
Johannes Doerfert	c0ab901bdd	[Attributor] Selectively look at the callee even when there are operand bundles While operand bundles carry unpredictable semantics, we know some of them and can therefore "ignore" them. In this case we allow to look at the declaration of `llvm.assume` when asked for the attributes at a call site. The assume operand bundles we have do not invalidate the declaration attributes. We cannot test this in isolation because the llvm.assume attributes are determined by the parser. However, a follow up patch will provide test coverage.	2020-09-09 00:52:17 -05:00
Johannes Doerfert	d5d75f61e5	[Attributor] Provide a command line option that limits recursion depth In `MultiSource/Benchmarks/tramp3d-v4/tramp3d-v4.cpp` we initialized attributes until stack frame ~35k caused space to run out. The initial size 1024 is pretty much random.	2020-09-09 00:47:02 -05:00
Max Kazantsev	795e4ee9d2	[NFC] Move functon from IndVarSimplify to SCEV This function can be reused in other places. Differential Revision: https://reviews.llvm.org/D87274 Reviewed By: fhahn, lebedev.ri	2020-09-09 11:20:59 +07:00
Krzysztof Parzyszek	c2b7b9b642	[Hexagon] Fix order of operands in V6_vdealb4w	2020-09-08 22:09:28 -05:00
Fangrui Song	b9d086693b	[llvm-cov gcov] Compute unmeasured arc counts by Kirchhoff's circuit law For a CFG G=(V,E), Knuth describes that by Kirchoff's circuit law, the minimum number of counters necessary is \|E\|-(\|V\|-1). The emitted edges form a spanning tree. libgcov emitted .gcda files leverages this optimization while clang --coverage's doesn't. Propagate counts by Kirchhoff's circuit law so that llvm-cov gcov can correctly print line counts of gcc --coverage emitted files and enable the future improvement of clang --coverage.	2020-09-08 18:45:11 -07:00
Brad Smith	88b368a1c4	[PowerPC] Set setMaxAtomicSizeInBitsSupported appropriately for 32-bit PowerPC in PPCTargetLowering Reviewed By: nemanjai Differential Revision: https://reviews.llvm.org/D86165	2020-09-08 21:21:14 -04:00
Mircea Trofin	4013bab9c4	[NFC][ThinLTO] EmbedBitcodeSection doesn't need the Config Instead, passing in the command line options, initialized to nullptr. In an upcoming patch, we can then use the parameter to pass actual command line options. Differential Revision: https://reviews.llvm.org/D87336	2020-09-08 17:14:44 -07:00
Krzysztof Parzyszek	055d209589	Handle masked loads and stores in MemoryLocation/Dependence Differential Revision: https://reviews.llvm.org/D87061	2020-09-08 19:08:44 -05:00
David Blaikie	be561fad1e	Remove unused variable(s)	2020-09-08 16:58:01 -07:00
Craig Topper	844e94a502	[SelectionDAGBuilder] Remove Unnecessary FastMathFlags temporary. Use SDNodeFlags instead. NFCI This was a missed simplication in D87200	2020-09-08 15:50:12 -07:00
David Blaikie	69da27c749	llvm-symbolizer: Add optional "start file" to match "start line" Since a function might have portions of its code coming from multiple different files, "start line" is ambiguous (it can't just be resolved relative to the file/line specified). Add start file to disambiguate it.	2020-09-08 15:40:58 -07:00
Craig Topper	b1e68f885b	[SelectionDAGBuilder] Pass fast math flags to getNode calls rather than trying to set them after the fact.: This removes the after the fact FMF handling from D46854 in favor of passing fast math flags to getNode. This should be a superset of D87130. This required adding a SDNodeFlags to SelectionDAG::getSetCC. Now we manage to contant fold some stuff undefs during the initial getNode that we don't do in later DAG combines. Differential Revision: https://reviews.llvm.org/D87200	2020-09-08 15:27:21 -07:00
Krzysztof Parzyszek	d183f47261	[Hexagon] Handle widening of truncation's operand with legal result Failing example: v8i8 = truncate v8i32. v8i8 is legal, but v8i32 was widened to HVX. Make sure that v8i8 does not get altered (even if it's changed to another legal type).	2020-09-08 16:07:39 -05:00
Nikita Popov	8453fbf088	[ValueTracking] Compute known bits of min/max intrinsics Implement known bits for the min/max intrinsics based on the recently added KnownBits primitives.	2020-09-08 21:08:17 +02:00
David Stenberg	17dce2fe43	[UnifyFunctionExitNodes] Remove unused getters, NFC The get{Return,Unwind,Unreachable}Block functions in UnifyFunctionExitNodes have not been used for many years, so just remove them. Reviewed By: bjope Differential Revision: https://reviews.llvm.org/D87078	2020-09-08 20:42:28 +02:00
Nikita Popov	f6b87da0c7	[InstCombine] Fold comparison of abs with int min If the abs is poisoning, this is already folded to true/false. For non-poisoning abs, we can convert this to a comparison with the operand.	2020-09-08 20:23:03 +02:00
Nikita Popov	e97f3b1b43	[InstCombine] Fold abs of known negative operand If we know that the abs operand is known negative, we can replace it with a neg. To avoid computing known bits twice, I've removed the fold for the non-negative case from InstSimplify. Both the non-negative and the negative case are handled by InstCombine now, with one known bits call. Differential Revision: https://reviews.llvm.org/D87196	2020-09-08 20:14:35 +02:00
Xun Li	59a467ee4f	[Coroutine] Make dealing with alloca spills more robust D66230 attempted to fix a problem where when there are allocas used before CoroBegin. It keeps allocas and their uses stay in put if there are no escapse/changes to the data before CoroBegin. Unfortunately that's incorrect. Consider this code: %var = alloca i32 %1 = getelementptr .. %var; stays put %f = call i8* @llvm.coro.begin store ... %1 After this fix, %1 will now stay put, however if a store happens after coro.begin and hence modifies the content, this change will not be reflected in the coroutine frame (and will eventually be DCEed). To generalize the problem, if any alias ptr is created before coro.begin for an Alloca and that alias ptr is latter written into after coro.begin, it will lead to incorrect behavior. There are also a few other minor issues, such as incorrect dominate condition check in the ptr visitor, unhandled memory intrinsics and etc. Ths patch attempts to fix some of these issue, and make it more robust to deal with aliases. While visiting through the alloca pointer, we also keep track of all aliases created that will be used after CoroBegin. We track the offset of each alias, and then reacreate these aliases after CoroBegin using these offset. It's worth noting that this is not perfect and there will still be cases we cannot handle. I think it's impractical to handle all cases given the current design. This patch makes it more robust and should be a pure win. In the meantime, we need to think about what how to completely elimiante these issues, likely through the route as @rjmccall mentioned in D66230. Differential Revision: https://reviews.llvm.org/D86859	2020-09-08 10:59:13 -07:00
Craig Topper	e6bb4c8e7b	[X86] SSE4_A should only imply SSE3 not SSSE3 in the frontend. SSE4_1 and SSE4_2 due imply SSSE3. So I guess I got confused when switching the code to being table based in D83273. Fixes PR47464	2020-09-08 10:50:59 -07:00
Simon Pilgrim	0dacf3b5ac	RISCVMatInt.h - remove unnecessary includes. NFCI. Add APInt forward declaration and move include to RISCVMatInt.cpp	2020-09-08 18:25:24 +01:00
Volkan Keles	1242dd330d	GlobalISel: Combine `op undef, x` to 0 https://reviews.llvm.org/D86611	2020-09-08 09:46:38 -07:00
Heejin Ahn	d25c17f317	[WebAssembly] Fix fixEndsAtEndOfFunction for try-catch When the function return type is non-void and `end` instructions are at the very end of a function, CFGStackify's `fixEndsAtEndOfFunction` function fixes the corresponding block/loop/try's type to match the function's return type. This is applied to consecutive `end` markers at the end of a function. For example, when the function return type is `i32`, ``` block i32 ;; return type is fixed to i32 ... loop i32 ;; return type is fixed to i32 ... end_loop end_block end_function ``` But try-catch is a little different, because it consists of two parts: a try part and a catch part, and both parts' return type should satisfy the function's return type. Which means, ``` try i32 ;; return type is fixed to i32 ... block i32 ;; this should be changed i32 too! ... end_block catch ... end_try end_function ``` As you can see in this example, it is not sufficient to only `end` instructions at the end of a function; in case of `try`, we should check instructions before `catch`es, in case their corresponding `try`'s type has been fixed. This changes `fixEndsAtEndOfFunction`'s algorithm to use a worklist that contains a reverse iterator, each of which is a starting point for a new backward `end` instruction search. Fixes https://bugs.llvm.org/show_bug.cgi?id=47413. Reviewed By: dschuff, tlively Differential Revision: https://reviews.llvm.org/D87207	2020-09-08 09:27:40 -07:00
Simon Pilgrim	3c83b967cf	LiveRegUnits.h - reduce MachineRegisterInfo.h include. NFC. We only need to include MachineInstrBundle.h, but exposes an implicit dependency in MachineOutliner.h. Also, remove duplicate includes from LiveRegUnits.cpp + MachineOutliner.cpp.	2020-09-08 17:27:00 +01:00
Ronak Chauhan	487a805310	[AMDGPU] Support disassembly for AMDGPU kernel descriptors Decode AMDGPU Kernel descriptors as assembler directives. Reviewed By: scott.linder, jhenderson, kzhuravl Differential Revision: https://reviews.llvm.org/D80713	2020-09-08 21:26:11 +05:30
Jonas Paulsson	6dc3e22b57	[DAGTypeLegalizer] Handle ZERO_EXTEND of promoted type in WidenVecRes_Convert. On SystemZ, a ZERO_EXTEND of an i1 vector handled by WidenVecRes_Convert() always ended up being scalarized, because the type action of the input is promotion which was previously an unhandled case in this method. This fixes https://bugs.llvm.org/show_bug.cgi?id=47132. Differential Revision: https://reviews.llvm.org/D86268 Patch by Eli Friedman. Review: Ulrich Weigand	2020-09-08 16:49:51 +02:00
Florian Hahn	c7b7c32f4a	[DSE,MemorySSA] Increase walker limit a bit. This slightly bumps the walker limit so that it covers more cases while not increasing compile-time too much: http://llvm-compile-time-tracker.com/compare.php?from=0fc1c2b51ba0cfb9145139af35be638333865251&to=91144a50ea4fa82c0c877e77784f60371640b263&stat=instructions	2020-09-08 14:55:46 +01:00
Simon Pilgrim	fcff2c32c0	X86CallLowering.cpp - improve auto const/pointer/reference qualifiers. NFCI. Fix clang-tidy warnings by ensuring auto variables are more cleanly qualified, or just avoid auto entirely.	2020-09-08 13:01:23 +01:00
Simon Pilgrim	0729ae367a	X86DomainReassignment.cpp - improve auto const/pointer/reference qualifiers. NFCI. Fix clang-tidy warnings by ensuring auto variables are more cleanly qualified, or just avoid auto entirely.	2020-09-08 13:01:23 +01:00
Xing GUO	25c3fa3f13	[DWARFYAML] Make the debug_ranges section optional. This patch makes the debug_ranges section optional. When we specify an empty debug_ranges section, yaml2obj only emits the section header. Reviewed By: jhenderson Differential Revision: https://reviews.llvm.org/D87263	2020-09-08 19:55:47 +08:00
Sam Tebbs	7aabb6ad77	[ARM][LowOverheadLoops] Remove modifications to the correct element count register After my patch at D86087, code that now uses the mov operand rather than the vctp operand will no longer remove modifications to the vctp operand as they should. This patch fixes that by explicitly removing modifications to the vctp operand rather than the register used as the element count.	2020-09-08 10:30:05 +01:00
Qiu Chaofan	8d9c13f37d	Revert "[PowerPC] Implement instruction clustering for stores" This reverts commit `3c0b325023`, (along with `ea795304` and `bb39eb9e`) since it breaks test with UB sanitizer.	2020-09-08 17:24:08 +08:00
Serge Guelton	38778e1087	Provide anchor for compiler extensions This patch is cherry-picked from 04b0a4e22e3b4549f9d241f8a9f37eebecb62a31, and amended to prevent an undefined reference to `llvm::EnableABIBreakingChecks'	2020-09-08 10:33:38 +02:00
Qiu Chaofan	bb39eb9e7f	[PowerPC] Fix getMemOperandWithOffsetWidth Commit `3c0b3250` introduced memory cluster under pwr10 target, but a check for operands was unexpectedly removed. This adds it back to avoid regression.	2020-09-08 15:35:25 +08:00
Simon Wallis	8ee1419ab6	[AARCH64][RegisterCoalescer] clang miscompiles zero-extension to long long Implement AArch64 variant of shouldCoalesce() to detect a known failing case and prevent the coalescing of a 32-bit copy into a 64-bit sign-extending load. Do not coalesce in the following case: COPY where source is bottom 32 bits of a 64-register, and destination is a 32-bit subregister of a 64-bit register, ie it causes the rest of the register to be implicitly set to zero. A mir test has been added. In the test case, the 32-bit copy implements a 32 to 64 bit zero extension and relies on the upper 32 bits being zeroed. Coalescing to the result of the 64-bit load meant overwriting the upper 32 bits incorrectly when the loaded byte was negative. Reviewed By: john.brawn Differential Revision: https://reviews.llvm.org/D85956	2020-09-08 08:04:52 +01:00
Mikael Holmen	ea795304ec	[PowerPC] Add parentheses to silence gcc warning Without gcc 7.4 warns with ../lib/Target/PowerPC/PPCInstrInfo.cpp:2284:25: warning: suggest parentheses around '&&' within '\|\|' [-Wparentheses] BaseOp1.isFI() && ~~~~~~~~~~~~~~~^~ "Only base registers and frame indices are supported."); ~	2020-09-08 08:39:57 +02:00
Andrew Wei	78071fb524	[LSR] Canonicalize a formula before insert it into the list In GenerateConstantOffsetsImpl, we may generate non canonical Formula if BaseRegs of that Formula is updated and includes a recurrent expr reg related with current loop while its ScaledReg is not. Patched by: mdchen Reviewed By: qcolombet Differential Revision: https://reviews.llvm.org/D86939	2020-09-08 13:14:53 +08:00
Johannes Doerfert	711bf7dcf9	[Attributor][FIX] Don't crash on internalizing linkonce_odr hidden functions The CloneFunctionInto has implicit requirements with regards to the linkage and visibility of the function. We now update these after we did the CloneFunctionInto on the copy with the same linkage and visibility as the original.	2020-09-07 23:38:09 -05:00
Johannes Doerfert	e6208849c8	[Attributor][NFC] Change variable spelling	2020-09-07 23:38:09 -05:00
Johannes Doerfert	8637acac5a	[Attributor][NFC] Clang tidy: no else after continue	2020-09-07 23:38:08 -05:00
Johannes Doerfert	ff70c25d76	[Attributor][NFC] Expand `auto` types (clang-fix-it)	2020-09-07 23:38:08 -05:00
Johannes Doerfert	79651265b2	[Attributor][FIX] Properly return changed if the IR was modified Deleting or replacing anything is certainly a modification. This caused a later assertion in IPSCCP when compiling 400.perlbench with the new PM. I'm not sure how to test this.	2020-09-07 23:38:08 -05:00
Qiu Chaofan	3c0b325023	[PowerPC] Implement instruction clustering for stores On Power10, it's profitable to schedule some stores with adjacent target address together. This patch implements this feature. Reviewed By: steven.zhang Differential Revision: https://reviews.llvm.org/D86754	2020-09-08 11:03:09 +08:00
Florian Hahn	efb8e156da	[DSE,MemorySSA] Add an early check for read clobbers to traversal. Depending on the benchmark, this early exit can save a substantial amount of compile-time: http://llvm-compile-time-tracker.com/compare.php?from=505f2d817aa8e07ba98e5fd4a8f6ff0666f89df1&to=eb4e441147f9b4b7a5fcbbc57428cadbe9e01f10&stat=instructions	2020-09-07 23:22:10 +01:00
Roman Lebedev	bb7d3af113	Reland [SimplifyCFG][LoopRotate] SimplifyCFG: disable common instruction hoisting by default, enable late in pipeline This was reverted in `503deec218` because it caused gigantic increase (3x) in branch mispredictions in certain benchmarks on certain CPU's, see https://reviews.llvm.org/D84108#2227365. It has since been investigated and here are the results: https://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20200907/827578.html > It's an amazingly severe regression, but it's also all due to branch > mispredicts (about 3x without this). The code layout looks ok so there's > probably something else to deal with. I'm not sure there's anything we can > reasonably do so we'll just have to take the hit for now and wait for > another code reorganization to make the branch predictor a bit more happy :) > > Thanks for giving us some time to investigate and feel free to recommit > whenever you'd like. > > -eric So let's just reland this. Original commit message: I've been looking at missed vectorizations in one codebase. One particular thing that stands out is that some of the loops reach vectorizer in a rather mangled form, with weird PHI's, and some of the loops aren't even in a rotated form. After taking a more detailed look, that happened because the loop's headers were too big by then. It is evident that SimplifyCFG's common code hoisting transform is at fault there, because the pattern it handles is precisely the unrotated loop basic block structure. Surprizingly, `SimplifyCFGOpt::HoistThenElseCodeToIf()` is enabled by default, and is always run, unlike it's friend, common code sinking transform, `SinkCommonCodeFromPredecessors()`, which is not enabled by default and is only run once very late in the pipeline. I'm proposing to harmonize this, and disable common code hoisting until //late// in pipeline. Definition of //late// may vary, here currently i've picked the same one as for code sinking, but i suppose we could enable it as soon as right after loop rotation happens. Experimentation shows that this does indeed unsurprizingly help, more loops got rotated, although other issues remain elsewhere. Now, this undoubtedly seriously shakes phase ordering. This will undoubtedly be a mixed bag in terms of both compile- and run- time performance, codesize. Since we no longer aggressively hoist+deduplicate common code, we don't pay the price of said hoisting (which wasn't big). That may allow more loops to be rotated, so we pay that price. That, in turn, that may enable all the transforms that require canonical (rotated) loop form, including but not limited to vectorization, so we pay that too. And in general, no deduplication means more [duplicate] instructions going through the optimizations. But there's still late hoisting, some of them will be caught late. As per benchmarks i've run {F12360204}, this is mostly within the noise, there are some small improvements, some small regressions. One big regression i saw i fixed in rG8d487668d09fb0e4e54f36207f07c1480ffabbfd, but i'm sure this will expose many more pre-existing missed optimizations, as usual :S llvm-compile-time-tracker.com thoughts on this: http://llvm-compile-time-tracker.com/compare.php?from=e40315d2b4ed1e38962a8f33ff151693ed4ada63&to=c8289c0ecbf235da9fb0e3bc052e3c0d6bff5cf9&stat=instructions * this does regress compile-time by +0.5% geomean (unsurprizingly) * size impact varies; for ThinLTO it's actually an improvement The largest fallout appears to be in GVN's load partial redundancy elimination, it spends much more time in `MemoryDependenceResults::getNonLocalPointerDependency()`. Non-local `MemoryDependenceResults` is widely-known to be, uh, costly. There does not appear to be a proper solution to this issue, other than silencing the compile-time performance regression by tuning cut-off thresholds in `MemoryDependenceResults`, at the cost of potentially regressing run-time performance. D84609 attempts to move in that direction, but the path is unclear and is going to take some time. If we look at stats before/after diffs, some excerpts: * RawSpeed (the target) {F12360200} * -14 (-73.68%) loops not rotated due to the header size (yay) * -272 (-0.67%) `"Number of live out of a loop variables"` - good for vectorizer * -3937 (-64.19%) common instructions hoisted * +561 (+0.06%) x86 asm instructions * -2 basic blocks * +2418 (+0.11%) IR instructions * vanilla test-suite + RawSpeed + darktable {F12360201} * -36396 (-65.29%) common instructions hoisted * +1676 (+0.02%) x86 asm instructions * +662 (+0.06%) basic blocks * +4395 (+0.04%) IR instructions It is likely to be sub-optimal for when optimizing for code size, so one might want to change tune pipeline by enabling sinking/hoisting when optimizing for size. Reviewed By: mkazantsev Differential Revision: https://reviews.llvm.org/D84108 This reverts commit `503deec218`.	2020-09-08 00:24:03 +03:00
Nikita Popov	ddab4cd83e	[KnownBits] Avoid some copies (NFC) These lambdas don't need copies, use const reference.	2020-09-07 22:19:29 +02:00
Nikita Popov	9fb46a452d	[SCCP] Compute ranges for supported intrinsics For intrinsics supported by ConstantRange, compute the result range based on the argument ranges. We do this independently of whether some or all of the input ranges are full, as we can often still constrain the result in some way. Differential Revision: https://reviews.llvm.org/D87183	2020-09-07 22:16:06 +02:00
Craig Topper	da79b1eecc	[SelectionDAG][X86][ARM] Teach ExpandIntRes_ABS to use sra+add+xor expansion when ADDCARRY is supported. Rather than using SELECT instructions, use SRA, UADDO/ADDCARRY and XORs to expand ABS. This is the multi-part version of the sequence we use in LegalizeDAG. It's also the same as the Custom sequence uses for i64 on 32-bit and i128 on 64-bit. So we can remove the X86 customization. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D87215	2020-09-07 13:15:26 -07:00
Sanjay Patel	8b30067919	[InstCombine] improve fold of pointer differences This was supposed to be an NFC cleanup, but there's a real logic difference (did not drop 'nsw') visible in some tests in addition to an efficiency improvement. This is because in the case where we have 2 GEPs, the code was always swapping the operands and negating the result. But if we have 2 GEPs, we should never need swapping/negation AFAICT. This is part of improving flags propagation noticed with PR47430.	2020-09-07 15:54:32 -04:00
Craig Topper	01b3e16757	[X86] Use the same sequence for i128 ISD::ABS on 64-bit targets as we use for i64 on 32-bit targets. Differential Revision: https://reviews.llvm.org/D87214	2020-09-07 11:14:05 -07:00
Sanjay Patel	7a06b166b1	[DAGCombiner] allow more store merging for non-i8 truncated ops This is a follow-up suggested in D86420 - if we have a pair of stores in inverted order for the target endian, we can rotate the source bits into place. The "be_i64_to_i16_order" test shows a limitation of the current function (which might be avoided if we integrate this function with the other cases in mergeConsecutiveStores). In the earlier "be_i64_to_i16" test, we skip the first 2 stores because we do not match the full set as consecutive or rotate-able, but then we reach the last 2 stores and see that they are an inverted pair of 16-bit stores. The "be_i64_to_i16_order" test alters the program order of the stores, so we miss matching the sub-pattern. Differential Revision: https://reviews.llvm.org/D87112	2020-09-07 14:12:36 -04:00
Eric Astor	a3ec4a3158	[ms] [llvm-ml] Allow use of locally-defined variables in expressions MASM allows variables defined by equate statements to be used in expressions. Reviewed By: thakis Differential Revision: https://reviews.llvm.org/D86946	2020-09-07 14:00:14 -04:00
Eric Astor	2feb6e9b84	[ms] [llvm-ml] Fix STRUCT field alignment MASM aligns fields to the _minimum_ of the STRUCT alignment value and the size of the next field. Reviewed By: thakis Differential Revision: https://reviews.llvm.org/D86945	2020-09-07 13:58:59 -04:00
Eric Astor	e52e7ad54d	[ms] [llvm-ml] Add support for bitwise named operators (AND, NOT, OR) in MASM Add support for expressions of the form '1 or 2', etc. Reviewed By: thakis Differential Revision: https://reviews.llvm.org/D86944	2020-09-07 13:57:54 -04:00
Simon Pilgrim	5ea9e655ef	VPlan.h - remove unnecessary forward declarations. NFCI. Already defined in includes.	2020-09-07 18:35:06 +01:00
Simon Pilgrim	4e89a0ab02	MipsISelLowering.h - remove CCState/CCValAssign forward declarations. NFCI. These are already defined in the CallingConvLower.h include.	2020-09-07 18:15:26 +01:00
Simon Pilgrim	95ca3aacf0	BTFDebug.h - reduce MachineInstr.h include to forward declaration. NFCI.	2020-09-07 17:51:13 +01:00
Simon Pilgrim	dfc333050b	LeonPasses.h - remove unnecessary includes. NFCI. Reduce to forward declarations and move includes to LeonPasses.cpp where necessary.	2020-09-07 17:51:12 +01:00
Simon Pilgrim	1c34ac03a2	LeonPasses.h - remove orphan function declarations. NFCI. The implementations no longer exist.	2020-09-07 17:51:12 +01:00
Sanjay Patel	7a6d6f0f70	[InstCombine] improve folds for icmp with multiply operands (PR47432) Check for no overflow along with an odd constant before we lose information by converting to bitwise logic. https://rise4fun.com/Alive/2Xl Pre: C1 != 0 %mx = mul nsw i8 %x, C1 %my = mul nsw i8 %y, C1 %r = icmp eq i8 %mx, %my => %r = icmp eq i8 %x, %y Name: nuw ne Pre: C1 != 0 %mx = mul nuw i8 %x, C1 %my = mul nuw i8 %y, C1 %r = icmp ne i8 %mx, %my => %r = icmp ne i8 %x, %y Name: odd ne Pre: C1 % 2 != 0 %mx = mul i8 %x, C1 %my = mul i8 %y, C1 %r = icmp ne i8 %mx, %my => %r = icmp ne i8 %x, %y	2020-09-07 12:40:37 -04:00
alex-t	2480a31e5d	[AMDGPU] SILowerControlFlow::optimizeEndCF should remove empty basic block optimizeEndCF removes EXEC restoring instruction case this instruction is the only one except the branch to the single successor and that successor contains EXEC mask restoring instruction that was lowered from END_CF belonging to IF_ELSE. As a result of such optimization we get the basic block with the only one instruction that is a branch to the single successor. In case the control flow can reach such an empty block from S_CBRANCH_EXEZ/EXECNZ it might happen that spill/reload instructions that were inserted later by register allocator are placed under exec == 0 condition and never execute. Removing empty block solves the problem. This change require further work to re-implement LIS updates. Recently, LIS is always nullptr in this pass. To enable it we need another patch to fix many places across the codegen. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D86634	2020-09-07 19:37:27 +03:00
Momchil Velikov	eb482afaf5	Reduce the number of memory allocations when displaying a warning about clobbering reserved registers (NFC). Also address some minor inefficiencies and style issues. Differential Revision: https://reviews.llvm.org/D86088	2020-09-07 17:04:00 +01:00
Simon Pilgrim	9de0a3da6a	[X86][SSE] Don't use LowerVSETCCWithSUBUS for unsigned compare with +ve operands (PR47448) We already simplify the unsigned comparisons if we've found the operands are non-negative, but we were still calling LowerVSETCCWithSUBUS which resulted in the PR47448 regressions.	2020-09-07 16:11:40 +01:00
Simon Pilgrim	60162626a5	[X86] Replace UpgradeX86AddSubSatIntrinsics with UpgradeX86BinaryIntrinsics generic helper. NFCI. Feed the Intrinsic::ID value directly instead of via the IsSigned/IsAddition bool flags.	2020-09-07 15:57:18 +01:00
Sanjay Patel	b22910daab	[InstCombine] erase instructions leading up to unreachable Normal dead code elimination ignores assume intrinsics, so we fail to delete assumes that are not meaningful (and potentially worse if they cause conflicts with other assumptions). The motivating example in https://llvm.org/PR47416 suggests that we might have problems upstream from here (difference between C and C++), but this should be a cheap way to make sure we remove more dead code. Differential Revision: https://reviews.llvm.org/D87149	2020-09-07 10:44:08 -04:00
Simon Pilgrim	96e0f34be7	[X86] Auto upgrade SSE/AVX PABS intrinsics to generic Intrinsic::abs Minor followup to D87101, we were expanding this to a neg+icmp+select pattern like we were in CGBuiltin	2020-09-07 15:07:26 +01:00
Simon Pilgrim	6670f5d1e6	MachineStableHash.h - remove MachineInstr.h include. NFC. Use forward declarations and move the include to MachineStableHash.cpp	2020-09-07 13:33:48 +01:00
Simon Wallis	79ea83e104	[SelectionDAG] memcpy expansion of const volatile struct ignores const zero In getMemcpyLoadsAndStores(), a memcpy where the source is a zero constant is expanded to a MemOp::Set instead of a MemOp::Copy, even when the memcpy is volatile. This is incorrect. The fix is to add a check for volatile, and expand to MemOp::Copy in the volatile case. Reviewed By: chill Differential Revision: https://reviews.llvm.org/D87134	2020-09-07 13:22:09 +01:00
Sanjay Patel	3ca8b9a560	[InstCombine] give a name to an intermediate value for easier tracking; NFC As noted in PR47430, we probably want to conditionally include 'nsw' here anyway, so we are going to need to fill out the optional args.	2020-09-07 08:19:42 -04:00
Simon Pilgrim	e57cbcbdc1	LegalizeTypes.h - remove orphan SplitVSETCC declaration. NFCI. The implementation no longer exists	2020-09-07 13:11:49 +01:00
Simon Pilgrim	5bb27e735d	X86AvoidStoreForwardingBlocks.cpp - use unsigned for Opcode values. NFCI. Fixes clang-tidy cppcoreguidelines-narrowing-conversions warnings.	2020-09-07 12:56:27 +01:00
Simon Pilgrim	9b645ebfff	[X86][AVX] Use lowerShuffleWithPERMV in shuffle combining to support non-VLX targets lowerShuffleWithPERMV allows us to use the ZMM variants for 128/256-bit variable shuffles on non-VLX AVX512 targets. This is another step towards shuffle combining through between vector widths - we still end up with an annoying regression (combine_vpermilvar_vperm2f128_zero_8f32) but we're going in the right direction....	2020-09-07 12:50:50 +01:00
Sam Parker	928c4b4b49	[SCEV] Refactor isHighCostExpansionHelper To enable the cost of constants, the helper function has been reorganised: - A struct has been introduced to hold SCEV operand information so that we know the user of the operand, as well as the operand index. The Worklist now uses instead instead of a bare SCEV. - The costing of each SCEV, and collection of its operands, is now performed in a helper function. Differential Revision: https://reviews.llvm.org/D86050	2020-09-07 11:57:46 +01:00
Benjamin Kramer	7ba0f81934	[X86] Unbreak the build after `22fa6b20d9`	2020-09-07 12:24:30 +02:00
Simon Pilgrim	71dfdbe2c7	[X86] getFauxShuffleMask - handle insert_subvector(zero, sub, C) Directly use SM_SentinelZero elements if we're (widening)inserting into a zero vector.	2020-09-07 11:10:40 +01:00

... 4 5 6 7 8 ...

139218 Commits