Add static pass registration and change mlir-opt to use it. Future work is needed to refactor the registration for PassManager usage.
Change build targets to alwayslink to enforce registration.
PiperOrigin-RevId: 220390178
- simple perfectly nested band tiling with fixed tile sizes.
- only the hyper-rectangular case is handled, with other limitations of
getIndexSet applying (constant loop bounds, etc.); once
the latter utility is extended, tiled code generation should become more
general.
- Add FlatAffineConstraints::isHyperRectangular()
PiperOrigin-RevId: 220324933
simple utility methods.
- clean up some of the analysis utilities used by memref dep checking
- add additional asserts / comments at places in analysis utilities
- add additional simple methods to the FlatAffineConstraints API.
PiperOrigin-RevId: 220124523
Adds equality constraints to dependence constraint system for accesses using dims/symbols where the defining operation of the dim/symbol is a constant.
PiperOrigin-RevId: 219814740
variables from mod's and div's when converting to flat form.
- propagate mod, floordiv, ceildiv / local variables constraint information
when flattening affine expressions and converting them into flat affine
constraints; resolve multiple TODOs.
- enables memref bound checker to work with arbitrary affine expressions
- update FlatAffineConstraints API with several new methods
- test/exercise functionality mostly through -memref-bound-check
- other analyses such as dependence tests, etc. should now be able to work in the
presence of any affine composition of add, mul, floor, ceil, mod.
PiperOrigin-RevId: 219711806
- Builds access functions and iterations domains for each access.
- Builds dependence polyhedron constraint system which has equality constraints for equated access functions and inequality constraints for iteration domain loop bounds.
- Runs elimination on the dependence polyhedron to test if no dependence exists between the accesses.
- Adds a trivial LoopFusion transformation pass with a simple test policy to test dependence between accesses to the same memref in adjacent loops.
- The LoopFusion pass will be extended in subsequent CLs.
PiperOrigin-RevId: 219630898
This CL adds support for vectorization using more interesting 2-D and 3-D
patterns. Note in particular the fact that we match some pretty complex
imperfectly nested 2-D patterns with a quite minimal change to the
implementation: we just add a bit of recursion to traverse the matched
patterns and actually vectorize the loops.
For instance, vectorizing the following loop by 128:
```
for %i3 = 0 to %0 {
%7 = affine_apply (d0) -> (d0)(%i3)
%8 = load %arg0[%c0_0, %7] : memref<?x?xf32>
}
```
Currently generates:
```
#map0 = ()[s0] -> (s0 + 127)
#map1 = (d0) -> (d0)
for %i3 = 0 to #map0()[%0] step 128 {
%9 = affine_apply #map1(%i3)
%10 = alloc() : memref<1xvector<128xf32>>
%11 = "n_d_unaligned_load"(%arg0, %c0_0, %9, %10, %c0) :
(memref<?x?xf32>, index, index, memref<1xvector<128xf32>>, index) ->
(memref<?x?xf32>, index, index, memref<1xvector<128xf32>>, index)
%12 = load %10[%c0] : memref<1xvector<128xf32>>
}
```
The above is subject to evolution.
PiperOrigin-RevId: 219629745
Introduce analysis to check memref accesses (in MLFunctions) for out of bound
ones. It works as follows:
$ mlir-opt -memref-bound-check test/Transforms/memref-bound-check.mlir
/tmp/single.mlir:10:12: error: 'load' op memref out of upper bound access along dimension tensorflow/mlir#1
%x = load %A[%idxtensorflow/mlir#0, %idxtensorflow/mlir#1] : memref<9 x 9 x i32>
^
/tmp/single.mlir:10:12: error: 'load' op memref out of lower bound access along dimension tensorflow/mlir#1
%x = load %A[%idxtensorflow/mlir#0, %idxtensorflow/mlir#1] : memref<9 x 9 x i32>
^
/tmp/single.mlir:10:12: error: 'load' op memref out of upper bound access along dimension tensorflow/mlir#2
%x = load %A[%idxtensorflow/mlir#0, %idxtensorflow/mlir#1] : memref<9 x 9 x i32>
^
/tmp/single.mlir:10:12: error: 'load' op memref out of lower bound access along dimension tensorflow/mlir#2
%x = load %A[%idxtensorflow/mlir#0, %idxtensorflow/mlir#1] : memref<9 x 9 x i32>
^
/tmp/single.mlir:12:12: error: 'load' op memref out of upper bound access along dimension tensorflow/mlir#1
%y = load %B[%idy] : memref<128 x i32>
^
/tmp/single.mlir:12:12: error: 'load' op memref out of lower bound access along dimension tensorflow/mlir#1
%y = load %B[%idy] : memref<128 x i32>
^
#map0 = (d0, d1) -> (d0, d1)
#map1 = (d0, d1) -> (d0 * 128 - d1)
mlfunc @test() {
%0 = alloc() : memref<9x9xi32>
%1 = alloc() : memref<128xi32>
for %i0 = -1 to 9 {
for %i1 = -1 to 9 {
%2 = affine_apply #map0(%i0, %i1)
%3 = load %0[%2tensorflow/mlir#0, %2tensorflow/mlir#1] : memref<9x9xi32>
%4 = affine_apply #map1(%i0, %i1)
%5 = load %1[%4] : memref<128xi32>
}
}
return
}
- Improves productivity while manually / semi-automatically developing MLIR for
testing / prototyping; also provides an indirect way to catch errors in
transformations.
- This pass is an easy way to test the underlying affine analysis
machinery including low level routines.
Some code (in getMemoryRegion()) borrowed from @andydavis cl/218263256.
While on this:
- create mlir/Analysis/Passes.h; move Pass.h up from mlir/Transforms/ to mlir/
- fix a bug in AffineAnalysis.cpp::toAffineExpr
TODO: extend to non-constant loop bounds (straightforward). Will transparently
work for all accesses once floordiv, mod, ceildiv are supported in the
AffineMap -> FlatAffineConstraints conversion.
PiperOrigin-RevId: 219397961
This is done by changing Type to be a POD interface around an underlying pointer storage and adding in-class support for isa/dyn_cast/cast.
PiperOrigin-RevId: 219372163
- add methods addConstantLowerBound, addConstantUpperBound, setIdToConstant,
addDimsForMap
- update coefficient storage to use numReservedCols * rows instead of numCols *
rows (makes the code simpler/natural; reduces movement of data when new
columns are added, eliminates movement of data when columns are added to the
end).
(addDimsForMap is tested in the child CL on memref bound checking: cl/219000460)
PiperOrigin-RevId: 219358376
This CL is a first in a series that implements early vectorization of
increasingly complex patterns. In particular, early vectorization will support
arbitrary loop nesting patterns (both perfectly and imperfectly nested), at
arbitrary depths in the loop tree.
This first CL builds the minimal support for applying 1-D patterns.
It relies on an unaligned load/store op abstraction that can be inplemented
differently on different HW.
Future CLs will support higher dimensional patterns, but 1-D patterns already
exhibit interesting properties.
In particular, we want to separate pattern matching (i.e. legality both
structural and dependency analysis based), from profitability analysis, from
application of the transformation.
As a consequence patterns may intersect and we need to verify that a pattern
can still apply by the time we get to applying it.
A non-greedy analysis on profitability that takes into account pattern
intersection is left for future work.
Additionally the CL makes the following cleanups:
1. the matches method now returns a value, not a reference;
2. added comments about the MLFunctionMatcher and MLFunctionMatches usage by
value;
3. added size and empty methods to matches;
4. added a negative vectorization test with a conditional, this exhibited a
but in the iterators. Iterators now return nullptr if the underlying storage
is nullpt.
PiperOrigin-RevId: 219299489
- There are several places where we are casting the type of the memref obtained
from the load/store op to a memref type, and this will become even more
common (some upcoming CLs this week). Add a getMemRefType and use it at
several places where the cast was being used.
PiperOrigin-RevId: 219164326
This is done by changing Attribute to be a POD interface around an underlying pointer storage and adding in-class support for isa/dyn_cast/cast.
PiperOrigin-RevId: 218764173
- Introduce Fourier-Motzkin variable elimination to eliminate a dimension from
a system of linear equalities/inequalities. Update isEmpty to use this.
Since FM is only exact on rational/real spaces, an emptiness check based on
this is guaranteed to be exact whenever it says the underlying set is empty;
if it says, it's not empty, there may still be no integer points in it.
Also, supports a version that computes "dark shadows".
- Test this by checking for "always false" conditionals in if statements.
- Unique IntegerSet's that are small (few constraints, few variables). This
basically means the canonical empty set and other small sets that are
likely commonly used get uniqued; allows checking for the canonical empty set
by pointer. IntegerSet::kUniquingThreshold gives the threshold constraint size
for uniqui'ing.
- rename simplify-affine-expr -> simplify-affine-structures
Other cleanup
- IntegerSet::numConstraints, AffineMap::numResults are no longer needed;
remove them.
- add copy assignment operators for AffineMap, IntegerSet.
- rename Invalid() -> Null() on AffineExpr, AffineMap, IntegerSet
- Misc cleanup for FlatAffineConstraints API
PiperOrigin-RevId: 218690456
- Adds FlatAffineConstraints::isEmpty method to test if there are no solutions to the system.
- Adds GCD test check if equality constraints have no solution.
- Adds unit test cases.
PiperOrigin-RevId: 218546319
This was left as a TODO in the code. Move the type verification from
MLFuncVerifier::verifyReturn to ReturnOp::verify. Since the return operation
can only appear as the last statement of an MLFunction, i.e. where the
surrounding block is the function itself, it is easy to access the function
descriptor (ReturnOp::verify already relies on this). From the function
descriptor, one can easily access the type information. Note that this
slightly modifies the error message due to the use of emitOpError instead of a
plain emitError.
Drop the obsolete TODO comment in MLFunction::verify about checking that
"return" only appears as the last operation of an MLFunction since
ReturnOp::verify explicitly checks for that.
PiperOrigin-RevId: 218347843
This was left as a TODO in the code. Note that the spec does not explicitly
prohibit the first basic block from having a predecessor, and may be worth
updating.
The error is reported at the location of the cfgfunc to which the basic block
belongs since the location information of the block label is not propagated
beyond the IR parser. Arguably, pointing to a function that starts with an
ill-formed block is better than pointing to the first operation in that block
as it makes easier to follow the code down until the first block label.
PiperOrigin-RevId: 218343654
a step forward because now every AbstractOperation knows which Dialect it is
associated with, enabling things in the future like "constant folding
hooks" which will be important for layering. This is also a bit nicer on
the registration side of things.
PiperOrigin-RevId: 218104230
Also rename Operation::is to Operation::isa
Introduce Operation::cast
All of these are for consistency with global dyn_cast/cast/isa operators.
PiperOrigin-RevId: 217878786
multiple TODOs.
- replace the fake test pass (that worked on just the first loop in the
MLFunction) to perform DMA pipelining on all suitable loops.
- nested DMAs work now (DMAs in an outer loop, more DMAs in nested inner loops)
- fix bugs / assumptions: correctly copy memory space and elemental type of source
memref for double buffering.
- correctly identify matching start/finish statements, handle multiple DMAs per
loop.
- introduce dominates/properlyDominates utitilies for MLFunction statements.
- move checkDominancePreservationOnShifts to LoopAnalysis.h; rename it
getShiftValidity
- refactor getContainingStmtPos -> findAncestorStmtInBlock - move into
Analysis/Utils.h; has two users.
- other improvements / cleanup for related API/utilities
- add size argument to dma_wait - for nested DMAs or in general, it makes it
easy to obtain the size to use when lowering the dma_wait since we wouldn't
want to identify the matching dma_start, and more importantly, in general/in the
future, there may not always be a dma_start dominating the dma_wait.
- add debug information in the pass
PiperOrigin-RevId: 217734892
This CL implements a very simple loop vectorization **test** and the basic
infrastructure to support it.
The test simply consists in:
1. matching the loops in the MLFunction and all the Load/Store operations
nested under the loop;
2. testing whether all the Load/Store are contiguous along the innermost
memory dimension along that particular loop. If any reference is
non-contiguous (i.e. the ForStmt SSAValue appears in the expression), then
the loop is not-vectorizable.
The simple test above can gradually be extended with more interesting
behaviors to account for the fact that a layout permutation may exist that
enables contiguity etc. All these will come in due time but it is worthwhile
noting that the test already supports detection of outer-vetorizable loops.
In implementing this test, I also added a recursive MLFunctionMatcher and some
sugar that can capture patterns
such as `auto gemmLike = Doall(Doall(Red(LoadStore())))` and allows iterating
on the matched IR structures. For now it just uses in order traversal but
post-order DFS will be useful in the future once IR rewrites start occuring.
One may note that the memory management design decision follows a different
pattern from MLIR. After evaluating different designs and how they quickly
increase cognitive overhead, I decided to opt for the simplest solution in my
view: a class-wide (threadsafe) RAII context.
This way, a pass that needs MLFunctionMatcher can just have its own locally
scoped BumpPtrAllocator and everything is cleaned up when the pass is destroyed.
If passes are expected to have a longer lifetime, then the contexts can easily
be scoped inside the runOnMLFunction call and storage lifetime reduced.
Lastly, whatever the scope of threading (module, function, pass), this is
expected to also be future-proof wrt concurrency (but this is a detail atm).
PiperOrigin-RevId: 217622889
Updates ComposeAffineMaps test pass to use this method.
Updates affine map composition test cases to handle the new pass, which can be reused when this method is used in a future instruction combine pass.
PiperOrigin-RevId: 217163351
- add util to create a private / exclusive / single use affine
computation slice for an op stmt (see method doc comment); a single
multi-result affine_apply op is prepended to the op stmt to provide all
results needed for its operands as a function of loop iterators and symbols.
- use it for DMA pipelining (to create private slices for DMA start stmt's);
resolve TODOs/feature request (b/117159533)
- move createComposedAffineApplyOp to Transforms/Utils; free it from taking a
memref as input / generalize it.
PiperOrigin-RevId: 216926818
* Move Return, Constant and AffineApply out into BuiltinOps;
* BuiltinOps are always registered, while StandardOps follow the same dynamic registration;
* Kept isValidX in MLValue as we don't have a verify on AffineMap so need to keep it callable from Parser (I wanted to move it to be called in verify instead);
PiperOrigin-RevId: 216592527
This CL applies the same pattern as AffineMap to IntegerSet: a simple struct
that acts as the storage is allocated in the bump pointer. The IntegerSet is
immutable and accessed everywhere by value.
Note that unlike AffineMap, it is not possible to remove the MLIRContext
parameter when constructing an IntegerSet for now. One possible way to achieve
this would be to add an enum to distinguish between the mathematically empty
set, the universe set and other sets.
This is left for future discussion.
PiperOrigin-RevId: 216545361
This CL applies the same pattern as AffineExpr to AffineMap: a simple struct
that acts as the storage is allocated in the bump pointer. The AffineMap is
immutable and accessed everywhere by value.
PiperOrigin-RevId: 216445930
This CL sketches what it takes for AffineExpr to fully have by-value semantics
and not be a not-so-smart pointer anymore.
This essentially makes the underyling class a simple storage struct and
implements the operations on the value type directly. Since there is no
forwarding of operations anymore, we can full isolate the storage class and
make a hard visibility barrier by moving detail::AffineExpr into
AffineExprDetail.h.
AffineExprDetail.h is only included where storage-related information is
needed.
PiperOrigin-RevId: 216385459
This CL:
1. performs the global codemod AffineXExpr->AffineXExprClass and
AffineXExprRef -> AffineXExpr;
2. simplifies function calls by removing the redundant MLIRContext parameter;
3. adds missing binary operator versions of scalar op AffineExpr where it
makes sense.
PiperOrigin-RevId: 216242674
This CL introduces a series of cleanups for AffineExpr value types:
1. to make it clear that the value types should be used, the pointer
AffineExpr types are put in the detail namespace. Unfortunately, since the
value type operator-> only forwards to the underlying pointer type, we
still
need to expose this in the include file for now;
2. AffineExprKind is ok to use, it thus comes out of detail and thus of
AffineExpr
3. getAffineDimExpr, getAffineSymbolExpr, getAffineConstantExpr are
similarly
extracted as free functions and their naming is mande consistent across
Builder, MLContext and AffineExpr
4. AffineBinaryOpEx::simplify functions are made into static free
functions.
In particular it is moved away from AffineMap.cpp where it does not belong
5. operator AffineExprType is made explicit
6. uses the binary operators everywhere possible
7. drops the pointer usage everywhere outside of AffineExpr.cpp,
MLIRContext.cpp and AsmPrinter.cpp
PiperOrigin-RevId: 216207212
This CL makes AffineExprRef into a value type.
Notably:
1. drops llvm isa, cast, dyn_cast on pointer type and uses member functions on
the value type. It may be possible to still use classof (in a followup CL)
2. AffineBaseExprRef aggressively casts constness away: if we mean the type is
immutable then let's jump in with both feet;
3. Drop implicit casts to the underlying pointer type because that always
results in surprising behavior and is not needed in practice once enough
cleanup has been applied.
The remaining negative I see is that we still need to mix operator. and
operator->. There is an ugly solution that forwards the methods but that ends
up duplicating the class hierarchy which I tried to avoid as much as
possible. But maybe it's not that bad anymore since AffineExpr.h would still
contain a single class hierarchy (the duplication would be impl detail in.cpp)
PiperOrigin-RevId: 216188003
This CL starts by replacing AffineExpr* with value-type AffineExprRef in a few
places in the IR. By a domino effect that is pretty telling of the
inconsistencies in the codebase, const is removed where it makes sense.
The rationale is that the decision was concisously made that unique'd types
have pointer semantics without const specifier. This is fine but we should be
consistent. In the end, the only logical invariant is that there should never
be such a thing as a const AffineExpr*, const AffineMap* or const IntegerSet*
in our codebase.
This CL takes a number of shortcuts to killing const with fire, in particular
forcing const AffineExprRef to return the underlying non-const
AffineExpr*. This will be removed once AffineExpr* has disappeared in
containers but for now such shortcuts allow a bit of sanity in this long quest
for cleanups.
The **only** places where const AffineExpr*, const AffineMap* or const
IntegerSet* may still appear is by transitive needs from containers,
comparison operators etc.
There is still one major thing remaining here: figure out why cast/dyn_cast
return me a const AffineXXX*, which in turn requires a bunch of ugly
const_casts. I suspect this is due to the classof
taking const AffineXXXExpr*. I wonder whether this is a side effect of 1., if
it is coming from llvm itself (I'd doubt it) or something else (clattner@?)
In light of this, the whole discussion about const makes total sense to me now
and I would systematically apply the rule that in the end, we should never
have any const XXX in our codebase for unique'd types (assuming we can remove
them all in containers and no additional constness constraint is added on us
from the outside world).
PiperOrigin-RevId: 215811554
This CL implements AffineExprBaseRef as a templated type to allow LLVM-style
casts to work properly. This also allows making AffineExprBaseRef::expr
private.
To achieve this, it is necessary to use llvm::simplify_type and make
AffineConstExpr derive from both AffineExpr and llvm::simplify<AffineExprRef>.
Note that llvm::simplify_type is just an interface to enable the proper
template resolution of isa/cast/dyn_cast but it otherwise holds no value.
Lastly note that certain dyn_cast operations wanted the const AffineExpr* form
of AffineExprBaseRef so I made the implicit constructor take that by default
and documented the immutable behavior. I think this is consistent with the
decision to make unique'd type immutable by convention and never use const on
them.
PiperOrigin-RevId: 215642247
This CL uniformizes the uses of AffineExprWrap outside of IR.
The public API of AffineExpr builder is modified to only use AffineExprWrap.
A few places access AffineExprWrap.expr, this is only while the API is in
transition to easily keep track (i.e. make expr private and let the compiler
track the errors).
Parser.cpp exhibits patterns that are dependent on nullptr values so
converting it is left for another CL.
PiperOrigin-RevId: 215642005
- introduce mlir::{floorDiv, ceilDiv, mod} for constant inputs in
mlir/Support/MathExtras.h
- consistently use these everywhere in IR, Analysis, and Transforms.
PiperOrigin-RevId: 215580677
verifier. We get most of this infrastructure directly from LLVM, we just
need to adapt it to our CFG abstraction.
This has a few unrelated changes engangled in it:
- getFunction() in various classes was const incorrect, fix it.
- This moves Verifier.cpp to the analysis library, since Verifier depends on
dominance and these are both really analyses.
- IndexedAccessorIterator::reference was defined wrong, leading to really
exciting template errors that were fun to diagnose.
- This flips the boolean sense of the foldOperation() function in constant
folding pass in response to previous patch feedback.
PiperOrigin-RevId: 214046593
- extend loop unroll-jam similar to loop unroll for affine bounds
- extend both loop unroll/unroll-jam to deal with cleanup loop for non multiple
of unroll factor.
- extend promotion of single iteration loops to work with affine bounds
- fix typo bugs in loop unroll
- refactor common code b/w loop unroll and loop unroll-jam
- move prototypes of non-pass transforms to LoopUtils.h
- add additional builder methods.
- introduce loopUnrollUpTo(factor) to unroll by either factor or trip count,
whichever is less.
- remove Statement::isInnermost (not used for now - will come back at the right
place/in right form later)
PiperOrigin-RevId: 213471227
Use these methods to simplify existing code. Rename getConstantMap
getConstantAffineMap. Move declarations to group similar ones together.
PiperOrigin-RevId: 212814829
unroll/unroll-and-jam more powerful; add additional affine expr builder methods
- use previously added analysis/simplification to infer multiple of unroll
factor trip counts, making loop unroll/unroll-and-jam more general.
- for loop unroll, support bounds that are single result affine map's with the
same set of operands. For unknown loop bounds, loop unroll will now work as
long as trip count can be determined to be a multiple of unroll factor.
- extend getConstantTripCount to deal with single result affine map's with the
same operands. move it to mlir/Analysis/LoopAnalysis.cpp
- add additional builder utility methods for affine expr arithmetic
(difference, mod/floordiv/ceildiv w.r.t postitive constant). simplify code to
use the utility methods.
- move affine analysis routines to AffineAnalysis.cpp/.h from
AffineStructures.cpp/.h.
- Rename LoopUnrollJam to LoopUnrollAndJam to match class name.
- add an additional simplification for simplifyFloorDiv, simplifyCeilDiv
- Rename AffineMap::getNumOperands() getNumInputs: an affine map by itself does
not have operands. Operands are passed to it through affine_apply, from loop
bounds/if condition's, etc., operands are stored in the latter.
This should be sufficiently powerful for now as far as unroll/unroll-and-jam go for TPU
code generation, and can move to other analyses/transformations.
Loop nests like these are now unrolled without any cleanup loop being generated.
for %i = 1 to 100 {
// unroll factor 4: no cleanup loop will be generated.
for %j = (d0) -> (d0) (%i) to (d0) -> (5*d0 + 3) (%i) {
%x = "foo"(%j) : (affineint) -> i32
}
}
for %i = 1 to 100 {
// unroll factor 4: no cleanup loop will be generated.
for %j = (d0) -> (d0) (%i) to (d0) -> (d0 - d mod 4 - 1) (%i) {
%y = "foo"(%j) : (affineint) -> i32
}
}
for %i = 1 to 100 {
for %j = (d0) -> (d0) (%i) to (d0) -> (d0 + 128) (%i) {
%x = "foo"() : () -> i32
}
}
TODO(bondhugula): extend this to LoopUnrollAndJam as well in the next CL (with minor
changes).
PiperOrigin-RevId: 212661212
- handle floordiv/ceildiv in AffineExprFlattener; update the simplification to
work even if mod/floordiv/ceildiv expressions appearing in the tree can't be eliminated.
- refactor the flattening / analysis to move it out of lib/Transforms/
- fix MutableAffineMap::isMultipleOf
- add AffineBinaryOpExpr:getAdd/getMul/... utility methods
PiperOrigin-RevId: 211540536
Outside of IR/
- simplify a MutableAffineMap by flattening the affine expressions
- add a simplify affine expression pass that uses this analysis
- update the FlatAffineConstraints API (to be used in the next CL)
In IR:
- add isMultipleOf and getKnownGCD for AffineExpr, and make the in-IR
simplication of simplifyMod simpler and more powerful.
- rename the AffineExpr visitor methods to distinguish b/w visiting and
walking, and to simplify API names based on context.
The next CL will use some of these for the loop unrolling/unroll-jam to make
the detection for the need of cleanup loop powerful/non-trivial.
A future CL will finally move this simplification to FlatAffineConstraints to
make it more powerful. For eg., currently, even if a mod expr appearing in a
part of the expression tree can't be simplified, the whole thing won't be
simplified.
PiperOrigin-RevId: 211012256
- introduce hyper-rectangular set representation and API sketch for
analysis/code generation
- implement the 'intersect' and 'project out' operations.
The represention is lighter weight, and operations and other queries on it are
much faster on such domains when compared to general polyhedral domains.
PiperOrigin-RevId: 210245882
FlatAffineConstraints, and MutableAffineMap.
All four classes introduced reside in lib/Analysis and are not meant to be
used in the IR (from lib/IR or lib/Parser/). They are all mutable, alloc'ed,
dealloc'ed - although with their fields pointing to immutable affine
expressions (AffineExpr *).
While on this, update simplifyMod to fold mod to a zero when possible.
PiperOrigin-RevId: 209618437