This change does two main things
1) An operation might have multiple dependences to the same
producer. Not tracking them correctly can result in incorrect code
generation with fusion. To rectify this the dependence tracking
needs to also have the operand number in the consumer.
2) Improve the logic used to find the fused loops making it easier to
follow. The only constraint for fusion is that linalg ops (on
buffers) have update semantics for the result. Fusion should be
such that only one iteration of the fused loop (which is also a
tiled loop) must touch only one (disjoint) tile of the output. This
could be relaxed by allowing for recomputation that is the default
when oeprands are tensors, or can be made legal with promotion of
the fused view (in future).
Differential Revision: https://reviews.llvm.org/D90579