!18709 Add EPP-MVSNet to model zoo

Merge pull request !18709 from NewMesc/master
2021-06-23 02:40:04 +00:00 · 2021-06-23 02:40:04 +00:00 · 8a417aa456
parent 7cb7a23f9b 22aed66032
commit 8a417aa456
8 changed files with 2084 additions and 0 deletions
--- a/model_zoo/research/cv/eppmvsnet/README.md
+++ b/model_zoo/research/cv/eppmvsnet/README.md
@ -0,0 +1,148 @@
 # Contents
 - [EPP-MVSNet](#thinking-path-re-ranker)
 - [Model Architecture](#model-architecture)
 - [Dataset](#dataset)
 - [Features](#features)
    - [Mixed Precision](#mixed-precision)
 - [Environment Requirements](#environment-requirements)
 - [Quick Start](#quick-start)
 - [Script Description](#script-description)
    - [Script and Sample Code](#script-and-sample-code)
    - [Script Parameters](#script-parameters)
    - [Training Process](#training-process)
        - [Training](#training)
    - [Evaluation Process](#evaluation-process)
        - [Evaluation](#evaluation)
 - [Model Description](#model-description)
    - [Performance](#performance)
 - [Description of random situation](#description-of-random-situation)
 - [ModelZoo Homepage](#modelzoo-homepage)
 # [EPP-MVSNet](#contents)
 EPP-MVSNet was proposed in 2021 by Parallel Distributed Computing Lab & Huawei Riemann Lab. By aggregating features at high resolution to a
 limited cost volume with an optimal depth range, thus, EPP-MVSNet leads to effective and efﬁcient 3D construction. Moreover, EPP-MVSNet achieved
 highest F-Score on the online TNT intermediate benchmark. This is a example of evaluation of EPP-MVSNet with BlendedMVS dataset in MindSpore. More
 importantly, this is the first open source version for EPP-MVSNet.
 # [Model Architecture](#contents)
 Specially, EPP-MVSNet contains two main modules. The first part is feature extraction network, which extracts 2D features of a group of pictures(one reference
 view and some source views) iteratively. The second part which contains three stages, iteratively regularizes the 3D cost volume composed of 2D features
 using homography transformation, and finally predicts depth map.
 # [Dataset](#contents)
 The dataset used in this example is BlendedMVS, which is a large-scale MVS dataset for generalized multi-view stereo networks. The dataset contains
 17k MVS training samples covering a variety of 113 scenes, including architectures, sculptures and small objects.
 # [Features](#contents)
 ## [3D Feature](#contents)
 This implementation version of EPP-MVSNet utilizes the newest 3D features of MindSpore.
 # [Environment Requirements](#contents)
 - Hardware (GPU)
 - Framework
    - [MindSpore](https://www.mindspore.cn/install/en)
 - For more information, please check the resources below:
    - [MindSpore Tutorials](https://www.mindspore.cn/tutorial/training/en/master/index.html)
    - [MindSpore Python API](https://www.mindspore.cn/doc/api_python/en/master/index.html)
 # [Quick Start](#contents)
 After installing MindSpore via the official website and Dataset is correctly generated, you can start training and evaluation as follows.
 - running on GPU
  ```python
  # run evaluation example with BlendedMVS dataset
  sh eval.sh
  ```
 # [Script Description](#contents)
 ## [Script and Sample Code](#contents)
 ```shell
 .
 └─eppmvsnet
  ├─README.md
  ├─scripts
  | └─run_eval.sh                       # Launch evaluation in gpu
  |
  ├─src
  | ├─blendedmvs.py                     # build blendedmvs data
  | ├─eppmvsnet.py                      # main architecture of EPP-MVSNet
  | ├─modules.py                        # math operations used in EPP-MVSNet
  | ├─networks.py                       # sub-networks of EPP-MVSNet
  | └─utils.py                          # other operations used for evaluation
  |
  ├─validate.py                         # Evaluation process on blendedmvs
 ```
 ## [Script Parameters](#contents)
 Parameters for EPP-MVSNet evaluation can be set in validate.py.
 - config for EPP-MVSNet
  ```python
  "n_views": 5,                         # Num of views used in a depth prediction
  "depth_interval": 128,                # Init depth numbers
  "n_depths": [32, 16, 8],              # Depth numbers of three stages
  "interval_ratios": [4.0, 2.0, 1.0],   # Depth interval's expanding ratios of three stages
  "img_wh": [768, 512],                 # Image resolution of evaluation
  ```
  validate.py for more configuration.
 ## [Evaluation Process](#contents)
 ### Evaluation
 - EPP-MVSNet evaluation on GPU
  ```python
  sh eval.sh
  ```
  Evaluation result will be stored in "./results/blendedmvs/val/metrics.txt". You can find the result like the
  followings in log.
  ```python
  stage3_l1_loss:1.1738
  stage3_less1_acc:0.8734
  stage3_less3_acc:0.938
  mean forward time(s/pic):0.2697
  ```
 # [Model Description](#contents)
 ## [Performance](#contents)
 ### Inference Performance
 | Parameter                      | EPP-MVSNet GPU               |
 | ------------------------------ | ---------------------------- |
 | Model Version                  | Inception V1                 |
 | Resource                       | Tesla V100 16GB; Ubuntu16.04 |
 | uploaded Date                  | 06/22/2021(month/day/year)   |
 | MindSpore Version              | 1.3.0                        |
 | Dataset                        | BlendedMVS                   |
 | Batch_size                     | 1                            |
 | Output                         | ./results/blendedmvs/val     |
 | Acc_less_1mm                   | 0.8734                       |
 | Acc_less_3mm                   | 0.938                        |
 | mean_time(s/pic)               | 0.2697                       |
 # [Description of random situation](#contents)
 No random situation for evaluation.
 # [ModelZoo Homepage](#contents)
 Please check the official [homepage](http://gitee.com/mindspore/mindspore/tree/master/model_zoo).
--- a/model_zoo/research/cv/eppmvsnet/scripts/eval.sh
+++ b/model_zoo/research/cv/eppmvsnet/scripts/eval.sh
@ -0,0 +1,24 @@
 #!/bin/bash
 # Copyright 2021 Huawei Technologies Co., Ltd
 #
 # Licensed under the Apache License, Version 2.0 (the "License");
 # you may not use this file except in compliance with the License.
 # You may obtain a copy of the License at
 #
 # http://www.apache.org/licenses/LICENSE-2.0
 #
 # Unless required by applicable law or agreed to in writing, software
 # distributed under the License is distributed on an "AS IS" BASIS,
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
 # ============================================================================
 # eval script
 SRC_NUM=4
 VIEW_NUM=$[${SRC_NUM}+1]
 DATAPATH="./data/blendedmvs/dataset_low_res"
 python -u validate.py --root_dir ${DATAPATH} --dataset_name blendedmvs --save_visual --img_wh 768 576 --n_views ${VIEW_NUM} --n_depths 32 16 8 --interval_ratios 4.0 2.0 1.0 --levels 3 --split val --gpu_id 0 > log.txt 2>&1 &
 cd ..
--- a/model_zoo/research/cv/eppmvsnet/src/blendedmvs.py
+++ b/model_zoo/research/cv/eppmvsnet/src/blendedmvs.py
@ -0,0 +1,276 @@
 # Copyright 2021 Huawei Technologies Co., Ltd
 #
 # Licensed under the Apache License, Version 2.0 (the "License");
 # you may not use this file except in compliance with the License.
 # You may obtain a copy of the License at
 #
 # http://www.apache.org/licenses/LICENSE-2.0
 #
 # Unless required by applicable law or agreed to in writing, software
 # distributed under the License is distributed on an "AS IS" BASIS,
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
 # ============================================================================
 """blendedmvs dataset"""
 import os
 from collections import defaultdict
 from copy import deepcopy
 import cv2
 import numpy as np
 from PIL import Image
 import mindspore.dataset.vision.py_transforms as py_vision
 from src.utils import read_pfm
 class Compose:
    """Composes several transforms together.
    Args:
        transforms (list of ``Transform`` objects): list of transforms to compose.
    Example:
        >>> transforms.Compose([
        >>>     transforms.CenterCrop(10),
        >>>     transforms.ToTensor(),
        >>> ])
    """
    def __init__(self, transforms):
        self.transforms = transforms
    def __call__(self, img):
        for t in self.transforms:
            img = t(img)
        return img
    def __repr__(self):
        format_string = self.__class__.__name__ + '('
        for t in self.transforms:
            format_string += '\n'
            format_string += '    {0}'.format(t)
        format_string += '\n)'
        return format_string
 class BlendedMVSDataset:
    """blendedmvs dataset"""
    def __init__(self, root_dir, split, n_views=3, levels=3, depth_interval=128.0, img_wh=(768, 576),
                 crop_wh=(640, 512), scale=False, scan=None, training_tag=False):
        """
        img_wh should be set to a tuple ex: (1152, 864) to enable test mode!
        """
        self.root_dir = root_dir
        self.split = split
        self.scale = scale
        self.training_tag = training_tag
        assert self.split in ['train', 'val', 'all'], \
            'split must be either "train", "val" or "all"!'
        self.img_wh = img_wh
        if img_wh is not None:
            assert img_wh[0] % 32 == 0 and img_wh[1] % 32 == 0, \
                'img_wh must both be multiples of 32!'
        self.single_scan = scan
        self.crop_wh = crop_wh
        if crop_wh is not None:
            assert crop_wh[0] % 32 == 0 and crop_wh[1] % 32 == 0, \
                'img_wh must both be multiples of 32!'
        self.n_views = n_views
        self.levels = levels  # FPN levels
        self.n_depths = depth_interval
        self.build_metas()
        self.cal_crop_factors()
        self.build_proj_mats()
        self.define_transforms()
    def cal_crop_factors(self):
        """"calculate crop factors"""
        self.start_w = (self.img_wh[0] - self.crop_wh[0]) // 2
        self.start_h = (self.img_wh[1] - self.crop_wh[1]) // 2
        self.finish_w = self.start_w + self.crop_wh[0]
        self.finish_h = self.start_h + self.crop_wh[1]
    def build_metas(self):
        """"build meta information"""
        self.metas = []
        self.ref_views_per_scan = defaultdict(list)
        if self.split == 'train':
            list_txt = os.path.join(self.root_dir, 'training_list.txt')
        elif self.split == 'val':
            list_txt = os.path.join(self.root_dir, 'validation_list.txt')
        else:
            list_txt = os.path.join(self.root_dir, 'all_list.txt')
        if self.single_scan is not None:
            self.scans = self.single_scan if isinstance(self.single_scan, list) else [self.single_scan]
        else:
            with open(list_txt) as f:
                self.scans = [line.rstrip() for line in f.readlines()]
        for scan in self.scans:
            with open(os.path.join(self.root_dir, scan, "cams/pair.txt")) as f:
                num_viewpoint = int(f.readline())
                for _ in range(num_viewpoint):
                    ref_view = int(f.readline().rstrip())
                    self.ref_views_per_scan[scan] += [ref_view]
                    line = f.readline().rstrip().split()
                    n_views_valid = int(line[0])  # valid views
                    if n_views_valid < self.n_views:  # skip no enough valid views
                        continue
                    src_views = [int(x) for x in line[1::2]]
                    self.metas += [(scan, -1, ref_view, src_views)]
    def build_proj_mats(self):
        """"build projection matrix"""
        self.proj_mats = {}  # proj mats for each scan
        if self.root_dir.endswith('dataset_low_res') \
                or self.root_dir.endswith('dataset_low_res/'):
            img_w, img_h = 768, 576
        else:
            img_w, img_h = 2048, 1536
        for scan in self.scans:
            self.proj_mats[scan] = {}
            for vid in self.ref_views_per_scan[scan]:
                proj_mat_filename = os.path.join(self.root_dir, scan,
                                                 f'cams/{vid:08d}_cam.txt')
                intrinsics, extrinsics, depth_min, depth_max = \
                    self.read_cam_file(scan, proj_mat_filename)
                intrinsics[0] *= self.img_wh[0] / img_w / 8
                intrinsics[1] *= self.img_wh[1] / img_h / 8
                # center crop
                if self.training_tag:
                    intrinsics[0, 2] = intrinsics[0, 2] - self.start_w / 8
                    intrinsics[1, 2] = intrinsics[1, 2] - self.start_h / 8
                # multiply intrinsics and extrinsics to get projection matrix
                proj_mat_ls = []
                for _ in reversed(range(self.levels)):
                    proj_mat_l = np.eye(4)
                    proj_mat_l[:3, :4] = intrinsics @ extrinsics[:3, :4]
                    intrinsics[:2] *= 2  # 1/8->1/4->1/2
                    proj_mat_ls += [proj_mat_l]
                proj_mat_ls = np.stack(proj_mat_ls[::-1]).astype(dtype=np.float32)
                self.proj_mats[scan][vid] = (proj_mat_ls, depth_min, depth_max)
    def read_cam_file(self, scan, filename):
        """"read camera file"""
        with open(filename) as f:
            lines = [line.rstrip() for line in f.readlines()]
        # extrinsics: line [1,5), 4x4 matrix
        extrinsics = np.fromstring(' '.join(lines[1:5]), dtype=np.float32, sep=' ')
        extrinsics = extrinsics.reshape((4, 4))
        # intrinsics: line [7-10), 3x3 matrix
        intrinsics = np.fromstring(' '.join(lines[7:10]), dtype=np.float32, sep=' ')
        intrinsics = intrinsics.reshape((3, 3))
        # depth_min & depth_interval: line 11
        depth_min = float(lines[11].split()[0])
        depth_max = float(lines[11].split()[3])
        return intrinsics, extrinsics, depth_min, depth_max
    def read_depth_and_mask(self, scan, filename, depth_min):
        """"read depth and mask"""
        depth = np.array(read_pfm(filename)[0], dtype=np.float32)
        h, w = depth.shape
        if (h, w) != self.img_wh:
            depth_0 = cv2.resize(depth, self.img_wh,
                                 interpolation=cv2.INTER_LINEAR)
            if self.training_tag:
                depth_0 = depth_0[self.start_h:self.finish_h, self.start_w:self.finish_w]
        depth_0 = cv2.resize(depth_0, None, fx=0.5, fy=0.5,
                             interpolation=cv2.INTER_LINEAR)
        depth_1 = cv2.resize(depth_0, None, fx=0.5, fy=0.5,
                             interpolation=cv2.INTER_LINEAR)
        depth_2 = cv2.resize(depth_1, None, fx=0.5, fy=0.5,
                             interpolation=cv2.INTER_LINEAR)
        depths = {"level_0": depth_0,
                  "level_1": depth_1,
                  "level_2": depth_2}
        masks = {"level_0": depth_0 > depth_min,
                 "level_1": depth_1 > depth_min,
                 "level_2": depth_2 > depth_min}
        depth_max = depth_0.max()
        return depths, masks, depth_max
    def define_transforms(self):
        if self.training_tag and self.split == 'train':  # you can add augmentation here
            self.transform = Compose([
                py_vision.ToTensor(),
                py_vision.Normalize(mean=[0.485, 0.456, 0.406],
                                    std=[0.229, 0.224, 0.225]),
            ])
        else:
            self.transform = Compose([
                py_vision.ToTensor(),
                py_vision.Normalize(mean=[0.485, 0.456, 0.406],
                                    std=[0.229, 0.224, 0.225]),
            ])
    def __len__(self):
        return len(self.metas)
    def __getitem__(self, idx):
        sample = {}
        scan, _, ref_view, src_views = self.metas[idx]
        # use only the reference view and first nviews-1 source views
        view_ids = [ref_view] + src_views[:self.n_views - 1]
        imgs = []
        proj_mats = []  # record proj mats between views
        for i, vid in enumerate(view_ids):
            img_filename = os.path.join(self.root_dir, f'{scan}/blended_images/{vid:08d}.jpg')
            depth_filename = os.path.join(self.root_dir, f'{scan}/rendered_depth_maps/{vid:08d}.pfm')
            img = Image.open(img_filename)
            w, h = img.size
            if (h, w) != self.img_wh:
                img = img.resize(self.img_wh, Image.BILINEAR)
            if self.training_tag:
                img = img.crop((self.start_w, self.start_h, self.finish_w, self.finish_h))
            img = self.transform(img)
            imgs += [img]
            proj_mat_ls, depth_min, depth_max = deepcopy(self.proj_mats[scan][vid])
            if i == 0:  # reference view
                if self.split == 'train':
                    depths, masks, depth_max = self.read_depth_and_mask(scan, depth_filename, depth_min)
                elif self.split == 'val':
                    if self.training_tag:
                        depths, masks, depth_max = self.read_depth_and_mask(scan, depth_filename, depth_min)
                    else:
                        depths, masks, _ = self.read_depth_and_mask(scan, depth_filename, depth_min)
                else:
                    raise ValueError
                fix_depth_interval = (depth_max - depth_min) / self.n_depths
                depth_interval = fix_depth_interval
                sample['init_depth_min'] = [depth_min]
                sample['depth_interval'] = [depth_interval]
                sample['fix_depth_interval'] = [fix_depth_interval]
                ref_proj_inv = np.asarray(proj_mat_ls)
                for j in range(proj_mat_ls.shape[0]):
                    ref_proj_inv[j] = np.mat(proj_mat_ls[j]).I
            else:
                proj_mats += [proj_mat_ls @ ref_proj_inv]
        imgs = np.stack(imgs)
        proj_mats = np.stack(proj_mats)[:, :, :3]  # (V-1, self.levels, 3, 4) from fine to coarse
        depth_0 = depths['level_0']
        mask_0 = masks["level_0"]
        sample['imgs'] = imgs
        sample['proj_mats'] = proj_mats
        sample['depths'] = depths
        sample['masks'] = masks
        sample['scan_vid'] = (scan, ref_view)
        return imgs, proj_mats, np.array(sample['init_depth_min'], dtype=np.float32), \
               np.array(sample['depth_interval'], dtype=np.float32), np.fromstring(scan, dtype=np.uint8), \
               np.array(ref_view), depth_0, mask_0, np.array(sample['fix_depth_interval'], dtype=np.float32)
--- a/model_zoo/research/cv/eppmvsnet/src/eppmvsnet.py
+++ b/model_zoo/research/cv/eppmvsnet/src/eppmvsnet.py
@ -0,0 +1,501 @@
 # Copyright 2021 Huawei Technologies Co., Ltd
 #
 # Licensed under the Apache License, Version 2.0 (the "License");
 # you may not use this file except in compliance with the License.
 # You may obtain a copy of the License at
 #
 # http://www.apache.org/licenses/LICENSE-2.0
 #
 # Unless required by applicable law or agreed to in writing, software
 # distributed under the License is distributed on an "AS IS" BASIS,
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
 # ============================================================================
 """main architecture of EPP-MVSNet"""
 import mindspore.nn as nn
 import mindspore.common as mstype
 from mindspore import Tensor
 from mindspore.ops import operations as P
 from src.modules import get_depth_values, determine_center_pixel_interval, groupwise_correlation, entropy_num_based, \
    HomoWarp
 from src.networks import UNet2D, CostCompression, CoarseStageRegPair, CoarseStageRegFuse, StageRegFuse
 class SingleStage(nn.Cell):
    """single stage"""
    def __init__(self, fuse_reg, height, width, entropy_range=False):
        super(SingleStage, self).__init__()
        self.fuse_reg = fuse_reg
        self.entropy_range = entropy_range
        self.transpose = P.Transpose()
        self.zeros = P.Zeros()
        self.linspace = P.LinSpace()
        self.exp = P.Exp()
        self.expand_dims = P.ExpandDims()
        self.squeeze_1 = P.Squeeze(1)
        self.pow = P.Pow()
        self.tile = P.Tile()
        self.homo_warp = HomoWarp(height, width)
    def construct(self, sample, depth_num, depth_start_override=None, depth_interval_override=None,
                  uncertainty_maps=None):
        """construct function of single stage"""
        ref_feat, src_feats, proj_mats = sample
        depth_start = depth_start_override  # n111 or n1hw
        depth_interval = depth_interval_override  # n111
        D = depth_num
        B, C, H, W = ref_feat.shape
        depth_interval = depth_interval.view(B, 1, 1, 1)
        interim_scale = 1
        ref_ncdhw = self.expand_dims(ref_feat, 2).view(B, C, 1, -1)
        ref_ncdhw = self.tile(ref_ncdhw, (1, 1, D, 1)).view(B, C, D, H, W)
        pair_results = []  # MVS
        weight_sum = self.zeros((ref_ncdhw.shape[0], 1, 1, ref_ncdhw.shape[3] // interim_scale,
                                 ref_ncdhw.shape[4] // interim_scale), mstype.float32)
        fused_interim = self.zeros((ref_ncdhw.shape[0], 8, ref_ncdhw.shape[2] // interim_scale, ref_ncdhw.shape[3] //
                                    interim_scale, ref_ncdhw.shape[4] // interim_scale), mstype.float32)
        depth_values = get_depth_values(depth_start, D, depth_interval, False)
        for i in range(src_feats.shape[1]):
            src_feat = src_feats[:, i]
            proj_mat = proj_mats[:, i]
            uncertainty_map = uncertainty_maps[i]
            warped_src = self.homo_warp(src_feat, proj_mat, depth_values)
            cost_volume = groupwise_correlation(ref_ncdhw, warped_src, 8, 1)
            interim = cost_volume
            heads = [uncertainty_map]
            weight = self.expand_dims(self.exp(-heads[0]), 2)
            weight_sum = weight_sum + weight
            fused_interim = fused_interim + interim * weight
        fused_interim /= weight_sum
        est_depth, prob_map, prob_volume = self.fuse_reg(fused_interim, depth_values)
        if self.entropy_range:
            # mean entropy
            entropy_num = entropy_num_based(prob_volume, dim=1, depth_num=D, keepdim=True)
            conf_range = self.pow(D, entropy_num).view(B, -1).mean()
            return est_depth, prob_map, pair_results, conf_range  # MVS
        return est_depth, prob_map, pair_results  # MVS
 class SingleStageP1(nn.Cell):
    """part1 of single stage 1"""
    def __init__(self, depth_number=32):
        super(SingleStageP1, self).__init__()
        self.transpose = P.Transpose()
        self.zeros = P.Zeros()
        self.linspace = P.LinSpace()
        self.expand_dims = P.ExpandDims()
        self.squeeze_1 = P.Squeeze(1)
        self.squeeze_last = P.Squeeze(-1)
        self.tile = P.Tile()
        self.zero = Tensor(0, mstype.float32)
        self.compression_ratio = 1
        self.depth_number = depth_number - self.compression_ratio + 1
        self.D = Tensor(self.depth_number - 1, mstype.float32)
    def construct(self, sample, depth_num, depth_start_override=None, depth_interval_override=None, timing=True):
        """construct function of part1 of single stage 1"""
        ref_feat, _, _ = sample
        depth_start = depth_start_override  # n111 or n1hw
        depth_interval = depth_interval_override  # n111
        depth_num *= self.compression_ratio
        depth_interval /= self.compression_ratio
        D = depth_num
        B = ref_feat.shape[0]
        depth_interval = depth_interval.view(B, 1, 1, 1)
        depth_start = depth_start.reshape(B, 1, 1, 1)
        h, w = ref_feat.shape[-2:]
        depth_end = depth_start + (D - 1) * depth_interval
        inverse_depth_interval = (1 / depth_start - 1 / depth_end) / (D - self.compression_ratio)
        depth_values = 1 / depth_end + inverse_depth_interval * \
                       self.linspace(self.zero, self.D, self.depth_number)  # (D)
        depth_values = 1.0 / depth_values
        single_depth_value = depth_values.reshape(B, self.depth_number, 1, 1)
        depth_values = self.tile(single_depth_value, (1, 1, h, w))
        single_depth_value = single_depth_value[:, ::self.compression_ratio, :, :]
        return depth_values, single_depth_value
 class SingleStageP3(nn.Cell):
    """part3 of single stage 1"""
    def __init__(self, entropy_range=False, compression_ratio=5):
        super(SingleStageP3, self).__init__()
        self.pair_reg = CoarseStageRegPair()
        self.fuse_reg = CoarseStageRegFuse()
        self.entropy_range = entropy_range
        self.compression_ratio = compression_ratio
        self.transpose = P.Transpose()
        self.zeros = P.Zeros()
        self.linspace = P.LinSpace()
        self.expand_dims = P.ExpandDims()
        self.squeeze_1 = P.Squeeze(1)
        self.squeeze_last = P.Squeeze(-1)
        self.tile = P.Tile()
        self.exp = P.Exp()
        self.pow = P.Pow()
    def construct(self, cost_volume_list, depth_values, sample, depth_num):
        """construct function"""
        ref_feat, _, _ = sample
        B, _, _, _ = ref_feat.shape
        d_scale = 1
        interim_scale = 1
        ref_ncdhw = self.expand_dims(ref_feat, 2)
        pair_results = []  # MVS
        weight_sum = self.zeros((ref_ncdhw.shape[0], 1, 1, ref_ncdhw.shape[3] // interim_scale,
                                 ref_ncdhw.shape[4] // interim_scale), mstype.float32)
        fused_interim = self.zeros((ref_ncdhw.shape[0], 8, depth_num // d_scale,
                                    ref_ncdhw.shape[3] // interim_scale, ref_ncdhw.shape[4] // interim_scale),
                                   mstype.float32)
        for i in range(cost_volume_list.shape[1]):
            cost_volume = cost_volume_list[:, i]
            interim, est_depth, uncertainty_map, occ = self.pair_reg(cost_volume, depth_values)
            pair_results.append([est_depth, [uncertainty_map, occ]])
            weight = self.expand_dims(self.exp(-uncertainty_map), 2)
            weight_sum = weight_sum + weight
            fused_interim = fused_interim + interim * weight
        fused_interim /= weight_sum
        est_depth, prob_map, prob_volume = self.fuse_reg(fused_interim, depth_values)
        if self.entropy_range:
            # mean entropy
            entropy_num = entropy_num_based(prob_volume, dim=1, depth_num=depth_num, keepdim=True)
            conf_range = self.pow(depth_num, entropy_num).view(B, -1).mean()
            return est_depth, prob_map, pair_results, conf_range  # MVS
        return est_depth, prob_map, pair_results  # MVS
 class SingleStageP2_S1(nn.Cell):
    """0 interpolation, part2 of single stage 1"""
    def __init__(self, cost_compression, height=32, width=40):
        super(SingleStageP2_S1, self).__init__()
        self.cost_compression = cost_compression
        self.transpose = P.Transpose()
        self.zeros = P.Zeros()
        self.linspace = P.LinSpace()
        self.expand_dims = P.ExpandDims()
        self.squeeze_1 = P.Squeeze(1)
        self.squeeze_last = P.Squeeze(-1)
        self.tile = P.Tile()
        self.homo_warp = HomoWarp(height, width)
        self.stack = P.Stack(1)
    def construct(self, sample, depth_num, depth_start_override=None, depth_interval_override=None, depth_values=None,
                  idx=None):
        """construct function"""
        ref_feat, src_feats, proj_mats = sample
        compression_ratio = 1
        depth_num *= compression_ratio
        B, C, H, W = ref_feat.shape
        ref_ncdhw = self.expand_dims(ref_feat, 2).view(B, C, 1, -1)
        ref_ncdhw = self.tile(ref_ncdhw, (1, 1, 32, 1)).view(B, C, 32, H, W)
        src_feat = src_feats[:, idx]
        proj_mat = proj_mats[:, idx]
        src_depth_values = depth_values
        warped_src = self.homo_warp(src_feat, proj_mat, src_depth_values)
        cost_volume = groupwise_correlation(ref_ncdhw, warped_src, 8, 1)
        # dynamic max pool
        cost_volume = self.cost_compression(cost_volume)
        return cost_volume
 class SingleStageP2_S3(nn.Cell):
    """2 interpolation, part2 of single stage 1"""
    def __init__(self, cost_compression, height=64, width=80, depth_number=96, depth_ratio=3, compression_ratio=5):
        super(SingleStageP2_S3, self).__init__()
        self.cost_compression = cost_compression
        self.transpose = P.Transpose()
        self.zeros = P.Zeros()
        self.linspace = P.LinSpace()
        self.expand_dims = P.ExpandDims()
        self.squeeze_1 = P.Squeeze(1)
        self.squeeze_last = P.Squeeze(-1)
        self.tile = P.Tile()
        self.linspace = P.LinSpace()
        self.cat1 = P.Concat(axis=1)
        self.stack = P.Stack(1)
        self.homo_warp = HomoWarp(height, width)
        self.max_pool = P.MaxPool3D(kernel_size=(3, 1, 1), strides=(3, 1, 1))
        self.zero = Tensor(0, mstype.float32)
        self.src_compression_ratio = depth_ratio
        self.depth_number = depth_number - depth_ratio + 1
        self.src_D = Tensor(self.depth_number, mstype.float32)
        self.compression_ratio = compression_ratio
    def construct(self, sample, depth_num, depth_start_override=None, depth_interval_override=None, depth_values=None,
                  idx=None):
        """construct function"""
        ref_feat, src_feats, proj_mats = sample
        depth_start = depth_start_override  # n111 or n1hw
        depth_interval = depth_interval_override  # n111
        depth_num *= self.compression_ratio
        depth_interval /= self.compression_ratio
        D = depth_num
        B, C, H, W = ref_feat.shape
        depth_interval = depth_interval.view(B, 1, 1, 1)
        depth_start = depth_start.reshape(B, 1, 1, 1)
        depth_end = depth_start + (D - 1) * depth_interval
        ref_ncdhw = self.expand_dims(ref_feat, 2).view(B, C, 1, -1)
        ref_ncdhw = self.tile(ref_ncdhw, (1, 1, 96, 1)).view(B, C, 96, H, W)
        src_feat = src_feats[:, idx]
        proj_mat = proj_mats[:, idx]
        src_D = D // self.compression_ratio * self.src_compression_ratio
        src_inverse_depth_interval = (1 / depth_start - 1 / depth_end) / (src_D - self.src_compression_ratio)
        src_depth_values = 1 / depth_end + src_inverse_depth_interval * \
                           self.linspace(self.zero, self.src_D, self.depth_number)  # (D)
        src_depth_values = 1.0 / src_depth_values
        src_depth_values = src_depth_values.view(B, -1, 1, 1)
        end_interval = self.expand_dims(src_depth_values[:, 1, :, :], 1) - self.expand_dims(
            src_depth_values[:, 0, :, :], 1)
        end_interpolation = self.expand_dims(src_depth_values[:, 0, :, :], 1) - end_interval
        start_interval = self.expand_dims(src_depth_values[:, -2, :, :], 1) - self.expand_dims(
            src_depth_values[:, -1, :, :], 1)
        start_interpolation = self.expand_dims(src_depth_values[:, -1, :, :], 1) - start_interval
        src_depth_values = self.cat1((end_interpolation, src_depth_values, start_interpolation))
        src_depth_values = self.tile(src_depth_values, (1, 1, H, W))
        warped_src = self.homo_warp(src_feat, proj_mat, src_depth_values)
        cost_volume = groupwise_correlation(ref_ncdhw, warped_src, 8, 1)
        # dynamic max pool
        cost_volume = self.cost_compression(cost_volume)
        cost_volume = self.max_pool(cost_volume)
        return cost_volume
 class EPPMVSNet(nn.Cell):
    """EPP-MVSNet"""
    def __init__(self, n_depths, interval_ratios, entropy_range=False, shrink_ratio=1,
                 height=None, width=None, distance=0.5):
        super(EPPMVSNet, self).__init__()
        self.feat_ext = UNet2D()
        self.n_depths = n_depths
        self.interval_ratios = interval_ratios
        self.entropy_range = entropy_range
        # hyper parameter used in entropy-based adjustment
        self.shrink_ratio = shrink_ratio
        self.distance = distance
        self.stage1_p1 = SingleStageP1()
        self.stage1_p3 = SingleStageP3(entropy_range=self.entropy_range)
        self.cost_compression = CostCompression()
        self.stage1_p2_s1 = SingleStageP2_S1(self.cost_compression, height=height // 8, width=width // 8)
        self.stage1_p2_s3 = SingleStageP2_S3(self.cost_compression, height=height // 8, width=width // 8)
        self.fuse_reg_2 = StageRegFuse("./ckpts/stage2_reg_fuse.ckpt")
        self.fuse_reg_3 = StageRegFuse("./ckpts/stage3_reg_fuse.ckpt")
        self.stage2 = SingleStage(self.fuse_reg_2, height // 4, width // 4, entropy_range=self.entropy_range)
        self.stage3 = SingleStage(self.fuse_reg_3, height // 2, width // 2, entropy_range=self.entropy_range)
        self.eppmvsnet_p1 = EPPMVSNetP1(self.feat_ext, self.stage1_p1, n_depths=n_depths,
                                        interval_ratios=interval_ratios)
        self.eppmvsnet_p3 = EPPMVSNetP3(self.stage1_p3, self.stage2, self.stage3, n_depths=n_depths,
                                        interval_ratios=interval_ratios, height=height, width=width,
                                        entropy_range=self.entropy_range, shrink_ratio=self.shrink_ratio)
    def construct(self, imgs, proj_mats=None, depth_start=None, depth_interval=None):
        """construct function"""
        feat_pack_1, feat_pack_2, feat_pack_3, depth_values_stage1, pixel_distances \
            = self.eppmvsnet_p1(imgs, proj_mats, depth_start, depth_interval)
        cost_volume_list = []
        ref_feat_1, srcs_feat_1 = feat_pack_1[:, 0], feat_pack_1[:, 1:]
        for i in range(feat_pack_1.shape[1] - 1):
            pixel_distance = P.Squeeze()(pixel_distances[i])
            if pixel_distance < self.distance * 3:
                cost_volume_1 = self.stage1_p2_s1([ref_feat_1, srcs_feat_1, proj_mats[:, :, 2]],
                                                  self.n_depths[0], depth_start,
                                                  depth_interval * self.interval_ratios[0],
                                                  depth_values_stage1, i)
                cost_volume_list.append(cost_volume_1)
            else:
                cost_volume_3 = self.stage1_p2_s3([ref_feat_1, srcs_feat_1, proj_mats[:, :, 2]],
                                                  self.n_depths[0], depth_start,
                                                  depth_interval * self.interval_ratios[0],
                                                  depth_values_stage1, i)
                cost_volume_list.append(cost_volume_3)
        cost_volume_list = P.Stack(1)(cost_volume_list)
        results = self.eppmvsnet_p3(feat_pack_1, feat_pack_2, feat_pack_3, depth_values_stage1, cost_volume_list,
                                    proj_mats, depth_start, depth_interval)
        return results
 class EPPMVSNetP1(nn.Cell):
    """EPPMVSNet part1"""
    def __init__(self, feat_ext, stage1_p1, n_depths, interval_ratios, entropy_range=False):
        super(EPPMVSNetP1, self).__init__()
        self.n_depths = n_depths
        self.interval_ratios = interval_ratios
        self.entropy_range = entropy_range
        # hyper parameter used in entropy-based adjustment
        self.shrink_ratio = 1
        self.feat_ext = feat_ext
        self.stage1_p1 = stage1_p1
        self.tile = P.Tile()
        self.expand_dims = P.ExpandDims()
    def construct(self, imgs, proj_mats=None, depth_start=None, depth_interval=None):
        """construct function"""
        B, V, _, H, W = imgs.shape
        imgs = imgs.reshape(B * V, 3, H, W)
        feat_pack_1, feat_pack_2, feat_pack_3 = self.feat_ext(imgs)
        feat_pack_1 = feat_pack_1.view(B, V, *feat_pack_1.shape[1:])  # (B, V, C, h, w)
        feat_pack_2 = feat_pack_2.view(B, V, *feat_pack_2.shape[1:])  # (B, V, C, h, w)
        feat_pack_3 = feat_pack_3.view(B, V, *feat_pack_3.shape[1:])  # (B, V, C, h, w)
        ref_feat_1, srcs_feat_1 = feat_pack_1[:, 0], feat_pack_1[:, 1:]
        depth_values_stage1, single_depth_values_stage1 = self.stage1_p1([ref_feat_1, srcs_feat_1, proj_mats[:, :, 2]],
                                                                         depth_num=self.n_depths[0],
                                                                         depth_start_override=depth_start,
                                                                         depth_interval_override=depth_interval *
                                                                         self.interval_ratios[0])
        _, src_feats, proj_mats = [ref_feat_1, srcs_feat_1, proj_mats[:, :, 2]]
        pixel_distances = []
        for i in range(src_feats.shape[1]):
            src_feat = src_feats[:, i]
            proj_mat = proj_mats[:, i]
            pixel_distance = determine_center_pixel_interval(src_feat, proj_mat, single_depth_values_stage1)
            pixel_distances.append(pixel_distance)
        return feat_pack_1, feat_pack_2, feat_pack_3, depth_values_stage1, pixel_distances
 class EPPMVSNetP3(nn.Cell):
    """EPPMVSNet part3"""
    def __init__(self, stage1_p3, stage2, stage3, n_depths, interval_ratios, entropy_range=False,
                 shrink_ratio=1, height=None, width=None):
        super(EPPMVSNetP3, self).__init__()
        self.n_depths = n_depths
        self.interval_ratios = interval_ratios
        self.entropy_range = entropy_range
        self.shrink_ratio = shrink_ratio
        self.stage1_p3 = stage1_p3
        self.stage2 = stage2
        self.stage3 = stage3
        self.height = height
        self.width = width
    def construct(self, feat_pack_1, feat_pack_2, feat_pack_3, depth_values_stage1, cost_volume_list_stage1,
                  proj_mats=None, depth_start=None, depth_interval=None):
        """construct function"""
        H = self.height
        W = self.width
        ref_feat_1, srcs_feat_1 = feat_pack_1[:, 0], feat_pack_1[:, 1:]
        if self.entropy_range:
            est_depth_1, _, pair_results_1, conf_range_1 = self.stage1_p3(cost_volume_list_stage1, depth_values_stage1,
                                                                          [ref_feat_1, srcs_feat_1, proj_mats[:, :, 2]],
                                                                          self.n_depths[0])
            stage2_conf_interval = self.shrink_ratio * conf_range_1 / self.n_depths[0] * (
                depth_interval * self.interval_ratios[0] * self.n_depths[0]) / self.n_depths[1]
        else:
            est_depth_1, _, pair_results_1 = self.stage1_p3(cost_volume_list_stage1, depth_values_stage1,
                                                            [ref_feat_1, srcs_feat_1, proj_mats[:, :, 2]],
                                                            self.n_depths[0])
            stage2_conf_interval = None
        uncertainty_maps_1, uncertainty_maps_2 = [], []
        for pair_result in pair_results_1:
            uncertainty_maps_1.append(pair_result[1][0])
        for uncertainty_map in uncertainty_maps_1:
            uncertainty_maps_2.append(P.ResizeBilinear((H // 4, W // 4), False)(uncertainty_map))
        ref_feat_2, srcs_feat_2 = feat_pack_2[:, 0], feat_pack_2[:, 1:]
        depth_start_2 = P.ResizeBilinear((H // 4, W // 4), False)(est_depth_1)
        if self.entropy_range:
            est_depth_2, _, _, conf_range_2 = self.stage2([ref_feat_2, srcs_feat_2, proj_mats[:, :, 1]],
                                                          depth_num=self.n_depths[1],
                                                          depth_start_override=depth_start_2,
                                                          depth_interval_override=stage2_conf_interval,
                                                          uncertainty_maps=uncertainty_maps_2)
            stage3_conf_interval = self.shrink_ratio * conf_range_2 / self.n_depths[1] * (
                stage2_conf_interval * self.n_depths[1]) / self.n_depths[2]
        else:
            est_depth_2, _, _ = self.stage2([ref_feat_2, srcs_feat_2, proj_mats[:, :, 1]],
                                            depth_num=self.n_depths[1], depth_start_override=depth_start_2,
                                            depth_interval_override=depth_interval * self.interval_ratios[1],
                                            uncertainty_maps=uncertainty_maps_2)
            stage3_conf_interval = None
        uncertainty_maps_3 = []
        for uncertainty_map in uncertainty_maps_2:
            uncertainty_maps_3.append(P.ResizeBilinear((H // 2, W // 2), False)(uncertainty_map))
        ref_feat_3, srcs_feat_3 = feat_pack_3[:, 0], feat_pack_3[:, 1:]
        depth_start_3 = P.ResizeBilinear((H // 2, W // 2), False)(est_depth_2)
        if self.entropy_range:
            est_depth_3, prob_map_3, _, _ = self.stage3([ref_feat_3, srcs_feat_3, proj_mats[:, :, 0]],
                                                        depth_num=self.n_depths[2],
                                                        depth_start_override=depth_start_3,
                                                        depth_interval_override=stage3_conf_interval,
                                                        uncertainty_maps=uncertainty_maps_3)
        else:
            est_depth_3, prob_map_3, _ = self.stage3([ref_feat_3, srcs_feat_3, proj_mats[:, :, 0]],
                                                     depth_num=self.n_depths[2],
                                                     depth_start_override=depth_start_3,
                                                     depth_interval_override=depth_interval * self.interval_ratios[2],
                                                     uncertainty_maps=uncertainty_maps_3)
        refined_depth = est_depth_3
        return refined_depth, prob_map_3
--- a/model_zoo/research/cv/eppmvsnet/src/modules.py
+++ b/model_zoo/research/cv/eppmvsnet/src/modules.py
@ -0,0 +1,437 @@
 # Copyright 2021 Huawei Technologies Co., Ltd
 #
 # Licensed under the Apache License, Version 2.0 (the "License");
 # you may not use this file except in compliance with the License.
 # You may obtain a copy of the License at
 #
 # http://www.apache.org/licenses/LICENSE-2.0
 #
 # Unless required by applicable law or agreed to in writing, software
 # distributed under the License is distributed on an "AS IS" BASIS,
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
 # ============================================================================
 """math operations of EPP-MVSNet"""
 import math
 import numpy as np
 import mindspore
 import mindspore.nn as nn
 import mindspore.common as mstype
 from mindspore import Tensor
 from mindspore.ops import constexpr
 from mindspore.ops import operations as P
 from mindspore.ops import composite as C
@constexpr
 def generate_FloatTensor(x):
    return mindspore.Tensor(x, dtype=mstype.float32)
 def get_depth_values(current_depth, n_depths, depth_interval, inverse_depth=False):
    """
    get the depth values of each pixel : [depth_min, depth_max) step is depth_interval
    current_depth: (B, 1, H, W), current depth map
    n_depth: int, number of channels of depth
    depth_interval: (B) or float, interval between each depth channel
    return: (B, D, H, W)
    """
    linspace = P.LinSpace()
    if not isinstance(depth_interval, float) and depth_interval.shape != current_depth.shape:
        depth_interval = depth_interval.reshape(-1, 1, 1, 1)
    depth_min = C.clip_by_value(current_depth - n_depths / 2 * depth_interval, generate_FloatTensor(1e-7),
                                generate_FloatTensor(58682))
    if inverse_depth:
        depth_end = depth_min + (n_depths - 1) * depth_interval
        inverse_depth_interval = (1 / depth_min - 1 / depth_end) / (n_depths - 1)
        depth_values = 1 / depth_end + inverse_depth_interval * \
                       linspace(generate_FloatTensor(0), generate_FloatTensor(n_depths - 1), n_depths).reshape(1, -1, 1,
                                                                                                               1)
        depth_values = 1.0 / depth_values
    else:
        depth_values = depth_min + depth_interval * \
                       linspace(generate_FloatTensor(0), generate_FloatTensor(n_depths - 1), n_depths).reshape(1, -1, 1,
                                                                                                               1)
    return depth_values
 class HomoWarp(nn.Cell):
    '''STN'''
    def __init__(self, H, W):
        super(HomoWarp, self).__init__()
        # batch_size = 1
        x = np.linspace(0, W - 1, W)
        y = np.linspace(0, H - 1, H)
        x_t, y_t = np.meshgrid(x, y)
        x_t = Tensor(x_t, mstype.float32)
        y_t = Tensor(y_t, mstype.float32)
        expand_dims = P.ExpandDims()
        x_t = expand_dims(x_t, 0)
        y_t = expand_dims(y_t, 0)
        flatten = P.Flatten()
        x_t_flat = flatten(x_t)
        y_t_flat = flatten(y_t)
        oneslike = P.OnesLike()
        ones = oneslike(x_t_flat)
        concat = P.Concat()
        sampling_grid = concat((x_t_flat, y_t_flat, ones))
        self.sampling_grid = expand_dims(sampling_grid, 0)  # (1, 3, D*H*W)
        c = np.linspace(0, 31, 32)
        self.channel = Tensor(c, mstype.float32).view(1, 1, 1, 1, -1)
        batch_size = 128
        batch_idx = np.arange(batch_size)
        batch_idx = batch_idx.reshape((batch_size, 1, 1, 1))
        self.batch_idx = Tensor(batch_idx, mstype.float32)
        self.zero = Tensor(np.zeros([]), mstype.float32)
    def get_pixel_value(self, img, x, y):
        """
        Utility function to get pixel value for coordinate
        vectors x and y from a  4D tensor image.
        Input
        -----
        - img: tensor of shape (B, H, W, C)
        - x: flattened tensor of shape (B*D*H*W,)
        - y: flattened tensor of shape (B*D*H*W,)
        Returns
        -------
        - output: tensor of shape (B, D, H, W, C)
        """
        shape = P.Shape()
        img_shape = shape(x)
        batch_size = img_shape[0]
        D = img_shape[1]
        H = img_shape[2]
        W = img_shape[3]
        img[:, 0, :, :] = self.zero
        img[:, H - 1, :, :] = self.zero
        img[:, :, 0, :] = self.zero
        img[:, :, W - 1, :] = self.zero
        tile = P.Tile()
        batch_idx = P.Slice()(self.batch_idx, (0, 0, 0, 0), (batch_size, 1, 1, 1))
        b = tile(batch_idx, (1, D, H, W))
        expand_dims = P.ExpandDims()
        b = expand_dims(b, 4)
        x = expand_dims(x, 4)
        y = expand_dims(y, 4)
        concat = P.Concat(4)
        indices = concat((b, y, x))
        cast = P.Cast()
        indices = cast(indices, mstype.int32)
        gather_nd = P.GatherNd()
        return cast(gather_nd(img, indices), mstype.float32)
    def homo_warp(self, height, width, proj_mat, depth_values):
        """`
        This function returns a sampling grid, which when
        used with the bilinear sampler on the input feature
        map, will create an output feature map that is an
        affine transformation [1] of the input feature map.
        zero = Tensor(np.zeros([]), mstype.float32)
        Input
        -----
        - height: desired height of grid/output. Used
          to downsample or upsample.
        - width: desired width of grid/output. Used
          to downsample or upsample.
        - proj_mat: (B, 3, 4) equal to "src_proj @ ref_proj_inv"
        - depth_values: (B, D, H, W)
        Returns
        -------
        - normalized grid (-1, 1) of shape (num_batch, 2, H, W).
          The 2nd dimension has 2 components: (x, y) which are the
          sampling points of the original image for each point in the
          target image.
        Note
        ----
        [1]: the affine transformation allows cropping, translation,
             and isotropic scaling.
        """
        shape = P.Shape()
        B = shape(depth_values)[0]
        D = shape(depth_values)[1]
        H = height
        W = width
        R = proj_mat[:, :, :3]  # (B, 3, 3)
        T = proj_mat[:, :, 3:]  # (B, 3, 1)
        cast = P.Cast()
        depth_values = cast(depth_values, mstype.float32)
        # transform the sampling grid - batch multiply
        matmul = P.BatchMatMul()
        tile = P.Tile()
        ref_grid_d = tile(self.sampling_grid, (B, 1, 1))  # (B, 3, H*W)
        cast = P.Cast()
        ref_grid_d = cast(ref_grid_d, mstype.float32)
        # repeat_elements has problem, can not be used
        ref_grid_d = P.Tile()(ref_grid_d, (1, 1, D))
        src_grid_d = matmul(R, ref_grid_d) + T / depth_values.view(B, 1, D * H * W)
        # project negative depth pixels to somewhere outside the image
        negative_depth_mask = src_grid_d[:, 2:] <= 1e-7
        src_grid_d[:, 0:1][negative_depth_mask] = W
        src_grid_d[:, 1:2][negative_depth_mask] = H
        src_grid_d[:, 2:3][negative_depth_mask] = 1
        src_grid = src_grid_d[:, :2] / src_grid_d[:, 2:]  # divide by depth (B, 2, D*H*W)
        reshape = P.Reshape()
        src_grid = reshape(src_grid, (B, 2, D, H, W))
        return src_grid
    def bilinear_sampler(self, img, x, y):
        """
        Performs bilinear sampling of the input images according to the
        normalized coordinates provided by the sampling grid. Note that
        the sampling is done identically for each channel of the input.
        To test if the function works properly, output image should be
        identical to input image when theta is initialized to identity
        transform.
        Input
        -----
        - img: batch of images in (B, H, W, C) layout.
        - grid: x, y which is the output of affine_grid_generator.
        Returns
        -------
        - out: interpolated images according to grids. Same size as grid.
        """
        shape = P.Shape()
        H = shape(img)[1]
        W = shape(img)[2]
        cast = P.Cast()
        max_y = cast(H - 1, mstype.float32)
        max_x = cast(W - 1, mstype.float32)
        zero = self.zero
        # grab 4 nearest corner points for each (x_i, y_i)
        floor = P.Floor()
        x0 = floor(x)
        x1 = x0 + 1
        y0 = floor(y)
        y1 = y0 + 1
        # clip to range [0, H-1/W-1] to not violate img boundaries
        x0 = C.clip_by_value(x0, zero, max_x)
        x1 = C.clip_by_value(x1, zero, max_x)
        y0 = C.clip_by_value(y0, zero, max_y)
        y1 = C.clip_by_value(y1, zero, max_y)
        # get pixel value at corner coords
        Ia = self.get_pixel_value(img, x0, y0)
        Ib = self.get_pixel_value(img, x0, y1)
        Ic = self.get_pixel_value(img, x1, y0)
        Id = self.get_pixel_value(img, x1, y1)
        # recast as float for delta calculation
        x0 = cast(x0, mstype.float32)
        x1 = cast(x1, mstype.float32)
        y0 = cast(y0, mstype.float32)
        y1 = cast(y1, mstype.float32)
        # calculate deltas
        wa = (x1 - x) * (y1 - y)
        wb = (x1 - x) * (y - y0)
        wc = (x - x0) * (y1 - y)
        wd = (x - x0) * (y - y0)
        # add dimension for addition
        expand_dims = P.ExpandDims()
        wa = expand_dims(wa, 4)
        wb = expand_dims(wb, 4)
        wc = expand_dims(wc, 4)
        wd = expand_dims(wd, 4)
        # compute output
        add_n = P.AddN()
        out = add_n([wa * Ia, wb * Ib, wc * Ic, wd * Id])
        return out
    def construct(self, input_fmap, proj_mat, depth_values, out_dims=None, **kwargs):
        """
        Spatial Transformer Network layer implementation as described in [1].
        The layer is composed of 3 elements:
        - localization_net: takes the original image as input and outputs
          the parameters of the affine transformation that should be applied
          to the input image.
        - affine_grid_generator: generates a grid of (x,y) coordinates that
          correspond to a set of points where the input should be sampled
          to produce the transformed output.
        - bilinear_sampler: takes as input the original image and the grid
          and produces the output image using bilinear interpolation.
        Input
        -----
        - input_fmap: output of the previous layer. Can be input if spatial
          transformer layer is at the beginning of architecture. Should be
          a tensor of shape (B, H, W, C)->(B, C, H, W).
        - theta: affine transform tensor of shape (B, 6). Permits cropping,
          translation and isotropic scaling. Initialize to identity matrix.
          It is the output of the localization network.
        - proj_mat: (B, 3, 4) equal to "src_proj @ ref_proj_inv"
        - depth_values: (B, D, H, W)
        Returns
        -------
        - out_fmap: transformed input feature map. Tensor of size (B, C, H, W)-->(B, H, W, C).
        - out: (B, C, D, H, W)
        Notes
         -----
        [1]: 'Spatial Transformer Networks', Jaderberg et. al,
             (https://arxiv.org/abs/1506.02025)
        """
        # grab input dimensions
        trans = P.Transpose()
        input_fmap = trans(input_fmap, (0, 2, 3, 1))
        shape = P.Shape()
        input_size = shape(input_fmap)
        H = input_size[1]
        W = input_size[2]
        # generate grids of same size or upsample/downsample if specified
        if out_dims:
            out_H = out_dims[0]
            out_W = out_dims[1]
            batch_grids = self.homo_warp(out_H, out_W, proj_mat, depth_values)
        else:
            batch_grids = self.homo_warp(H, W, proj_mat, depth_values)
        x_s, y_s = P.Split(1, 2)(batch_grids)
        squeeze = P.Squeeze(1)
        x_s = squeeze(x_s)
        y_s = squeeze(y_s)
        out_fmap = self.bilinear_sampler(input_fmap, x_s, y_s)
        out_fmap = trans(out_fmap, (0, 4, 1, 2, 3))
        return out_fmap
 def determine_center_pixel_interval(src_feat, proj_mat, depth_values):
    """
    src_feat: (B, C, H, W)
    proj_mat: (B, 3, 4) equal to "src_proj @ ref_proj_inv"
    depth_values: (B, D, H, W)
    out: (B, C, D, H, W)
    """
    B, _, H, W = src_feat.shape
    D = depth_values.shape[1]
    R = proj_mat[:, :, :3]  # (B, 3, 3)
    T = proj_mat[:, :, 3:]  # (B, 3, 1)
    concat = P.Concat(axis=1)
    ref_center = generate_FloatTensor([H / 2, W / 2]).view(1, 2, 1, 1)
    ref_center = ref_center.reshape(1, 2, -1)
    ref_center = P.Tile()(ref_center, (B, 1, 1))
    ref_center = concat((ref_center, C.ones_like(ref_center[:, :1])))  # (B, 3, H*W)
    ref_center_d = C.repeat_elements(ref_center, rep=D, axis=2)  # (B, 3, D*H*W)
    src_center_d = C.matmul(R, ref_center_d) + T / depth_values.view(B, 1, D * 1 * 1)
    negative_depth_mask = src_center_d[:, 2:] <= 1e-7
    src_center_d[:, 0:1][negative_depth_mask] = W
    src_center_d[:, 1:2][negative_depth_mask] = H
    src_center_d[:, 2:3][negative_depth_mask] = 1
    transpose = P.Transpose()
    sqrt = P.Sqrt()
    pow_ms = P.Pow()
    src_center = src_center_d[:, :2] / src_center_d[:, 2:]  # divide by depth (B, 2, D*H*W)
    src_grid_valid = transpose(src_center, (0, 2, 1)).view(B, D, 1, 2)  # (B, D*H*W, 2)
    delta_p = src_grid_valid[:, 1:, :, :] - src_grid_valid[:, :-1, :, :]
    epipolar_pixel = sqrt(pow_ms(delta_p[:, :, :, 0], 2) + pow_ms(delta_p[:, :, :, 1], 2))
    epipolar_pixel = epipolar_pixel.mean(1)
    return epipolar_pixel
 def depth_regression(p, depth_values, keep_dim=False):
    """
    p: probability volume (B, D, H, W)
    depth_values: discrete depth values (B, D, H, W) or (D)
    inverse: depth_values is inverse depth or not
    """
    if depth_values.ndim <= 2:
        depth_values = depth_values.view(*depth_values.shape, 1, 1)
    cumsum = P.ReduceSum(keep_dim)
    depth = cumsum(p * depth_values, 1)
    return depth
 def soft_argmin(volume, dim, keepdim=False, window=None):
    """soft argmin"""
    softmax = nn.Softmax(1)
    prob_vol = softmax(volume)
    length = volume.shape[dim]
    index = nn.Range(0, length)()
    index_shape = []
    for i in range(len(volume.shape)):
        if i == dim:
            index_shape.append(length)
        else:
            index_shape.append(1)
    index = index.reshape(index_shape)
    out = P.ReduceSum(True)(index * prob_vol, dim)
    squeeze = P.Squeeze(axis=dim)
    out_sq = squeeze(out) if not keepdim else out
    if window is None:
        return prob_vol, out_sq
    # |depth hypothesis - predicted depth|, assemble to UCSNet
    #        1d11    n1hw
    mask = ((index - out).abs() <= window)
    mask = mask.astype(mstype.float32)
    prob_map = P.ReduceSum(keepdim)(prob_vol * mask, dim)
    return prob_vol, out_sq, prob_map
 def entropy(volume, dim, keepdim=False):
    return P.ReduceSum(keepdim)(
        -volume * P.Log()(C.clip_by_value(volume, generate_FloatTensor(1e-9), generate_FloatTensor(1.))), dim)
 def entropy_num_based(volume, dim, depth_num, keepdim=False):
    return P.ReduceSum(keepdim)(
        -volume * P.Log()(C.clip_by_value(volume, generate_FloatTensor(1e-9), generate_FloatTensor(1.))) / math.log(
            math.e, depth_num), dim)
 def groupwise_correlation(v1, v2, groups, dim):
    n, c, d, h, w = v1.shape
    reshaped_size = (n, groups, c // groups, d, h, w)
    v1_reshaped = v1.view(*reshaped_size)
    v2_reshaped = v2.view(*reshaped_size)
    vc = P.ReduceSum()(v1_reshaped * v2_reshaped, dim + 1)
    return vc
--- a/model_zoo/research/cv/eppmvsnet/src/networks.py
+++ b/model_zoo/research/cv/eppmvsnet/src/networks.py
@ -0,0 +1,402 @@
 # Copyright 2021 Huawei Technologies Co., Ltd
 #
 # Licensed under the Apache License, Version 2.0 (the "License");
 # you may not use this file except in compliance with the License.
 # You may obtain a copy of the License at
 #
 # http://www.apache.org/licenses/LICENSE-2.0
 #
 # Unless required by applicable law or agreed to in writing, software
 # distributed under the License is distributed on an "AS IS" BASIS,
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
 # ============================================================================
 """sub-networks of EPP-MVSNet"""
 import numpy as np
 import mindspore
 import mindspore.ops as P
 from mindspore import nn
 from mindspore import Tensor, Parameter
 from src.modules import depth_regression, soft_argmin, entropy
 class BasicBlockA(nn.Cell):
    """BasicBlockA"""
    def __init__(self, in_channels, out_channels, stride):
        super(BasicBlockA, self).__init__()
        self.conv2d_0 = nn.Conv2d(in_channels, out_channels, 3, stride=stride, padding=1, pad_mode="pad")
        self.conv2d_1 = nn.Conv2d(in_channels, out_channels, 1, stride=stride, padding=0, pad_mode="valid")
        self.batchnorm2d_2 = nn.BatchNorm2d(out_channels, eps=9.999999747378752e-06, momentum=0.8999999761581421)
        self.batchnorm2d_3 = nn.BatchNorm2d(out_channels, eps=9.999999747378752e-06, momentum=0.8999999761581421)
        self.relu_4 = nn.ReLU()
        self.conv2d_5 = nn.Conv2d(out_channels, out_channels, 3, stride=1, padding=(1, 1, 1, 1), pad_mode="pad")
        self.batchnorm2d_6 = nn.BatchNorm2d(out_channels, eps=9.999999747378752e-06, momentum=0.8999999761581421)
        self.relu_8 = nn.ReLU()
    def construct(self, x):
        """construct"""
        x1 = self.conv2d_0(x)
        x1 = self.batchnorm2d_2(x1)
        x1 = self.relu_4(x1)
        x1 = self.conv2d_5(x1)
        x1 = self.batchnorm2d_6(x1)
        res = self.conv2d_1(x)
        res = self.batchnorm2d_3(res)
        out = P.Add()(x1, res)
        out = self.relu_8(out)
        return out
 class BasicBlockB(nn.Cell):
    """BasicBlockB"""
    def __init__(self, in_channels, out_channels):
        super(BasicBlockB, self).__init__()
        self.conv2d_0 = nn.Conv2d(in_channels, out_channels, 3, stride=1, padding=1, pad_mode="pad")
        self.batchnorm2d_1 = nn.BatchNorm2d(out_channels, eps=9.999999747378752e-06, momentum=0.8999999761581421)
        self.relu_2 = nn.ReLU()
        self.conv2d_3 = nn.Conv2d(in_channels, out_channels, 3, stride=1, padding=1, pad_mode="pad")
        self.batchnorm2d_4 = nn.BatchNorm2d(out_channels, eps=9.999999747378752e-06, momentum=0.8999999761581421)
        self.relu_6 = nn.ReLU()
    def construct(self, x):
        """construct"""
        x1 = self.conv2d_0(x)
        x1 = self.batchnorm2d_1(x1)
        x1 = self.relu_2(x1)
        x1 = self.conv2d_3(x1)
        x1 = self.batchnorm2d_4(x1)
        res = x
        out = P.Add()(x1, res)
        out = self.relu_6(out)
        return out
 class UNet2D(nn.Cell):
    """UNet2D"""
    def __init__(self):
        super(UNet2D, self).__init__()
        self.conv2d_0 = nn.Conv2d(3, 16, 5, stride=2, padding=2, pad_mode="pad")
        self.batchnorm2d_1 = nn.BatchNorm2d(16, eps=9.999999747378752e-06, momentum=0.8999999761581421)
        self.leakyrelu_2 = nn.LeakyReLU(alpha=0.009999999776482582)
        self.convblocka_0 = BasicBlockA(16, 32, 1)
        self.convblockb_0 = BasicBlockB(32, 32)
        self.convblocka_1 = BasicBlockA(32, 64, 2)
        self.convblockb_1 = BasicBlockB(64, 64)
        self.convblocka_2 = BasicBlockA(64, 128, 2)
        self.convblockb_2 = BasicBlockB(128, 128)
        self.conv2dbackpropinput_51 = P.Conv2DBackpropInput(64, 3, stride=2, pad=1, pad_mode="pad")
        self.conv2dbackpropinput_51_weight = Parameter(Tensor(
            np.random.uniform(0, 1, (128, 64, 3, 3)).astype(np.float32)))
        self.conv2d_54 = nn.Conv2d(128, 64, 3, stride=1, padding=1, pad_mode="pad")
        self.convblockb_3 = BasicBlockB(64, 64)
        self.conv2dbackpropinput_62 = P.Conv2DBackpropInput(32, 3, stride=2, pad=1, pad_mode="pad")
        self.conv2dbackpropinput_62_weight = Parameter(Tensor(
            np.random.uniform(0, 1, (64, 32, 3, 3)).astype(np.float32)))
        self.conv2d_65 = nn.Conv2d(64, 32, 3, stride=1, padding=1, pad_mode="pad")
        self.convblockb_4 = BasicBlockB(32, 32)
        self.conv2d_52 = nn.Conv2d(128, 32, 3, stride=1, padding=1, pad_mode="pad")
        self.conv2d_63 = nn.Conv2d(64, 32, 3, stride=1, padding=1, pad_mode="pad")
        self.conv2d_73 = nn.Conv2d(32, 32, 3, stride=1, padding=1, pad_mode="pad")
        self.concat = P.Concat(axis=1)
        param_dict = mindspore.load_checkpoint("./ckpts/feat_ext.ckpt")
        params_not_loaded = mindspore.load_param_into_net(self, param_dict, strict_load=True)
        print(params_not_loaded)
    def construct(self, imgs):
        """construct"""
        _, _, h, w = imgs.shape
        x = self.conv2d_0(imgs)
        x = self.batchnorm2d_1(x)
        x = self.leakyrelu_2(x)
        x1 = self.convblocka_0(x)
        x1 = self.convblockb_0(x1)
        x2 = self.convblocka_1(x1)
        x2 = self.convblockb_1(x2)
        x3 = self.convblocka_2(x2)
        x3 = self.convblockb_2(x3)
        x2_upsample = self.conv2dbackpropinput_51(x3, self.conv2dbackpropinput_51_weight,
                                                  (x2.shape[0], x2.shape[1], h // 4, w // 4))
        x2_upsample = self.concat((x2_upsample, x2,))
        x2_upsample = self.conv2d_54(x2_upsample)
        x2_upsample = self.convblockb_3(x2_upsample)
        x1_upsample = self.conv2dbackpropinput_62(x2_upsample, self.conv2dbackpropinput_62_weight,
                                                  (x1.shape[0], x1.shape[1], h // 2, w // 2))
        x1_upsample = self.concat((x1_upsample, x1,))
        x1_upsample = self.conv2d_65(x1_upsample)
        x1_upsample = self.convblockb_4(x1_upsample)
        x3_final = self.conv2d_52(x3)
        x2_final = self.conv2d_63(x2_upsample)
        x1_final = self.conv2d_73(x1_upsample)
        return x3_final, x2_final, x1_final
 class ConvBnReLu(nn.Cell):
    """ConvBnReLu"""
    def __init__(self, in_channels, out_channels):
        super(ConvBnReLu, self).__init__()
        self.conv3d_0 = nn.Conv3d(in_channels, out_channels, (3, 1, 1), stride=1, padding=(1, 1, 0, 0, 0, 0),
                                  pad_mode="pad")
        self.batchnorm3d_1 = nn.BatchNorm3d(out_channels, eps=9.999999747378752e-06, momentum=0.8999999761581421)
        self.leakyrelu_2 = nn.LeakyReLU(alpha=0.009999999776482582)
    def construct(self, x):
        """construct"""
        x = self.conv3d_0(x)
        x = self.batchnorm3d_1(x)
        x = self.leakyrelu_2(x)
        return x
 class CostCompression(nn.Cell):
    """CostCompression"""
    def __init__(self):
        super(CostCompression, self).__init__()
        self.basicblock_0 = ConvBnReLu(8, 64)
        self.basicblock_1 = ConvBnReLu(64, 64)
        self.basicblock_2 = ConvBnReLu(64, 8)
        param_dict = mindspore.load_checkpoint("./ckpts/stage1_cost_compression.ckpt")
        params_not_loaded = mindspore.load_param_into_net(self, param_dict, strict_load=True)
        print(params_not_loaded)
    def construct(self, x):
        """construct"""
        x = self.basicblock_0(x)
        x = self.basicblock_1(x)
        x = self.basicblock_2(x)
        return x
 class Pseudo3DBlock_A(nn.Cell):
    """Pseudo3DBlock_A"""
    def __init__(self, in_channels, out_channels):
        super(Pseudo3DBlock_A, self).__init__()
        self.conv3d_0 = nn.Conv3d(in_channels, out_channels, (1, 3, 3), stride=1, padding=(0, 0, 1, 1, 1, 1),
                                  pad_mode="pad")
        self.conv3d_1 = nn.Conv3d(out_channels, out_channels, (3, 1, 1), stride=1, padding=(1, 1, 0, 0, 0, 0),
                                  pad_mode="pad")
        self.batchnorm3d_2 = nn.BatchNorm3d(out_channels, eps=9.999999747378752e-06, momentum=0.8999999761581421)
        self.relu_3 = nn.ReLU()
        self.conv3d_4 = nn.Conv3d(out_channels, out_channels, (1, 3, 3), stride=1, padding=(0, 0, 1, 1, 1, 1),
                                  pad_mode="pad")
        self.conv3d_5 = nn.Conv3d(out_channels, out_channels, (3, 1, 1), stride=1, padding=(1, 1, 0, 0, 0, 0),
                                  pad_mode="pad")
        self.batchnorm3d_6 = nn.BatchNorm3d(out_channels, eps=9.999999747378752e-06, momentum=0.8999999761581421)
        self.relu_8 = nn.ReLU()
    def construct(self, x):
        """construct"""
        x1 = self.conv3d_0(x)
        x1 = self.conv3d_1(x1)
        x1 = self.batchnorm3d_2(x1)
        x1 = self.relu_3(x1)
        x1 = self.conv3d_4(x1)
        x1 = self.conv3d_5(x1)
        x1 = self.batchnorm3d_6(x1)
        res = x
        out = P.Add()(x1, res)
        out = self.relu_8(out)
        return out
 class Pseudo3DBlock_B(nn.Cell):
    """Pseudo3DBlock_B"""
    def __init__(self):
        super(Pseudo3DBlock_B, self).__init__()
        self.conv3d_0 = nn.Conv3d(8, 8, (1, 3, 3), stride=(1, 2, 2), padding=(0, 0, 1, 1, 1, 1), pad_mode="pad")
        self.conv3d_1 = nn.Conv3d(8, 16, (1, 1, 1), stride=2, padding=0, pad_mode="valid")
        self.conv3d_2 = nn.Conv3d(8, 16, (3, 1, 1), stride=(2, 1, 1), padding=(1, 1, 0, 0, 0, 0), pad_mode="pad")
        self.batchnorm3d_3 = nn.BatchNorm3d(16, eps=9.999999747378752e-06, momentum=0.8999999761581421)
        self.batchnorm3d_4 = nn.BatchNorm3d(16, eps=9.999999747378752e-06, momentum=0.8999999761581421)
        self.relu_5 = nn.ReLU()
        self.conv3d_6 = nn.Conv3d(16, 16, (1, 3, 3), stride=1, padding=(0, 0, 1, 1, 1, 1), pad_mode="pad")
        self.conv3d_7 = nn.Conv3d(16, 16, (3, 1, 1), stride=1, padding=(1, 1, 0, 0, 0, 0), pad_mode="pad")
        self.batchnorm3d_8 = nn.BatchNorm3d(16, eps=9.999999747378752e-06, momentum=0.8999999761581421)
        self.relu_10 = nn.ReLU()
    def construct(self, x):
        """construct"""
        x1 = self.conv3d_0(x)
        x1 = self.conv3d_2(x1)
        x1 = self.batchnorm3d_4(x1)
        x1 = self.relu_5(x1)
        x1 = self.conv3d_6(x1)
        x1 = self.conv3d_7(x1)
        x1 = self.batchnorm3d_8(x1)
        res = self.conv3d_1(x)
        res = self.batchnorm3d_3(res)
        out = P.Add()(x1, res)
        out = self.relu_10(out)
        return out
 class CoarseStageRegFuse(nn.Cell):
    """CoarseStageRegFuse"""
    def __init__(self):
        super(CoarseStageRegFuse, self).__init__()
        self.basicblocka_0 = Pseudo3DBlock_A(8, 8)
        self.basicblockb_0 = Pseudo3DBlock_B()
        self.conv3dtranspose_21 = nn.Conv3dTranspose(16, 8, 3, stride=2, padding=1, pad_mode="pad", output_padding=1)
        self.conv3d_23 = nn.Conv3d(16, 8, (1, 3, 3), stride=1, padding=(0, 0, 1, 1, 1, 1), pad_mode="pad")
        self.conv3d_24 = nn.Conv3d(8, 8, (3, 1, 1), stride=1, padding=(1, 1, 0, 0, 0, 0), pad_mode="pad")
        self.conv3d_25 = nn.Conv3d(8, 1, 3, stride=1, padding=1, pad_mode="pad")
        self.concat_1 = P.Concat(axis=1)
        self.squeeze_1 = P.Squeeze(axis=1)
        param_dict = mindspore.load_checkpoint("./ckpts/stage1_reg_fuse.ckpt")
        params_not_loaded = mindspore.load_param_into_net(self, param_dict, strict_load=True)
        print(params_not_loaded)
    def construct(self, fused_interim, depth_values):
        """construct"""
        x1 = self.basicblocka_0(fused_interim)
        x2 = self.basicblockb_0(x1)
        x1_upsample = self.conv3dtranspose_21(x2)
        cost_volume = self.concat_1((x1_upsample, x1))
        cost_volume = self.conv3d_23(cost_volume)
        cost_volume = self.conv3d_24(cost_volume)
        score_volume = self.conv3d_25(cost_volume)
        score_volume = self.squeeze_1(score_volume)
        prob_volume, _, prob_map = soft_argmin(score_volume, dim=1, keepdim=True, window=2)
        est_depth = depth_regression(prob_volume, depth_values, keep_dim=True)
        return est_depth, prob_map, prob_volume
 class CoarseStageRegPair(nn.Cell):
    """CoarseStageRegPair"""
    def __init__(self):
        super(CoarseStageRegPair, self).__init__()
        self.basicblocka_0 = Pseudo3DBlock_A(8, 8)
        self.basicblockb_0 = Pseudo3DBlock_B()
        self.conv3dtranspose_21 = nn.Conv3dTranspose(16, 8, 3, stride=2, padding=1, pad_mode="pad", output_padding=1)
        self.concat_22 = P.Concat(axis=1)
        self.conv3d_23 = nn.Conv3d(16, 8, (1, 3, 3), stride=1, padding=(0, 0, 1, 1, 1, 1), pad_mode="pad")
        self.conv3d_24 = nn.Conv3d(8, 8, (3, 1, 1), stride=1, padding=(1, 1, 0, 0, 0, 0), pad_mode="pad")
        self.conv3d_25 = nn.Conv3d(8, 1, 3, stride=1, padding=1, pad_mode="pad")
        self.conv2d_38 = nn.Conv2d(1, 8, 3, stride=1, padding=1, pad_mode="pad")
        self.batchnorm2d_39 = nn.BatchNorm2d(num_features=8, eps=9.999999747378752e-06, momentum=0.8999999761581421)
        self.leakyrelu_40 = nn.LeakyReLU(alpha=0.009999999776482582)
        self.conv2d_41 = nn.Conv2d(8, 8, 3, stride=1, padding=1, pad_mode="pad")
        self.batchnorm2d_42 = nn.BatchNorm2d(num_features=8, eps=9.999999747378752e-06, momentum=0.8999999761581421)
        self.leakyrelu_43 = nn.LeakyReLU(alpha=0.009999999776482582)
        self.conv2d_45 = nn.Conv2d(8, 1, 3, stride=1, padding=1, pad_mode="pad")
        self.conv2d_46 = nn.Conv2d(8, 1, 3, stride=1, padding=1, pad_mode="pad")
        self.concat_1 = P.Concat(axis=1)
        self.squeeze_1 = P.Squeeze(axis=1)
        param_dict = mindspore.load_checkpoint("./ckpts/stage1_reg_pair.ckpt")
        params_not_loaded = mindspore.load_param_into_net(self, param_dict, strict_load=True)
        print(params_not_loaded)
    def construct(self, cost_volume, depth_values):
        """construct"""
        x1 = self.basicblocka_0(cost_volume)
        x2 = self.basicblockb_0(x1)
        x1_upsample = self.conv3dtranspose_21(x2)
        interim = self.concat_1((x1_upsample, x1))
        interim = self.conv3d_23(interim)
        interim = self.conv3d_24(interim)
        score_volume = self.conv3d_25(interim)
        score_volume = self.squeeze_1(score_volume)
        prob_volume, _ = soft_argmin(score_volume, dim=1, keepdim=True)
        est_depth = depth_regression(prob_volume, depth_values, keep_dim=True)
        entropy_ = entropy(prob_volume, dim=1, keepdim=True)
        x = self.conv2d_38(entropy_)
        x = self.batchnorm2d_39(x)
        x = self.leakyrelu_40(x)
        x = self.conv2d_41(x)
        x = self.batchnorm2d_42(x)
        x = self.leakyrelu_43(x)
        out = P.Add()(x, entropy_)
        uncertainty_map = self.conv2d_45(out)
        occ = self.conv2d_46(out)
        return interim, est_depth, uncertainty_map, occ
 class StageRegFuse(nn.Cell):
    """StageRegFuse"""
    def __init__(self, ckpt_path):
        super(StageRegFuse, self).__init__()
        self.basicblocka_0 = Pseudo3DBlock_A(8, 8)
        self.basicblocka_1 = Pseudo3DBlock_A(8, 8)
        self.basicblockb_0 = Pseudo3DBlock_B()
        self.basicblocka_2 = Pseudo3DBlock_A(16, 16)
        self.conv3dtranspose_38 = nn.Conv3dTranspose(16, 8, 3, stride=2, padding=1, pad_mode="pad", output_padding=1)
        self.concat_39 = P.Concat(axis=1)
        self.conv3d_40 = nn.Conv3d(16, 8, (1, 3, 3), stride=1, padding=(0, 0, 1, 1, 1, 1), pad_mode="pad")
        self.conv3d_41 = nn.Conv3d(8, 8, (3, 1, 1), stride=1, padding=(1, 1, 0, 0, 0, 0), pad_mode="pad")
        self.conv3d_42 = nn.Conv3d(8, 1, 3, stride=1, padding=1, pad_mode="pad")
        self.concat_1 = P.Concat(axis=1)
        self.squeeze_1 = P.Squeeze(axis=1)
        param_dict = mindspore.load_checkpoint(ckpt_path)
        params_not_loaded = mindspore.load_param_into_net(self, param_dict, strict_load=True)
        print(params_not_loaded)
    def construct(self, fused_interim, depth_values):
        """construct"""
        x1 = self.basicblocka_0(fused_interim)
        x1 = self.basicblocka_1(x1)
        x2 = self.basicblockb_0(x1)
        x2 = self.basicblocka_2(x2)
        x1_upsample = self.conv3dtranspose_38(x2)
        cost_volume = self.concat_1((x1_upsample, x1))
        cost_volume = self.conv3d_40(cost_volume)
        cost_volume = self.conv3d_41(cost_volume)
        score_volume = self.conv3d_42(cost_volume)
        score_volume = self.squeeze_1(score_volume)
        prob_volume, _, prob_map = soft_argmin(score_volume, dim=1, keepdim=True, window=2)
        est_depth = depth_regression(prob_volume, depth_values, keep_dim=True)
        return est_depth, prob_map, prob_volume
--- a/model_zoo/research/cv/eppmvsnet/src/utils.py
+++ b/model_zoo/research/cv/eppmvsnet/src/utils.py
@ -0,0 +1,116 @@
 # Copyright 2021 Huawei Technologies Co., Ltd
 #
 # Licensed under the Apache License, Version 2.0 (the "License");
 # you may not use this file except in compliance with the License.
 # You may obtain a copy of the License at
 #
 # http://www.apache.org/licenses/LICENSE-2.0
 #
 # Unless required by applicable law or agreed to in writing, software
 # distributed under the License is distributed on an "AS IS" BASIS,
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
 # ============================================================================
 """other functions of EPP-MVSNet"""
 from __future__ import absolute_import
 from __future__ import division
 from __future__ import print_function
 import re
 import sys
 import numpy as np
 def read_pfm(filename):
    """read pfm files"""
    file = open(filename, 'rb')
    color = None
    width = None
    height = None
    scale = None
    endian = None
    header = file.readline().decode('utf-8').rstrip()
    if header == 'PF':
        color = True
    elif header == 'Pf':
        color = False
    else:
        raise Exception('Not a PFM file.')
    dim_match = re.match(r'^(\d+)\s(\d+)\s$', file.readline().decode('utf-8'))
    if dim_match:
        width, height = map(int, dim_match.groups())
    else:
        raise Exception('Malformed PFM header.')
    scale = float(file.readline().rstrip())
    if scale < 0:  # little-endian
        endian = '<'
        scale = -scale
    else:
        endian = '>'  # big-endian
    data = np.fromfile(file, endian + 'f')
    shape = (height, width, 3) if color else (height, width)
    data = np.reshape(data, shape)
    data = np.flipud(data)
    file.close()
    return data, scale
 def save_pfm(filename, image, scale=1):
    """save pfm files"""
    file = open(filename, "wb")
    color = None
    image = np.flipud(image)
    if image.dtype.name != 'float32':
        raise Exception('Image dtype must be float32.')
    if len(image.shape) == 3 and image.shape[2] == 3:  # color image
        color = True
    elif len(image.shape) == 2 or len(image.shape) == 3 and image.shape[2] == 1:  # greyscale
        color = False
    else:
        raise Exception('Image must have H x W x 3, H x W x 1 or H x W dimensions.')
    file.write('PF\n'.encode('utf-8') if color else 'Pf\n'.encode('utf-8'))
    file.write('{} {}\n'.format(image.shape[1], image.shape[0]).encode('utf-8'))
    endian = image.dtype.byteorder
    if endian == '<' or endian == '=' and sys.byteorder == 'little':
        scale = -scale
    file.write(('%f\n' % scale).encode('utf-8'))
    image.tofile(file)
    file.close()
 class AverageMeter:
    """Computes and stores the average and current value"""
    def __init__(self):
        self.reset()
    def reset(self):
        """reset averagemeter"""
        self.val = 0
        self.avg = 0
        self.sum = 0
        self.count = 0
    def update(self, val, n=1):
        """update averagemeter"""
        self.val = val
        self.sum += val * n
        self.count += n
        if self.count > 0:
            self.avg = self.sum / self.count
--- a/model_zoo/research/cv/eppmvsnet/validate.py
+++ b/model_zoo/research/cv/eppmvsnet/validate.py
@ -0,0 +1,180 @@
 # Copyright 2021 Huawei Technologies Co., Ltd
 #
 # Licensed under the Apache License, Version 2.0 (the "License");
 # you may not use this file except in compliance with the License.
 # You may obtain a copy of the License at
 #
 # http://www.apache.org/licenses/LICENSE-2.0
 #
 # Unless required by applicable law or agreed to in writing, software
 # distributed under the License is distributed on an "AS IS" BASIS,
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
 # ============================================================================
 """EPP-MVSNet's validation process on BlendedMVS dataset"""
 import os
 import time
 from argparse import ArgumentParser
 import cv2
 import numpy as np
 from tqdm import tqdm
 import mindspore.dataset as ds
 from mindspore import context
 from mindspore.ops import operations as P
 from src.eppmvsnet import EPPMVSNet
 from src.blendedmvs import BlendedMVSDataset
 from src.utils import save_pfm, AverageMeter
 def get_opts():
    """set options"""
    parser = ArgumentParser()
    parser.add_argument('--gpu_id', type=int, default=0, choices=[0, 1, 2, 3, 4, 5, 6, 7],
                        help='which gpu used to inference')
    ## vis
    parser.add_argument('--root_dir', type=str,
                        default='/home/ubuntu/data/DTU/mvs_training/dtu/',
                        help='root directory of dtu dataset')
    parser.add_argument('--dataset_name', type=str, default='blendedmvs',
                        choices=['blendedmvs'],
                        help='which dataset to train/val')
    parser.add_argument('--split', type=str, default=None,
                        help='which split to evaluate')
    parser.add_argument('--scan', type=str, default=None, nargs='+',
                        help='specify scan to evaluate (must be in the split)')
    # for depth prediction
    parser.add_argument('--n_views', type=int, default=5,
                        help='number of views (including ref) to be used in testing')
    parser.add_argument('--depth_interval', type=float, default=128,
                        help='depth interval unit in mm')
    parser.add_argument('--n_depths', nargs='+', type=int, default=[32, 16, 8],
                        help='number of depths in each level')
    parser.add_argument('--interval_ratios', nargs='+', type=float, default=[4.0, 2.0, 1.0],
                        help='depth interval ratio to multiply with --depth_interval in each level')
    parser.add_argument('--img_wh', nargs="+", type=int, default=[1152, 864],
                        help='resolution (img_w, img_h) of the image, must be multiples of 32')
    parser.add_argument('--ckpt_path', type=str, default='ckpts/exp2/_ckpt_epoch_10.ckpt',
                        help='pretrained checkpoint path to load')
    parser.add_argument('--save_visual', default=False, action='store_true',
                        help='save depth and proba visualization or not')
    parser.add_argument('--entropy_range', action='store_true', default=False,
                        help='whether to use entropy range method')
    parser.add_argument('--conf', type=float, default=0.9,
                        help='min confidence for pixel to be valid')
    parser.add_argument('--levels', type=int, default=3, choices=[3, 4, 5],
                        help='number of FPN levels (fixed to be 3!)')
    return parser.parse_args()
 if __name__ == "__main__":
    args = get_opts()
    context.set_context(mode=0, device_target='GPU', device_id=args.gpu_id, save_graphs=False)
    dataset = BlendedMVSDataset(args.root_dir, args.split, n_views=args.n_views, depth_interval=args.depth_interval,
                                img_wh=tuple(args.img_wh), levels=args.levels, scan=args.scan)
    img_wh = args.img_wh
    scans = dataset.scans
    print(args.n_depths)
    print(args.interval_ratios)
    # Step 1. Create depth estimation and probability for each scan
    EPPMVSNet_eval = EPPMVSNet(n_depths=args.n_depths, interval_ratios=args.interval_ratios,
                               entropy_range=args.entropy_range, height=args.img_wh[1], width=args.img_wh[0])
    EPPMVSNet_eval.set_train(False)
    depth_dir = f'results/{args.dataset_name}/{args.split}/depth'
    print('Creating depth and confidence predictions...')
    if args.scan:
        data_range = [i for i, x in enumerate(dataset.metas) if x[0] == args.scan]
    else:
        data_range = range(len(dataset))
    test_loader = ds.GeneratorDataset(dataset, column_names=["imgs", "proj_mats", "init_depth_min", "depth_interval",
                                                             "scan", "vid", "depth_0", "mask_0", "fix_depth_interval"],
                                      num_parallel_workers=1, shuffle=False)
    test_loader = test_loader.batch(batch_size=1)
    test_data_size = test_loader.get_dataset_size()
    print("train dataset length is:", test_data_size)
    pbar = tqdm(enumerate(test_loader.create_tuple_iterator()), dynamic_ncols=True, total=test_data_size)
    metrics = ['stage3_l1_loss', 'stage3_less1_acc', 'stage3_less3_acc']
    avg_metrics = {t: AverageMeter() for t in metrics}
    forward_time_avg = AverageMeter()
    scan_list, vid_list = [], []
    depth_folder = f'{img_wh[0]}_{img_wh[1]}_{args.n_views - 1}'
    for i, sample in pbar:
        imgs, proj_mats, init_depth_min, depth_interval, scan, vid, depth_0, mask_0, fix_depth_interval = sample
        scan = scan[0].asnumpy()
        scan_str = ""
        for num in scan:
            scan_str += chr(num)
        scan = scan_str
        vid = vid[0].asnumpy()
        depth_file_dir = os.path.join(depth_dir, scan, depth_folder)
        if not os.path.exists(depth_file_dir):
            os.makedirs(depth_file_dir, exist_ok=True)
        begin = time.time()
        results = EPPMVSNet_eval(imgs, proj_mats, init_depth_min, depth_interval)
        forward_time = time.time() - begin
        if i != 0:
            forward_time_avg.update(forward_time)
        depth, proba = results
        depth = P.Squeeze()(depth).asnumpy()
        depth = np.nan_to_num(depth)  # change nan to 0
        proba = P.Squeeze()(proba).asnumpy()
        proba = np.nan_to_num(proba)  # change nan to 0
        save_pfm(os.path.join(depth_dir, f'{scan}/{depth_folder}/depth_{vid:04d}.pfm'), depth)
        save_pfm(os.path.join(depth_dir, f'{scan}/{depth_folder}/proba_{vid:04d}.pfm'), proba)
        # record l1 loss of each image
        scan_list.append(scan)
        vid_list.append(vid)
        pred_depth = depth
        gt = P.Squeeze()(depth_0).asnumpy()
        mask = P.Squeeze()(mask_0).asnumpy()
        abs_err = np.abs(pred_depth - gt)
        abs_err_scaled = abs_err / fix_depth_interval.asnumpy()
        l1 = abs_err_scaled[mask].mean()
        less1 = (abs_err_scaled[mask] < 1.).astype(np.float32).mean()
        less3 = (abs_err_scaled[mask] < 3.).astype(np.float32).mean()
        avg_metrics[f'stage3_l1_loss'].update(l1)
        avg_metrics[f'stage3_less1_acc'].update(less1)
        avg_metrics[f'stage3_less3_acc'].update(less3)
        if args.save_visual:
            mi = np.min(depth[depth > 0])
            ma = np.max(depth)
            depth = (depth - mi) / (ma - mi + 1e-8)
            depth = (255 * depth).astype(np.uint8)
            depth_img = cv2.applyColorMap(depth, cv2.COLORMAP_JET)
            cv2.imwrite(os.path.join(depth_dir, f'{scan}/{depth_folder}/depth_visual_{vid:04d}.jpg'), depth_img)
            cv2.imwrite(os.path.join(depth_dir, f'{scan}/{depth_folder}/proba_visual_{vid:04d}.jpg'),
                        (255 * (proba > args.conf)).astype(np.uint8))
        print(f'step {i} time: {forward_time}s')
    print(f'mean forward time: {forward_time_avg.avg}')
    with open(f'results/{args.dataset_name}/{args.split}/metrics.txt', 'w') as f:
        for i in avg_metrics.items():
            f.writelines((i[0]) + ':' + str(np.round(i[1].avg, 4)) + '\n')
        f.writelines('mean forward time(s/pic):' + str(np.round(forward_time_avg.avg, 4)) + '\n')
        f.close()
    print('Done!')