forked from mindspore-Ecosystem/mindspore
!1586 dataset: fix some format problem in take and split
Merge pull request !1586 from ms_yan/take_split_format
This commit is contained in:
commit
fb78bb6ece
|
@ -560,9 +560,9 @@ class Dataset:
|
|||
|
||||
Note:
|
||||
1. If count is greater than the number of element in dataset or equal to -1,
|
||||
all the element in dataset will be taken.
|
||||
all the element in dataset will be taken.
|
||||
2. The order of using take and batch effects. If take before batch operation,
|
||||
then taken given number of rows, otherwise take given number of batches.
|
||||
then taken given number of rows, otherwise take given number of batches.
|
||||
|
||||
Args:
|
||||
count (int, optional): Number of elements to be taken from the dataset (default=-1).
|
||||
|
@ -590,7 +590,7 @@ class Dataset:
|
|||
# here again
|
||||
dataset_size = self.get_dataset_size()
|
||||
|
||||
if(dataset_size is None or dataset_size <= 0):
|
||||
if dataset_size is None or dataset_size <= 0:
|
||||
raise RuntimeError("dataset size unknown, unable to split.")
|
||||
|
||||
all_int = all(isinstance(item, int) for item in sizes)
|
||||
|
@ -640,8 +640,8 @@ class Dataset:
|
|||
Note:
|
||||
1. Dataset cannot be sharded if split is going to be called.
|
||||
2. It is strongly recommended to not shuffle the dataset, but use randomize=True instead.
|
||||
Shuffling the dataset may not be deterministic, which means the data in each split
|
||||
will be different in each epoch.
|
||||
Shuffling the dataset may not be deterministic, which means the data in each split
|
||||
will be different in each epoch.
|
||||
|
||||
Raises:
|
||||
RuntimeError: If get_dataset_size returns None or is not supported for this dataset.
|
||||
|
@ -1173,6 +1173,7 @@ class SourceDataset(Dataset):
|
|||
def is_sharded(self):
|
||||
raise NotImplementedError("SourceDataset must implement is_sharded.")
|
||||
|
||||
|
||||
class MappableDataset(SourceDataset):
|
||||
"""
|
||||
Abstract class to represent a source dataset which supports use of samplers.
|
||||
|
@ -1253,13 +1254,13 @@ class MappableDataset(SourceDataset):
|
|||
|
||||
Note:
|
||||
1. Dataset should not be sharded if split is going to be called. Instead, create a
|
||||
DistributedSampler and specify a split to shard after splitting. If dataset is
|
||||
sharded after a split, it is strongly recommended to set the same seed in each instance
|
||||
of execution, otherwise each shard may not be part of the same split (see Examples)
|
||||
DistributedSampler and specify a split to shard after splitting. If dataset is
|
||||
sharded after a split, it is strongly recommended to set the same seed in each instance
|
||||
of execution, otherwise each shard may not be part of the same split (see Examples)
|
||||
2. It is strongly recommended to not shuffle the dataset, but use randomize=True instead.
|
||||
Shuffling the dataset may not be deterministic, which means the data in each split
|
||||
will be different in each epoch. Furthermore, if sharding occurs after split, each
|
||||
shard may not be part of the same split.
|
||||
Shuffling the dataset may not be deterministic, which means the data in each split
|
||||
will be different in each epoch. Furthermore, if sharding occurs after split, each
|
||||
shard may not be part of the same split.
|
||||
|
||||
Raises:
|
||||
RuntimeError: If get_dataset_size returns None or is not supported for this dataset.
|
||||
|
|
Loading…
Reference in New Issue