Commit Graph

101 Commits

Author SHA1 Message Date
hesham df361d1d26 Change mem layout of string tensor
add support for MindRecord and TFRecord
----
optimize tensorshape

optimize tensorshape and FlatIndex

TFRecord and MindRecord support for string tensor

Modify mem layout
Add new constructor
Add method Allocate

Change some GetMutableBuffer usages to AllocateBuffer
2020-05-22 01:22:29 -04:00
mindspore-ci-bot 58e6d7d950 !1341 Added lookup and vocab to mindspore.dataset.text
Merge pull request !1341 from ZiruiWu/vocab_and_lookup
2020-05-22 10:19:42 +08:00
jonwe bb51bb88d7 add compress in mindrecord 2020-05-22 09:37:51 +08:00
mindspore-ci-bot 2e3d55ed87 !1281 Implementation of SplitOp
Merge pull request !1281 from Peilin/splitOp
2020-05-22 09:29:03 +08:00
mindspore-ci-bot 39b9aedf68 !1342 Bug fix on issue Core dump on GPU when train with lenet with AU
Merge pull request !1342 from Tinazhang/cc
2020-05-22 09:18:49 +08:00
Peilin Wang 71e8bb1960 general split case done, chaining sampler (basic case) is working
implementation 99% complete

everything and tested except for repeatable shuffling

tested most basic/typical split usecases

cleanup

some more cleanup

fix CI

more ci fix

more ci fixes

more ci fix

more ci fix

more ci fix

added more tests, fixed some bugs

some more clean up and test cases

added shard/shuffle before split warning/error

addressed code review comments and ci

fixed ci
2020-05-21 20:42:51 -04:00
Tinazhang e9e40b688b Bug fix 2020-05-21 18:20:00 -04:00
Zirui Wu 25ab2ef303 Implemented lookup and vocab 2020-05-21 17:17:24 -04:00
mindspore-ci-bot 46949fc327 !1307 Cleanup dataset UT: unskip and enhance TFRecord sharding tests
Merge pull request !1307 from cathwong/ckw_dataset_ut_unskip2
2020-05-22 03:21:45 +08:00
qianlong 451c20a6f5 Add UnicodeCharTokenizer for nlp 2020-05-21 09:22:45 +08:00
mindspore-ci-bot 93e7c97a96 !1272 [Dataset] MindData Tree Optimizer Infrastructure
Merge pull request !1272 from JunhanHu/minddata_opt
2020-05-21 05:29:00 +08:00
Cathy Wong b78894e02b Cleanup dataset UT: unskip and enhance TFRecord sharding tests 2020-05-20 17:05:38 -04:00
Junhan Hu f44d213503 MindData optimizer infrastructure. 2020-05-20 16:11:26 -04:00
xulei2020 163b6b7ea7 add jieba c++ code 2020-05-20 15:55:12 +08:00
Tinazhang 17cecf2cf5 Added TCs to RandomCrop and RandomCropAndResize and modified visalize() calling 2020-05-19 15:42:24 -04:00
jinyaohui 5a914994ba clean pylint 2020-05-18 16:42:35 +08:00
jinyaohui bcfaff97f9 clean pylint 2020-05-18 10:31:46 +08:00
hesham e8ca243364 -Add DE_STRING
-replace switch'case by indexing

- Add test case
- Add constructors
- Add getItem string

- Fix bugs
- Add more tests

- Tensor iterator
- asNumpy
- TextFileDataset

- Tensor(Numpy)

- Super > 2D
- Add more test cases for GeneratorDataset

- Change StartAddr to GetBuffer and GetMutableNuffer

- Raise an error if batch is used with strings

Clean-up work
2020-05-15 20:33:28 -04:00
jiangzhiwen cb2814b498 flat_map first commit 2020-05-15 17:45:39 +08:00
mindspore-ci-bot c680cfbf27 !1157 dataset: add concat operation for dataset
Merge pull request !1157 from ms_yan/concat_dataset
2020-05-15 16:07:19 +08:00
mindspore-ci-bot ab031ee9ea !1126 VOCDataset support object detection function
Merge pull request !1126 from xiefangqi/voc_support_detection
2020-05-15 15:56:39 +08:00
xiefangqi c937bad53f minddata support voc 2020-05-15 13:24:03 +08:00
ms_yan c0fa7b4b19 init commit of concat dataset
change to use __add__ operation instead ds.concat
2020-05-15 13:14:13 +08:00
jonyguo be2e7531ca fix: MindDataset parameter shard_id & num_shards check 2020-05-14 17:18:11 +08:00
Cathy Wong 913074e656 Cleanup dataset UT: resolve skipped test units 2020-05-13 14:41:57 -04:00
liyong aa3f89e74f mindrecord support read file list 2020-05-13 14:11:59 +08:00
Cathy Wong 49ef53f164 Cleanup dataset UT: util.py internals 2020-05-11 14:44:24 -04:00
mindspore-ci-bot 2860fd9338 !984 Add unit test case for HWC2CHW.
Merge pull request !984 from Tinazhang/hwc2chw
2020-05-09 05:02:41 +08:00
Tinazhang c8b5586c7f add unit test for HWC2CHWC 2020-05-08 13:17:20 -04:00
Cathy Wong 58226addd6 Cleanup dataset UT: use md5 npz in test_zip for images 2020-05-08 11:25:48 -04:00
mindspore-ci-bot 47f5abceb4 !960 Adding example for grayscale
Merge pull request !960 from EricZ/grayscale_fix
2020-05-08 05:08:15 +08:00
mindspore-ci-bot 078dd86cfe !507 Implemented padded_batch
Merge pull request !507 from ZiruiWu/batch_with_padding
2020-05-08 04:54:05 +08:00
mindspore-ci-bot de7625777f !951 fix: MindDataset with columns_name parameter cause errors in some scenes
Merge pull request !951 from guozhijian/fix_read_by_columns
2020-05-08 04:51:03 +08:00
eric 0f0548f21b Added test case for grayscale support 2020-05-07 15:09:57 -04:00
Zirui Wu c2d364a573 batch with padding implemented
support for 1 specific dimension to be None, added validator

fix various CI complains

another round of CI fixes

ci

refactor parts of the code

code refactor

ci fix

comments added, fix bugs

address review comments

address review comments

review cmts

added simple perf test script

update pad code

perf imprv
2020-05-07 11:18:42 -04:00
jonyguo d4d236bcce fix: use MindDataset by column_names get data error in some situation 2020-05-07 18:12:36 +08:00
liyong b520ca9087 fix pk sampler in mindrecord 2020-05-07 14:54:23 +08:00
Cathy Wong 772e6c1461 Cleanup dataset UT: test_batch, save_and_check support 2020-05-05 15:35:09 -04:00
eric 36fffb7706 Added example md5 generation
Comparison example

Added md5 and comparison example for py_transforms

Added md5 check for images
2020-05-04 21:15:33 -04:00
Junhan Hu 83c68ca2ef Skip pyfunc test case 2020-05-01 15:14:07 -04:00
eric 26cb3e8a5f Added test function to show that seed doesn't work.
Added testcase to show that c image aug don't use seed properly

Added passing test cases

Added working testcases for using seed

Added additional test cases to show seed use

Added test case for seed
2020-04-30 18:57:59 -04:00
ms_yan c56fe3aa2d modify take op with an operator 2020-04-30 10:16:36 +08:00
mindspore-ci-bot 8af10eb51e !875 Reject python OP in operations argument for C++ uniform augmentation OP
Merge pull request !875 from AdelShafiei/ua_py
2020-04-30 06:11:17 +08:00
Adel Shafiei d15bd04bfe added input validation to reject python op in C++ uniform augmentation operations list 2020-04-29 16:48:10 -04:00
mindspore-ci-bot a606c2e4da !872 [Dataset] Add schema support for GeneratorDataset
Merge pull request !872 from JunhanHu/generator_schema
2020-04-30 02:58:29 +08:00
mindspore-ci-bot 2303453753 !869 Random data op
Merge pull request !869 from JesseKLee/random_data_op
2020-04-30 02:56:39 +08:00
Junhan Hu c5a8ffe4f4 Add schema support for GeneratorDataset 2020-04-29 13:50:51 -04:00
Jesse Lee 5236d0c3c0 Replace print with logger.info 2020-04-29 12:44:04 -04:00
mindspore-ci-bot 8d3695f666 !672 Added UT for uniform augmentation C++ OP
Merge pull request !672 from AdelShafiei/ua_ut
2020-04-29 23:56:09 +08:00
Jesse Lee 270bf831a9 Random Data Op 2020-04-29 10:26:00 -04:00