增加language选项的说明 (#1810)

* Update README.md

* Update README_ch.md

* Update README.md
This commit is contained in:
Wei Shengyu 2021-01-27 19:04:54 +08:00 committed by GitHub
parent b22ee4dd5f
commit 25bf92295f
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
2 changed files with 5 additions and 5 deletions

View File

@ -72,7 +72,7 @@ fusion_generator:
python3 tools/synth_image.py -c configs/config.yml --style_image examples/style_images/2.jpg --text_corpus PaddleOCR --language en python3 tools/synth_image.py -c configs/config.yml --style_image examples/style_images/2.jpg --text_corpus PaddleOCR --language en
``` ```
* Note 1: The language options is correspond to the corpus. Currently, the tool only supports English, Simplified Chinese and Korean. * Note 1: The language options is correspond to the corpus. Currently, the tool only supports English(en), Simplified Chinese(ch) and Korean(ko).
* Note 2: Synth-Text is mainly used to generate images for OCR recognition models. * Note 2: Synth-Text is mainly used to generate images for OCR recognition models.
So the height of style images should be around 32 pixels. Images in other sizes may behave poorly. So the height of style images should be around 32 pixels. Images in other sizes may behave poorly.
* Note 3: You can modify `use_gpu` in `configs/config.yml` to determine whether to use GPU for prediction. * Note 3: You can modify `use_gpu` in `configs/config.yml` to determine whether to use GPU for prediction.
@ -120,7 +120,7 @@ In actual application scenarios, it is often necessary to synthesize pictures in
* `with_label`Whether the `label_file` is label file list. * `with_label`Whether the `label_file` is label file list.
* `CorpusGenerator` * `CorpusGenerator`
* `method`Method of CorpusGeneratorsupports `FileCorpus` and `EnNumCorpus`. If `EnNumCorpus` is usedNo other configuration is neededotherwise you need to set `corpus_file` and `language`. * `method`Method of CorpusGeneratorsupports `FileCorpus` and `EnNumCorpus`. If `EnNumCorpus` is usedNo other configuration is neededotherwise you need to set `corpus_file` and `language`.
* `language`Language of the corpus. * `language`Language of the corpus. Currently, the tool only supports English(en), Simplified Chinese(ch) and Korean(ko).
* `corpus_file`: Filepath of the corpus. Corpus file should be a text file which will be split by line-endings'\n'. Corpus generator samples one line each time. * `corpus_file`: Filepath of the corpus. Corpus file should be a text file which will be split by line-endings'\n'. Corpus generator samples one line each time.

View File

@ -63,10 +63,10 @@ fusion_generator:
```python ```python
python3 tools/synth_image.py -c configs/config.yml --style_image examples/style_images/2.jpg --text_corpus PaddleOCR --language en python3 tools/synth_image.py -c configs/config.yml --style_image examples/style_images/2.jpg --text_corpus PaddleOCR --language en
``` ```
* 注1语言选项和语料相对应目前该工具只支持英文、简体中文和韩语。 * 注1语言选项和语料相对应目前支持英文(en)、简体中文(ch)和韩语(ko)
* 注2Style-Text生成的数据主要应用于OCR识别场景。基于当前PaddleOCR识别模型的设计我们主要支持高度在32左右的风格图像。 * 注2Style-Text生成的数据主要应用于OCR识别场景。基于当前PaddleOCR识别模型的设计我们主要支持高度在32左右的风格图像。
如果输入图像尺寸相差过多,效果可能不佳。 如果输入图像尺寸相差过多,效果可能不佳。
* 注3可以通过修改配置文件中的`use_gpu`(true或者false)参数来决定是否使用GPU进行预测。 * 注3可以通过修改配置文件`configs/config.yml`中的`use_gpu`(true或者false)参数来决定是否使用GPU进行预测。
例如,输入如下图片和语料"PaddleOCR": 例如,输入如下图片和语料"PaddleOCR":
@ -105,7 +105,7 @@ python3 tools/synth_image.py -c configs/config.yml --style_image examples/style_
* `with_label`:标志`label_file`是否为label文件。 * `with_label`:标志`label_file`是否为label文件。
* `CorpusGenerator` * `CorpusGenerator`
* `method`:语料生成方法,目前有`FileCorpus`和`EnNumCorpus`可选。如果使用`EnNumCorpus`,则不需要填写其他配置,否则需要修改`corpus_file`和`language` * `method`:语料生成方法,目前有`FileCorpus`和`EnNumCorpus`可选。如果使用`EnNumCorpus`,则不需要填写其他配置,否则需要修改`corpus_file`和`language`
* `language`:语料的语种; * `language`:语料的语种,目前支持英文(en)、简体中文(ch)和韩语(ko)
* `corpus_file`: 语料文件路径。语料文件应使用文本文件。语料生成器首先会将语料按行切分,之后每次随机选取一行。 * `corpus_file`: 语料文件路径。语料文件应使用文本文件。语料生成器首先会将语料按行切分,之后每次随机选取一行。
语料文件格式示例: 语料文件格式示例: