History

LYX 8b02a6ff16 revise		2023-03-23 17:15:58 +08:00
..
data_dstc11	add: dstc11-simmc	2022-11-14 11:43:19 +08:00
task1	add: dstc11-simmc	2022-11-14 11:43:19 +08:00
task2	add: dstc11-simmc	2022-11-14 11:43:19 +08:00
task3	add: dstc11-simmc	2022-11-14 11:43:19 +08:00
task4	add: dstc11-simmc	2022-11-14 11:43:19 +08:00
LICENSE	add: dstc11-simmc	2022-11-14 11:43:19 +08:00
README.md	revise	2023-03-23 17:15:58 +08:00
simmc.yaml	add: dstc11-simmc	2022-11-14 11:43:19 +08:00

README.md

DSTC11 SIMMC2.1 DAMO-ConvAI

DSTC11-Track 1 : The Third Situated Interactive MultiModal Conversations (SIMMC 2.1) Challenge 2022

Team: DAMO-ConvAI

Participant: Yuxing Long, Huibin Zhang, Binyuan Hui

🏴 Overview

For task 1, 2 and 3, we design discriminative models based on transformer-encoder structure (🤗 Longformer). To predict ambiguous candidates, coreference resolution, and belief state, we encode dialogue history and attach task-specific heads to the output of encoder. Additionally, we line up the item vectors (bbox position embedding) with their respective attribute tokens in the inputted token embedding. Auxiliary heads for predicting attributes are added as additional supervision signals.

For task 4, we propose a generative multimodal model which takes dialogue history and non-visual attributes as textual input, takes corresponding scene images as visual input and generates system response autoregressively.

🔥 News

2022.11.5: We are officialy announced as the 🏆 Winner of DSTC11 Track1 Subtask2,3,4 and 🥈 Runner-up of DSTC11 Track1 Subtask1.
2022.10.28: We submit our test-std prediction results to SIMMC and make our repository public available.
2022.10.13: The repository dstc11-simmc2.1-damo-comvai for DSTC11 Track1 is created.

🌏 Environment

Firstly, download fairseq files and put it under task4 directory. Then, install the conda virtual environment by:

conda env create -f simmc.yml
pip install -r ./task4/requirements.txt

👐 Data Preparation

For task 1, 2 and 3, download SIMMC 2.1 data and rearrange the data_dstc11 folder in the following format.

|-- images                                                # scene images
|   |-- cloth_store_1_1_1.png
|   |-- cloth_store_1_1_2.png
|   `-- ...
|-- jsons                                                 # bbox and scene jsons
|   |-- cloth_store_1_1_1_bbox.json
|   |-- cloth_store_1_1_1_scene.json
|   `-- ...
|-- fashion_prefab_metadata_all.json                      # metadata (fashion)
|-- furniture_prefab_metadata_all.json                    # metadata (furniture)
|-- simmc2.1_dials_dstc11_dev.json                        # dialogue data (dev)
|-- simmc2.1_dials_dstc11_devtest.json                    # dialogue data (devtest)
|-- simmc2.1_dials_dstc11_teststd_public.json             # dialogue data (teststd)
`-- simmc2.1_dials_dstc11_train.json                      # dialogue data (train)

For task 4, you can directly put SIMMC 2.1 data into the data_dstc11 folder without rearragement.

NOTE: Some of the scene images are corrupted and therefore ignored.

cloth_store_1416238_woman_4_8.png
cloth_store_1416238_woman_19_0.png
cloth_store_1416238_woman_20_6.png

🌟 Inference

For each task, we provide the parameters of our model and the runnable code. The inference can be performed by running the corresponding bash file.

(Subtask 1) Ambiguous Candidate Identification

cd task1/bash
bash run_dstc11_task1.sh

(Subtask 2) Multimodal Coreference Resolution

cd task2/bash
bash run_dstc11_task2.sh

(Subtask 3) Multimodal Dialog State Tracking (MM-DST)

cd task3/bash
bash run_dstc11_task3.sh

NOTE: For task 1, 2 and 3, the preprocessing program need to be executed in advance taskN/scripts/process_for_dstc11_taskN.py, and the preprocessed dataset can be found under taskN/data directory. For downloaded checkpoints, they should be put into 'taskN/save_model' directory. All script will print the result (Precision/Recall/F1-score) and create a line-by-line *.json prediction for each turn of the preprocessed dataset.

(Subtask 4) Multimodal Dialog Response Generation

cd task4/run_scripts/simmc2.1
bash evaluate_one.sh task4_para 0 1

NOTE: For task 4, the preprocessing program need to be executed in advance task4/dataset/gen_simmc2.1.py, and the preprocessed tsv format dataset file teststd_public_withlast.tsv and teststd_public_withlast.tsv.index can be found under task4/dataset/simmc2.1 directory. For downloaded checkpoint, it should be put into 'task4/run_scripts/simmc2.1/task4_para/task4_para.pt'.

🐣 Model Parameter

Since our model is trained separately for each task, Download the model parameters by one of the following methods:

Sub-Task #1	Ambiguous Candidate Identification (New)
Goal	Given ambiguous object mentions, to resolve referent objects to thier canonical ID(s).
Input	Current user utterance, Dialog context, Multimodal context
Output	Canonical object IDs
Metrics	Object Identification F1
Devtest Performance	70.31
Teststd Performance	67.26
Checkpoint	Checkpoint Link

Sub-Task #2	Multimodal Coreference Resolution
Goal	To resolve referent objects to thier canonical ID(s) as defined by the catalog.
Input	Current user utterance, Dialog context, Multimodal context
Output	Canonical object IDs
Metrics	Coref F1
Devtest Performance	94.40
Teststd Performance	94.29
Checkpoint	Checkpoint Link

Sub-Task #3	Multimodal Dialog State Tracking (MM-DST)
Goal	To track user belief states across multiple turns
Input	Current user utterance, Dialogue context, Multimodal context
Output	Belief state for current user utterance
Metrics	Slot F1, Intent F1
Devtest Performance	93.32 / 99.19
Teststd Performance	94.24 / 95.98
Checkpoint	Checkpoint Link

Sub-Task #4	Multimodal Dialog Response Generation
Goal	To generate Assistant responses
Input	Current user utterance, Dialog context, Multimodal context, (Ground-truth API Calls)
Output	Assistant response utterance
Metrics	BLEU-4
Devtest Performance	42.55
Teststd Performance	40.93
Checkpoint	Checkpoint Link

📜 Result

For the results of each task, we put the prediction results of the test-std set in the test_result link.

💬 References

@inproceedings{kottur-etal-2021-simmc,
    title = "{SIMMC} 2.0: A Task-oriented Dialog Dataset for Immersive Multimodal Conversations",
    author = "Kottur, Satwik  and
      Moon, Seungwhan  and
      Geramifard, Alborz  and
      Damavandi, Babak",
    booktitle = "Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing",
    month = nov,
    year = "2021",
    address = "Online and Punta Cana, Dominican Republic",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2021.emnlp-main.401",
    doi = "10.18653/v1/2021.emnlp-main.401",
    pages = "4903--4912",
}

📝 License

Our repository is released under MIT License, see LICENSE for details.