DAMO-ConvAI/dstc11-simmc
LYX 8b02a6ff16 revise 2023-03-23 17:15:58 +08:00
..
data_dstc11 add: dstc11-simmc 2022-11-14 11:43:19 +08:00
task1 add: dstc11-simmc 2022-11-14 11:43:19 +08:00
task2 add: dstc11-simmc 2022-11-14 11:43:19 +08:00
task3 add: dstc11-simmc 2022-11-14 11:43:19 +08:00
task4 add: dstc11-simmc 2022-11-14 11:43:19 +08:00
LICENSE add: dstc11-simmc 2022-11-14 11:43:19 +08:00
README.md revise 2023-03-23 17:15:58 +08:00
simmc.yaml add: dstc11-simmc 2022-11-14 11:43:19 +08:00

README.md

DSTC11 SIMMC2.1 DAMO-ConvAI

🤗 transformers support

DSTC11-Track 1 : The Third Situated Interactive MultiModal Conversations (SIMMC 2.1) Challenge 2022

Team: DAMO-ConvAI

Participant: Yuxing Long, Huibin Zhang, Binyuan Hui

🏴 Overview

For task 1, 2 and 3, we design discriminative models based on transformer-encoder structure (🤗 Longformer). To predict ambiguous candidates, coreference resolution, and belief state, we encode dialogue history and attach task-specific heads to the output of encoder. Additionally, we line up the item vectors (bbox position embedding) with their respective attribute tokens in the inputted token embedding. Auxiliary heads for predicting attributes are added as additional supervision signals.

For task 4, we propose a generative multimodal model which takes dialogue history and non-visual attributes as textual input, takes corresponding scene images as visual input and generates system response autoregressively.

🔥 News

  • 2022.11.5: We are officialy announced as the 🏆 Winner of DSTC11 Track1 Subtask2,3,4 and 🥈 Runner-up of DSTC11 Track1 Subtask1.
  • 2022.10.28: We submit our test-std prediction results to SIMMC and make our repository public available.
  • 2022.10.13: The repository dstc11-simmc2.1-damo-comvai for DSTC11 Track1 is created.

🌏 Environment

Firstly, download fairseq files and put it under task4 directory. Then, install the conda virtual environment by:

conda env create -f simmc.yml
pip install -r ./task4/requirements.txt

👐 Data Preparation

For task 1, 2 and 3, download SIMMC 2.1 data and rearrange the data_dstc11 folder in the following format.

|-- images                                                # scene images
|   |-- cloth_store_1_1_1.png
|   |-- cloth_store_1_1_2.png
|   `-- ...
|-- jsons                                                 # bbox and scene jsons
|   |-- cloth_store_1_1_1_bbox.json
|   |-- cloth_store_1_1_1_scene.json
|   `-- ...
|-- fashion_prefab_metadata_all.json                      # metadata (fashion)
|-- furniture_prefab_metadata_all.json                    # metadata (furniture)
|-- simmc2.1_dials_dstc11_dev.json                        # dialogue data (dev)
|-- simmc2.1_dials_dstc11_devtest.json                    # dialogue data (devtest)
|-- simmc2.1_dials_dstc11_teststd_public.json             # dialogue data (teststd)
`-- simmc2.1_dials_dstc11_train.json                      # dialogue data (train)

For task 4, you can directly put SIMMC 2.1 data into the data_dstc11 folder without rearragement.

NOTE: Some of the scene images are corrupted and therefore ignored.

cloth_store_1416238_woman_4_8.png
cloth_store_1416238_woman_19_0.png
cloth_store_1416238_woman_20_6.png

🌟 Inference

For each task, we provide the parameters of our model and the runnable code. The inference can be performed by running the corresponding bash file.

(Subtask 1) Ambiguous Candidate Identification

cd task1/bash
bash run_dstc11_task1.sh

(Subtask 2) Multimodal Coreference Resolution

cd task2/bash
bash run_dstc11_task2.sh

(Subtask 3) Multimodal Dialog State Tracking (MM-DST)

cd task3/bash
bash run_dstc11_task3.sh

NOTE: For task 1, 2 and 3, the preprocessing program need to be executed in advance taskN/scripts/process_for_dstc11_taskN.py, and the preprocessed dataset can be found under taskN/data directory. For downloaded checkpoints, they should be put into 'taskN/save_model' directory. All script will print the result (Precision/Recall/F1-score) and create a line-by-line *.json prediction for each turn of the preprocessed dataset.

(Subtask 4) Multimodal Dialog Response Generation

cd task4/run_scripts/simmc2.1
bash evaluate_one.sh task4_para 0 1

NOTE: For task 4, the preprocessing program need to be executed in advance task4/dataset/gen_simmc2.1.py, and the preprocessed tsv format dataset file teststd_public_withlast.tsv and teststd_public_withlast.tsv.index can be found under task4/dataset/simmc2.1 directory. For downloaded checkpoint, it should be put into 'task4/run_scripts/simmc2.1/task4_para/task4_para.pt'.

🐣 Model Parameter

Since our model is trained separately for each task, Download the model parameters by one of the following methods:

Sub-Task #1 Ambiguous Candidate Identification (New)
Goal Given ambiguous object mentions, to resolve referent objects to thier canonical ID(s).
Input Current user utterance, Dialog context, Multimodal context
Output Canonical object IDs
Metrics Object Identification F1
Devtest Performance 70.31
Teststd Performance 67.26
Checkpoint Checkpoint Link
Sub-Task #2 Multimodal Coreference Resolution
Goal To resolve referent objects to thier canonical ID(s) as defined by the catalog.
Input Current user utterance, Dialog context, Multimodal context
Output Canonical object IDs
Metrics Coref F1
Devtest Performance 94.40
Teststd Performance 94.29
Checkpoint Checkpoint Link
Sub-Task #3 Multimodal Dialog State Tracking (MM-DST)
Goal To track user belief states across multiple turns
Input Current user utterance, Dialogue context, Multimodal context
Output Belief state for current user utterance
Metrics Slot F1, Intent F1
Devtest Performance 93.32 / 99.19
Teststd Performance 94.24 / 95.98
Checkpoint Checkpoint Link
Sub-Task #4 Multimodal Dialog Response Generation
Goal To generate Assistant responses
Input Current user utterance, Dialog context, Multimodal context, (Ground-truth API Calls)
Output Assistant response utterance
Metrics BLEU-4
Devtest Performance 42.55
Teststd Performance 40.93
Checkpoint Checkpoint Link

📜 Result

For the results of each task, we put the prediction results of the test-std set in the test_result link.

💬 References

@inproceedings{kottur-etal-2021-simmc,
    title = "{SIMMC} 2.0: A Task-oriented Dialog Dataset for Immersive Multimodal Conversations",
    author = "Kottur, Satwik  and
      Moon, Seungwhan  and
      Geramifard, Alborz  and
      Damavandi, Babak",
    booktitle = "Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing",
    month = nov,
    year = "2021",
    address = "Online and Punta Cana, Dominican Republic",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2021.emnlp-main.401",
    doi = "10.18653/v1/2021.emnlp-main.401",
    pages = "4903--4912",
}

📝 License

Our repository is released under MIT License, see LICENSE for details.