DAMO-ConvAI/pace
浑醉 da050b480d add simmc file 2024-03-18 11:32:35 +08:00
..
assets add: pace 2023-06-01 11:48:27 +08:00
data add simmc file 2024-03-18 11:32:35 +08:00
pace add citation & fixed a bug 2023-06-01 12:52:25 +08:00
scripts add: pace 2023-06-01 11:48:27 +08:00
.gitignore add: pace 2023-06-01 11:48:27 +08:00
LICENSE add: pace 2023-06-01 11:48:27 +08:00
README.md paperwithcode 2023-06-12 14:37:52 +08:00
eval_model.py add: pace 2023-06-01 11:48:27 +08:00
requirements.txt add: pace 2023-06-01 11:48:27 +08:00
run.py add: pace 2023-06-01 11:48:27 +08:00

README.md

PaCE: Unified Multi-modal Dialogue Pre-training with Progressive and Compositional Experts

PWC PWC PWC PWC PWC PWC PWC PWC

🔥 Updates

  • [2023-05-02]: PaCE was accepted by ACL 2023

🏴󠁶󠁵󠁭󠁡󠁰󠁿 Overview

PaCE, a unified, structured, compositional multi-modal dialogue pre-training framework. It utilizes a combination of several fundamental experts to accommodate multiple dialogue-related tasks and can be pre-trained using limited dialogue and extensive non-dialogue multi-modal data. Furthermore, we propose a progressive training method where old experts from the past can assist new experts, facilitating the expansion of their capabilities.

model

Quickstart

Prepare Environment

First, you should set up a python environment. This code base has been tested under python 3.x, and we officially support python 3.7.

$ conda create -n pace python=3.7
$ cd PaCE # where contains 'requirements.txt'
$ pip install -r requirements.txt

Pre-trained model

The pretrained models of PaCE can be found at here

Finetuning

$ python run.py with env_8 task_finetune_irtr_photochat_randaug per_gpu_batchsize=4 load_path=[pretrained.ckpt] data_root=[datasets] log_dir=[log_dir]

Evaluation

$ python run.py with env_debug task_finetune_irtr_photochat_randaug per_gpu_batchsize=4 test_only=True precision=32 load_path=[pretrained.ckpt] data_root=[datasets] log_dir=[log_dir]

💿 Resource

Corpus

We do not distribute datasets because of the license issue. Please download the datasets by yourself. We placed links for each dataset below for downloading. And we process data to the same format as in ViLT.

Multi-Modal Non-Dialogue Data VG SBU MSCOCO GCC - -
Multi-Modal Dialogue Data VisDial ImageChat MMDialog MMConv SIMMC 2.0 Photochat

Project

  • For pace, there is an overview:
|-- datamodules
    |-- vocabs # Contain specialized vocabulary required for downstream datasets.
    |-- datamodule_base.py # Define the high-level interface for the dataset.
    |-- photochat_datamodule.py 
    |-- ... # Datamodule for each dataset.
|-- datasets
    |-- base_dataset.py # Specify the specifics of acquiring individual items and transforming them into batches.
    |-- photochat_dataset.py # Define dataset-specific functions.
    |-- ... # For each 'dataset'.
|-- gadgets
    |-- metrics.py # Define evaluation metrics
|-- modules
        |-- heads.py # Define model head(like MLMHead). 
    |-- objectives.py # Compute loss for various tasks.
    |-- vision_transformer.py # ViT model
    |-- pace_module.py # Model architecture
    |-- pace_utils.py # Set metrics and schedule / wrapup epoch
|-- tranforms # Transforms on images, i.e. resize, randaug and so on. 
|-- utils # This mainly defines functions related to converting data processing to array format, as well as some helpful functions.
        |-- write_photochat.py 
        |-- ...
        |-- format_simmc_dst_generation.py
        |-- ...
|-- config.py # Config your global settings, such as numbers of your GPU, output directory, and the location of your dataset, and so on.  Furthermore, you can configure different hyperparameters for different tasks here.

💬 Citation

@article{li2023pace,
  title={PaCE: Unified Multi-modal Dialogue Pre-training with Progressive and Compositional Experts},
  author={Li, Yunshui and Hui, Binyuan and Yin, ZhiChao and Yang, Min and Huang, Fei and Li, Yongbin},
  journal={arXiv preprint arXiv:2305.14839},
  year={2023}
}

👍 Acknowledge

This repository is built using the ViLT repository and the timm library..

📝 License

Please note that there are TWO LICENSES for code and pre-training corpus. The code and pre-trained models are open-sourced under MIT License, while the pre-training corpus is released under CC BY-SA 4.0.