The Emotion Transcription in Conversation (ETC) Dataset is a Japanese dialogue dataset of approximately 1,000 conversations. Each utterance is paired with an emotion transcription, a natural language description of the speaker's internal emotional state at the time of the utterance. The dataset also includes emotion labels corresponding to the emotion transcriptions, as well as speakers' personality traits (TIPI-J).
This dataset was constructed as a benchmark for the task of Emotion Transcription in Conversation (ETC): describing the emotional states behind speakers' utterances in natural language.
Note
A Japanese version of this README is available here.
Note
The published data has been quality-checked, and dialogues considered ethically problematic have been excluded. Please note that the analysis reported in the paper is based on the dataset prior to the exclusion of such dialogues and may differ from the statistics of the published version. Additionally, speaker names have been replaced with anonymous IDs assigned by the dataset creators.
Caution
The dialogue content in this dataset was collected via crowdsourcing and does not represent the beliefs or opinions of the dataset creators or their affiliated institutions.
| ETC Dataset | |
|---|---|
| # Dialogues | 997 |
| # Speakers | 198 |
| # Utterances / emotion transcriptions | 9,970 |
| Utterances per dialogue | 10 |
| Avg. utterance length (characters) | 42.72 |
| โ Speaker | 44.65 |
| โ Listener | 40.79 |
| Avg. emotion transcription length (characters) | 28.88 |
| โ Speaker | 28.91 |
| โ Listener | 28.85 |
| # Emotion categories | 7 (Ekman's 6 basic emotions + Neutral) |
| Language | Japanese |
The etc/ directory contains the dialogue data (dialogues/*.json) and speaker personality trait data based on TIPI-J1 (personality_traits.json).
etc/
โโโ dialogues/ // Dialogue data (one file per dialogue)
โ โโโ 0001.json
โ โโโ 0002.json
โ โโโ ...
โ โโโ 0997.json
โโโ personality_traits.json // Speaker personality traits data
โโโ split.json // Train/Valid/Test split information
The dialogue data includes participant IDs, utterances, emotion transcriptions, and emotion labels. Each dialogue begins with the Speaker's utterance, and the Speaker and Listener take turns alternately (10 utterances per dialogue in total).
For dialogue collection, we adopted the dialogue setup from EmpatheticDialogues2. For each dialogue, a specific emotion label (e.g., "impressed," "disappointed," "confident"โ32 types in total) was assigned. The Speaker talks about an experience related to that emotion, while the Listener responds to the Speaker's utterances.
Emotion labels consist of 7 categories: Ekman's 6 basic emotions3 (joy, sadness, fear, anger, surprise, and disgust) plus "Neutral." Each emotion transcription was annotated by 3 annotators in a multi-label format.
| Key | Type | Description |
|---|---|---|
| dialogue_id | int | Dialogue ID |
| dialogue_emotion | str | Emotion label assigned to the participant pair for the dialogue |
| participants | dict | Dictionary of speaker IDs |
| participants.speaker | str | Speaker ID |
| participants.listener | str | Listener ID |
| dialogue | list (dict) | List of utterance information |
| dialogue.turn | int | Turn number (1-indexed) |
| dialogue.role | str | Role: speaker or listener |
| dialogue.utterance | str | Utterance text |
| dialogue.emotion_transcription | str | The participant's emotion transcription for the utterance |
| dialogue.emotions | list (list (str)) | List of emotion labels for the emotion transcription (multi-label format by 3 annotators) |
Example: etc/dialogues/0945.json
{
"dialogue_id": 945,
"dialogue_emotion": "ไฟก้ ผใใ",
"participants": {
"speaker": "FQ",
"listener": "BN"
},
"dialogue": [
{
"turn": 1,
"role": "speaker",
"utterance": "ไฟก้ ผใใชใใจใไบบ้้ขไฟใฃใฆๆง็ฏใงใใชใใใฎใใชใจๆใใพใใใใใใฏ่จใฃใฆใ่ฃๅใใใใใจใใใใใ้ฃใใใงใใใญใ",
"emotion_transcription": "ใใใชใๆทฑใ่ณชๅใใใ็ธๆใฏๅฐใใใชใจๆใใคใคใใไบบๆใ็ฅใใใใซ่ใใฆใฟใใใชใใพใใใ",
"emotions": [
["ๆๆ"],
["ๆๆ"],
["่ฉฒๅฝใชใ"]
]
},
{
"turn": 1,
"role": "listener",
"utterance": "ไบบใจใฎ้ขไฟใฃใฆๆฌๅฝใซ้ฃใใใงใใใญใ่ฏใใใจๆใฃใฆใใไบใ็ธๆใใใใใฐ่ฟทๆใ ใฃใใไปฒใ่ฏใใจๆใฃใฆใใใฎใซ่ฃใงๆชๅฃใ่จใใใฆใใใๆญฃ่งฃใใชใใฆๆๆขใใงๆง็ฏใใฆใใใใใใใพใใใใญใ",
"emotion_transcription": "่ชๅใฏไบบ้้ขไฟใฎ่ค้ใใซๅคงใใฆๆทฑใๅ
ฑๆใใ้ฃใใไบใๅคใใใใใ่ช ๅฎใซๅใๅใฃใฆไฟก้ ผ้ขไฟใ็ฏใใใจใๅคงๅใ ใจไผใใใใฃใใงใใ",
"emotions": [
["ๆฒใใฟ"],
["ๆฒใใฟ"],
["่ฉฒๅฝใชใ"]
]
}
// ...
]
}The personality trait data includes TIPI-J (Japanese version of the Ten-Item Personality Inventory)1 questionnaire items, speaker responses, and Big Five scores computed from those responses.
| Key | Type | Description |
|---|---|---|
| item | dict | Questionnaire items (i01โi10) |
| personality | dict | Personality trait data keyed by speaker ID |
| personality.*.participant_id | str | Participant ID |
| personality.*.response | dict | Responses to each questionnaire item |
| personality.*.score | dict | Scores for each Big Five dimension |
| personality.*.score.openness | int | Openness (2โ14) |
| personality.*.score.conscientiousness | int | Conscientiousness (2โ14) |
| personality.*.score.extraversion | int | Extraversion (2โ14) |
| personality.*.score.agreeableness | int | Agreeableness (2โ14) |
| personality.*.score.neuroticism | int | Neuroticism (2โ14) |
{
"item": {
"i01": "ๆดป็บใง๏ผๅคๅ็ใ ใจๆใ",
"i02": "ไปไบบใซไธๆบใใใก๏ผใใใใจใ่ตทใใใใใใจๆใ",
"i03": "ใใฃใใใใฆใใฆ๏ผ่ชๅใซๅณใใใจๆใ",
// ...
},
"personality": {
"AA": {
"participant_id": "AA",
"response": {
"i01": "2. ใใใใ้ใใจๆใ",
"i02": "2. ใใใใ้ใใจๆใ",
// ...
},
"score": {
"openness": 10,
"conscientiousness": 2,
"extraversion": 7,
"agreeableness": 11,
"neuroticism": 9
}
}
// ...
}
}split.json contains the Train / Valid / Test split information used in the experiments reported in the paper. Note that the dataset used in the paper's experiments includes dialogues that were later excluded from this published dataset due to ethical concerns.
Caution
Please observe the following guidelines when using this dataset:
- Do not attempt to identify individuals from the data in this dataset.
- Do not use this dataset to impersonate any specific speaker.
- When using this dataset for purposes such as predicting speakers' personality traits, be mindful of the rights of speakers who may not wish to have their personal information inferred.
@inproceedings{tanaka-etal-2026-etcdataset,
title = "Emotion Transcription in Conversation: A Benchmark for Capturing Subtle and Complex Emotional States through Natural Language",
author = "Tanaka, Yoshiki and
Uehara, Ryuichi and
Inoue, Koji and
Inaba, Michimasa",
booktitle = "Proceedings of the Fifteenth Language Resources and Evaluation Conference (LREC 2026)",
year = "2026",
pages = "9692--9709",
publisher = "European Language Resources Association (ELRA)"
}
@inproceedings{tanaka-etal-2026-etcdataset-ja,
title = "ๅฏพ่ฉฑใซใใใๅฟๆ
่จ่ฟฐ: ่ช็ถ่จ่ชใซใใๆฉๅพฎใใค่ค้ใชๅฟๆ
็่งฃใฎใใใฎใใณใใใผใฏ",
author = "็ฐไธญ ็พฉ่ฆ and ไธๅ ้ไธ and ไบไธ ๆๆฒป and ็จฒ่ ้ๅฐ",
booktitle = "่จ่ชๅฆ็ๅญฆไผ็ฌฌ32ๅๅนดๆฌกๅคงไผ็บ่กจ่ซๆ้",
year = "2026",
pages = "1328--1333"
}This work was supported by JSPS KAKENHI Grant Number 25H01382.
This dataset is licensed under CC BY-NC 4.0.
Footnotes
-
Atsushi Oshio, ABE Shingo, and Pino Cutrone. Development, reliability, and validity of the japanese version of ten item personality inventory (tipi-j). Japanese Journal of Personality, Vol. 21, No. 1, 2012. โฉ โฉ2
-
Hannah Rashkin, Eric Michael Smith, Margaret Li, and Y-Lan Boureau. Towards empathetic open-domain conversation models: A new benchmark and dataset. In Anna Korhonen, David Traum, and Lluรญs Mร rquez, editors, Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 5370โ5381, Florence, Italy, July 2019. Association for Computational Linguistics. โฉ
-
P. Ekman, W. V. Friesen, M. J. O'Sullivan, A. K. Chan, I. Diacoyanni-Tarlatzis, K. G. Heider, R. Krause, W. A. LeCompte, T. K. Pitcairn, P. E. Ricci-Bitti, K. R. Scherer, M. Tomita, and A. Tzavaras. Universals and cultural differences in the judgments of facial expressions of emotion. Vol. 53, pp. 712โ717, 1987. โฉ
