PEGCONV dataset

The following is a set of datasets recorded using OpenBCI eeg soft cap, shimmer3 sensor, realsense camera, and audio recorder during an emotional task like video/image watching and a human-human or human-machine conversation about emotional subjects. It includes 4 datasets:

note

To access any of these datasets

Watching-video Dataset (PEGVideo)

This dataset includes EEG, PPG, GSR, and video data which has been collected during a video-watching task.

raw_data : This directory includes video, OpenBCI(EEG), and Shimmer3(PPG and GSR) data.

Each raw EEG and Shimmer file includes the continuous data from all trials for each participant. The beginning and end of each trial have been marked in the trigger column. The fixation cross time is included in each trial.

preprocessed_data : Splitted EEG, PPG, and GSR data according to markers(triggers) and 3 first seconds (fixation cross) have been removed. The name of each file has three numbers. The first number shows the participant's ID. The third number is the trial's ID. This number can be used to find the reported ratings for each trial from the "labels.csv file".

Labels : includes all ratings for all trials of all participants.

dataset-info.csv : includes participants' information and their permission to share their data.

To cite the dataset: Saffaryazdi, N., Wasim, S. T., Dileep, K., Nia, A. F., Nanayakkara, S., Broadbent, E., & Billinghurst, M. (2022). Using facial micro-expressions in combination with EEG and physiological signals for emotion recognition. _Frontiers in Psychology_ , _13_ , 864047.

Human-Human Conversation Dataset (PEGConv)

This dataset includes EEG, PPG, GSR, audio, and video data which has been collected during an emotional human-human conversation.

raw_data : This directory includes audio, video, OpenBCI(EEG), and Shimmer3(PPG and GSR) data.

Each raw EEG and Shimmer file includes the continuous data from all trials for each participant. The beginning and end of each trial have been marked in the trigger column. The fixation cross time is included in each trial.

preprocessed_data : Splitted EEG, PPG, and GSR data according to markers(triggers) and 3 first seconds (fixation cross) have been removed.

The name of each file has three numbers. The first number shows the participant's ID. The third number is the trial's ID. This number can be used to find the reported ratings for each trial from the "labels.csv file".

Labels.csv : includes all ratings for all trials of all participants. For each trial, participants rated their emotions at first, in the middle, and at the end. The first column shows the participant ID, trial number, and part. For example p20_1_0 means the participant ID is p20, the trial ID is 1 and the rating is related to the beginning part of the conversation. Or p20_3_1 means participant ID is p20, trial ID is 3 and the rating is related to the middle of conversation in trial 3. The sensor data and labels can be mapped together using participant ID and trial ID.

dataset-info.csv : includes participants' information and their permission to share their data.

To cite the dataset and know more about the dataset: Saffaryazdi, N., Goonesekera, Y., Saffaryazdi, N., Hailemariam, N. D., Temesgen, E. G., Nanayakkara, S., Broadbent E. & Billinghurst, M. (2022, March). Emotion recognition in conversations using brain and physiological signals. In _27th International Conference on Intelligent User Interfaces_ (pp. 229-242).

Face-to-Face vs Remote Conversation Dataset (pegFRConv)

This dataset includes EEG, PPG, GSR, audio, and video data which has been collected during emotional human-human face-to-face and remote conversations.

raw_data : This directory includes audio, video, eeg, and Shimmer data in face-to-face and remote conversations. It includes:

preprocessed_data : It includes splitted EEG, PPG and EDA data based on trials' markers. It also includes extracted facial action units from the facial video.

labels : includes all ratings for all trials of all participants in both conditions.

empathy_ratings.csv : all empathy ratings for all participants after face-to-face session and after remote session.

dataset-info.csv : includes participants' information, their permission to share their data, and information about missing data

To cite the dataset: Saffaryazdi, N., Kirkcaldy, N., Lee, G., Loveys, K., Broadbent, E., & Billinghurst, M. (2024). Exploring the impact of computer-mediated emotional interactions on human facial and physiological responses. Telematics and Informatics Reports, 14, 100131.

Human-Machine Conversation Dataset (PEGHMConv)

This dataset includes a folder for each modality. Each modality's folder includes the participant's recorded raw audio, video, OpenBCI, and Shimmer3 data during interaction with a neutral/non-empathetic conversational agent and an empathetic agent.

It includes:

raw_data : This directory includes audio, video, eeg, and Shimmer3(PPG and GSR) data.

preprocessed_data It includes trimmed eeg, gsr and ppg files. The first 9 seconds of each file is related to fixation cross and watching image and the rest is related to conversation with digital human.

self_report_labels.csv This file includes all participants' ratings for all trials of both conditions. It also includes the binary arousal and valence value of image stimuli (conversation topic).

note: The name of each file has two numbers. The first number shows the participant's ID. The second number is the trial's ID. This number can be used to find the reported ratings for each trial from the self_report file.

dataset-info.csv: includes participants' information and their permission to share their data.

README.md Important information about the dataset and missing data

To cite the dataset: Saffaryazdi, N., Gharibnavaz, A., & Billinghurst, M. (2022). Octopus Sensing: A Python library for human behavior studies. Journal of Open Source Software, 7(71), 4045.