PEGCONV dataset

The following is a set of datasets recorded using OpenBCI eeg soft cap, shimmer3 sensor, realsense camera, and audio recorder during an emotional task like video/image watching and a human-human or human-machine conversation about emotional subjects. It includes 4 datasets:

Watching-video Dataset (PEGVideo)
Human-Human Conversation Dataset (PEGConv/PhysioConv)
Face-to-Face vs Remote Conversation Dataset (PEGFRConv)
Human-Machine Conversation Dataset (PEGHMConv)

note

To know more about each dataset, read the related paper.
To cite each dataset, cite the related paper.

To access any of these datasets

1- Please sign the end-user agreement.
2- The dataset is on pcloud. So please create a free account on pcloud website.
3- Send the signed agreement and your pcloude username to zsaf419@aucklanduni.ac.nz.

Watching-video Dataset (PEGVideo)

This dataset includes EEG, PPG, GSR, and video data which has been collected during a video-watching task.

raw_data : This directory includes video, OpenBCI(EEG), and Shimmer3(PPG and GSR) data.

Video.zip
- Video files are in Avi format
- Some participants did not allow their video data to be shared.
shimmer.zip
- Includes shimmer3 data in CSV format
eeg.zip
- Includes OpenBCI data in CSV format
labels
- self_reports: Includes participant's ratings in CSV format
- self-report-coding.txt: The meaning of each self_report column
- stimuli_ids.csv: Subject of each stimulus (emotion)

Each raw EEG and Shimmer file includes the continuous data from all trials for each participant. The beginning and end of each trial have been marked in the trigger column. The fixation cross time is included in each trial.

preprocessed_data : Splitted EEG, PPG, and GSR data according to markers(triggers) and 3 first seconds (fixation cross) have been removed. The name of each file has three numbers. The first number shows the participant's ID. The third number is the trial's ID. This number can be used to find the reported ratings for each trial from the "labels.csv file".

Includes eeg.zip, ppg.zip and gsr.zip. Each zip file includes a directory for each participant. Each participant has a CSV file for each trial's data
self-report ratings are in the labels directory. The ratings are converted to categorized labels in these files. For each of arousal, valence, and dominance, 5 is the threshold when it is considered as high.
Each of the GSR and PPG files has a column of time series.
The EEG file has 16 columns of 16 EEG channels. The order of channels are as follows: {"Fp1", "Fp2", "F7", "F3", "F4", "F8", "T3", "C3", "C4", "T4", "T5", "P3", "P4", "T6", "O1", "O2"}
The name of each file has three numbers. The first number shows the participant's ID. The third number is the trial's ID. This number can be used to find the reported ratings for each trial (stimuli) from the "labels" directory.

Labels : includes all ratings for all trials of all participants.

dataset-info.csv : includes participants' information and their permission to share their data.

To cite the dataset: Saffaryazdi, N., Wasim, S. T., Dileep, K., Nia, A. F., Nanayakkara, S., Broadbent, E., & Billinghurst, M. (2022). Using facial micro-expressions in combination with EEG and physiological signals for emotion recognition. _Frontiers in Psychology_ , _13_ , 864047.

Human-Human Conversation Dataset (PEGConv)

This dataset includes EEG, PPG, GSR, audio, and video data which has been collected during an emotional human-human conversation.

raw_data : This directory includes audio, video, OpenBCI(EEG), and Shimmer3(PPG and GSR) data.

Video.zip
- Video files are in avi format
- Some participants did not allow their video data to be shared.
Audio.zip
- Audio files are in wav format
- Some participants did not allow their audio data to be shared.
shimmer.zip
- Includes shimmer3 data in csv format
eeg.zip
- Includes OpenBCI data in csv format

preprocessed_data : Splitted EEG, PPG, and GSR data according to markers(triggers) and 3 first seconds (fixation cross) have been removed.

The EEG, PPG and EDA files are in "csv" format.
Each EDA and PPG file has a column of time series.
The EEG file has 16 columns of 16 EEG channels. The order of channels is as follows: {"Fp1", "Fp2", "F7", "F3", "F4", "F8", "T3", "C3", "C4", "T4", "T5", "P3", "P4", "T6", "O1", "O2"}

The name of each file has three numbers. The first number shows the participant's ID. The third number is the trial's ID. This number can be used to find the reported ratings for each trial from the "labels.csv file".

Labels.csv : includes all ratings for all trials of all participants. For each trial, participants rated their emotions at first, in the middle, and at the end. The first column shows the participant ID, trial number, and part. For example p20_1_0 means the participant ID is p20, the trial ID is 1 and the rating is related to the beginning part of the conversation. Or p20_3_1 means participant ID is p20, trial ID is 3 and the rating is related to the middle of conversation in trial 3. The sensor data and labels can be mapped together using participant ID and trial ID.

dataset-info.csv : includes participants' information and their permission to share their data.

To cite the dataset and know more about the dataset: Saffaryazdi, N., Goonesekera, Y., Saffaryazdi, N., Hailemariam, N. D., Temesgen, E. G., Nanayakkara, S., Broadbent E. & Billinghurst, M. (2022, March). Emotion recognition in conversations using brain and physiological signals. In _27th International Conference on Intelligent User Interfaces_ (pp. 229-242).

Face-to-Face vs Remote Conversation Dataset (pegFRConv)

This dataset includes EEG, PPG, GSR, audio, and video data which has been collected during emotional human-human face-to-face and remote conversations.

raw_data : This directory includes audio, video, eeg, and Shimmer data in face-to-face and remote conversations. It includes:

It includes eeg.zip, shimmer.zip, video.zip and Audio.zip
Each zip file includes f2f and remote directories
Some participants did not allow their audio and video data to be shared (look at dataset_info.csv).
Each raw EEG and Shimmer file includes the continuous data from all trials for each participant. The beginning and end of each trial have been marked in the trigger column. The fixation cross time is included in each trial.
eeg, and Shimmer data are in "csv" format, video is in "avi" format and audio is in "wav" format.

preprocessed_data : It includes splitted EEG, PPG and EDA data based on trials' markers. It also includes extracted facial action units from the facial video.

It includes eeg.zip, ppg.zip, gsr.zip, and face.zip. Each zip file includes f2f and remote directories.
The processed EEG, PPG, GSR, and face files are in "csv" format.
Each EDA and PPG file has a column of time series.
The EEG file has 16 columns of 16 EEG channels. The order of channels are as follows: {"Fp1", "Fp2", "F7", "F3", "F4", "F8", "T3", "C3", "C4", "T4", "T5", "P3", "P4", "T6", "O1", "O2"}
The name of each file has three number. The first number shows the participant's ID. The third number is trial's ID. This number can be used to find the reported ratings for each trial from the "labels.csv file".
Each face file is in the "csv" format and display facial action units of each frame of the video in a row. Facial action units are extracted using OpenFace. For each action units it has two column. For example for the presence of AU 1, the column AU01_c in the output file would encode 0 as not present and 1 as present. For intensity of AU 1 the column AU01_r in the output file would range from 0 (not present), 1 (present at minimum intensity), 5 (present at maximum intensity), with continuous values in between.

labels : includes all ratings for all trials of all participants in both conditions.

all_f2f.csv: all emotion ratings for all participants' trials in face-to-face conditions
all_remote.csv: all emotion ratings for all participants' trials in remote conditions
Each csv file includes participant (participant ID), stimuli (stimuli ID), label(stimulus label based on valence-arousal), valence(stimulus valence), arousal(stimulus arousal), emotion(Reported emotion based on basic emotions), intensity(How strong is the emotion), reported_valence (0-4), reported_arousal (0-4), dominance (0-4), task (0:f2f, 1:remote)

empathy_ratings.csv : all empathy ratings for all participants after face-to-face session and after remote session.

dataset-info.csv : includes participants' information, their permission to share their data, and information about missing data

To cite the dataset: Saffaryazdi, N., Kirkcaldy, N., Lee, G., Loveys, K., Broadbent, E., & Billinghurst, M. (2024). Exploring the impact of computer-mediated emotional interactions on human facial and physiological responses. Telematics and Informatics Reports, 14, 100131.

Human-Machine Conversation Dataset (PEGHMConv)

This dataset includes a folder for each modality. Each modality's folder includes the participant's recorded raw audio, video, OpenBCI, and Shimmer3 data during interaction with a neutral/non-empathetic conversational agent and an empathetic agent.

It includes:

raw_data : This directory includes audio, video, eeg, and Shimmer3(PPG and GSR) data.

camera.zip
- Includes video data for all participants in both neutral and empathic conditions
- Includes empathy and neutral directories which each directory has a folder for each participnt's data
- Video files are in wav format
Audio.zip
- Includes audio data for all participants in both neutral and empathic conditions
- Includes empathy and neutral directories which each directory has a folder for each participnt's data
- Some participants did not allow their audio data to be shared.
shimmer.zip
- Includes shimmer data for all participants in both neutral and empathic conditions
- Includes empathy and neutral directories which each directory has a folder for each participnt's data
- Each file includes one column for gsr data and one column for ppg data
- Each file is in CSV format
- This data should be trimmed based on trigger column. The start of data related to watching image and conversation is from the row that includes "START" in the trigger column
eeg.zip
- Includes EEG data for all participants in both neutral and empathic conditions
- Includes empathy and neutral directories which each directory has a folder for each participnt's data
- Each file is in CSV format
- This data should be trimmed based on trigger column. The start of data related to watching image and conversation is from the row that includes "START" in the trigger column
- Column 1 to 17 are recorded data from 16 channels
- The order of channels is as follows: {"Fp1", "Fp2", "F7", "F3", "F4", "F8", "T3", "C3", "C4", "T4", "T5", "P3", "P4", "T6", "O1", "O2"}
- Accelerometer data are not valid, because it was not connected to participant's head

preprocessed_data It includes trimmed eeg, gsr and ppg files. The first 9 seconds of each file is related to fixation cross and watching image and the rest is related to conversation with digital human.

self_report_labels.csv This file includes all participants' ratings for all trials of both conditions. It also includes the binary arousal and valence value of image stimuli (conversation topic).

note: The name of each file has two numbers. The first number shows the participant's ID. The second number is the trial's ID. This number can be used to find the reported ratings for each trial from the self_report file.

dataset-info.csv: includes participants' information and their permission to share their data.

README.md Important information about the dataset and missing data

To cite the dataset: Saffaryazdi, N., Gharibnavaz, A., & Billinghurst, M. (2022). Octopus Sensing: A Python library for human behavior studies. Journal of Open Source Software, 7(71), 4045.