Home Overview Get the data Stage parsing Behavior detection Rating prediction Submit results Contact us

Instructions for Accessing the Dataset

The Multimodal Dyadic Behavior (MMDB) dataset is a unique collection of multimodal (video, audio, and physiological) recordings of the social and communicative behavior of children ages 15-30 months, gathered in the context of a semi-structured, tabletop play interaction with an adult. The sessions were recorded in the Child Study Lab (CSL) at the Georgia Institute of Technology, under a university Institutional Review Board-approved protocol. Our overall goal for the dataset is to develop novel computational methods for measuring and analyzing the behavior of children and adults during face-to-face social interactions.

How can I get access to the dataset?

To protect the rights and privacy of our research participants, we require a formal approval from an authorized independent Ethics Committee before you can access the data. In the US, you can send us an approved protocol from your university's Institutional Review Board (IRB). Outside of the US, we suggest you contact your national Ethics Committee. You can find the list of Europe's national Ethics Committees via EUREC. For researchers outside of the US and Europe, an approval from their University's research ethics committee might be sufficient. Please contact us if you have any problems with the approval.

Guidelines for setting up an IRB protocol and answers to common application questions

In order to be able to receive and analyze data collected at Georgia Tech, you will first need to file an IRB application with your university. The basic process for setting up a new IRB protocol is the same across universities in the US, though the specific set of questions on the application may vary from university to university. Once you submit the protocol, it will be reviewed by your university's IRB committee. Because you will not be directly interacting with research participants or collecting any data, your protocol will usually qualify for an expedited review. In most cases, the IRB will immediately approve your application for using our dataset. In some cases, the IRB may decide they want you to clarify a few details before they approve your application. Feel free to contact us if you encounter any problems going through this process.

Below, we list some of the common questions you may be asked when filling out the application and provide a brief guideline to the answers. Do not be intimidated by the lengthy application, many of the questions will not pertain to you because you are not collecting any data yourself. Make it very clear that your protocol is for data analysis only - that you will not have any direct interaction with human subjects and will only be analyzing data previously collected under an IRB-approved protocol at Georgia Tech.

Review Type

When asked to indicate what Review Type you are requesting, make sure to indicate Expedited Review and choose expedited review sub-category 6 (Collection of data from voice, video, digital, or image recordings made for research purposes). Asking for Expedited Review of your protocol will significantly reduce the processing and approval time.

Protocol Summary/Description

Sample text: This protocol will involve the analysis of an existing dataset, which includes video, audio, and physiological recordings of children aged 15-30 month during a brief play interaction with an adult. All data was previously collected at the Georgia Institute of Technology, under a university-approved IRB protocol. The parents of the participants consented to sharing their child's video, audio, and physiological data with the research community. The data does not contain any personal identifiers that can be linked back to the subjects, except the participants' images. The goal of our analysis will be to {insert a brief description of you planned analysis of the data}.

Questions about HIPAA, DSMB and Risk

When asked, indicate that this research does NOT involve the collection of health information, so HIPAA does not apply. If asked, indicate that the data will NOT be reviewed by a Data Safety Monitoring Board (DSMB). If asked, indicate that the study involves Minimal Risk to human subjects.

Human Subjects Training

All of your study personnel who will have access to the data will have to pass a short training course on human subjects research offered by the Collaborative Institutional Training Initiative (CITI training) or equivalent. If you have any questions about how to complete this training, contact your university's IRB. Study personnel will have to complete this training before you can submit your protocol application to your IRB.


Sample text: All data was previously collected by researchers at the Georgia Institute of Technology, under a university IRB-approved protocol. The parents of the participants consented to sharing of their child's video, audio, and physiological data with the research community. The data is labelled with a unique participant code only and does not contain any identifying information, except the participant's images. The child's age (expressed in in months) is also available.

You will also need to describe your procedures for maintaining the confidentiality of the data to be received: how you will safeguard the data from access by those not authorized to do so, and how the data will be transmitted among your research personnel. We include some sample text below, but of course you should modify it with the specifics of how you intend to store and control access to the data.

Sample text: All data will be saved in unique directory on a password-protected server maintained by {insert name of your institution}. Only study personnel listed on this IRB protocol will be granted login credentials to access the directory on the server where the data is stored. The data may also be saved on encrypted external hard drives. No copies of the data will be distributed to anyone not listed on this IRB protocol.

Sample Size

The current dataset includes data from 121 participants, for a total of 160 individual sessions (some participants completed two sessions).

Sample Size Justification

The number of participants is needed to generate enough data points to develop the computational models that are at the heart of the study.

Any Other Questions about Human Subject Interaction and Data Collection (e.g., Subject Recruitment, Compensation, Informed Consent, Data Collection Plan/Duration, Inclusion/Exclusion Criteria, Potential Risk)

Suggested text: Not applicable. There will be no direct collection of human subject data at {insert name of your institution}. All data was previously collected by researchers at the Georgia Institute of Technology, under a university IRB-approved protocol (see attached). The parents of the subjects have consented to sharing their child's video, audio, and physiological data with the research community.

Associated Documents

The application will include a section where you can upload any relevant documents. You should upload the following two documents with your application:

The Last Step after the Approval

Once your IRB has reviewed and approved your protocol, they will send you a letter to that effect. Please email us a copy of your protocol along with the approval letter. Once we receive and verify your approval, we will send you a Data Use Agreement (DUA), which is required by our IRB protocol. A sample DUA can be found here. Once you sign the agreement and return it to us, we will create an account for your on our website and send you the data you request on an encrypted hard drive.
Your effort in following these procedures is highly appreciated!

Permitted Data Use

  • - Data received shall be saved on a secure, password-protected server or workstation maintained by the your home institution, and as specified in the relevant protocol approved by the Ethics Committee.
  • - Storing data on personal laptops/desktops is NOT permitted, with the exception of data embedded in talks/presentations.
  • - Storing data on a third party provider like Dropbox is NOT permitted, with the exception of data embedded in talks/presentations.
  • - Only study personnel listed on the your Ethics Committee protocol shall be granted access to the data.
  • - Data may be shown in public venues (e.g., talks, conference presentations), and still images of data may be included in publications.
  • - More details can be found in the sample Data Use Agreement