Georgia Tech Egocentric Activity Datasets

This is our repository of egocentric activity datasets.
This page includes GTEA, GTEA Gaze and GTEA Gaze+ datasets.
We are now working on expanding GTEA Gaze+ dataset. Stay tuned!

Recommended!

GTEA

This dataset contains 7 types of daily activities, each performed by 4 different subjects. The camera is mounted on a cap worn by the subject.

Please consider citing the following papers when using this dataset:

Alireza Fathi, Xiaofeng Ren, James M. Rehg,
Learning to Recognize Objects in Egocentric Activities, CVPR, 2011

Yin Li, Zhefan Ye, James M. Rehg.
Delving into Egocentric Actions, CVPR 2015

GTEA Gaze

This dataset is collected using Tobii eye-tracking glasses. It consists of 17 sequences, performed by 14 different subjects.

To record the sequences, we stuffed a table with various kinds of food, dishes and snacks. We asked each subject to wear the Tobii glasses and calibrated the gaze. Then we asked the subject to take a sit and make whatever food they feel like having. The beginning and ending time of the actions are annotated. Each action consists of a verb and a set of nouns. For example pouring milk into cup. In our experiments we extract images from video at 15 frames per second. Action annotations are based on frame numbers. The following sequences are used for training: 1, 6, 7, 8, 10, 12, 13, 14, 16, 17, 18, 21, 22 and the following sequences are used for testing: 2, 3, 5, 20.

Please consider citing the following paper when using this dataset:

Alireza Fathi, Yin Li, James M. Rehg,
Learning to Recognize Daily Actions using Gaze, ECCV, 2012

GTEA Gaze+

We collected this dataset using SMI eye-tracking glasses. We are more than half-way through the annotation, and here we have made the collected and annotated data available. The current version contains 37 videos with gaze tracking and action annotations. Audio files are also available upon request.

We collected this dataset at Georgia Tech's AwareHome. This dataset consists of seven meal-preparation activities, performed by 26 subjects. Subjects perform the activities based on the given cooking recipes (get the recipes here).
Activities are: American Breakfast, Pizza, Snack, Greek Salad, Pasta Salad, Turkey Sandwich and Cheese Burger. SMI glasses record a HD video of subjects activities at 24 frames per second. They also record subject's gaze at 30 fps.
For each activity, we used ELAN to annotate its actions. An activity is a meal-preparation task such as making pizza, and an action is a short temporal segment such as putting sauce on the pizza crust, dicing the green peppers, washing the mushrooms, etc.

American Breakfast

Video

P1 P2 P3 P4 P5 P6

Pizza (Special)

Video

P1 P2 P3 P4 P5 P6

Afternoon Snack

Video

P1 P2 P3 P4 P5 P6

Greek Salad

Video

P1 P2 P3 P4 P6

Pasta Salad

Video

P1 P2 P3 P4

Turkey Sandwich

Video

P1 P2 P3 P4 P6

Cheese Burger

Video

P1 P2 P3 P4 P6

Gaze & Action Labels

We have mistakenly put raw labels in Jan. 2016. Please re-download the cleaned action labels if you got the incorrect version.

Gaze Labels Hand Masks


Please consider citing the following papers when using this dataset:

Alireza Fathi, Yin Li, James M. Rehg,
Learning to Recognize Daily Actions using Gaze, ECCV, 2012
Yin Li, Zhefan Ye, James M. Rehg.
Delving into Egocentric Actions, CVPR 2015

Contact

For general questions or bug reports please contact

Yin Li (yli440@gatech.edu).