I am a Computer Science PhD student, at the Georgia Institute of Technology. I work in Dr. James M. Rehg's lab and my research focuses on using natural language to aid in computer vision tasks. My projects thus far have focused on Visual Question answering and human activity understanding. Before graduate school, I attended Emory University and graduated with my BS in Computer Science in May 2016. At Emory I worked in a Natural Language Processing lab for two years under Dr. Jinho Choi. During my undergraduate carrer I also worked at University of Central Florida's computer vision lab under Dr. Mubarak Shah.
Action2Vec: A Crossmodal Embedding Approach to Action Learning
Accepted to the CVPR 2018 Deep Vision workshop and Language and Vision workshop and submitted to BMVC 2018.
This abstract creates a cross-modal embedding space of actions in videos and verbs.
Learning to Localize and Align Fine-Grained Actions to Sparse Instructions
This work has been submitted to ECCV 2018.
This paper addresses the task of automatically generating an alignment between a recipe and a first-person video demonstrating how to prepare the dish.
Situated Bayesian Reasoning Framework for Robots Operating in Diverse Everyday Environments
Accepted to ISRR 2017 and AAAI 2018.
In this paper, we present an approach for automatically generating a compact semantic knowledge base, relevant to a robot’s particular operating environment, given only a small number of object labels obtained from object recognition or a robot’s task description.
Advances in Methods and Evaluations for Distributional Semantic Models
The goal of this work was to create more semantically rich embeddings for verbs. The approach modified the word embedding architecture to incorporate semantic role labels and dependencies. Additionally, this work introduces novel quantitative evaluations for embedding for all parts of speech. This work was done at Emory University under Dr. Jinho Choi. This was my undergraduate thesis from Emory University that examines new approaches to Word Embedding and proposes novel methods for word embedding evalutation.
Deep Tracking: Visual Tracking Using Deep Convolutional Networks
This abstract presents a novel and successful approach to object tracking by using convolutional neural networks. This abstract was accepted to the Grace Hopper Celebration 2015 where I also presented the poster for this work. The abstract was also accepted to the ACM student research competition in 2015. This work was done in a Research Experience for Undergraduate (REU) program at University of Central Florida under Dr. Mubarak Shah.
Localizing and Aligning Fine-Grained Actions to Sparse Instructions
This project addresses the task of automatically generating an alignment between a recipe and a first-person video demonstrating how to prepare the dish. The sparse descriptions and ambiguity of written instructions create significant alignment challenges. The key to our approach is the use of egocentric cues to generate a concise set of action proposals, which are then matched to recipe steps using object detections and computational linguistic techniques. Read Paper
We introduce a new augmented versions of the Extended GTEA Gaze+ dataset and the Bristol Egocentric Object Interactions Dataset. We clean up the Extended GTEA Gaze+ dataset recipes and create recipes based on narrations for the Bristol Egocentric Object Interactions Dataset. Addtionally for each video in both datasets we went through each ground truth action segment annotations and add the label of which recipe step number the action is part of. Below we provide links to the new labels and recipes as well as links to download the videos.
AUGMENTED BRISTOL EGOCENTRIC OBJECT INTERACTIONS DATASET
Contains 58 daily activity videos, taken using 8 participants, with an avg. length of 1 minute and taken at 30 fps. The videos are divided among 6 different indoor daily tasks. Using the descriptions given by the participants we create a instruction set (recipe) for each task and in identical fashion to the EGTEA dataset, we add the labels of which recipe step each action segment corresponds to.
Links to Raw Videos
AUGMENTED EXTENDED GTEA GAZE+ DATASET
Contains 86 unique cooking session videos, taken using 32 subjects, avg. 15 minutes long and taken at 24 fps. The videos are divided among 7 different recipes. In order to evaluate our video to text alignment system we add annotations to the dataset. For each ground truth action segment we add the label of which recipe step number the action is part of. Some actions do not correspond to any recipe step and we label them as such.
Links to Raw Videos
More information on the datasets:
We evaluate over the EGTEA dataset because videos are long and complex as well as the word to text density is quite low compared with other instructional video datasets. While EGTEA videos are all taken at the same indoor kitchen setting, the manipulated objects differ greatly between recipes. Additionally most of recipes in the dataset require cooking multiple things. Some of the subjects will fully finish cooking one food before moving on to the next food. In contrast other subjects will cook multiple foods at the same time therefore jumping back and forth between recipe steps. BEOID videos are significantly shorter and have a higher word density than EGTEA. On the other hand each BEIOD recipe is executed in a different location with extremely different objects. We evaluate over BEOID to analyze how our system performs given multiple types of indoor scenes and non-cooking activities.
Awards and Honors
Presidential PhD Fellowship
Georgia Institute of Technology: 2016 - 2021
Highest Honors on Undergraduate Thesis
Emory University: 2016
Emory Honors Program: nominated and selected
Program involves a written thesis, thesis defense, a GPA above 3.5 and graduate coursework: 2015 - 2016
Computing Research Association: Women Graduate Workshop travel grant recipient
Travel Grant: 2017
Anita Borg Scholarship recipient
Travel grant to attend the Grace Hopper Celebration: 2015
Emory Honor List
Maintaining a GPA above 3.5: 2013 - 2016
For any questions or for more information just send me an email.
Email: meerahahn [at] gatech.edu