I am a Computer Science PhD student, at the Georgia Institute of Technology. I work in Dr. James M. Rehg's lab and my research focuses on using natural language to aid in computer vision tasks. My projects thus far have focused on Visual Question answering and human activity understanding. Before graduate school, I attended Emory University and graduated with my BS in Computer Science in May 2016. At Emory I worked in a Natural Language Processing lab for two years under Dr. Jinho Choi. During my undergraduate carrer I also worked at University of Central Florida's computer vision lab under Dr. Mubarak Shah.


Research Publications

Action2Vec: A Crossmodal Embedding Approach to Action Learning

Accepted to the CVPR 2018 Deep Vision workshop and Language and Vision workshop and submitted to BMVC 2018.
This abstract creates a cross-modal embedding space of actions in videos and verbs.
Read Paper

Learning to Localize and Align Fine-Grained Actions to Sparse Instructions

This work has been submitted to ECCV 2018.
This paper addresses the task of automatically generating an alignment between a recipe and a first-person video demonstrating how to prepare the dish.
Read Paper
Project Page

Situated Bayesian Reasoning Framework for Robots Operating in Diverse Everyday Environments

Accepted to ISRR 2017 and AAAI 2018.
In this paper, we present an approach for automatically generating a compact semantic knowledge base, relevant to a robot’s particular operating environment, given only a small number of object labels obtained from object recognition or a robot’s task description.
Read Paper

Advances in Methods and Evaluations for Distributional Semantic Models

The goal of this work was to create more semantically rich embeddings for verbs. The approach modified the word embedding architecture to incorporate semantic role labels and dependencies. Additionally, this work introduces novel quantitative evaluations for embedding for all parts of speech. This work was done at Emory University under Dr. Jinho Choi. This was my undergraduate thesis from Emory University that examines new approaches to Word Embedding and proposes novel methods for word embedding evalutation.
Read Thesis

Deep Tracking: Visual Tracking Using Deep Convolutional Networks

This abstract presents a novel and successful approach to object tracking by using convolutional neural networks. This abstract was accepted to the Grace Hopper Celebration 2015 where I also presented the poster for this work. The abstract was also accepted to the ACM student research competition in 2015. This work was done in a Research Experience for Undergraduate (REU) program at University of Central Florida under Dr. Mubarak Shah.
Read Abstract

Localizing and Aligning Fine-Grained Actions to Sparse Instructions

This project addresses the task of automatically generating an alignment between a recipe and a first-person video demonstrating how to prepare the dish. The sparse descriptions and ambiguity of written instructions create significant alignment challenges. The key to our approach is the use of egocentric cues to generate a concise set of action proposals, which are then matched to recipe steps using object detections and computational linguistic techniques. Read Paper

We introduce a new augmented versions of the Extended GTEA Gaze+ dataset and the Bristol Egocentric Object Interactions Dataset. We clean up the Extended GTEA Gaze+ dataset recipes and create recipes based on narrations for the Bristol Egocentric Object Interactions Dataset. Addtionally for each video in both datasets we went through each ground truth action segment annotations and add the label of which recipe step number the action is part of. Below we provide links to the new labels and recipes as well as links to download the videos.

Contains 58 daily activity videos, taken using 8 participants, with an avg. length of 1 minute and taken at 30 fps. The videos are divided among 6 different indoor daily tasks. Using the descriptions given by the participants we create a instruction set (recipe) for each task and in identical fashion to the EGTEA dataset, we add the labels of which recipe step each action segment corresponds to.
Action Annotations
Links to Raw Videos
Original Dataset

Contains 86 unique cooking session videos, taken using 32 subjects, avg. 15 minutes long and taken at 24 fps. The videos are divided among 7 different recipes. In order to evaluate our video to text alignment system we add annotations to the dataset. For each ground truth action segment we add the label of which recipe step number the action is part of. Some actions do not correspond to any recipe step and we label them as such.
Action Annotations
Links to Raw Videos

More information on the datasets:
We evaluate over the EGTEA dataset because videos are long and complex as well as the word to text density is quite low compared with other instructional video datasets. While EGTEA videos are all taken at the same indoor kitchen setting, the manipulated objects differ greatly between recipes. Additionally most of recipes in the dataset require cooking multiple things. Some of the subjects will fully finish cooking one food before moving on to the next food. In contrast other subjects will cook multiple foods at the same time therefore jumping back and forth between recipe steps. BEOID videos are significantly shorter and have a higher word density than EGTEA. On the other hand each BEIOD recipe is executed in a different location with extremely different objects. We evaluate over BEOID to analyze how our system performs given multiple types of indoor scenes and non-cooking activities.

Awards and Honors

Presidential PhD Fellowship

Georgia Institute of Technology: 2016 - 2021

Highest Honors on Undergraduate Thesis

Emory University: 2016

Emory Honors Program: nominated and selected

Program involves a written thesis, thesis defense, a GPA above 3.5 and graduate coursework: 2015 - 2016

Computing Research Association: Women Graduate Workshop travel grant recipient

Travel Grant: 2017

Anita Borg Scholarship recipient

Travel grant to attend the Grace Hopper Celebration: 2015

Emory Honor List

Maintaining a GPA above 3.5: 2013 - 2016


For any questions or for more information just send me an email.
Email: meerahahn [at] gatech.edu