The overarching goal of our research is to develop machine learning models for complex, unstructured, and heterogenous data. Some of our ongoing projects are described below.

Generative Models with Structured Priors

Structured prediction models enhance modeling power by using structured representations. Most structured prediction models are discriminative models, impacting the ability to perform full bayesian inference on these models. In this project, our goal is to come up with a structured generative models that are capable of encoding structured priors on the parameters, instead of flat Dirichlet priors, which are not capable of incorporating how the different parameters are constrained. Hinge-loss Markov random fields (HL-MRFs) are a recently developed structured prediction framework. Our goal is to develop custom-defined generative models using HL-MRF priors and develop algorithms for inference and learning in these models.

People: Yue Zhang

Deep Latent Variable Models

In this project, we focus on extending deep generative models such as variational recurrent neural networks (VRNNS), variational autoencoders (VAE), and generative adversarial networks (GANs) and adapt them to applications where data generation is an important requirement. One such problem for which we are actively developing models is energy disaggregation, disaggregating energy signals into their component appliance signals. These deep latent variable generative models combine the prediction power of deep learning models and representation power of structured prediction/graphical models. Existing work on energy disaggregation do not particularly pay attention to generating “new signals” accurately, which we plan to achieve in this work. Generating signals accurately can help in creating new data instances, helping in reducing sensor deployment and data collection cost in this important energy problem. We eventually plan to develop advanced variants of these models and expand to many other interesting application domains.

People: Gissella Bejarano, David Defazio

Latent Variable Models for Computational Social Science

This project focuses on developing structured latent variable models for learning the different relevant latent factors in computational social science problems. One such problem that we are actively working on is understanding the development of addiction. Several linguistic, psychological, and structural attributes and their relationship with one another play an important role in the development of addiction and subsequent recovery from it. Latent variables that are learned based on these interdependencies help in abstracting and accurately understanding the role played by these different attributes, allowing us to understand the process of development of addiction and recovery from it accurately.

People: Yue Zhang


Cyberbullying is a serious problem in social media and can lead to several psychological problems. In this work, our goal is to develop models to detect the presence, severity, and different coarse and fine-grained bullying categories in online interaction data. The sensitive nature of this data makes unsupervised and weakly-supervised models lucrative for this problem. We aim to develop semi-supervised/weakly supervised models to detect the presence and severity of bullying across different coarse and fine-grained bullying categories.

People: Yue Zhang, David Defazio

Temporal Models for Urban Informatics

Resource allocation is an important problem in urban environments. In this work, we build structured models for jointly predicting demand and response times in urban settings. We are leveraging and extending a continuous variant of conditional random fields, Gaussian conditional random fields, to jointly predict demand and response times. One such problem, where we are actively testing our models is 311 service calls. We plan to construct online variants of this algorithm and expand the domain of application to many urban data settings.

People: David Defazio