I am interested in the broad area of Action Recognition. In particular:
- I have performed work in combining probability and logic to answer queries about possible events in an inherently stochastic and uncertain world. Logic helps us define events precisely using domain knowledge, while probability adds the necessary touch of uncertainty. I have looked into the fields of Probabilistic Logic Programming, Bayesian Learning, Statistical Relational Learning and Probabilistic Graphical Models.
- In the era of Big Data, events come to us in ultra-frequent and dense streams. In order to perform online event recognition and gather actionable knowledge to improve our systems, we have to deploy our event recognition pipelines on distributed, massively parallel architectures. Our code bases have to be shifted to frameworks like Akka, Scala and Kafka in order to meet today’s demands. Offline event recognition might be useful for data aggregation / visualization and requisite report generation, but if we want to perform event forecasting in real-time, we need to update our technologies. Combining these with probabilistic inference as described above is a problem that interests me greatly.
Below are some concrete questions that I would like to answer:
- Given a large stream of time-stamped observations and a prior model for an activity of interest, how can we best explain the observation stream? For example, in network traffic data, operators are interested in automatically detecting patterns of DDoS or other attacks. The major problem here is the abundance of noisy observations with respect to the activity we want to detect, so we need to build models that can handle huge streams of data with close to perfect recall.
- Discriminative models have shown significant classification accuracy and statistical elegance when sufficient training data is available. Unfortunately, the intra-class variation present in high-level scenarios is such that we can never hope to gather enough training data for them. For instance, we simply cannot hope to ever create a good feature descriptor for a soccer match or for cooking a steak. In such highly structured yet also heavily variable scenarios, mining spatio-temporal structure is very important to recognize and summarize activities. To that end, we need to go beyond frequent itemset mining, which leverages only co-occurrence information; we need to come up with probabilistic models that relate low-level observations in a spatio-temporal sense, while still keeping inference tractable. Graphical models such as (Semi-) Markov Models or more general Dynamic Bayesian Networks could be used to build those hierarchical models.
- How should activity recognition be regularized? What constitutes a fine or coarse characterization of an activity? Which one should we prefer for a given task? For example, when observing multiple videos of weddings, we will likely observe a prevalence of wedding cakes. But in another video we might not observe one, for multiple reasons (maybe it was occluded by a participant or the video simply was not extended to the actual cake cutting). Does this mean that we should immediately remove the information about “cutting a wedding cake” as a constituent of wedding videos, or is this an anomaly that should not be considered? From a logical perspective, if we were to assume that we have to build a set of rules with the event of interest in the head, how many different rules would we need, and how long should every rule be? Which atoms should be placed first such that we unify as efficiently as possible during grounding? From a more theoretical perspective, what should be the structure of the Conjunctive Normal Form of the theory that we build for the event?