Research — Edward Kim

Overview of My Research Vision

An Envisioned General Interactive Co-Design Framework for Safe and Scalable Deployment of Autonomous Systems

This framework equips frontline workers with interactive tools that elicit domain knowledge via multimodal inputs and encode it in a formal language. For example, they can capture a therapist’s instructions on how to assist a particular patient with impaired vision and lower-body function in performing desired tasks. Such tasks require onsite expertise that is not available in public data. This knowledge representation drives simulation engines that procedurally generate varied task-specific environments and embodied synthetic data for training and testing autonomous systems. The framework then validates whether behaviors learned in simulation reliably transfer to real-world deployment. Resulting analyses are reported back to workers, highlighting inconsistencies in the instructions and soliciting feedback to iteratively refine the domain knowledge. The following are select research thrusts that directly relate to this framework.

Thrust 1: A Formal Language to Encode Domain Knowledge

Scenic: A Language for Scenario Specification and Data Generation

I co-developed a domain-specific probabilistic programming language (PPL) for modeling and generating stochastic and reactive multi-agent environments, task specifications and policies in simulation. This language has been utilized across fields, such as computer vision, reinforcement learning, machine learning, and robotics, in various domains as shown on the right.

Daniel J. Fremont, Edward Kim, Tommaso Dreossi, Shromona Ghosh, Xiangyu Yue, Alberto
Sangiovanni-Vincentelli, Sanjit A Seshia, “SCENIC: A language for scenario specification and data generation,” A Special Issue of Machine Learning Journal on Robust ML, 2022, pg. 1-45
Eric Vin, Shun Kashiwa, Matthew Rhea, Daniel Fremont, Edward Kim, Tommaso Dreossi, Shormona Ghosh, Xiangyu Yue, Alberto Sangiovanni-Vincentelli, Sanjit A. Seshia, “3D Environment Modeling for Falsification and Beyond with Scenic 3.0,” International Conference on Computer-Aided Verification (CAV), 2023

Thrust 2: Interactive Domain Knowledge Elicitation

Interactive Program Generation for Modeling Collaborative Physical Activities from Narrated Demonstrations

This work interactively elicits the domain knowledge of an expert with no programming knowledge via a multi-modal interface coupled with a probabilistic program generation algorithm. We conducted a study with 23 users in manufacturing and sports domains. Here is a demo video of our study.

Edward Kim*, Daniel He*, Jorge Diaz Chao*, Wiktor Rajca, Mohammed Amin, Nishant Malpani, Ruta Desai, Antti Oulasvirta, Bjoern Hartmann, Sanjit Seshia, https://arxiv.org/abs/2509.24250 (* Equal Contributions) [Pending Review]

Clinician-Directed Large Language Model Software Generation for Therapeutic Interventions in Physical Rehabilitation

This work is an application of domain knowledge elicitation work to healthcare. Digital health interventions increasingly deliver home exercise programs via sensor-equipped devices such as smartphones, enabling remote monitoring of adherence and performance. However, current software is usually authored before clinical encounters as libraries of modules for broad impairment categories. At the point of care, clinicians can only choose from these modules and adjust a few parameters (for example, duration or repetitions). As a result, individual limitations, goals, and environmental constraints are often not reflected, limiting personalization and benefit. We propose a paradigm in which large language models (LLMs) act as constrained translators that convert clinicians' exercise prescriptions into intervention software. Clinicians remain the decision makers: they design exercises during the encounter, tailored to each patient's impairments, goals, and environment, and the LLM generates matching software.

Edward Kim*, Yuri Cho*, Jose Eduardo E. Lima, Julie Muccini, Jenelle Jindal, Alison Scheid, Erik Nelson, Seong Hyun Park, Yuchen Zeng, Alton Sturgis, Caesar Li, Jackie Dai, Sun Min Kim, Yash Prakash, Liwen Sun, Isabella Hu, Hongxuan Wu, Daniel He, Wiktor Rajca, Cathra Halabi, Maarten Lansberg, Bjoern Hartmann, Sanjit A. Seshia
https://arxiv.org/abs/2511.18274 (* Indicates Equal Contributions) [Pending Review]
An Extended Abstract of this Journal is accepted at International Stroke Conference 2026 and is given an oral presentation (ISC is the largest conference in stroke community).

Thrust 3: System Training with Formalized Domain Knowledge

Programmatic Modeling and Generation of Real-time Strategic Soccer Environments for Reinforcement Learning

A key benefit of encoding domain knowledge as probabilistic programs is that they can generate distributions of task-specific data, effectively augmenting the training datasets. Leveraging this, I developed a framework in which developers specify tasks as Markov Decision Processes in Scenic for online reinforcement learning (RL). Also, when data are scarce, developers can encode stochastic policies in Scenic to generate offline RL datasets. In multi-agent domains, offline training on this augmented data enabled RL agents to solve tasks that online training alone could not.

Abdus Salam Azad*, Edward Kim*, Mark Wu, Kimin Lee, Ion Stoica, Pieter Abbeel, Alberto Sangiovanni-Vincentelli, Sanjit A. Seshia, “Programmatic Modeling and Generation of Real-time Strategic Soccer Environments for Reinforcement Learning,” Association for the Advancement of Artificial Intelligence (AAAI), 2022 (* indicates equal contributions)

Thrust 4: System Testing with Formalized Domain Knowledge in Simulation

A Programmatic and Semantic Approach to Explaining and Debugging Neural Networks

Even as deep neural networks have become very effective for tasks in vision and perception, it remains difficult to explain and debug their behavior. In this paper, we present a programmatic and semantic approach to explaining, understanding, and debugging the correct and incorrect behaviors of a neural network-based perception system. Our approach is semantic in that it employs a high-level representation of the distribution of environment scenarios that the detector is intended to work on. It is programmatic in that scenario representation is a program in a domain-specific probabilistic programming language which can be used to generate synthetic data to test a given perception module. Our framework assesses the performance of a perception module to identify correct and incorrect detections, extracts rules from those results that semantically characterizes the correct and incorrect scenarios, and then specializes the probabilistic program with those rules in order to more precisely characterize the scenarios in which the perception module operates correctly or not.

Edward Kim, Divya Gopinath, Corina Pasareanu, and Sanjit A. Seshia, “A Programmatic Approach to Explaining and Debugging Neural Network Based Object Detector,” Computer Vision and Pattern Recognition (CVPR), 2020, pg. 11128-11137, [Oral Presentation]

Parallel and Multi-Objective Falsification with SCENIC and VerifAI

Falsification has emerged as an important tool for simulation-based verification of autonomous systems. In this paper, we present extensions to the SCENIC scenario specification language and VERIFAI toolkit that improve the scalability of sampling-based falsification methods by using parallelism and extend falsification to multi-objective specifications. We first present a parallelized framework that is interfaced with both the simulation and sampling capabilities of SCENIC and the falsification capabilities of VERIFAI, reducing the execution time bottleneck inherently present in simulation-based testing. We then present an extension of VERIFAI’s falsification algorithms to support multi-objective optimization during sampling, using the concept of rulebooks to specify a preference ordering over multiple metrics that can be used to guide the counterexample search process. Lastly, we evaluate the benefits of these extensions with a comprehensive set of benchmarks written in the SCENIC language.

Kesav Viswanadha, Edward Kim, Francis Indaheng, Daniel J. Fremont, Sanjit A. Seshia, “Parallel and Multi-Objective Falsification with Scenic and VerifAI,” International Conference on Runtime Verification (RV), 2021, pg. 265-276. (I advised Kesav Viswanadha)

Thrust 5: Sim-to-Real Validation of System Properties

Testing autonomous systems with synthetic sensor data in simulation can reveal failures or successes that do not reproduce on real data at deployment, exposing a simulation to reality (sim-to-real) validation gap. The status quo is to physically reconstruct counterexample environments and re-test systems in the real world, a painfully labor-intensive process. Leveraging the prevalence of large, labeled \textit{real} sensor datasets, I proposed a novel data-driven formulation of sim-to-real validation as a sensor data query problem: find real labeled traces (e.g., RGB videos) that match environment descriptions of interest encoded in Scenic, then validate ML components (perception, prediction, planning) on those real traces. Given a labeled real sensor dataset and a Scenic program, I developed query algorithms with correctness guarantees that monitor the labels of real traces and return the subset of these traces that match the program’s description. Formally, the program’s initial conditions and multi-agent behaviors are compiled into a satisfiability modulo theories (SMT) formula and a concurrent composition of finite-state machines (FSMs), respectively. In the papers, I prove theorems that the satisfaction of the SMT formula and an FSM simulation relation over the labeled traces implies that the traces match, or instantiate, the reactive multi-agent environment specified in the program.

Edward Kim, Devan Shanker, Varun Bharadwaj, Hongbeen Park, Jinkyu Kim, Hazem Torfah, Daniel Fremont, Sanjit Seshia, “Querying Labeled Time Series Data with Scenario Programs,” NASA Formal Methods (NFM), 2025
Edward Kim, Jay Shenoy, Sebastian Junges, Daniel J. Fremont, Alberto Sangiovanni-Vincentelli, Sanjit A. Seshia, “Querying Labelled Data with Scenario Programs for Sim-to-Real Validation,” International Conference on Cyber-Physical Systems (ICCPS), 2022 [Best Paper Nominee]
Daniel J. Fremont, Edward Kim, Yash Vardhan Pant, Sanjit Seshia, Atul Acharya, Xantha Bruso, Paul Wells, Steve Lemke, Qiang Lu, Shalin Mehta, “Formal Scenario-Based Testing of Autonomous Vehicles: From Simulation to the Real World,” Conference on Intelligent Transportation Systems (ITSC), 2020