This page highlights recent AI-enabled research systems and prediction workflows that connect clinical questions, EHR data engineering, LLM experimentation, and manuscript development. Across these projects, I have been using Codex and Claude not only for coding assistance, but also for planning, prompt iteration, pipeline design, validation, and research writing support.

Clinical Summarization Platform

PreoperativeEvaluation

An end-to-end research platform for generating anesthesiology-style preoperative evaluation summaries from longitudinal EHR data in MIMIC-IV.

MIMIC-IV Structured EHR Retrieval LLM Summarization LLM-as-Judge Evaluation Live Demo

What the system does

The workflow combines BigQuery-based cohort construction, structured clinical data extraction, open-source LLM summarization, and rubric-based evaluation into a coordinated multi-agent research pipeline.

Why it matters

Preoperative assessment often requires clinicians to integrate fragmented prior admissions, medications, labs, microbiology, and comorbidity history. This project explores how AI can make that synthesis more systematic, auditable, and clinically useful, with special emphasis on making each summary traceable back to the underlying patient record.

Key capability

The demo supports trace-back highlighting so users can inspect the raw patient data corresponding to specific portions of the generated summary, making the system more transparent and easier to evaluate clinically.

Screenshot of the preoperative evaluation demo application
Interactive demo interface for preoperative summary generation and trace-back review of source patient data.

Highlights

  • Built a LangGraph-style multi-agent pipeline spanning data retrieval, summarization, evaluation, and analysis.
  • Integrated perioperative context including medications, labs, microbiology, comorbidity burden, baseline measurements, and prior admissions.
  • Benchmarked open-source LLMs across multiple inference backends and evaluated summary quality with rubric-based commercial LLM judges.
  • Developed supporting research artifacts including experiment outputs, publication figures, manuscript drafts, a clinician review app, and a synthetic demo.
  • Published an interactive demo for exploring the preoperative summary workflow in a shareable web interface.
  • Used Codex and Claude for prompt design, rubric iteration, experiment planning, implementation support, and manuscript development.

Technical focus: Python, LangGraph/LangChain-style orchestration, BigQuery/GCP, Hugging Face and vLLM model serving, commercial LLM-based evaluation, and reproducible research-tooling pipelines for figures, tables, and exports.

Prediction and Fine-Tuning Workflow

PostoperationPrediction

A preoperative prediction workflow for postoperative infection using longitudinal EHR context and instruction-style LLM classifiers.

Longitudinal EHR Leakage-Aware Cohorting LoRA / QLoRA Clinical Prediction

What the workflow does

The project builds a leakage-aware surgical cohort, creates modeling-ready representations from preoperative and prior-admission data, and evaluates open-weight LLMs for clinically meaningful infection outcomes.

What we learned early

Reduced-context strategies performed better than naive long-context truncation in pilot evaluation, improving held-out ROC AUC from 0.675 to 0.883 and average precision from 0.159 to 0.478.

Highlights

  • Framed the first manuscript around ICD-based postoperative infection as a clinically coherent primary endpoint.
  • Built patient-level cohorting and split logic to reduce leakage from repeat admissions.
  • Constructed modeling inputs that combine demographics, procedures, comorbidity context, medications, labs, microbiology, and prior-admission note history.
  • Prepared fine-tuning and evaluation pipelines for instruction-style LoRA and QLoRA workflows.
  • Used a planning-oriented multi-agent workflow, supported by Codex and Claude, to coordinate data, modeling, validation, and manuscript tasks.

Technical focus: longitudinal EHR representation, patient-level dataset design, fine-tuning exports for instruction-style models, and evaluation pipelines for clinically grounded binary prediction tasks.

AI Research Workflow

Planning

Codex and Claude as research collaborators

I increasingly use agentic workflows to break large clinical AI projects into planning, coding, evaluation, and writing threads that can move in parallel.

Validation

Reproducibility stays central

The goal is not to replace methodological rigor, but to accelerate careful cohort design, prompt iteration, result checking, and manuscript preparation.

Direction

AI as part of the full research stack

These projects reflect a broader agenda of treating AI as one component in clinically grounded, end-to-end research systems rather than as an isolated modeling layer.