Stephanie Birkelbach

Howdy! I'm an undergraduate student at Texas A&M University pursuing dual Bachelor of Science degrees in Computer Science and Statistics!

I am interested in how computing can help us understand people and society. I enjoy exploring topics such as human-centered NLP, sensemaking, and the broader role technology plays in shaping society. Ultimately, I hope to contribute to ethical and socially beneficial technology.

Social Computing & Cultural AI Systems

Language technologies increasingly mediate how people communicate, create, and interpret information at scale. In this line of work, I study and build AI systems that operate within social, cultural, and platform-level contexts, where meaning is shaped by communities, norms, and narratives. My research examines how these contextual forces influence model behavior and system outcomes, with the goal of designing language technologies that better align with the social dynamics in which they are deployed.

SocialPulse: An Open-Source Subreddit Sensemaking Toolkit
Stephanie Birkelbach, Maria Teleki, Peter Carragher, Xiangjue Dong, Nehul Bhatnagar, James Caverlee
Collaboration w/ Carnegie Mellon University, Revionics
SocialLLM@ICWSM 26 Presented at IC2S2 (Oral)

Paper GitHub IC2S2 Abstract Video

Understanding how online communities discuss and make sense of complex social issues is a central challenge in social media research, yet existing tools for large-scale discourse analysis are often closed-source, difficult to adapt, or limited to single analytical views. We present SocialPulse, an open-source subreddit sensemaking toolkit that unifies multiple complementary analyses -- topic modeling, sentiment analysis, user activity characterization, and bot detection -- within a single interactive system. SocialPulse enables users to fluidly move between aggregate trends and fine-grained content, compare highly active and long-tail contributors, and examine temporal shifts in discourse across subreddits. The demo showcases end-to-end exploratory workflows that allow researchers and practitioners to rapidly surface themes, participation patterns, and emerging dynamics in large Reddit datasets. By offering an extensible and openly available platform, SocialPulse provides a practical and reusable foundation for transparent, reproducible sensemaking of online community discourse.


                @inproceedings{birkelbach26_socialpulse,

                  title     = {{SocialPulse: An Open-Source Subreddit Sensemaking Toolkit}},

                  author    = {Stephanie Birkelbach and Maria Teleki and Peter Carragher and Xiangjue Dong and Nehul Bhatnagar and James Caverlee

                  year      = {2026},

                  booktitle = {arXiv}

                }

Disfluency-Aware Speech and Language Understanding

Current speech and language understanding systems are built for fluent text, not for how people actually speak. Disfluencies -- pauses, repairs, hedges, and restarts -- are treated as artifacts to be removed, despite being fundamental to spoken communication. My work challenges this assumption by modeling disfluency as meaningful linguistic structure. I develop models, benchmarks, and evaluation frameworks that operate directly on spontaneous speech, yielding more robust language understanding in real-world conversational settings.

Z-Scores: A Metric for Linguistically Assessing Disfluency Removal
Maria Teleki, Sai Janjur, Haoran Liu, Oliver Grabner, Ketan Verma, Thomas Docog, Xiangjue Dong, Lingfeng Shi, Cong Wang, Stephanie Birkelbach, Jason Kim, Yin Zhang, James Caverlee
ICASSP 2026

Paper Code Project Website

Evaluating disfluency removal in speech requires more than aggregate token-level scores. Traditional word-based metrics such as precision, recall, and F1 (E-Scores) capture overall performance but cannot reveal why models succeed or fail. We introduce Z-Scores, a span-level linguistically-grounded evaluation metric that categorizes system behavior across distinct disfluency types (EDITED, INTJ, PRN). Our deterministic alignment module enables robust mapping between generated text and disfluent transcripts, allowing Z-Scores to expose systematic weaknesses that word-level metrics obscure. By providing category-specific diagnostics, Z-Scores enable researchers to identify model failure modes and design targeted interventions -- such as tailored prompts or data augmentation -- yielding measurable performance improvements. A case study with LLMs shows that Z-scores uncover challenges with INTJ and PRN disfluencies hidden in aggregate F1, directly informing model refinement strategies.


                @inproceedings{teleki25_zscores,

                  title     = {Z-Scores: A Metric for Linguistically Assessing Disfluency Removal},

                  author    = {Maria Teleki and Sai Janjur and Haoran Liu and Oliver Grabner and Ketan Verma and Thomas Docog and Xiangjue Dong and Lingfeng Shi and Cong Wang and Stephanie Birkelbach and Jason Kim and Yin Zhang and James Caverlee},

                  year      = {2025},

                  booktitle = {ICASSP},

                }

Conversational Speech Reveals Structural Robustness Failures in SpeechLLM Backbones
Maria Teleki, Sai Janjur, Haoran Liu, Oliver Grabner, Ketan Verma, Thomas Docog, Xiangjue Dong, Lingfeng Shi, Cong Wang, Stephanie Birkelbach, Jason Kim, Yin Zhang, Éva Székely, James Caverlee
Collaboration w/ KTH Royal Institute of Technology
Preprint 2026

Paper Code

LLMs serve as the backbone in SpeechLLMs, yet their behavior on spontaneous conversational input remains poorly understood. Conversational speech contains pervasive disfluencies -- interjections, edits, and parentheticals -- that are rare in the written corpora used for pre-training. Because gold disfluency removal is a deletion-only task, it serves as a controlled probe to determine whether a model performs faithful structural repair or biased reinterpretation. Using the DRES evaluation framework, we evaluate proprietary and open-source LLMs across architectures and scales. We show that model performance clusters into stable precision-recall regimes reflecting distinct ``editing policies.'' Notably, reasoning models systematically over-delete fluent content, revealing a bias toward semantic abstraction over structural fidelity. While fine-tuning achieves SOTA results, it harms generalization. Our findings demonstrate that robustness to speech is shaped by specific training objectives.


              @inproceedings{teleki26_dres,

                title     = {Conversational Speech Reveals Structural Robustness Failures in SpeechLLM Backbones},

                author    = {Maria Teleki and Sai Janjur and Haoran Liu and Oliver Grabner and Ketan Verma and Thomas Docog and Xiangjue Dong and Lingfeng Shi and Cong Wang and Stephanie Birkelbach and Jason Kim and Yin Zhang and James Caverlee},

                year      = {2025},

              }

(2023 - Present)	B.S. Computer Science at Texas A&M University -- minor in Mathematics, emphasis area in Cybersecurity
(2023 - Present)	B.S. Statistics at Texas A&M University

(2023-2027)	President's Endowed Scholarship
(2023-2027)	Oscar Nelson Opportunity Award
(2023-2027)	Dennis W. Holder Scholarship
(Fall 2025)	Distinguished Student
(Fall 2024)	Dean's Honor Roll
(2023)	Saint Pius X High School Valedictorian
(2023)	National Merit Finalist

Rochester Institute of Technology NSF REU Site: Trustworthy AI May 2026 - July 2026	This summer!!
Texas A&M University Undergraduate Research Assistant May 2025 - Present	Developed SocialPulse, an open-source subreddit sensemaking toolkit integrating topic modeling, sentiment analysis, user activity analysis, and bot detection to support large-scale social media research. Conducting research in spoken language processing, disfluency in conversational speech, and robust conversational AI under faculty supervision. Evaluating LLMs on their ability to understand spontaneous, disfluent speech compared to written text.
Pelican Industrial Computer Engineering Intern June - August 2025	Programmed an ESP32 microcontroller with Arduino IDE to interface with external memory, an OLED display, and navigation buttons. Built a user-friendly interface to display pre-stored outputs, automating hardware validation and reducing manual calculations. Improved diagnostic efficiency by enabling employees to quickly identify malfunctioning hardware components.
Pelican Industrial IT Intern June - August 2024, May - August 2023	Implemented a disaster recovery solution by replicating the company’s virtual machine hosting internal management software, ensuring business continuity. Supported secure remote work by setting up VPNs and remote desktop access for employees. Collaborated with IT staff to troubleshoot system issues and strengthen cybersecurity practices. Gained hands-on experience configuring backup routers, domain name registration, and digital certificates to enhance network reliability and security.
The Houstonian Assistant Summer League Coach May - July 2023	Coached swimmers ages 6-16 in all four strokes and organized team events, developing leadership and communication skills.
Forever 21 Brand Ambassador June - August 2022	Developed customer service and supported daily operations, ensuring efficiency and positive customer experiences in a fast-paced retail setting.

Stephanie Birkelbach

Social Computing & Cultural AI Systems

Disfluency-Aware Speech and Language Understanding

Education

Awards

Service

Experience

Certifications