Yuan Yuan

AI/ML Engineer & Graduate Researcher

Philadelphia, US.

About

Yuan Yuan is a highly accomplished M.S. candidate in Data Science & Computer and Information Science, specializing in Large Language Models, Reinforcement Learning, and computational social science. With a strong foundation in developing and optimizing AI solutions, Yuan has contributed to cutting-edge research in LLM personalization and text generation, leading to multiple publications. Their professional experience includes impactful roles at Microsoft, Meta, and Google Summer of Code, where they have engineered scalable AI systems, enhanced developer productivity, and delivered critical features for millions of users.

Work

Microsoft

AI Engineer

Remote, N/A, US

Sep 2025

→

Present

Summary

Led the development and optimization of advanced AI solutions, leveraging deep learning and reinforcement learning to enhance large language model performance in multi-agent environments and deliver scalable AI systems.

Highlights

Engineered and fine-tuned Large Language Models with reinforcement learning in multi-turn, multi-agent environments on Azure Cloud, significantly improving model adaptability and responsiveness.

Post-trained LLMs on multi-modal datasets using Azure Cloud infrastructure, achieving large-scale optimization and enhanced model capabilities for diverse applications.

Developed and debugged robust inference pipelines for both open- and closed-source LLMs, ensuring high performance and accuracy on critical personalization benchmarks.

Maintained and evolved an internal multi-party LLM chat platform built with FastAPI and HTML, facilitating real-time collaboration and seamless user interaction.

Extended model capabilities by implementing GPT-style function calling, enhancing reasoning and tool use for complex, real-world tasks.

Google Summer of Code (AsyncAPI)

Open Source Maintainer

Remote, N/A, US

May 2024

→

Sep 2025

Summary

Led key development initiatives for the AsyncAPI project, enhancing platform functionality, developer experience, and code reliability for the open-source community.

Highlights

Led the development of a Slack bot using Django, AWS, and MongoDB with vector search capabilities, integrating message history, GitHub repository data, and website content to enhance search and retrieval functionalities.

Drove the deprecation of the AsyncAPI Generator CLI, transitioning to a more robust and integrated AsyncAPI CLI environment, significantly enhancing platform functionality and streamlining development processes (PR #1251).

Initiated and directed the integration of templates into the generator repository, improving developers' experience and code reusability (Issue #1249) within the AsyncAPI ecosystem.

Resolved critical issues in React engine templates, improving functionality and reliability (PR #1234, PR #1213) through diligent Git-based version control and code reviews.

Education

University of Pennsylvania

Philadelphia, PA, United States of America

Sep 2023

→

May 2026

M.S.

Data Science & Computer and Information Science

Grade: 3.88/4.0

Vassar College

Poughkeepsie, NY, United States of America

Sep 2017

→

May 2021

B.A.

Political Science, Correlate in Math

Grade: 3.93/4.0

Awards

PHI BETA KAPPA Membership

May 2021

Awarded By

Phi Beta Kappa Society

Recognized for outstanding academic achievement in the liberal arts and sciences, a distinction held by top academic performers.

The Julia Flitner Lamb Prize

May 2021

Awarded By

Vassar College

Awarded for exceptional scholarship and academic excellence at Vassar College.

Ann Cornelisen Fund for Post-Graduate Fellowships

May 2021

Awarded By

Vassar College

Granted to support promising students pursuing post-graduate studies, recognizing academic potential and research aptitude.

Publications

ControlText: Unlocking Controllable Fonts in Multilingual Text Rendering without Font Annotations

Nov 2025

Published by

EMNLP 2025 Findings

Summary

Presented a new text rendering and editing algorithm for diffusion models, improving text generation and enabling users to specify fonts in images. Implemented a text segmentation module based on "Rethinking Text Segmentation" and evaluated the approach against "AnyText" for multi-language text generation.

TurnaboutLLM: A Deductive Reasoning Benchmark from Detective Games

Nov 2025

Published by

EMNLP 2025 Main

Summary

Introduced TurnaboutLLM, a novel framework and dataset derived from detective games, challenging LLMs to detect contradictions in lengthy narratives. Evaluated 12 state-of-the-art LLMs, identifying shortcomings in chain-of-thought prompting and context scaling. Collected, parsed, and transformed game data, annotated reasoning chains, and conducted extensive qualitative and quantitative analysis on ChatGPT, Llama, and Qwen models.

Multilingual Retrieval Augmented Generation for Culturally-Sensitive Tasks: A Benchmark for Cross-lingual Robustness

Jul 2025

Published by

ACL 2025 Main

Summary

Introduced BordIRLines, a multilingual dataset of 251 territorial disputes (720 queries across 49 languages) to evaluate RAG on culturally and politically sensitive tasks. Experiments showed multilingual retrieval improves cross-lingual consistency and reduces geopolitical bias. Designed and monitored human annotation procedure with over 100 participants.

Towards Rationality in Language and Multimodal Agents: A Survey

Jun 2025

Published by

NAACL 2025 Main

Summary

Presented a survey framing rationality in intelligent agents via four axioms: information grounding, logical consistency, invariance from irrelevant context, and preference orderability. Reviewed recent advances in multimodal and multi-agent systems, highlighting open challenges in evaluation, architecture design, and bias mitigation. Researched and drafted the Logical Consistency sections.

Know Me, Respond to Me: Benchmarking LLMs for Dynamic User Profiling and Personalized Responses at Scale

Jun 2025

Published by

COLM 2025 & Long abstract at the MASC-SLL 2025 [Oral]

Summary

Proposed PersonaMem, a benchmark of 180+ multi-session user–LLM interactions capturing evolving personas, where models must align responses with users' current profiles, noting that even top LLMs struggle, achieving only ~50. Evaluated GPT, Qwen, Llama, and Claude models on the benchmark.

When Your Friendly Agent Is More Than Just Friendly: Revealing Side Effects of Using Style Features on LLM Personalization

Jan 2024

Published by

Work in Progress

Summary

Studied how defining personas with style features (e.g., “empathetic”) shapes agent behavior and trait interactions. Generated synthetic user-assistant conversations with GPT-40-mini, Qwen3-8b, and Llama3-8b, investigating side effect strength via ELO ratings and win rates, and significance with linear regression and t-test.

ImplicitPersona: Continuous Learning of Implicit User Personas via Reinforcement Fine-tuning for LLM Personalization

Jan 2024

Published by

Work in Progress

Summary

Built a synthetic dataset to evaluate and improve LLMs' ability to detect implicit user preferences in long-term conversations using Reinforcement Fine-Tuning (RFT). Designed reward signals to encourage models to infer subtle preferences and adapt responses, showing RFT-trained models achieve stronger personalization and alignment with evolving user needs. Implemented a multi-modal RL pipeline using VERL and a DPO training pipeline using LLaMA-Factory, evaluating GPT, Qwen, LLaMA, and Claude models.

Skills

Programming Languages

Java, Python, R, C++, Hack/PHP, JavaScript.

Frameworks & Libraries

Spring Boot, Django, FastAPI, HTML, React, LLaMA-Factory, VERL.

Cloud Platforms

AWS EC2, AWS S3, AWS Kinesis, AWS RDS, AWS Aurora, AWS Elasticache, Azure Cloud.

Databases & Caching

Redis, Apache Kafka, MySQL, Postgres, DynamoDB, MongoDB.

DevOps & Tools

Docker, Docker-compose, Kubernetes, Jenkins, ArgoCD, gRPC, HTTP, WebSocket, Git.

Machine Learning & AI

Large Language Models (LLMs), Reinforcement Learning (RFT), Deep Learning, Natural Language Processing (NLP), Diffusion Models, AI Agents, Computational Social Science, LLM Personalization, Synthetic Data, Multi-modal AI.

Interests

Research Interests

Large Language Model, LLM Personalization, Computational Social Science, LLM Post-Training & Reinforcement Learning, Synthetic Data.