About Me
I am a Founding Member of Technical Staff at Inferact, building vLLM and making AI accessible to everyone with cheaper and faster inference.
I just finished my Postdoc at UC Berkeley, where I worked with Ion Stoica and Joseph E. Gonzalez in the Sky Computing Lab. Prior to that, I completed my Ph.D. in Computer Science at UCLA in 2024, where I was advised by Harry Xu and Miryung Kim.
My research lies at the intersection of systems and machine learning. I build systems to make AI faster and more efficient.
I am a recipient of the Amazon & UCLA Science Hub Fellowship (2021), a Jane Street Graduate Research Fellowship Finalist (2023), and UCLA's Outstanding Graduate Student Research Award (2024).
Updates
- Jan 2026: Joined Inferact as a Founding Member of Technical Staff.
- Dec 2025: Announced GVM at Sky Winter Retreat.
- Jul–Nov 2025: Invited talks on kvcached and ConServe at Moonshot AI, AWS, NVIDIA, Meta, and IBM.
- Oct 2025: Released kvcached with a technical deep dive blog. [X] [LinkedIn]
Open Source Projects
I am actively working on the Open Virtual GPU project (ovg-project), building open-source infrastructure for GPU virtualization and efficient GPU sharing in datacenters. Our vision is to create a "GPU OS" that makes GPU resources as manageable and shareable as CPU resources today. Read our first blog post on solving the GPU cost crisis.
Elastic KV cache sharing across multiple co-located LLMs through GPU virtual memory. Integrates with SGLang and vLLM.
An OS-level GPU virtualization layer, for sharing a GPU with hardware-like performance isolation and full flexibility.
Publications
-
Lost in Translation: The Search for Meaning in Network-Attached AI Accelerator Disaggregation
Jaewan Hong, Yifan Qiao, Soujanya Ponnapalli, Shu Liu, Marcos K. Aguilera Vincent Liu, Christopher J. Rossbach, Ion Stoica
HotNets 2025
-
PrefillOnly: An Inference Engine for Prefill-only Workloads in Large Language Model Applications
Kuntai Du, Bowen Wang, Chen Zhang, Yiming Cheng, Qing Lan, Hejian Sang, Yihua Cheng, Jiayi Yao, Xiaoxuan Liu, Yifan Qiao, Ion Stoica, Junchen Jiang
SOSP 2025
-
Chenxiao Liu, Zhenting Zhu, Quanxi Li, Yanwen Xia, Yifan Qiao, Xiangyun Deng, Youyou Lu, Tao Xie, Huimin Cui, Zidong Du, Harry Xu, Chenxi Wang
SOSP 2025
-
Towards Efficient and Practical GPU Multitasking in the Era of LLM
Jiarong Xing, Yifan Qiao, Simon Mo, Xingqi Cui, Gur-Eyal Sela, Yang Zhou, Joseph Gonzalez, Ion Stoica
Arxiv 2025
-
Prism: Unleashing GPU Sharing for Cost-Efficient Multi-LLM Serving
Shan Yu, Jiarong Xing, Yifan Qiao, Mingyuan Ma, Yangmin Li, Yang Wang, Shuo Yang, Zhiqiang Xie, Shiyi Cao, Ke Bao, Ion Stoica, Harry Xu, Ying Sheng
Arxiv 2025 code
-
ConServe: Harvesting GPUs for Low-Latency and High-Throughput Large Language Model Serving
Yifan Qiao, Shu Anzai, Shan Yu, Haoran Ma, Shuo Yang, Yang Wang, Miryung Kim, Yongji Wu, Yang Zhou, Jiarong Xing, Joseph E. Gonzalez, Ion Stoica, Harry Xu
Arxiv 2025
-
Haoran Ma, Yifan Qiao, Shi Liu, Shan Yu, Chenxi Wang, Yuanjiang Ni, Qingda Lu, Jiesheng Wu, Yiying Zhang, Miryung Kim, and Harry Xu.
OSDI 2024 full versioncode
-
A Tale of Two Paths: Toward a Hybrid Data Plane for Efficient Far-Memory Applications
Lei Chen*, Shi Liu*, Chenxi Wang, Haoran Ma, Yifan Qiao, Zhe Wang, Chenggang Wu, Youyou Lu, Xiaobing Feng, Huimin Cui, Shan Lu, and Harry Xu.
OSDI 2024 full versioncode
-
Harvesting Idle Memory for Application-managed Soft State with Midas
Yifan Qiao, Zhenyuan Ruan, Haoran Ma, Adam Belay, Miryung Kim, and Harry Xu.
-
Hermit: Low-Latency, High-Throughput, and Transparent Remote Memory via Feedback-Directed Asynchrony
Yifan Qiao, Chenxi Wang, Zhenyuan Ruan, Adam Belay, Qingda Lu, Yiying Zhang, Miryung Kim, and Guoqing Harry Xu.
-
Canvas: Isolated and Adaptive Swapping for Multi-Applications on Remote Memory
Chenxi Wang*, Yifan Qiao*, Haoran Ma, Shi Liu, Yiying Zhang, Wenguang Chen, Ravi Netravali, Miryung Kim, Guoqing Harry Xu. (*contributed equally)
-
Bamboo: Making Preemptible Instances Resilient for Affordable Training of Large DNNs
John Thorpe*, Pengzhan Zhao*, Jonathan Eyolfson, Yifan Qiao, Zhihao Jia, Minjia Zhang, Ravi Netravali, Guoqing Harry Xu.
NSDI 2023 full versioncode
-
MemLiner: Lining up Tracing and Application for a Far-Memory-Friendly Runtime
Chenxi Wang*, Haoran Ma*, Shi Liu, Yifan Qiao, Jonathan Eyolfson, Christian Navasca, Shan Lu, Guoqing Harry Xu.
OSDI 2022 (Awarded Jay Lepreau Best Paper) code
-
Mako: A Low-Pause, High-Throughput Evacuating Collector for Memory-Disaggregated Datacenters
Haoran Ma, Shi Liu, Chenxi Wang, Yifan Qiao, Michael D. Bond, Stephen M. Blackburn, Miryung Kim, Guoqing Harry Xu.
PLDI 2022 code
-
Dorylus: Affordable, Scalable, and Accurate GNN Training over Billion-Edge Graphs
John Thorpe*, Yifan Qiao*, Jonathan Eyolfson, Shen Teng, Guanzhou Hu, Zhihao Jia, Jinliang Wei, Keval Vora, Ravi Netravali, Miryung Kim, and Guoqing Harry Xu. (*contributed equally)
OSDI 2021 full versioncode
-
Algorithm-Directed Crash Consistence in Non-Volatile Memory for HPC
Shuo Yang, Kai Wu, Yifan Qiao, Dong Li, Jidong Zhai.
CLUSTER 2017
Experience
-
Visiting Student at MIT PDOS Group, hosted by Adam Belay.
Worked on an elastic LLM serving system.
Jun. 2023 - Sept. 2023
-
Visiting Student at MIT PDOS Group, hosted by Adam Belay.
Worked on Midas, a new OS memory abstraction for soft state.
Jun. 2022 - Sept. 2022
-
Research Intern at Alibaba Bellevue, Cloud Storage Team, hosted by Qingda Lu.
Worked on Hermit, a high-performance and transparent remote memory system.
Jun. 2021 - Sept. 2021
Service
- MLSys 2026, Program Committee
- ASPLOS 2026, Program Committee
- ATC 2024, External Review Committee
- SOSP 2023, Artifact Evaluation Committee
- OSDI 2023, Artifact Evaluation Committee
- ATC 2023, Artifact Evaluation Committee
- WORDS 2022, Session Chair
Awards
- 2024 Outstanding Graduate Student Research Award, UCLA
- 2023 Jane Street Graduate Research Fellowship Finalist
- 2021 Amazon Ph.D. Fellow
- 2019 Magna Cum Laude in Beijing (8/140)
- 2019 Magna Cum Laude at Department of Computer Science and Technology, Tsinghua University
- 2019 Cum Laude at Tsinghua University (14/140)
- 2018 CNPC Scholarship for Comprehensive Excellence (8/140)
- 2018 Qualcomm Scholarship (Top 6%)
- 2017 National Scholarship (6/140)
Teaching
UC Berkeley
- Guest Lecture for CS 262A Advanced Topics in Computer Systems (Spring 2025)
- Guest Lecture for CS 162 Operating System (Fall 2025)
UCLA
- Teaching Assistant for CS 130 Software Engineering (Winter 2024)
- Teaching Assistant for CS 130 Software Engineering (Spring 2024)