About Me

I am a Founding Member of Technical Staff at Inferact, building vLLM and making AI accessible to everyone with cheaper and faster inference.

I just finished my Postdoc at UC Berkeley, where I worked with Ion Stoica and Joseph E. Gonzalez in the Sky Computing Lab. Prior to that, I completed my Ph.D. in Computer Science at UCLA in 2024, where I was advised by Harry Xu and Miryung Kim.

My research lies at the intersection of systems and machine learning. I build systems to make AI faster and more efficient.

I am a recipient of the Amazon & UCLA Science Hub Fellowship (2021), a Jane Street Graduate Research Fellowship Finalist (2023), and UCLA's Outstanding Graduate Student Research Award (2024).

Updates

  • Jan 2026: Joined Inferact as a Founding Member of Technical Staff.
  • Dec 2025: Announced GVM at Sky Winter Retreat.
  • Jul–Nov 2025: Invited talks on kvcached and ConServe at Moonshot AI, AWS, NVIDIA, Meta, and IBM.
  • Oct 2025: Released kvcached with a technical deep dive blog. [X] [LinkedIn]

Open Source Projects

I am actively working on the Open Virtual GPU project (ovg-project), building open-source infrastructure for GPU virtualization and efficient GPU sharing in datacenters. Our vision is to create a "GPU OS" that makes GPU resources as manageable and shareable as CPU resources today. Read our first blog post on solving the GPU cost crisis.

kvcached 750+

Elastic KV cache sharing across multiple co-located LLMs through GPU virtual memory. Integrates with SGLang and vLLM.

Co-leading with Jiarong Xing and Shan Yu

GVM New

An OS-level GPU virtualization layer, for sharing a GPU with hardware-like performance isolation and full flexibility.

Co-leading with Yicheng Liu

Publications

  1. Lost in Translation: The Search for Meaning in Network-Attached AI Accelerator Disaggregation

    Jaewan Hong, Yifan Qiao, Soujanya Ponnapalli, Shu Liu, Marcos K. Aguilera Vincent Liu, Christopher J. Rossbach, Ion Stoica

    HotNets 2025

  2. PrefillOnly: An Inference Engine for Prefill-only Workloads in Large Language Model Applications

    Kuntai Du, Bowen Wang, Chen Zhang, Yiming Cheng, Qing Lan, Hejian Sang, Yihua Cheng, Jiayi Yao, Xiaoxuan Liu, Yifan Qiao, Ion Stoica, Junchen Jiang

    SOSP 2025

  3. Orthrus: Efficient and Timely Detection of Silent User Data Corruption in the Cloud with Resource-Adaptive Computation Validation

    Chenxiao Liu, Zhenting Zhu, Quanxi Li, Yanwen Xia, Yifan Qiao, Xiangyun Deng, Youyou Lu, Tao Xie, Huimin Cui, Zidong Du, Harry Xu, Chenxi Wang

    SOSP 2025

  4. Towards Efficient and Practical GPU Multitasking in the Era of LLM

    Jiarong Xing, Yifan Qiao, Simon Mo, Xingqi Cui, Gur-Eyal Sela, Yang Zhou, Joseph Gonzalez, Ion Stoica

    Arxiv 2025

  5. Prism: Unleashing GPU Sharing for Cost-Efficient Multi-LLM Serving

    Shan Yu, Jiarong Xing, Yifan Qiao, Mingyuan Ma, Yangmin Li, Yang Wang, Shuo Yang, Zhiqiang Xie, Shiyi Cao, Ke Bao, Ion Stoica, Harry Xu, Ying Sheng

    Arxiv 2025 code

  6. ConServe: Harvesting GPUs for Low-Latency and High-Throughput Large Language Model Serving

    Yifan Qiao, Shu Anzai, Shan Yu, Haoran Ma, Shuo Yang, Yang Wang, Miryung Kim, Yongji Wu, Yang Zhou, Jiarong Xing, Joseph E. Gonzalez, Ion Stoica, Harry Xu

    Arxiv 2025

  7. DRust: Language-Guided Distributed Shared Memory with Fine Granularity, Full Transparency, and Ultra Efficiency

    Haoran Ma, Yifan Qiao, Shi Liu, Shan Yu, Chenxi Wang, Yuanjiang Ni, Qingda Lu, Jiesheng Wu, Yiying Zhang, Miryung Kim, and Harry Xu.

    OSDI 2024 full versioncode

  8. A Tale of Two Paths: Toward a Hybrid Data Plane for Efficient Far-Memory Applications

    Lei Chen*, Shi Liu*, Chenxi Wang, Haoran Ma, Yifan Qiao, Zhe Wang, Chenggang Wu, Youyou Lu, Xiaobing Feng, Huimin Cui, Shan Lu, and Harry Xu.

    OSDI 2024 full versioncode

  9. Harvesting Idle Memory for Application-managed Soft State with Midas

    Yifan Qiao, Zhenyuan Ruan, Haoran Ma, Adam Belay, Miryung Kim, and Harry Xu.

    NSDI 2024 codeslides

  10. Hermit: Low-Latency, High-Throughput, and Transparent Remote Memory via Feedback-Directed Asynchrony

    Yifan Qiao, Chenxi Wang, Zhenyuan Ruan, Adam Belay, Qingda Lu, Yiying Zhang, Miryung Kim, and Guoqing Harry Xu.

    NSDI 2023 codeslides

  11. Canvas: Isolated and Adaptive Swapping for Multi-Applications on Remote Memory

    Chenxi Wang*, Yifan Qiao*, Haoran Ma, Shi Liu, Yiying Zhang, Wenguang Chen, Ravi Netravali, Miryung Kim, Guoqing Harry Xu. (*contributed equally)

    NSDI 2023 codeslides

  12. Bamboo: Making Preemptible Instances Resilient for Affordable Training of Large DNNs

    John Thorpe*, Pengzhan Zhao*, Jonathan Eyolfson, Yifan Qiao, Zhihao Jia, Minjia Zhang, Ravi Netravali, Guoqing Harry Xu.

    NSDI 2023 full versioncode

  13. MemLiner: Lining up Tracing and Application for a Far-Memory-Friendly Runtime

    Chenxi Wang*, Haoran Ma*, Shi Liu, Yifan Qiao, Jonathan Eyolfson, Christian Navasca, Shan Lu, Guoqing Harry Xu.

    OSDI 2022 (Awarded Jay Lepreau Best Paper) code

  14. Mako: A Low-Pause, High-Throughput Evacuating Collector for Memory-Disaggregated Datacenters

    Haoran Ma, Shi Liu, Chenxi Wang, Yifan Qiao, Michael D. Bond, Stephen M. Blackburn, Miryung Kim, Guoqing Harry Xu.

    PLDI 2022 code

  15. Dorylus: Affordable, Scalable, and Accurate GNN Training over Billion-Edge Graphs

    John Thorpe*, Yifan Qiao*, Jonathan Eyolfson, Shen Teng, Guanzhou Hu, Zhihao Jia, Jinliang Wei, Keval Vora, Ravi Netravali, Miryung Kim, and Guoqing Harry Xu. (*contributed equally)

    OSDI 2021 full versioncode

  16. Algorithm-Directed Crash Consistence in Non-Volatile Memory for HPC

    Shuo Yang, Kai Wu, Yifan Qiao, Dong Li, Jidong Zhai.

    CLUSTER 2017

Experience

  1. Visiting Student at MIT PDOS Group, hosted by Adam Belay.

    Worked on an elastic LLM serving system.

    Jun. 2023 - Sept. 2023

  2. Visiting Student at MIT PDOS Group, hosted by Adam Belay.

    Worked on Midas, a new OS memory abstraction for soft state.

    Jun. 2022 - Sept. 2022

  3. Research Intern at Alibaba Bellevue, Cloud Storage Team, hosted by Qingda Lu.

    Worked on Hermit, a high-performance and transparent remote memory system.

    Jun. 2021 - Sept. 2021

Service

  • MLSys 2026, Program Committee
  • ASPLOS 2026, Program Committee
  • ATC 2024, External Review Committee
  • SOSP 2023, Artifact Evaluation Committee
  • OSDI 2023, Artifact Evaluation Committee
  • ATC 2023, Artifact Evaluation Committee
  • WORDS 2022, Session Chair

Awards

  • 2024 Outstanding Graduate Student Research Award, UCLA
  • 2023 Jane Street Graduate Research Fellowship Finalist
  • 2021 Amazon Ph.D. Fellow
  • 2019 Magna Cum Laude in Beijing (8/140)
  • 2019 Magna Cum Laude at Department of Computer Science and Technology, Tsinghua University
  • 2019 Cum Laude at Tsinghua University (14/140)
  • 2018 CNPC Scholarship for Comprehensive Excellence (8/140)
  • 2018 Qualcomm Scholarship (Top 6%)
  • 2017 National Scholarship (6/140)

Teaching

UC Berkeley

  • Guest Lecture for CS 262A Advanced Topics in Computer Systems (Spring 2025)
  • Guest Lecture for CS 162 Operating System (Fall 2025)

UCLA

  • Teaching Assistant for CS 130 Software Engineering (Winter 2024)
  • Teaching Assistant for CS 130 Software Engineering (Spring 2024)