About Me
Hi, my name is Yeonjoon Jung, and I am an undergraduate student at POSTECH
majoring in Convergence IT Engineering and Computer Science and Engineering.
I am currently taking a leave of absence to complete my mandatory alternative military service
as an ML Researcher/Engineer at SqueezeBits, where I focus on optimizing and accelerating AI models.
My recent research interests span the field of Efficient AI—including quantization,
inference optimization, parameter-efficient fine-tuning (PEFT)-with applications
to large language models (LLMs) and diffusion models.
I am always open to collaborations and new research opportunities. Please feel free to contact me.
News
[09/2025] Our paper is accepted by NeurIPS 2025 as a Spotlight
[06/2025] Released blog post on explaining GraLoRA, a novel LoRA fine-tuning method.
[01/2025] Released blog post on exploring Vision Language Model serving.
[12/2024] Released blog post on effectiveness of prefix caching.
[12/2024] Released blog post on understanding speculative decoding.
[10/2024] Released blog post on analyzing batching in LLM serving
[10/2024] Released blog post on evaluating LLM serving with key metrics
[11/2023] Our paper is accepted by LoG 2023 extended abstract track
[08/2023] I joined AI startup, SqueezeBits, as an ML Researcher/Engineer
[03/2023] I joined Prof. Ahn’s research group as an undergraduate researcher
Papers
GraLoRA: Granular Low-Rank Adaptation for Parameter-Efficient Fine-Tuning
Yeonjoon Jung, Daehyun Ahn, Hyungjun Kim, Taesu Kim, Eunhyeok Park
Neural Information Processing Systems (NeurIPS) 2025 Spotlight
ArXiv, Github
Triplet edge attention for algorithmic reasoning
Yeonjoon Jung, Sungsoo Ahn
Learning on Graph Conference (LoG), 2023, extended abstract
ArXiv
Blogs
GraLoRA: Boosting Fine-Tuning Accuracy Without Extra Cost
Introducing GraLoRA, a novel LoRA fine-tuning method
Link
[vLLM vs TensorRT-LLM] #13. Vision Language Models
Exploring Vision Language Model serving
Link
[vLLM vs TensorRT-LLM] #12. Automatic Prefix Caching
Effectiveness of prefix caching in LLM serving
Link
[vLLM vs TensorRT-LLM] #11. Speculative Decoding
Understanding speculative decoding in LLM serving
Link
[vLLM vs TensorRT-LLM] #2. Towards Optimal Batching for LLM Serving
Analyzing batching in LLM serving
Link or
Link2
[vLLM vs TensorRT-LLM] #1. An Overall Evaluation
Education
POSTECH
03/2020 - Present
Major: Convergence IT Engineering and Computer Science and Engineering
Korea Science Academy of KAIST
03/2017 - 02/2020