Yeonjoon Jung

Undergraduate Student at POSTECH | ML Researcher/Engineer at SqueezeBits

About Me

Hi, my name is Yeonjoon Jung, and I am an undergraduate student at POSTECH majoring in Convergence IT Engineering and Computer Science and Engineering. I am currently taking a leave of absence to complete my mandatory alternative military service as an ML Researcher/Engineer at SqueezeBits, where I focus on optimizing and accelerating AI models.

My recent research interests span the field of Efficient AI—including quantization, inference optimization, parameter-efficient fine-tuning (PEFT)-with applications to large language models (LLMs) and diffusion models.

I am always open to collaborations and new research opportunities. Please feel free to contact me.

News

[11/2025] GraLoRA is now available in the HuggingFace PEFT Library.
[10/2025] Released blog post on efficient pipeline for diffusion model inference.
[09/2025] Our paper is accepted by NeurIPS 2025 as a Spotlight.
[06/2025] Released blog post on explaining GraLoRA, a novel LoRA fine-tuning method.
[01/2025] Released blog post on exploring Vision Language Model serving.
[12/2024] Released blog post on effectiveness of prefix caching.
[12/2024] Released blog post on understanding speculative decoding.
[10/2024] Released blog post on analyzing batching in LLM serving.
[10/2024] Released blog post on evaluating LLM serving with key metrics.
[11/2023] Our paper is accepted by LoG 2023 extended abstract track.
[08/2023] I joined AI startup, SqueezeBits, as an ML Researcher/Engineer.
[03/2023] I joined Prof. Ahn’s research group as an undergraduate researcher.

Papers

GraLoRA: Granular Low-Rank Adaptation for Parameter-Efficient Fine-Tuning

Yeonjoon Jung, Daehyun Ahn, Hyungjun Kim, Taesu Kim, Eunhyeok Park
Neural Information Processing Systems (NeurIPS) 2025 Spotlight
ArXiv, Github

Triplet edge attention for algorithmic reasoning

Yeonjoon Jung, Sungsoo Ahn
Learning on Graph Conference (LoG), 2023, extended abstract
ArXiv

Blogs

Winning both speed and quality: How Yetter deals with diffusion models

Introducing efficient pipeline for diffusion model inference | Link

GraLoRA: Boosting Fine-Tuning Accuracy Without Extra Cost

Introducing GraLoRA, a novel LoRA fine-tuning method | Link

[vLLM vs TensorRT-LLM] #13. Vision Language Models

Exploring Vision Language Model serving | Link

[vLLM vs TensorRT-LLM] #12. Automatic Prefix Caching

Effectiveness of prefix caching in LLM serving | Link

[vLLM vs TensorRT-LLM] #11. Speculative Decoding

Understanding speculative decoding in LLM serving | Link

[vLLM vs TensorRT-LLM] #2. Towards Optimal Batching for LLM Serving

Analyzing batching in LLM serving | Link or Link2

[vLLM vs TensorRT-LLM] #1. An Overall Evaluation

Evaluating LLM serving with key metrics | Link or Link2

Education

POSTECH

03/2020 - Present
Major: Convergence IT Engineering and Computer Science and Engineering

Korea Science Academy of KAIST

03/2017 - 02/2020