Yeonjoon Jung

Undergraduate Student at POSTECH | ML Researcher/Engineer at SqueezeBits

About Me

Hi, my name is Yeonjoon Jung, and I am an undergraduate student at POSTECH majoring in Convergence IT Engineering and Computer Science and Engineering. I am currently taking a leave of absence to complete my mandatory alternative military service as an ML Researcher/Engineer at SqueezeBits, where I focus on optimizing and accelerating AI models.

My recent research interests span the field of Efficient AI—including quantization, inference optimization, parameter-efficient fine-tuning (PEFT)-with applications to large language models (LLMs) and diffusion models.

I am always open to collaborations and new research opportunities. Please feel free to contact me.

News

[09/2025] Our paper is accepted by NeurIPS 2025 as a Spotlight
[06/2025] Released blog post on explaining GraLoRA, a novel LoRA fine-tuning method.
[01/2025] Released blog post on exploring Vision Language Model serving.
[12/2024] Released blog post on effectiveness of prefix caching.
[12/2024] Released blog post on understanding speculative decoding.
[10/2024] Released blog post on analyzing batching in LLM serving
[10/2024] Released blog post on evaluating LLM serving with key metrics
[11/2023] Our paper is accepted by LoG 2023 extended abstract track
[08/2023] I joined AI startup, SqueezeBits, as an ML Researcher/Engineer
[03/2023] I joined Prof. Ahn’s research group as an undergraduate researcher

Papers

GraLoRA: Granular Low-Rank Adaptation for Parameter-Efficient Fine-Tuning

Yeonjoon Jung, Daehyun Ahn, Hyungjun Kim, Taesu Kim, Eunhyeok Park
Neural Information Processing Systems (NeurIPS) 2025 Spotlight
ArXiv, Github

Triplet edge attention for algorithmic reasoning

Yeonjoon Jung, Sungsoo Ahn
Learning on Graph Conference (LoG), 2023, extended abstract
ArXiv

Blogs

GraLoRA: Boosting Fine-Tuning Accuracy Without Extra Cost

Introducing GraLoRA, a novel LoRA fine-tuning method
Link

[vLLM vs TensorRT-LLM] #13. Vision Language Models

Exploring Vision Language Model serving
Link

[vLLM vs TensorRT-LLM] #12. Automatic Prefix Caching

Effectiveness of prefix caching in LLM serving
Link

[vLLM vs TensorRT-LLM] #11. Speculative Decoding

Understanding speculative decoding in LLM serving
Link

[vLLM vs TensorRT-LLM] #2. Towards Optimal Batching for LLM Serving

Analyzing batching in LLM serving
Link or Link2

[vLLM vs TensorRT-LLM] #1. An Overall Evaluation

Evaluating LLM serving with key metrics
Link or Link2

Education

POSTECH

03/2020 - Present
Major: Convergence IT Engineering and Computer Science and Engineering

Korea Science Academy of KAIST

03/2017 - 02/2020