Yeonjoon Jung

Undergraduate Student at POSTECH · ML Researcher/Engineer at SqueezeBits

Hi, my name is Yeonjoon Jung, and I am an undergraduate student at POSTECH majoring in Convergence IT Engineering & Computer Science and Engineering.

I am currently taking a leave of absence to complete my mandatory alternative military service as an ML Researcher/Engineer at SqueezeBits, where I focus on optimizing and accelerating AI models.

My recent research interests span Efficient AI, including quantization, inference optimization, and parameter-efficient fine-tuning (PEFT), with applications to large language models (LLMs) and diffusion models.

I am always open to collaborations and new research opportunities. Feel free to contact me.

Email GitHub LinkedIn

Updates

News

03/2026

Released a blog post on scalable synthetic data generation for Physical AI.
02/2026

Released a blog post on building reliable synthetic data pipelines for Physical AI.
11/2025

GraLoRA is now available in the HuggingFace PEFT Library.
10/2025

Released a blog post on an efficient pipeline for diffusion model inference.
09/2025

Our paper was accepted by NeurIPS 2025 as a Spotlight.
06/2025

Released a blog post explaining GraLoRA, a novel LoRA fine-tuning method.
01/2025

Released a blog post on Vision Language Model serving.
12/2024

Released a post on prefix caching and another on speculative decoding.
10/2024

Released posts on optimal batching and an overall LLM serving evaluation.
11/2023

Our paper was accepted to the LoG 2023 extended abstract track.
08/2023

Joined SqueezeBits as an ML Researcher/Engineer.
03/2023

Joined Prof. Ahn's research group as an undergraduate researcher.

Research

Papers

GraLoRA: Granular Low-Rank Adaptation for Parameter-Efficient Fine-Tuning

Yeonjoon Jung, Daehyun Ahn, Hyungjun Kim, Taesu Kim, Eunhyeok Park

Neural Information Processing Systems (NeurIPS) 2025 · Spotlight

ArXiv GitHub

Triplet edge attention for algorithmic reasoning

Yeonjoon Jung, Sungsoo Ahn

Learning on Graph Conference (LoG), 2023 · Extended abstract

ArXiv

Writing

Blogs

Reliable & Scalable Synthetic Data for Physical AI (Part 2)

On scaling synthetic data generation for Physical AI.

Read

Reliable & Scalable Synthetic Data for Physical AI (Part 1)

On building reliable synthetic data pipelines for Physical AI.

Read

Winning both speed and quality: How Yetter deals with diffusion models

Introducing an efficient pipeline for diffusion model inference.

Read

GraLoRA: Boosting Fine-Tuning Accuracy Without Extra Cost

Introducing GraLoRA, a novel LoRA fine-tuning method.

Read

[vLLM vs TensorRT-LLM] #13. Vision Language Models

Exploring Vision Language Model serving.

Read

[vLLM vs TensorRT-LLM] #12. Automatic Prefix Caching

The effectiveness of prefix caching in LLM serving.

Read

[vLLM vs TensorRT-LLM] #11. Speculative Decoding

Understanding speculative decoding in LLM serving.

Read

[vLLM vs TensorRT-LLM] #2. Towards Optimal Batching for LLM Serving

Analyzing batching in LLM serving.

Read Medium

[vLLM vs TensorRT-LLM] #1. An Overall Evaluation

Evaluating LLM serving with key metrics.

Read Medium

Background

Education

POSTECH

03/2020 - Present

Major: Convergence IT Engineering and Computer Science and Engineering

Korea Science Academy of KAIST

03/2017 - 02/2020