Yeonjoon Jung
Undergraduate Student at POSTECH · ML Researcher/Engineer at SqueezeBits
Hi, my name is Yeonjoon Jung, and I am an undergraduate student at POSTECH majoring in Convergence IT Engineering & Computer Science and Engineering.
I am currently taking a leave of absence to complete my mandatory alternative military service as an ML Researcher/Engineer at SqueezeBits, where I focus on optimizing and accelerating AI models.
My recent research interests span Efficient AI, including quantization, inference optimization, and parameter-efficient fine-tuning (PEFT), with applications to large language models (LLMs) and diffusion models.
I am always open to collaborations and new research opportunities. Feel free to contact me.
News
-
03/2026Released a blog post on scalable synthetic data generation for Physical AI.
-
02/2026Released a blog post on building reliable synthetic data pipelines for Physical AI.
-
11/2025GraLoRA is now available in the HuggingFace PEFT Library.
-
10/2025Released a blog post on an efficient pipeline for diffusion model inference.
-
09/2025Our paper was accepted by NeurIPS 2025 as a Spotlight.
-
06/2025Released a blog post explaining GraLoRA, a novel LoRA fine-tuning method.
-
01/2025Released a blog post on Vision Language Model serving.
-
12/2024Released a post on prefix caching and another on speculative decoding.
-
10/2024Released posts on optimal batching and an overall LLM serving evaluation.
-
11/2023Our paper was accepted to the LoG 2023 extended abstract track.
-
08/2023Joined SqueezeBits as an ML Researcher/Engineer.
-
03/2023Joined Prof. Ahn's research group as an undergraduate researcher.
Papers
Triplet edge attention for algorithmic reasoning
Blogs
Reliable & Scalable Synthetic Data for Physical AI (Part 2)
On scaling synthetic data generation for Physical AI.
Reliable & Scalable Synthetic Data for Physical AI (Part 1)
On building reliable synthetic data pipelines for Physical AI.
Winning both speed and quality: How Yetter deals with diffusion models
Introducing an efficient pipeline for diffusion model inference.
GraLoRA: Boosting Fine-Tuning Accuracy Without Extra Cost
Introducing GraLoRA, a novel LoRA fine-tuning method.
[vLLM vs TensorRT-LLM] #13. Vision Language Models
Exploring Vision Language Model serving.
[vLLM vs TensorRT-LLM] #12. Automatic Prefix Caching
The effectiveness of prefix caching in LLM serving.
[vLLM vs TensorRT-LLM] #11. Speculative Decoding
Understanding speculative decoding in LLM serving.
[vLLM vs TensorRT-LLM] #2. Towards Optimal Batching for LLM Serving
Analyzing batching in LLM serving.
[vLLM vs TensorRT-LLM] #1. An Overall Evaluation
Evaluating LLM serving with key metrics.
Education
POSTECH
Major: Convergence IT Engineering and Computer Science and Engineering