|
공지 |
2026년 신입생 Introduction
| 안태현 | 2026.01.08 | 14 |
|
11 |
LoRA: Low-Rank Adaptation of Large Language Models
 | 한상우 | 2026.01.19 | 0 |
|
10 |
[NeurIPS 22] FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness
| 고병욱 | 2026.01.13 | 6 |
|
9 |
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
| 김나영 | 2026.01.13 | 3 |
|
8 |
SARATHI: Efficient LLM Inference by Piggybacking Decodes with Chunked Prefills
| 김민수 | 2026.01.13 | 5 |
|
7 |
SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models
| 임성훈 | 2026.01.12 | 2 |
|
6 |
FlexGen: High-Throughput Generative Inference of Large Language Models with a Single GPU
| 이원주 | 2026.01.12 | 5 |
|
5 |
Efficient Memory Management for Large Language Model Serving with PagedAttention
| 김정률 | 2026.01.12 | 5 |
|
4 |
CacheBlend: Fast Large Language Model Serving for RAG with Cached Knowledge Fusion
| 한상우 | 2026.01.12 | 8 |
|
3 |
Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding
| 김나영 | 2026.01.08 | 10 |
|
2 |
InfiniGen: Efficient Generative Inference of Large Language Models with Dynamic KV Cache Management
| 김민수 | 2026.01.08 | 10 |
|
1 |
ICLR 2023 GPTQ:ACCURATE POST-TRAINING QUANTIZATION FOR GENERATIVE PRE-TRAINED TRANSFORMERS
| 고병욱 | 2026.01.08 | 11 |