2025-05-25 Readings
· One min read
Paper #2: (MLSys 2024) AWQ: Activation-Aware Weight Quantization for On-Device LLM Compression and Acceleration
Paper #1: (MLSys 2024) Punica: Multi-Tenant LoRA Serving
Paper #2: (MLSys 2024) AWQ: Activation-Aware Weight Quantization for On-Device LLM Compression and Acceleration
Paper #1: (MLSys 2024) Punica: Multi-Tenant LoRA Serving