Punica: Multi-Tenant LoRA Serving
- The CUDA kernel algorithm is novel (I built a python POC below since the paper didn't do a good job explaining it)
- The CUDA kernel algorithm is novel (I built a python POC below since the paper didn't do a good job explaining it)