Category: Machine Learning Systems
Category: Machine Learning Systems
- Batching in LLM Serving Systems
- Faster Causal Self Attention
- GPU Architecture and Programming
- GPU Kernel Programming with Triton and CUDA
- How to write a fast kernel
- InfLLM: Training-Free Long-Context Extrapolation for LLMs with an Efficient Context Memory
- Intro to Mixture of Experts (MoE) in LLM Serving Systems
- Memory Management in LLM Serving Systems
- Modeling and Scaling Performance with Roofline
- Optimizing GPU Kernels
- Performance Modeling for LLM Serving Systems
- Practical Lessons from Predicting Clicks on Ads at Facebook
- Quantization in LLM Serving Systems
- Recommender Systems
- Sparsity and Pruning in LLM Serving Systems
- Speculative Decoding in LLM Serving Systems
- Transformer Architecture and Implementation