Bhavya Giri

Now

Projects

2026
cuda-kernelsCustom CUDA kernels for transformer inference — fused attention, quantized GEMM
2025
mlops-pipelineEnd-to-end ML pipeline with automated training, evaluation, and deployment on Kubernetes
2025
gpu-benchBenchmarking suite for GPU memory bandwidth, compute throughput, and kernel launch latency
2024
vector-dbLightweight vector database with HNSW indexing and CUDA-accelerated similarity search
2024
model-servingHigh-throughput model serving framework with dynamic batching and TensorRT optimization

Blog

2026
CUDA Memory Coalescing: The First Thing to Get RightWhy memory access patterns matter more than compute in GPU kernels.
2026
Hello, WorldFirst post — why I'm writing, what to expect.

Connect