Bhavya Giri

Now

2026

cuda-kernelsCustom CUDA kernels for transformer inference — fused attention, quantized GEMM

2025

mlops-pipelineEnd-to-end ML pipeline with automated training, evaluation, and deployment on Kubernetes

2025

gpu-benchBenchmarking suite for GPU memory bandwidth, compute throughput, and kernel launch latency

2024

vector-dbLightweight vector database with HNSW indexing and CUDA-accelerated similarity search

2024

model-servingHigh-throughput model serving framework with dynamic batching and TensorRT optimization

2026

CUDA Memory Coalescing: The First Thing to Get RightWhy memory access patterns matter more than compute in GPU kernels.

2026

Hello, WorldFirst post — why I'm writing, what to expect.