What Transformers Taught Me About Attention

In 2017, a paper titled โ€œAttention Is All You Needโ€ revolutionized machine learning. The Transformer architecture it introduced now powers everything from GPT to BERT to the AI assistants we talk to daily. But beyond its technical brilliance, the attention mechanism offers a surprisingly profound insight about how intelligence might work. The Core Idea Traditional neural networks processed sequences step by step, maintaining a hidden state that theoretically encoded everything that came before. The problem? Information had to survive a long game of telephone. ...

January 3, 2026 ยท 4 min ยท 646 words ยท Shuvro

Neural Garden

What Is It? Neural Garden is an interactive visualization tool that lets you watch neural networks learn in real-time. Instead of just seeing loss curves, you see the decision boundaries evolving, the weights flowing, the network โ€œthinking.โ€ Why I Built It I was frustrated with how abstract neural network learning felt. You tweak hyperparameters, wait, look at numbers. Whereโ€™s the intuition? I wanted to see what the network was doing. To watch it struggle with non-linear boundaries. To see why learning rates matter, viscerally. ...

December 15, 2025 ยท 2 min ยท 280 words ยท Shuvro