Generalizing Transformers for processing imagesThe Visual Transformer (ViT) model is introduced in the paper “An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale”…2h ago2h ago
CLIP: Aligning images and text with contrastive learningThe paper “Learning Transferable Visual Models From Natural Language Supervision” introduces the CLIP (Contrastive Language–Image…3h ago3h ago
Learning to build GPT: my takeawaysThe lecture on building GPT by Andrej Karpathy is a MUST-watch for everyone who’s in NLP. In this post, I’ll be summarizing my notes from…Oct 25Oct 25
The gradual information-fusing neural modelHello, this post is based on lecture 4 by Andrej Karpathy on building neural nets from scratch.Oct 23Oct 23
The essentials of activations, gradients, and batch normalizationHello everyone. As you know, I’m watching the lectures by Andrej Karpathy on building neural networks from scratch.Oct 14Oct 14
The neural probabilistic language modelIn our previous post, we discussed count-based and simple neural bigram models. The problem with both is that they only consider the…Oct 10Oct 10
From count-based to neural bigram modelsIn this post, I will summarize my notes from the 2nd lecture of the Zero to Hero course by Andrej Karpathy, which concerns building…Oct 8Oct 8
Backpropagation, backed up by Andrej KarpathyI’ve been giving a shot at the free Neural Networks: Zero to Hero course by Karpathy on building neural networks, from scratch, in code.Oct 7Oct 7
Cross-attention made (really) easySo far, queries, keys, and values have originated from the same source. (If that’s unclear, please refer to my previous posts on…Oct 2Oct 2
My take on attention, part 2In the last post, we discussed how to calculate the attention score for a pair of words and how this score is used to create a new…Oct 2Oct 2