Darron's Devlog

cs224n - Lecture 10. Transformers and Pretraining

Word structure and subword models Assumpstions we’ve made: Fixed vocab of a number of words, built from the training set. All novel words seen at test time are mapped to a single UNK token. Finite vocabulary assumptions make even less sense in many languages. Many...

Cs224n

cs224n - Lecture 9. Self-Attention and Transformers

So far: recurrent models for (most) NLP - Circa 2016, the de facto strategy in NLP is to encode sentences with a bidirectional LSTM. (for example, the source sentence in a translation) - Define your output (parse, sentence, summary) as a sequence, and use an...

Cs224n

cs224n - Lecture 8. Attention (Contd.)

Attention Encoder hidden states $\mathbf{h}_1, \ldots, \mathbf{h}_N \in \mathbb{R}^h$ On timestep $t$, we have Decoder hidden state $\mathbf{s}_t \in \mathbb{R}^h$ Attention score $\mathbf{e}^t$ for this step: $\mathbf{e}^t = \left[ \mathbf{s}_t^T \mathbf{h}_1, \ldots, \mathbf{s}_t^T \mathbf{h}_N \right] \in \mathbb{R}^N$ Take softmax to get the Attention distribution: \(\alpha^t...

Cs224n

cs224n - Lecture 7. Translation, Seq2Seq, Attention

New Task: Machine Translation Pre-Neural Machine Translation Machine Translation (MT) is the task of translating a sentence x from one language (the source language) to a sentence y in another language (the target language). 1990s-2010s: Statistical Machine Translation Core idea: Learn a probabilistic model from...

Cs224n

cs224n - Lecture 6. Simple and LSTM RNNs

The Simple RNN Language Model Training an RNN Language Model Get a big corpus of text which is a sequence of words $x^{(1)}, \ldots, x^{(T)}$ Feed into RNN-LM; compute output distribution $\hat{y}^{(t)}$ for every step $t$. i.e., predict probability distribution of every word, given words...

Cs224n

cs224n - Lecture 5. Language Models and RNNs

How do we gain from a neural dependency parser? So far… Transition based dependency parsers were an efficient linear time method for giving the syntactic structure of natural language text. Worked pretty well before neural nets came along. $\color{red}{(-)}$ They worked with indicator features, specifying...

Cs224n

cs224n - Lecture 4. Dependency Parsing

Two views of linguistic structure: Phrase structure Constituency = phrase structure grammar = context-free grammers(CFGs) Phrase structure organizes words into nested constituents Starting unit: words (noun, preposition, adjective, determiner, …) the, $\ $ cat, $\ $ cuddly, $\ $ by, $\ $ door Words combine...

Cs224n

cs224n - Lecture 3. Backprop and Neural Networks

Named Entity Recognition(NER) Task: find and classify names in text Possible uses: Tracking mentions of particular entities in documents For question answering, answers are usually named entities Often followed by Named Entity Linking/Canonicalization into Knowledge Base Simple NER: Window classification using binary logistic classifier Using...

Cs224n

cs224n - Lecture 2. Neural Classifiers

Review: Main idea of word2vec Start with random word vectors Iterate through each word in the whole corpus Try to predict surrounding words using word vectors: $P(o\mid c) = \frac{\exp(u_o^T v_c)}{\sum_{w \in V}\exp(u_w^T v_c)}$ Learning: Update vectors so they can predict actual surrounding words better...

Cs224n

cs224n - Lecture 1. Word Vectors

Objectives The foundations of the effective modern methods for deep learning applied to NLP; from basics to key methods used in NLP: RNN, Attention, Transformers, etc.) A big picture understanding of human languages and the difficulties in understanding and producing them An understanding of and...

Blog

DevEnv Setup

For purpose of setting local development environment on a new SSD storage, followed instructions below. Post for later use. Document Enable NVIDIA CUDA on WSL Install stable version of Windows 11 Enable WSL, install Ubuntu(20.04.3 LTS) On Windows Settings app, select Check for updates in...

Papers

Mask R-CNN

Mask R-CNN Kaiming He, Georgia Gkioxari, Piotr Dollár, Ross Girshick, ICCV 2017 https://github.com/facebookresearch/Detectron What’s different? Models so far R-CNN: 2-stage model for Object detection Fast R-CNN: RoI on feature map Faster R-CNN: RPN network Instance Segmentation Combining to tasks: Object detection(Fast/Faster R-CNN): classify individual objects...

Projects

Starfish detection w/ TF Object Detection API

TensorFlow - Help Protect the Great Barrier Reef Worked in Feb. 2022. to study object detection model Task: Underwater + Small object detection Score(IoU=0.50:0.95): mAP@100: 0.364686 / AR@100: 0.491768 / Expected F2: 0.459727 Direct link: kaggle notebook

Cs231n

cs231n - Lecture 15. Detection and Segmentation

Computer Vision Tasks Image Classification: No spatial extent Semantic Segmentation: No objects, just pixels Object Detection/ Instance Segmentation: Multiple objects Semantic Segmentation Paired training data: For each training image, each pixel is labeled with a semantic category. At test time, classify each pixel of a...

Cs231n

cs231n - Lecture 14. Visualizing and Understanding

What’s going on inside ConvNets? Visualizing what models have learned First layer: Visualize Filters At the first layer, we can visualize the raw weights and see gabor-like features. While the higher layers are about the weights to the activations from the layer before, it is...

Papers

Unsupervised Representation Learning by Predicting Image Rotations

Unsupervised Representation Learning by Predicting Image Rotations Gidaris et al. 2018 https://github.com/gidariss/FeatureLearningRotNet ConvNet: (+) Unparalleled capacity to learn high level semantic image features (-) Require massive amounts of manually labeled data, expensive and impractical to scale $\rightarrow$ Unsupervised Learning Unsupervised semantic feature learning: Learn image...

Cs231n

cs231n - Lecture 13. Self-Supervised Learning

Self-Supervised Learning Generative vs. Self-supervised Learning Both aim to learn from data without manual label annotation Generative learning aims to model data distribution $p_{data}(x)$, e.g., generating realistic images. Self-supervised learning methods solve “pretext” tasks that produce good features for downstream tasks. Learn with supervised learning...

Cs231n

cs231n - Lecture 12. Generative Models

Supervised vs. Unsupervised Supervised Learning: Data: $(x,y)$; y is label Goal: Learn a function to map $x\rightarrow y$ Unsupervised Learning: Data: x; no labels Goal: Learn some underlying hidden structure of the data Generative Modeling Given training data, generate new samples from same distribution Objectives:...

Cs231n

cs231n - Lecture 11. Attention and Transformers

Attention with RNNs Image Captioning using spatial features Input: Image I Output: Sequence y $= y_1, y_2, \ldots, y_T$ Encoder: $h_0 = f_W(z)$, where z is spatial CNN features, $f_W(\cdot)$ is an MLP Decoder: $y_t = g_v(y_{t-1}, h_{t-1}, c)$, where context vector c is often...

Cs231n

cs231n - Lecture 10. Recurrent Neural Networks

RNN: Process Sequences one to one; vanilla neural networks one to many; e.g. Image Captioning(image to sequence of words) many to one; e.g. Action Prediction(video sequence to action class) many to many(1); e.g. Video Captioning(video sequence to caption) many to many(2); e.g. Video Classification on...

Search Darron's Devlog