This article will go through a use-case of using CNNs for NLP with practical details of implementation on PyTorch & Python. It is based on the paper “Character-Aware Neural Language Models”.

Photo by Alina Grubnyak on Unsplash


Language models: The language model assigns a probability to a sentence or a phrase. Given some words, we can predict the next word which would assign the highest probability to the phrase. We often see these being used for text completion for messaging services

The children play in the ________

A good language would be able to predict the next word as ‘park’ or something else which makes sense.

This article aims to provide intuition for word embeddings using information from some of the most famous papers in the field.

Photo by Markus Spiske on Unsplash

Why do we need word vectors?

The other choice that we have is to represent each word in a document as a one-hot encoded vector. Let’s try to think about some problems with this.

  1. Vocabulary Size: Each word would be represented by a shape as long as your vocabulary. This could be in the order of the millions.

Photo by Compare Fibre on Unsplash

This article is to give a flavor for the reader to explore the topic more in-depth, on their own using the resources mentioned at the end.

Whether you’re doing least-squares approximation or reading a new paper in deep learning a fundamental understanding of linear algebra is critical to internalize key concepts.

The gist of Linear Algebra

Firstly, a few definitions that we should all know

Column space: You could imagine each column of the matrix as a vector in space. Now, the space spanned by taking linear combinations of all the independent vectors in the matrix makes up the column space.


ML/DS enthusiast

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store