Skip to main content

Cleaning Text Data

The first step with NLP is cleaning up your text.

Natural language processing (NLP) is a branch of machine learning that teaches computers to understand text data. However, text data isn't perfect and it is recommended that your text is cleaned before being used in language modeling applications.

Cleaning text simply refers to the process of transforming text into a more digestible form so that machine learning algorithms can perform better. This process often requires multiple steps and various techniques for producing suitable text data, most of which we will cover in this tutorial 🤝.

Without further ado, lets dive in!

Hex example project#