Discover more from Ankur’s Newsletter
New Book Launch!
Announcing Applied Natural Language Processing in the Enterprise, my latest book
I'm super excited to announce the launch of my latest book, Applied Natural Language Processing in the Enterprise! Huge shoutout to my co-author, Ajay, who has worked tirelessly with me on this for more than a year. Our book is a technical book on natural language processing (NLP), a subfield of machine learning and artificial intelligence that deals with human (aka natural) language.
For decades, people have used computers and calculators to crunch numbers, avoiding mental math as much as possible, but we have not had the luxury of using computers to process text in meaningful ways — until recently, that is.
Now, advances in natural language processing have made it possible for computers to perform tasks such as reading comprehension, question answering, text classification, text summarization, text generation, and translation. Today, we are surrounded by NLP-powered software in the form of Apple's Siri, Amazon's Alexa, Google Home, Google Search, Google Smart Compose, and Google Translate, just to name a few.
Our book helps developers get up to speed on the latest and most promising trends in NLP and, as the book's title implies, has a strong applied / practitioner's bent to it rather than a more theoretical or R&D angle. We hope our book will help the next generation of developers learn how to train and deploy real-world NLP applications in their organizations.
We hope you take this moment to celebrate the official book launch with us and share this with friends or colleagues that may be interested in learning more about the field of natural language processing.
Finally, thanks to our publishers at O'Reilly Media, we are offering the first ten readers that respond complimentary digital copies of the book for your support.
Also, for a limited time, we are sharing the first three chapters of the book for free on our Github page.
Thank you, and happy reading!
A preview of the book
Below is a short excerpt from the first chapter of our book.
What is NLP?
Let’s begin by defining what natural language processing is. Here is how NLP is commonly defined.
Natural language processing (NLP) is a subfield of linguistics, computer science, information engineering, and artificial intelligence concerned with the interactions between computers and human (natural) languages, in particular how to program computers to process and analyze large amounts of natural language data. Challenges in natural language processing frequently involve speech recognition, natural language understanding, and natural language generation.
Let’s unpack this definition. When we say "natural language," we mean "human language" as opposed to programming languages. Natural language refers to not only textual data but also to speech and audio data.
Great, but so what if computers can now work with large amounts of text, speech, and audio data? Why is this so important?
Imagine for a second the world without language. How would we communicate via text or speech? How would we read books, listen to music, or comprehend movies and TV shows? Life as we know it would cease to exist; we would be stuck in caveman days, able to process information visually but unable to share our knowledge with each other or communicate in any meaningful way. One of the major leaps in human history is the formation of a human (aka "natural") language, which allowed humans to communicate with one another, form groups, and operate as collective units of people instead of as solo individuals.
Likewise, if machines can work with only numerical and visual data but cannot process natural language, machines would be limited in the number and variety of applications they would have in the real world. Without the ability to handle natural language, machines will never be able to approach general artificial intelligence or anything that resembles human intelligence today.
Fortunately, machines can now finally process natural language data reasonably well. Let’s explore what commercial applications are possible because of this relatively newfound ability of computers to work with natural language data.
Because of the advances in NLP, machines are able to handle a broad array of natural language tasks, at least in a rudimentary way. Here are some common applications of NLP today.
Machine translation: Machine translation is the process of using machines to translate from one language to another without any human intervention. By far the most popular example of this is Google Translate, which supports over 100 languages and serves over 500 million people daily. When it was first launched in 2006, the performance of Google Translate was notably worse than what it is today. Performance today is fast approaching human expert-level. (For more, read The New York Times article from 2016 on Google's neural machine translation.)
Speech recognition: It may sound shocking, but voice recognition technology has been around for over 50 years. None of the voice recognition software had good performance or gone mainstream until very recently, driven by the rise of deep learning. Today, Amazon Alexa, Apple Siri, Google Assistant, Microsoft Cortana, digital voice assistants in your car, and other software are now able to recognize speech with such a high level of accuracy that the software is able to process the information real-time and answer in a mostly reasonable way. Even as little as fifteen years ago, the ability of such machines to recognize speech and respond in a coherent manner was abysmal.
Question answering: For these digital assistants to deliver a delightful experience to humans asking questions, speech recognition is only the first half of the job. The software needs to (a) recognize the speech (b) given the speech recognized, retrieve an appropriate response. This second half is known as question answering (QA).
Text summarization: One of the most common tasks humans do everyday, especially in white collar desk jobs, is read long form documents and summarize the contents. Machines are now able to perform this summarization, creating a shorter summary of a longer text document. Text summarization reduces the reading time for humans. Humans that analyze lots of text daily (i.e., lawyers, paralegals, business analysts, students, etc.) are able to shift through the machine-generated short summaries of long form documents and then, based on the summaries, choose the relevant documents to read more thoroughly.
Chatbots: If you have spent some time perusing websites recently, you may have realized that more and more sites now have a chatbot that automatically chimes in to engage the human user. The chatbot usually greets the human in a friendly, nonthreatening manner and then asks the user questions to gauge the purpose and intent of the visit to the site. The chatbot then tries to automatically answer the questions without any human intervention. Such chatbots are now automating digital customer engagement.
Text-to-speech and speech-to-text: Software is now able to convert text to high-fidelity audio very easily. For example, Google Cloud Text-to-Speech is able to convert text into human-like speech in more than 180 voices across over 30 languages. Likewise, Google's Cloud Speech-to-Text is able to convert audio to text for over 120 languages, delivering a truly global offering.
Voicebots: Ten years ago, automated voice agents were clunky. Unless humans responded in a fairly constrained manner (e.g., with yes or no type responses), the voice agents on the phone could not process the information. Now, AI voicebots like those provided by Voiq are able to help augment and automate calls for sales, marketing, and customer success teams.
Text and audio generation: Years ago, text generation relied on templates and rules-based systems. This limited the scope of application. Now, software is able to generate text and audio using machine learning, broadening the scope of application considerably. For example, Gmail is now able to suggest entire sentences based on previous sentences you've drafted, and it's able to do this on the fly as you type. While natural language generation is best at short blurbs of text (partial sentences), soon such systems may be able to produce reasonably good long form content. A popular commerical application of natural language generation is data-to-text software, which generates textual summaries of databases and data sets. Data-to-text software includes data analysis as well as text generation. Firms in this space include Narrative Science and Automated Insights.
Sentiment analysis: With the explosion of social media content, there is an ever-growing need to automate customer sentiment analysis, dissecting tweets, posts, and comments for sentiment such as positive vs. negative vs. neutral or angry vs. sad vs. happy. Such software is also known as emotion AI.
Information extraction: One major challenge in NLP is creating structured data from unstructured and/or semi-structured documents. For example, named entity recognition software is able to extract people, organizations, locations, dates, and currencies from long form texts such as mainstream news. Information extraction also involves relationship extraction, identifying the relations between entities, if any.
The number of NLP applications in enterprise has exploded over the past decade, ranging from speech recognition and question and answering to voicebots and chatbots that are able to generate natural language on their own. This is quite astounding given where the field was a few decades ago.
To put the current progress in NLP into perspective, let’s walk through how NLP has progressed, starting from its origins in 1950.
The field of natural language processing has been around for nearly 70 years. Perhaps most famously, Alan Turing laid the foundation for the field by developing the Turing test in 1950. The Turing test is a test of a machine's ability to demonstrate intelligence that is indistinguishable from that of a human. For the machine to pass the Turing test, the machine must generate human-like responses such that a human evaluator would not be able to tell whether the responses were generated by a human or a machine (i.e., the machine's responses are of human quality). (For more on the Turing test, please refer to the Wikipedia article.)
The Turing test launched significant debate in the then-nascent artificial intelligence field and spurred researchers to develop natural langugage processing models that would serve as building blocks for a machine that someday may pass the Turing test, a search that continues to this day.
Like the broader field of artificial intellgience, NLP has had many booms and busts, lurching from hype cycles to AI winters. In 1954, Georgetown University and IBM successfully built a system that could automatically translate more than sixty Russian sentences to English. At the time, researchers at Georgetown University thought machine translation would be a solved problem within three to five years. The success in the U.S. also spurred the Soviet Union to launch similar efforts. The Georgetown-IBM success coupled with the Cold War mentality led to increased funding for NLP in these early years.
However, by 1966, progress had stalled, and the Automatic Language Processing Advisory Committee (known as ALPAC) - a U.S. government agency set up to evaluate the progress in computational linguistics - released a sobering report. The report stated that machine translation was more expensive, less accurate, and slower than human translation and unlikely to reach human-level performance in the near future. The report led to a reduction in funding for machine translation research. Following the report, research in the field nearly died for almost a decade.
Despite these setbacks, the field of NLP re-emerged in the 1970s. By the 1980s, computational power had increased significantly and costs had come down sufficiently, opening up the field to many more researchers around the world.
In the late 1980s, NLP rose in prominence again with the release of the first statistical machine translation systems, led by researchers at IBM's Thomas J. Watson Research Center. Prior to the rise of statistical machine translation, machine translation relied on human hand-crafted rules for language. These systems were called rules-based machine translation. The rules would help correct and control mistakes that the machine translation systems would typically make, but crafting such rules was a laborious and painstaking process. The machine translation systems were also brittle as a result; if the machine translation systems encountered edge case scenarios for which rules had not been developed, the machine translation systems would fail, sometimes egregiously.
Statistical machine translation helped reduce the need for human hand-crafted rules. Statistical machine translation relied much more heavily on learning from data. Using a bilingual corpus with parallel texts as data (i.e., two texts that are identical except for the language they are written in), such systems would carve sentences into small subsets and translate the subsets segment-by-segment from the source language to the target language. The more data (i.e., bilingual text corpora) the system would have, the better the translation.
Statistical machine translation would remain the most widely studied and used machine translation method until the rise of neural machine translation in the mid-2010s.
By the 1990s, such successes led researchers to expand beyond text into speech recognition. Speech recognition, like machine translation, had been around since the early 1950s, spurred by early successes by the likes of Bell Labs and IBM. But speech recognition systems had severe limitations. In the 1960s, for example, such systems could take voice commands for playing chess but not do much else.
By the mid-1980s, IBM applied a statistical approach to speech recognition and launched a voice activated typewriter called Tangora, which could handle a 20,000-word vocabulary.
DARPA, Bell Labs, and Carnegie Mellon University also had similar successes by the late 1980s. Speech recognition software systems by then had larger vocabularies than the average human and could handle continuous speech recognition, a milestone in the history of speech recognition.
In the 1990s, several researchers in the space left research labs and universities to work in industry, which led to more commercial applications of speech recognition and machine translation.
Today's NLP heavyweights such as Google hired their first speech recognition employees in 2007. The U.S. government also got involved then; the National Security Agency began tagging large volumes of recorded conversations for specific keywords, facilitating the search process for NSA analysts.
By the early 2010s, NLP researchers, both in academia and industry, began experimenting with deep neural networks for NLP tasks. Early deep learning-led successes came from a deep learning method called long short-term memory (LSTM). In 2015, Google used such a method to revamp Google Voice.
Deep learning methods led to dramatic performance improvements in NLP tasks, spurring more dollars into the space. These successes have led to a much deeper integration of NLP software in our everyday lives.
For example, cars in the early 2010s had voice recognition software that could handle a limited set of voice commands; now cars have tech that could handle a much broader set of natural language commands, inferring context and intent much more clearly.
Looking back today, progress in NLP was slow but steady, moving from rules-based systems in the early days to statistical machine translation by the 1980s and to neural network-based systems by the 2010s. While academic research in the space has been fierce for quite some time, NLP has become a mainstream topic only recently. Let's examine the main inflection points over the past several years that have helped NLP become one of the hottest topics in AI today.
NLP and computer vision are both sub-fields of artificial intelligence, but computer vision has had more successful commerical successes to date. Computer vision had its inflection point in 2012 (the so-called "ImageNet" moment) when a deep learning-based solution called AlexNet decimated the previous error rate of computer vision models.
In the years since 2012, computer vision has powered applications such as auto-tagging of photos and videos, self-driving cars, cashier-less stores, facial recognition-powered authentication of devices, radiology diagnoses, and more.
NLP has been a relatively late bloomer by comparison. NLP made waves from 2014 onwards with the release of Amazon Alexa, a revamped Apple Siri, Google Assistant, and Microsoft Cortana. Google also launched a much improved version of Google Translate in 2016, and now chatbots and voicebots are much more commonplace.
That being said, it wasn't until 2018 that NLP had its very own ImageNet moment with the release of large pretrained language models trained using the Transformer architecture; the most notable of these was Google's BERT, which was launched in November 2018.
In 2019, generative models such as OpenAI's GPT-2 made splashes, generating new content based on previous content on the fly, a previously insurmountable feat. In 2020, OpenAI released an even larger and more impressive version called GPT-3, building on its previous successes.
Heading into 2021 and beyond, NLP is now no longer an experimental sub-field of AI. Along with computer vision, NLP is now poised to have many broad based applications in the enterprise. With this book, we hope to share some of concepts and tools that will help you build some of these applications at your company.
A Final Word
There is not one single approach to solving NLP tasks. The three dominant approaches today are rule-based, traditional machine learning (statistical-based), and neural network-based.
Let's explore each approach.
Rule-based NLP: Traditional NLP software relies heavily on human-crafted rules of languages; domain experts, typically linguists, curate these rules. You can think of these rules as regular expression or pattern-matching. Rule-based NLP perform well in narrowly scoped out use cases but typically do not generalize well. More and more rules are necessary to generalize such a system, and this makes rule-based NLP a labor intensive and brittle solution compared to the other NLP approaches. Here are examples of rules in a rule-based system: words ending in -ing are verbs, words ending in -er or -est are adjectives, words ending in 's are possessives, etc. Think of how many rules we would need to create by hand to make a system that could analyze and process a large volume of natural language data. Not only would the creation of rules be a mind-bogglingly difficult and tedious process, but we would also have to deal with the many errors that would occur from using such rules. We would have to create rules for rules to address all the corner cases for each and every rule.
Traditional (or Classical) Machine Learning: Traditional machine learning relies less on rules and more on data. It uses a statistical approach, drawing probability distributions of words based on a large annotated corpus. Humans still play a meaningful role; domain experts need to perform feature engineering to improve the machine learning model's performance. Features include capitalization, singular vs. plural, surrounding words, etc. After creating these features, you would have to train a traditional ML model to perform NLP tasks, e.g. text classification. Since traditional ML uses a statistical approach to determine when to apply certain features or rules to process language, a traditional ML-based NLP is easier to build and maintain than a rule-based system. It also generalizes better than rule-based NLP.
Neural Networks: Neural networks address the shortcomings of traditional machine learning. Instead of requiring humans to perform feature engineering, neural networks will "learn" the important features via representation learning. To perform well, these neural networks just need copious amounts of data. The amount of data required for these neural nets to perform well is substantial, but, in today's internet age, data is not too hard to acquire. You can think of neural networks as very powerful function approximators or "rule" creators; these rules and features are several degrees more nuanced and complex than the rules created by humans, allowing for more automated learning and more generalization of the system in processing natural language data.
Of these three, the neural network-based branch of NLP, fueled by the rise of very deep neural networks (i.e., deep learning), is the most powerful and the one that has led to many of the mainstream commercial applications of NLP in recent years.
In this book, we will focus mostly on neural network-based approaches to NLP, but we will also explore traditional machine learning approaches, too. The former has state-of-the-art performance in many NLP tasks, but traditional machine learning is still actively used in commercial applications.
We won't focus much on rule-based NLP, but, since it has been around for decades, you will not have difficulty finding other resources on rule-based NLP. Rule-based NLP does have a room among the other two approaches but usually only to deal with edge cases.
Hope you enjoyed this preview. Thanks for reading.