A Simple Introduction to Natural Language Processing

By Elena Nisioti
A Simple Introduction to Natural Language Processing

The story of Natural Language Processing (NLP) starts way before Siri uttered its first words. Its roots can be traced back to the early realization that giving computers the ability to understand natural language, a task that a 2-year old can solve, is actually tougher than it sounds.

Whereas Artificial Intelligence (AI) represents our attempt to make computers think like us, NLP expresses our attempt to make them communicate with us. Until now, humans had adopted the reverse approach of learning specialized programming languages to interact with a computer. Recent successes of NLP prove that human-machine communication has entered a new era, where intelligent algorithms can take advantage of the richness of our languages.

“The limits of my language means the limits of my world.”

― Ludwig Wittgenstein

What is NLP used for?

Asking oneself how Natural Language Processing can be used by a computer is similar to asking: how can language be used by a human? There is no limit to NLP applications and it is the burden of the NLP practitioner to analyze their needs and how NLP can address them.

Arguably the greatest potential of NLP lies in its ability to equip AI with semantics. For instance, Sentiment Analysis is the process of reading between-the-lines of an opinionated text in order to understand a person’s emotional state. It is particularly useful for identifying public opinion in social media in order to recognize consumers’ needs and improve marketing policies.

Topic segmentation is another example, where a text is segmented and topics within it are identified. This technique is often combined with Big Data analysis and can be very useful in problems where social trends must be recognized, such as market analysis or interpreting political currents.

But how are these applications leveraged in today’s market?

Natural Language Processing with Virtual Assistants

Virtual Assistants (VAs) are probably the most representative example of how deep NLP has entered into our lives. Popular commercial products, such as Siri, alexa, and Google Assistant, as well as academic-originated ones, such as LOLITA, are a few examples of VAs, and chances are you already have “discussed” with one of them.

Although there are various reasons why people prefer one assistant over another, these products exhibit a striking similarity. They all belong to large companies. An NLP system requires a lot of data for improving its learning algorithms and, not surprisingly, behemoths like Amazon and Google have an abundance of them. The reason why penetrating this market as a newcomer is difficult is the success cycle: when a VA is good at its job, users are more motivated to converse with it, providing it, thus, with more data and contributing to an ever-improving balance of quality and satisfaction.

Natural Language Processing in the financial world

Time is money. Especially if you work in finances. In today’s financial market, companies are striving to battle the inherent uncertainty of a fast-moving economy by calculating performance indexes. Examples are the consensus estimate, which aim at characterizing how well a company is performing in comparison to predictions based on its status, and quick ratio, which quantifies a company’s ability to meet its short-term obligations with its most liquid assets. The accuracy of these metrics ultimately depends on the quality of the data used to calculate them, which is the reason why companies like Synechron and IBM RegTech have already leveraged NLP to enrich their databases. As this information is usually found in corporate documents, such as financial statements and management commentary, Optical Character Recognition is usually employed, an NLP technique that converts written text to machine-interpretable data.

The evolution of NLP techniques

By this point it must be obvious that NLP and AI are closely intertwined. Nevertheless, the techniques employed by NLP have not been AI-related from the start and their evolution is instructive for understanding how improving our technologies can help us solve problems of ever-increasing complexity.

Rule-based NLP

The first NLP systems were based on the premises that language follows particular grammar and syntax norms and were simply a set of hand-coded rules, called phrase structure rules. This approach was abandoned early in the 80s, as these rules were handcrafted by experts and, regardless of their complexity, could not handle many characteristics of natural language or perform essential tasks such as speech recognition.

Statistical NLP

Conversely, if we equip NLP with machine-learning algorithms it is possible to learn the rules of a language only by letting the algorithm observe large amounts of natural language data. The reason why statistical algorithms are so advantageous in NLP problems, is that they can generalize to unseen inputs, which, for example, accounts for the ability of VAs to respond to different accents.

When working with statistical models, it is important to understand the difference between supervised and unsupervised learning. In order to gather data for NLP applications, such as personalized advertisements about products, companies used to rely their analysis on reviews of users. But when was the last time you wrote a traditional review? Lately consumers have a tendency to tweet about their complaints, check-in to their favorite places or instagram their delicious food. In machine learning terms, this information is unlabeled, as there is no clear rating provided by the users. However, it is quite easy for a human to understand whether a user’s post is negative or positive, an ability that unsupervised learning and NLP attempt to imitate.

Natural Language Understanding

Upon opening the official developer page for alexa one encounters a term that has recently entered the market’s vocabulary: Natural Language Understanding (NLU). If NLP is the analysis of large amounts of natural language data, then NLU is the science that ensures that a machine comprehends. Naturally entangled with AI, this term can be used to describe VA systems so convincing, that interacting with them gives you the illusion of conversing with a real human.

Combining AI with NLP

In 2017 Facebook performed an ambitious experiment, as scientists desired to examine how two software bots that do not know English can learn to negotiate the splitting of objects (balls and hats) when conversing in natural language.

The outcome of the experiment was a bit perplexing. The bots managed to negotiate perfectly. And the language that they used was English – or at least looked like it:

Bob: i can i i everything else . . . . . . . . . . . . . .

Alice: balls have zero to me to me to me to me to me to me to me to me to

The experiment sparked controversy among the public, with journalists alerting about AI that “ignores” humans and develops its own incomprehensible intelligence and AI experts warning that the experiment simply failed.

Nevertheless, one thing is for sure. When combining advanced technologies, such as AI and NLP, researchers, marketing specialists and users must be cautious, as the result might be, quite unintuitively, neither intelligent nor natural.