top of page
Search

Translating Noise: Sentiment Analysis

  • Writer: Faline Rezvani
    Faline Rezvani
  • Sep 17, 2024
  • 3 min read

Humans like to talk.  As vocal learners enjoying apex predator status, we fancy a good chinwag for social, practical, and emotional purposes.  Given our love for loquacity, deciphering the code of human communication hidden within an abundance of chatter can be challenging.

Businesses can use our language to their advantage by clearly communicating values and visions, speaking to what’s important for an individual.  Communicating values has become increasingly urgent, with contributions to social causes being one of the most effective ways to reach consumers born after 1965 (WP Engine, n.d.).  If outgoing messages can be put to such powerful use, can companies use language to their advantage for incoming messages as well?

Measuring content effectiveness has been found to be among the top ways successful businesses differentiate themselves (Stahl, 2023).  A useful thermometer to test the temperature of a company’s online presence is the process of sentiment analysis, or evaluating the emotional tone behind a body of text.

To understand the body of text, we take separate word meanings, or semantics, mix them with some world knowledge, or pragmatics, shake them up with context, and accommodate for ambiguity, irony, and sarcasm: a puzzling, tedious task.  Luckily, machines excel in that arena.

The field of natural language processing (NLP) explores a machine’s capacity to learn and understand human language.  Each word in the human language is linked to meaning, meaning can be linked to distance, which can be linked to a number, which can be fed to a machine.  Machine learning (ML) can help us to distill the sentiment behind human language efficiently and accurately.


NLP In Action

 

Among the many open datasets available for ML exploration, the IMDb Review dataset can be found here: IMDB Dataset of 50K Movie Reviews - Kaggle.com.  The following will illustrate how a sample of reviews can be used as input for a ML model, which will learn from the data and eventually be able to classify new reviews as positive, or negative.
 
After examining the movie review samples, the hardest task will be not spending hours reading them. 


Turning the sentiment to a numerical value will be much less time consuming.  Positive reviews will be encoded to the value, '1’ and negative reviews, ‘0’.


The text must then be tidied.  Human language contains what’s referred to in NLP as noise, or information not contributing to sentiment.  One element of noise that can be isolated is a list of stop words, words commonly used, such as contractions, yet not useful to meaning.  Words can also be whittled down to their root by removing suffixes.


The tidy text is turned into numerical input through the process of word embedding, or vectorization.


The numerical samples with their accompanying sentiment classes are split into two datasets, one to train the ML model and one by which to test its predictions.


We choose the algorithm, or collection of code and computational tools designed for a specific purpose, to assist with our particular task, classification.  We choose adaptive boosting (Ada Boost), a hard-working algorithm that will continue to test error rates, readjusting weights until the lowest error rate is reached.



The Ada Boost model performs well with an average training accuracy of 81.29% and average test accuracy of 80.20%.  Further understanding of task-dependent noise will enhance the performance of the chosen algorithm.  Emojis can be helpful in sentiment analysis, but cannot be used as input for this particular model.


Data Sources

 

Armed with a text classification model, an organization can begin their data collection journey.  The ideal path establishes scalable solutions to ingest a variety of file types.

 

  • Social Media Posts

  • Surveys

  • Google Reviews

  • Employee Interactions

  • Podcasts

  • Business Journals

 

Companies can use the results of their sentiment analysis efforts to establish content effectiveness KPIs.  Through the eyes of consumers, a company can deepen their understanding of their place in terms of environmental, social, and governance (ESG) solutions.  This helps a company to build sustainability KPIs, a crucial step forward for all industries.
 
For the full code utilizing Python’s Natural Language Toolkit (NLTK) library and Scikit-learn’s feature extraction module, see the GitHub repository.

 

 



 

 

“We have two ears and one mouth so that we can listen twice as much as we speak.”

 

- Epictetus

 

 

 



 

WP Engine. (n.d.). Generation Influence: Reaching Gen Z in the new digital paradigm. Page 19. https://wpengine.com/wp-content/uploads/2020/08/Generation-Influence-U.S.-Report.pdf
 
Stahl, S. (2023, Oct. 18). 14th Annual B2B Content Marketing Benchmarks, Budgets, and Trends Outlook for 2024. https://contentmarketinginstitute.com/articles/b2b-content-marketing-trends-research/

Comments


bottom of page