top of page

This site was designed with the

website builder. Create your website today.Start Now

Search

From Viz to ML

Faline Rezvani
May 14, 2024
2 min read

Updated: May 28

ree

Colson, G. (2011). Leading British Phobias [Painting].

Data visualization can be used to clearly and quickly shed light on the distribution of habits, preferences, or even fears within a population. Even simple representations of open data can lay the foundation for a machine learning (ML) project.

Young People Survey is a publicly available dataset collected in 2013 from Slovakian participants aged 15-30 years (Sabo, M. n.d.). There are 1,010 samples and 157 features, several of which are devoted to phobias.

Understanding the Data

With this information, we can learn more about ophidiophobia, or the fear of snakes. In addition to rating various hobbies and interests, the survey participants were asked to rate their fear of snakes based on a 5-point scale, with 1 representing the lowest level of fear. Two quick visualizations show the instances of each rating, as well as the min., max., median, and interquartile range (IQR) between lower and upper quartiles of the rating instances.

ree

ree

To further understand the meaning associated with each rating, we can reference the Fear Cognition Scale (FCS) developed by Murad Salman Mirza (Mirza, M. 2018):

ree

Mirza, M. (2018). Fear Cognition Scale [Digital image].

In Tableau, we can enhance our initial bar chart:

ree

Now viewers can quickly and clearly see there are almost as many people with no fear of snakes as there are people with a critical fear.

Insight into Action

Learning about this large population uninhibited by the fear of snakes, an online snake enthusiast publication may want to optimize their advertising efforts. They can explore the statistics and relationships of this dataset and build a predictive model to help them target readers who are “not fearful” of snakes.

Using Colab, a secure Google Cloud-based programming environment, we can load the dataset and begin exploring. Rather than use all 157 features of the survey for this exercise, we only load 12 columns unrelated to demographics along with our 1 target column.

ree

Before being used to make predictions, the features must be inspected for relationships. For example, as the rating for ‘Music’ increases, will the rating for ‘Dancing’ also increase?

ree

Luckily, no. Features sharing a direct relationship can result in misleading ML models. With this heatmap we can see the most closely related features are ‘Cars’ and ‘Science and Technology’, but on a small enough scale that it won’t pose a problem.

Only after thoroughly dissecting the dataset with exhaustive exploratory analysis can the ML model life cycle at last begin.

ree

Increased awareness of the process underlying predictive modeling can help any individual build a framework of questions prior to collaborating on a ML project.

“The secret of getting ahead is getting started.”

- Mark Twain

Barnhill, J. (2023, Aug.). Specific Phobias - Mental Health Disorders - Merck Manuals Consumer Version.

Murza, M. (2018, Oct.). An Introduction to the Fear Cognition Scale (FCS) for the Digital Workplace (trainingmag.com).

Sabo, M. (n.d.). Young People Survey (kaggle.com).

Yau, J. (2023, Jan.) Seeing Ourselves in Greg Colson’s Quirky Pie Chart Paintings (hyperallergic.com).

Recent Posts

Enterprise Risk Management and RegTech: A Dynamic Duo

Enterprise Risk Management and RegTech: A Dynamic Duo

The Five Whys

Translating Noise: Sentiment Analysis

Translating Noise: Sentiment Analysis

Comments

bottom of page