Courses
Institutions
Share
Recently I received a dump of data from a friend of mine who has an online store. As a data science enthusiast, I wanted to check for simple and common use cases. I wanted to understand
I decided to begin with simple predictive and clustering techniques. This blog is a walk-through of my learning and observations.
Be it Predictive Analysis or Clustered Analysis, the first step is to wrangle the data or clean it. The data I received consists of products, customers, sales, inventory, and wastage. There were a good number of customers, unfortunately, recorded with different names many a time, or a salesperson typing her own name (instead of the customer), the date column missing, capital / small letters being used interchangeably, and so on. The first few days went into reconciling the data. Then we wrote a script that can create a pipeline to handle multiple scenarios and then deposit cleaned data into a new repository.
Another important step for Predictive Analysis or Clustered Analysis is exploration. How much data is available? What is the distribution across categories? How many customers? etc. We identify some statistical information like – What does the average spend per customer? What does the average spend per item? What does the average spend per category of products? Is there a deviation in the sales across months, customers, etc.? What are the statistical values like mean, deviation, quantiles, etc? Calculate and append them to the existing data. Now, the data is ready for the next step.
There are several predictive algorithms starting from simple logistic and linear regression algorithms all the way to the transformer-based algorithms to predict the next word or sentence. It could also be to predict the presence of cancerous tissue in an MRI image.
Given the type of dataset, our use-case was to predict if a particular customer will buy or not buy a particular product. These types of Predictive Analytics are referred to as ‘Recommendations’.
Typically, we use techniques like the Apriori algorithm and basic collaborative filtering. The count of the sales in some categories was disproportionately higher than that of other categories. So, these techniques performed poorly.
Singular Value Decomposition (SVD) is another method to predict the probability of purchase. This is a popular technique used in movie or product recommendations. However, in this case, there were no ratings. So, to compensate, we calculated the number of times the product was purchased by the person out of all his visits and converted it to a rating of 1 to 5.
The algorithm was able to predict a value of 1 to 5 for each product. 5 implies a definite purchase and 1 implies that the chances are low. We calculate the algorithm on the n-1 purchases of data and then predicted the probability for the nth purchase. If we set 3 as the threshold, the algorithm was able to predict with an accuracy of 60-70%. This was just a simple prediction and not tuned or optimized.
Learn more about the role of predictive analysis in business analytics.
I then decided to apply a few clustering analysis algorithms to the dataset. The simple and the most popular one is the K-Means method. In this, each point becomes part of a single cluster. The cluster of the point is decided such that the sum of the square distance from the point to the cluster’s centroid is at the minimum. The centroid is adjusted after each iteration and the process is repeated until the stopping condition is met.
In our dataset, a mere count of the purchases or quantity of each product did not yield good results. However, once we started to add the statistical data, for example, the mean, median, deviation, and quantiles for each product and customer – there were some interesting clusters that began to evolve.
Once we clustered the data, we calculated the cluster value for each row. We visualized them using simple plots. Figure 1.1 is a mapping of the Average Cost spent by the customer to the average days between two purchases. We used the cluster value as the ‘hue’.
From the above image, we can infer that – there are many customers who spent an average of fewer than INR 1000 for every purchase. However, they were consistent and came frequently – (once every two weeks). There were a small set of people who are high-value and frequent (very important customers). There were a few customers who are not very frequent and of course, there was a set of non-returning customers.
Figure 1.2 is another plot of the ‘Category of Purchase’ to ‘Average Spend’ by the customer. Again, we use the cluster value as the ‘hue’.
Interestingly, the high valued customers are more interested in vegetables than other categories of products.
Clustering helped us to ask ourselves some very interesting questions
Technology giants like Amazon use Transformer Bases Agorithms, as well as Bandit Algorithms (reinforcement learning), causal reference, etc to make better predictions. They extensively use clustering techniques to find interesting insights. Very often to improve the effectiveness of the predictive algorithms.
(The purpose of this blog is to give readers a real-world application of predictive and clustering analysis. It tries to demonstrate how even simple and basic predictive and clustering techniques can generate some interesting insights.)
References:
https://scikit-learn.org/stable/modules/clustering.html
http://surpriselib.com/
https://plotly.com/
Information related to companies and external organizations is based on secondary research or the opinion of individual authors and must not be interpreted as the official information shared by the concerned organization.
Additionally, information like fee, eligibility, scholarships, finance options etc. on offerings and programs listed on Online Manipal may change as per the discretion of respective universities so please refer to the respective program page for latest information. Any information provided in blogs is not binding and cannot be taken as final.
Explore our online programs to become future-ready
Master of Business Administration Bachelor of Business AdministrationBachelor of Computer ApplicationsBachelor of CommerceMaster of Computer ApplicationsMaster of CommerceMaster of Arts in Journalism & Mass CommunicationMA in EconomicsMSc Data ScienceMSc Business AnalyticsPGCP Business AnalyticsPGCP Logistics and Supply ChainPGCP in Entrepreneurship and InnovationBachelor of ArtsMA in EnglishMA in SociologyMA in Political Science
Manipal University JaipurManipal Academy of Higher EducationManipal Institute of TechnologySikkim Manipal University
I authorize Online Manipal and its associates to contact me with updates & notifications via email, SMS, WhatsApp, and voice call. This consent will override any registration for DNC / NDNC.
Enter the code sent to your phone number to proceed with the application form
Edit
Resend OTP
COURSE SELECTED Edit
Bachelor of Business Administration (BBA) Manipal University Jaipur
Please leave this field empty. Submit
Enroll yourself to attend the upcoming webinar
Explore related degree courses & certification