2017 has seen role of big data analytics grow and diversify it’s application into various business domains, ranging from retail business to governance sphere. Analytics has facilitated the use of unstructured data into usable insights. As business house continue to take leaps of growth with the revolutionary data analytics, it is expected that this growing and expanding vertical shall leave an indelible mark in 2018 too.
Nowadays, the size of the data that is being generated and created in different organizations is increasing drastically. Due to this large amount of data, several areas in artificial intelligence and data science have been raised.
Businesses who don’t effectively use their data will be losing $1.2 trillion to their competitors every year by 2020. By that time, experts predict that there will be a 4300% increase in annual data production.
In order to stay competitive, companies need to find a way to leverage data into actionable strategies. Data science and artificial intelligence are the key to maximizing data utilization.
Predictive Analytics is among the most useful applications of data science.
Using it allows executives to predict upcoming challenges, identify opportunities for growth, and optimize their internal operations.
So to understand thoroughly about important role played by analytics in technology we will start with basics,
What is Predictive Analytics?
According to Wikipedia, “Predictive analytics encompasses a variety of statistical techniques from predictive modeling, machine learning, and data mining that analyze current and historical facts to make predictions about future or otherwise unknown events.”
Predictive analytics is the branch of the advanced analytics which is used to make predictions about unknown future events. Predictive analytics uses many techniques from data mining, statistics, modelling, machine learning, and artificial intelligence to analyse current data to make predictions about future. It uses a number of data mining, and analytical techniques to bring together the management, information technology, and modelling business process to make predictions about future. The patterns found in historical and transactional data can be used to identify risks and opportunities for future. Predictive analytics models capture relationships among many factors to assess risk with a particular set of conditions to assign a score, or weight-age. By successfully applying predictive analytics the businesses can effectively interpret big data for their opportunities.
Predictive Analytics can also be defined using techniques, tools and technologies that use data to find models – models that can anticipate outcomes with a significant probability of accuracy.
Data Scientist explore data, formulate hypothesis, and use algorithms to find predictive models
The six steps of Predictive Analytics are
1. Understand and Collect Data
2. Prepare and Clean Data
3. Create Model using Statistical & Machine Learning Algorithm
4. Evaluate the model to make sure it will work
5. Deploy the model, use the model in applications
6. Monitor the model, Measure the effectiveness of the model in the real world.
Predictive analytics can be further categorized as –
- Predictive Modelling –What will happen next, if ?
- Root Cause Analysis-Why this actually happened?
- Data Mining- Identifying correlated data.
- Forecasting- What if the existing trends continue?
- Monte-Carlo Simulation – What could happen?
- Pattern Identification and Alerts –When should an action be invoked to correct a process.
Below are some of the important reasons for using Predictive Analytics by organisation. Organizations are turning to predictive analytics to help solve difficult problems and uncover new opportunities. Common uses include:
Detecting fraud. Combining multiple analytics methods can improve pattern detection and prevent criminal behavior. As cybersecurity becomes a growing concern, high-performance behavioral analytics examines all actions on a network in real time to spot abnormalities that may indicate fraud, zero-day vulnerabilities and advanced persistent threats.
Optimizing marketing campaigns. Predictive analytics are used to determine customer responses or purchases, as well as promote cross-sell opportunities. Predictive models help businesses attract, retain and grow their most profitable customers.
Improving operations. Many companies use predictive models to forecast inventory and manage resources. Airlines use predictive analytics to set ticket prices. Hotels try to predict the number of guests for any given night to maximize occupancy and increase revenue. Predictive analytics enables organizations to function more efficiently.
Reducing risk. Credit scores are used to assess a buyer’s likelihood of default for purchases and are a well-known example of predictive analytics. A credit score is a number generated by a predictive model that incorporates all data relevant to a person’s creditworthiness. Other risk-related uses include insurance claims and collections.
The types of Modeling techniques used in Predictive Analytics
1. Classifiers : An algorithm that maps the input data to a specific category. This algorithm is used in Predictive Data Classification process. This process has two stages: The Learning stage and Prediction stage.
This supervisor algorithm used for categorise structured or unstructured data into different sections by tagging them with unique classification tag attribute. This process of classification is done after learning process. The goal is to teach your model to extract and discover hidden relationships and rules — the classification rules from training data. The model does so by employing a classification algorithm.
The prediction stage that follows the learning stage consists of having the model predict new class tag attributes or numerical values that classify data it has not seen before (that is, test data). The main goal of a classification problem is to identify the category/class to which a new data will fall under.
The following are the steps involved in building a classification model:
i. Initialise the classifier to be used.
ii. Train the classifier: All classifiers in scikit-learn uses a fit(X, y) method to fit the model(training) for the given train data X and train label y.
iii. Predict the target: Given an unlabelled observation X, the predict(X) returns the predicted label y.
iv. Evaluate the classifier model.
Some of the examples of Classification Algorithms :
1. Logistic Regression
2. Naive Bayes
3. Stochastic Gradient Descent
4. K-Nearest Neighbours
5. Decision Tree
6. Random Forest
7. Support Vector Machine
A Recommendation system or Recommenders works in well-defined, logical phases viz., data collection, ratings, and filtering. These phases are described below.
• Recommender System helps match users with item
• Implicit or explicit user feedback or item suggestion
• Different Recommender System designs are based on the availability of data or content/context of the data.
Recommendation allows you to make recommendations (similarly to Association Rules or Market Basket Analysis) by generating rules (for example, purchasing Product A leads to purchasing Product B). Recommendation uses the link analysis technique. This technique is optimized to work on large volumes of transactions. Recommendation triggers all the existing rules in a projected graph whose antecedent is a neighbor of the given user in the bipartite graph. Recommendation provides a specialized workflow to make it easy to obtain a set of recommendations for a given customer.
How does the Recommendation system capture the details? If the user has logged in, then the details are extracted either from an HTTP session or from the system cookies. In case the Recommendation system depends on system cookies, then the data is available only till the time the user is using the same terminal. Events are fired almost in every case — a user liking a Product or adding it to a cart and purchasing it. So that is how user details are stored. But that is just one part of what Recommenders do.
Ratings are important in the sense that they tell you what a user feels about a product. User’s feelings about a product can be reflected to an extent in the actions he or she takes such as likes, adding to shopping cart, purchasing or just clicking. Recommendation systems can assign implicit ratings based on user actions.
Filtering means filtering products based on ratings and other user data. Recommendation systems use three types of filtering: collaborative, user-based and a hybrid approach. In collaborative filtering, a comparison of users’ choices is done and recommendations given. For example, if user X likes products A, B, C, and D and user Y likes products A, B, C, D and E, then it is likely that user X will be recommended product E because there are a lot of similarities between users X and Y as far as choice of products is concerned.
Several reputed brands such as Social Media Ecosystems use this model to provide effective and relevant recommendations to the users consuming those services. In user-based filtering, the user’s browsing history, likes and ratings are taken into account before providing recommendations. Many companies also use a hybrid approach. Netflix is known to use a hybrid approach.
While big data and Recommendation engines have already proved an extremely useful combination for big corporations, it raises a question of whether companies with smaller budgets can afford such investments. Powerful media recommendation engines can be built for anything from movies and videos to music, books, and products – think Netflix, Pandora, or Amazon.
Using unsupervised techniques like clustering, we can seek to understand the relationships between the variables or between the observations by determining whether observations fall into relatively distinct groups. During Clustering case, most of the data is categorised using unsupervised learning — so we don’t have response variables telling us whether a customer is a frequent shopper or not. Hence, we can attempt to cluster the customers on the basis of the variables in order to identify distinct customer groups. There are other types of unsupervised statistical learning including k-means clustering, hierarchical clustering, principal component analysis, etc.
Clustering is an unsupervised data mining technique where the records in a data set are organised into different logical groupings. The groupings are in such a way that records inside the same group are more similar than records outside the group. Clustering has a wide variety of applications ranging from market segmentation to customer segmentation, electoral grouping, web analytics, and outlier detection.
Clustering is also used as a data compression technique and data preprocessing technique for supervised data mining tasks. Many different data mining approaches are available to cluster the data and are developed based on proximity between the records, density in the data set, or novel application of neural networks.
Clustering can help us explore the dataset and separate cases into groups representing similar traits or characteristics. Each group could be a potential candidate for a Category/class. Clustering is used for exploratory data analytics, i.e., as unsupervised learning, rather than for confirmatory analytics or for predicting specific outcomes. Examples of several interesting case-studies, including Divorce and Consequences on Young Adults, Paediatric Trauma, and Youth Development, demonstrate hierarchical clustering,
4. Numerical, time series forecasting :A time series is a sequence of measurements over time, usually obtained at equally spaced intervals
Any metric that is measured over regular time intervals forms a time series. Analysis of time series is commercially importance because of industrial need and relevance especially w.r.t forecasting (demand, sales, supply etc).
An Ordered sequence of observations of a variable or captured object at equally distributed time interval. Time series is anything which is observed sequentially over the time at regular interval like hourly, daily, weekly, monthly, quarterly etc. Time series data is important when you are predicting something which is changing over the time using past data. In time series analysis the goal is to estimate the future value using the behaviours in the past data.
There are many statistical techniques available for time series forecast, however we have found few effective ones which are listed below:
Techniques of Forecasting:
1. Simple Moving Average (SMA)
2. Exponential Smoothing (SES)
3. Autoregressive Integration Moving Average (ARIMA)
4. Neural Network (NN)Croston
Components of a Time Series
• Secular Trend
• Cyclical Variation
– Rises and Falls over periods longer than
• Seasonal Variation
– Patterns of change within a year, typically
• Residual Variation
Components of a Time Series
Time series data mining combines traditional data mining and forecasting techniques. Data mining techniques such as sampling, clustering and decision trees are applied to data collected over time with the goal of improving predictions.
To know more in detail about the AI and Machine Learning and we will explore Predictive Analytics:
Some of the popular Predictive Algorithms are given below:
1. K-means Clustering
2. Association rules
3. Boosting trees
5. Cluster Analysis
6. Feature Selection
7. Independent Component Analysis
8. Kohonen Networks (SOFM)
9. Neural Networks
10. Social network analysis (SNA)
11. Random Forest (Decision Trees)
12. Mars regression splines
13. Linear and logistic regression
14. Naive Bayesian classifiers
16. Partial Least Squares
17. Response Optimisation
18. Root cause analysis.
In my next blog I will be explaining clearly each one of the predictive analytics algorithms mentioned above, starting with K-means Clustering. I hope you liked the article
please let you know with your comments what more would like to see in the upcoming articles on Predictive Analytics. Happy Reading !!