Category: AI

ModelTaxonomy AI

Machine Learning Essentials – Machine Learning Taxonomy

This Artificial Intelligence post covers all essentials Machine Learning concepts required for a data engineer to know before he embarks upon journey towards Development of Deep Learning products.

I have listed below the most popular and often used Machine Learning concepts, read them and provide feedback to us:

actor-critic – Refers to a class of agent architectures, where the actor plays out a particular policy, while the critic learns to evaluate actor’s policy. Both the actor and critic are simultaneously improving by bootstrapping on each other.

agent – A system that is embedded in an environment, and takes actions to change the state of the environment. Examples include mobile robots, software agents, or industrial controllers.

average-reward methods – A framework where the agent’s goal is to maximize the expected payoff per step. Average-reward methods are appropriate in problems where the goal is maximize the long-term performance. They are usually much more difficult to analyze than discounted algorithms.

discount factor – A scalar value between 0 and 1 which determines the present value of future rewards. If the discount factor is 0, the agent is concerned with maximizing immediate rewards. As the discount factor approaches 1, the agent takes more future rewards into account. Algorithms which discount future rewards include Q-learning and TD(lambda).

dynamic programming (DP) is a class of solution methods for solving sequential decision problems with a compositional cost structure. Richard Bellman was one of the principal founders of this approach.

environment – The external system that an agent is “embedded” in, and can perceive and act on.

function approximator refers to the problem of inducing a function from training examples.
Standard approximators include decision trees, neural networks, and nearest-neighbor methods.

Markov decision process (MDP) – A probabilistic model of a sequential decision problem, where states can be perceived exactly, and the current state and action selected determine a probability distribution on future states. Essentially, the outcome of applying an action to a state depends only on the current action and state (and not on preceding actions or states).

model-based algorithms
– These compute value functions using a model of the system dynamics. Adaptive Real-time DP (ARTDP) is a well-known example of a model-based algorithm.

model-free algorithms – these directly learn a value function without requiring knowledge of the consequences of doing actions. Q-learning is the best known example of a model-free algorithm. For full details on this concept watch this video :

model – The agent’s view of the environment, which maps state-action pairs to probability distributions over states. Note that not every reinforcement learning agent uses a model of its environment. For full details on this concept watch this video :

Monte Carlo methods – A class of methods for learning of value functions, which estimates the value of a state by running many trials starting at that state, then averages the total rewards received on those trials.

policy – The decision-making function (control strategy) of the agent, which represents a mapping from situations to actions.

reward – A scalar value which represents the degree to which a state or action is desirable. Reward functions can be used to specify a wide range of planning goals (e.g. by penalizing every non-goal state, an agent can be guided towards learning the fastest route to the final state).

sensor – Agents perceive the state of their environment using sensors, which can refer to physical transducers, such as ultrasound, or simulated feature-detectors.

state – this can be viewed as a summary of the past history of the system, that determines its future evolution.

stochastic approximation – Most RL algorithms can be viewed as stochastic approximations of exact DP algorithms, where instead of complete sweeps over the state space, only selected states are backed up (which are sampled according to the underlying probabilistic model). A rich theory of stochastic approximation (e.g Robbins Munro) can be brought to bear to understand the theoretical convergence of RL methods.

TD (temporal difference) algorithms – A class of learning methods, based on the idea of comparing temporally successive predictions. Possibly the single most fundamental idea in all of reinforcement learning.

unsupervised learning – The area of machine learning in which an agent learns from interaction with its environment, rather than from a knowledgeable teacher that specifies the action the agent should take in any given state.

value function is a mapping from states to real numbers, where the value of a state represents the long-term reward achieved starting from that state, and executing a particular policy. The key distinguishing feature of RL methods is that they learn policies indirectly, by instead learning value functions. RL methods can be constrasted with direct optimization methods, such as genetic algorithms (GA), which attempt to search the policy space directly.

Parametric Models can be defined as models that first make an assumption about a function form, or shape of
function f ( ie linear). Then fits the model. This reduces estimating mapping function f to just estimating set of parameters, but if our assumption was wrong, will lead to bad results.

Algorithms that do not make strong assumptions about the form of the mapping function are called nonparametric machine learning algorithms. By not making assumptions, they are free to learn any functional form from the training data.

Supervised Models can be defined as models that fit input variables xi = (x 1 , x 2 , …x n ) to a known output variables y i = (y 1 , y 2 , …y n )

Unsupervised Models can be defined as models that take in input variables xi = (x1 , x2 , …xn ), but they do not have an associated output to supervise the training. The goal is to understand relationships between the variables or observations.

Blackbox Machine Learning models are models that make decisions, but we do not know what happens ”under the hood” e.g. deep learning, neural networks

Descriptive Machine Learning models are models that provide insight into why they make their decisions e.g. linear regression, decision trees

First-Principle models can be defined as models based on a prior belief of how the system under investigation works, incorporates domain knowledge on ad-hoc basis.

Data-Driven models can be defined as models based on observed correlations between input and output variables

Deterministic models are defined as models that produce a single ”prediction” e.g. yes or no, true or false

Stochastic models are defined as models that produce probability distributions over possible events

Flat models are models that solve problems on a single level,no notion of subproblems.
Flat clustering creates a flat set of clusters without any explicit structure that would relate clusters to each other.

Hierarchical models are models that solve several different nested subproblems. Hierarchical clustering creates a hierarchy of clusters.

There is very self explanatory video about these concepts, explained briefly about different modelling techniques and Models. Please have a look at it and provide the feedback.

Predictive Analytics AI

Predictive Analytics

2017 has seen role of big data analytics grow and diversify it’s application into various business domains, ranging from retail business to governance sphere. Analytics has facilitated the use of unstructured data into usable insights. As business house continue to take leaps of growth with the revolutionary data analytics, it is expected that this growing and expanding vertical shall leave an indelible mark in 2018 too.

Nowadays, the size of the data that is being generated and created in different organizations is increasing drastically. Due to this large amount of data, several areas in artificial intelligence and data science have been raised.

Businesses who don’t effectively use their data will be losing $1.2 trillion to their competitors every year by 2020. By that time, experts predict that there will be a 4300% increase in annual data production.

In order to stay competitive, companies need to find a way to leverage data into actionable strategies. Data science and artificial intelligence are the key to maximizing data utilization.

Predictive Analytics is among the most useful applications of data science.

Using it allows executives to predict upcoming challenges, identify opportunities for growth, and optimize their internal operations.

So to understand thoroughly about important role played by analytics in technology  we will start with basics,

What is Predictive Analytics?

According to Wikipedia, “Predictive analytics encompasses a variety of statistical techniques from predictive modeling, machine learning, and data mining that analyze current and historical facts to make predictions about future or otherwise unknown events.”


Predictive analytics is the branch of the advanced analytics which is used to make predictions about unknown future events. Predictive analytics uses many techniques from data mining, statistics, modelling, machine learning, and artificial intelligence to analyse current data to make predictions about future. It uses a number of data mining, and analytical techniques to bring together the management, information technology, and modelling business process to make predictions about future. The patterns found in historical and transactional data can be used to identify risks and opportunities for future. Predictive analytics models capture relationships among many factors to assess risk with a particular set of conditions to assign a score, or weight-age. By successfully applying predictive analytics the businesses can effectively interpret big data for their opportunities.

Predictive Analytics can also be defined using techniques, tools and technologies that use data to find models – models that can anticipate outcomes with a significant probability of accuracy.

Data Scientist explore data, formulate hypothesis, and use algorithms to find predictive models

The six steps of Predictive Analytics are
1. Understand and Collect Data
2. Prepare and Clean Data
3. Create Model using Statistical & Machine Learning Algorithm
4. Evaluate the model to make sure it will work
5. Deploy the model, use the model in applications
6. Monitor the model, Measure the effectiveness of the model in the real world.

Predictive analytics can be further categorized as –

  1. Predictive Modelling –What will happen next, if ?
  2. Root Cause Analysis-Why this actually happened?
  3. Data Mining- Identifying correlated data.
  4. Forecasting- What if the existing trends continue?
  5. Monte-Carlo Simulation – What could happen?
  6. Pattern Identification and Alerts –When should an action be invoked to correct a process.

Below are some of the important reasons for using Predictive Analytics by​ organisation. Organizations are turning to predictive analytics to help solve difficult problems and uncover new opportunities. Common uses include:

Detecting fraud. Combining multiple analytics methods can improve pattern detection and prevent criminal behavior. As cybersecurity becomes a growing concern, high-performance behavioral analytics examines all actions on a network in real time to spot abnormalities that may indicate fraud, zero-day vulnerabilities and advanced persistent threats.

Optimizing marketing campaigns. Predictive analytics are used to determine customer responses or purchases, as well as promote cross-sell opportunities. Predictive models help businesses attract, retain and grow their most profitable customers.

Improving operations. Many companies use predictive models to forecast inventory and manage resources. Airlines use predictive analytics to set ticket prices. Hotels try to predict the number of guests for any given night to maximize occupancy and increase revenue. Predictive analytics enables organizations to function more efficiently.

Reducing risk. Credit scores are used to assess a buyer’s likelihood of default for purchases and are a well-known example of predictive analytics. A credit score is a number generated by a predictive model that incorporates all data relevant to a person’s creditworthiness. Other risk-related uses include insurance claims and collections.


The types of Modeling techniques used in Predictive Analytics
1. Classifiers : An algorithm that maps the input data to a specific category. This algorithm is used in Predictive Data Classification process. This process has two stages: The Learning stage and Prediction stage.

This supervisor algorithm used for categorise structured or unstructured data into different sections by tagging them with unique classification tag attribute. This process of classification is done after learning process. The goal is to teach your model to extract and discover hidden relationships and rules — the classification rules from training data. The model does so by employing a classification algorithm.

The prediction stage that follows the learning stage consists of having the model predict new class tag attributes or numerical values that classify data it has not seen before (that is, test data). The main goal of a classification problem is to identify the category/class to which a new data will fall under.

The following are the steps involved in building a classification model:
i. Initialise the classifier to be used.
ii. Train the classifier: All classifiers in scikit-learn uses a fit(X, y) method to fit the model(training) for the given train data X and train label y.
iii. Predict the target: Given an unlabelled observation X, the predict(X) returns the predicted label y.
iv. Evaluate the classifier model.

Some of the examples of Classification Algorithms :
1. Logistic Regression
2. Naive Bayes
3. Stochastic Gradient Descent
4. K-Nearest Neighbours
5. Decision Tree
6. Random Forest
7. Support Vector Machine

2. Recommenders:
A Recommendation system or Recommenders works in well-defined, logical phases viz., data collection, ratings, and filtering. These phases are described below.
• Recommender System helps match users with item
• Implicit or explicit user feedback or item suggestion
• Different Recommender System designs are based on the availability of data or content/context of the data.

Recommendation allows you to make recommendations (similarly to Association Rules or Market Basket Analysis) by generating rules (for example, purchasing Product A leads to purchasing Product B). Recommendation uses the link analysis technique. This technique is optimized to work on large volumes of transactions. Recommendation triggers all the existing rules in a projected graph whose antecedent is a neighbor of the given user in the bipartite graph. Recommendation provides a specialized workflow to make it easy to obtain a set of recommendations for a given customer.

Data Collection:
How does the Recommendation system capture the details? If the user has logged in, then the details are extracted either from an HTTP session or from the system cookies. In case the Recommendation system depends on system cookies, then the data is available only till the time the user is using the same terminal. Events are fired almost in every case — a user liking a Product or adding it to a cart and purchasing it. So that is how user details are stored. But that is just one part of what Recommenders do.

Ratings are important in the sense that they tell you what a user feels about a product. User’s feelings about a product can be reflected to an extent in the actions he or she takes such as likes, adding to shopping cart, purchasing or just clicking. Recommendation systems can assign implicit ratings based on user actions.

Filtering means filtering products based on ratings and other user data. Recommendation systems use three types of filtering: collaborative, user-based and a hybrid approach. In collaborative filtering, a comparison of users’ choices is done and recommendations given. For example, if user X likes products A, B, C, and D and user Y likes products A, B, C, D and E, then it is likely that user X will be recommended product E because there are a lot of similarities between users X and Y as far as choice of products is concerned.

Several reputed brands such as Social Media Ecosystems use this model to provide effective and relevant recommendations to the users consuming those services. In user-based filtering, the user’s browsing history, likes and ratings are taken into account before providing recommendations. Many companies also use a hybrid approach. Netflix is known to use a hybrid approach.

While big data and Recommendation engines have already proved an extremely useful combination for big corporations, it raises a question of whether companies with smaller budgets can afford such investments. Powerful media recommendation engines can be built for anything from movies and videos to music, books, and products – think Netflix, Pandora, or Amazon.

3. Clusters
Using unsupervised techniques like clustering, we can seek to understand the relationships between the variables or between the observations by determining whether observations fall into relatively distinct groups. During Clustering case, most of the data is categorised using unsupervised learning — so we don’t have response variables telling us whether a customer is a frequent shopper or not. Hence, we can attempt to cluster the customers on the basis of the variables in order to identify distinct customer groups. There are other types of unsupervised statistical learning including k-means clustering, hierarchical clustering, principal component analysis, etc.

Clustering is an unsupervised data mining technique where the records in a data set are organised into different logical groupings. The groupings are in such a way that records inside the same group are more similar than records outside the group. Clustering has a wide variety of applications ranging from market segmentation to customer segmentation, electoral grouping, web analytics, and outlier detection.

Clustering is also used as a data compression technique and data preprocessing technique for supervised data mining tasks. Many different data mining approaches are available to cluster the data and are developed based on proximity between the records, density in the data set, or novel application of neural networks.

Clustering can help us explore the dataset and separate cases into groups representing similar traits or characteristics. Each group could be a potential candidate for a Category/class. Clustering is used for exploratory data analytics, i.e., as unsupervised learning, rather than for confirmatory analytics or for predicting specific outcomes. Examples of several interesting case-studies, including Divorce and Consequences on Young Adults, Paediatric Trauma, and Youth Development, demonstrate hierarchical clustering,

4. Numerical, time series forecasting :A time series is a sequence of measurements over time, usually obtained at equally spaced intervals

Any metric that is measured over regular time intervals forms a time series. Analysis of time series is commercially importance because of industrial need and relevance especially w.r.t forecasting (demand, sales, supply etc).

An Ordered sequence of observations of a variable or captured object at equally distributed time interval. Time series is anything which is observed sequentially over the time at regular interval like hourly, daily, weekly, monthly, quarterly etc. Time series data is important when you are predicting something which is changing over the time using past data. In time series analysis the goal is to estimate the future value using the behaviours in the past data.

There are many statistical techniques available for time series forecast, however we have found few effective ones which are listed below:
Techniques of Forecasting:
1. Simple Moving Average (SMA)
2. Exponential Smoothing (SES)
3. Autoregressive Integration Moving Average (ARIMA)
4. Neural Network (NN)Croston

Components of a Time Series
• Secular Trend
– Linear
– Nonlinear

• Cyclical Variation
– Rises and Falls over periods longer than
one year

• Seasonal Variation
– Patterns of change within a year, typically
repeating themselves

• Residual Variation

Components of a Time Series

Yt = Ct +St +Rt

Time series data mining combines traditional data mining and forecasting techniques. Data mining techniques such as sampling, clustering and decision trees are applied to data collected over time with the goal of improving predictions.

To know more in detail about the AI and Machine Learning and we will explore Predictive Analytics:

Some of the popular Predictive Algorithms are given below:

Predictive Algorithms:
1. K-means Clustering
2. Association rules
3. Boosting trees

5. Cluster Analysis
6. Feature Selection
7. Independent Component Analysis
8. Kohonen Networks (SOFM)
9. Neural Networks

10. Social network analysis (SNA)
11. Random Forest (Decision Trees)
12. Mars regression splines
13. Linear and logistic regression
14. Naive Bayesian classifiers

15.Optimal binning
16. Partial Least Squares
17. Response Optimisation
18. Root cause analysis.

In my next blog I will be explaining clearly each one of the predictive analytics algorithms mentioned above, starting with K-means Clustering. I hope you liked the article

please let you know with your comments what more would like to see in the upcoming articles on Predictive Analytics. Happy Reading !!






“This is the beginning of the quantum age of computing and the latest advancement is towards building a universal quantum computer.

“A universal quantum computer, once built, will represent one of the greatest milestones in the history of information technology and has the potential to solve certain problems we couldn’t solve, and will never be able to solve, with today’s classical computers.”

quantum computing is likely to have the biggest impact on industries that are data-rich and time-sensitive. Machine Learning is the main application for quantum computing.

a quantum computer has quantum bits or qubits, which work in a particularly intriguing way. Where a bit can store either a zero or a 1, a qubit can store a zero, a one, both zero and one, or an infinite number of values in between—and be in multiple states (store multiple values) at the same time! If that sounds confusing, think back to light being a particle and a wave at the same time, Schrödinger’s cat being alive and dead, or a car being a bicycle and a bus.

A quantum computer exponentially expands the vocabulary of binary code words by using two spooky principles of quantum physics, namely ‘entanglement’ and ‘superposition’. Qubits can store a 0, a 1, or an arbitrary combination of 0 and 1 at the same time. Multiple qubits can be made in superposition states that have strictly no classical analogue, but constitute legitimate digital code in a quantum computer.

In a quantum computer, each quantum bit of information–or qubit–can therefore be unfixed, a mere probability; this in turn means that in some mysterious way, a qubit can have the value of one or zero simultaneously, a phenomenon called superposition.

And just as a quantum computer can store multiple values at once, so it can process them simultaneously. Instead of working in series, it can work in parallel, doing multiple operations at once.

In theory, a quantum computer could solve in less than a minute problems that it would take a classical computer millennia to solve.

To date, most quantum computers have been more or less successful science experiments. None have harnessed more than 12 qubits, and the problems the machines have solved have been trivial. Quantum computers have been complicated, finicky machines, employing delicate lasers, vacuum pumps, and other exotic machinery to shepherd their qubits.

The world’s fastest supercomputer, China’s Sunway TaihuLight, runs at 93 petaflops (93 quadrillion flops, or around 1017) – but it relies on 10 million processing cores and uses massive amounts of energy.

In comparison, even a small 30-qubit universal quantum computer could, theoretically, run at the equivalent of a classical computer operating at 10 teraflops (10 trillion flops, or 1012


IBM says “With a quantum computer built of just 50 qubits, none of today’s TOP500 supercomputers could successfully emulate it, reflecting the tremendous potential of this technology,” .

A screenshot of IBM’s the author attempting to quantum compute. (courtesy : Silicon)






The significant advance, by a team at the University of New South Wales (UNSW) in Sydney appears today in the international journal Nature.

“What we have is a game changer,” said team leader Andrew Dzurak, Scientia Professor and Director of the Australian National Fabrication Facility at UNSW.

“We’ve demonstrated a two-qubit logic gate – the central building block of a quantum computer – and, significantly, done it in silicon. Because we use essentially the same device technology as existing computer chips, we believe it will be much easier to manufacture a full-scale processor chip than for any of the leading designs, which rely on more exotic technologies.

“This makes the building of a quantum computer much more feasible, since it is based on the same manufacturing technology as today’s computer industry,” he added.


The benefits of quantum computing

quantum computers could be used for discovering new drugs, securing the internet, modeling the economy, or potentially even building far more powerful artificial intelligence systems—all sorts of exceedingly complicated tasks.

A functional quantum computer will provide much faster computation in a number of key areas, including: searching large databases, solving complicated sets of equations, and modelling atomic systems such as biological molecules and drugs. This means they’ll be enormously useful for finance and healthcare industries, and for government, security and defence organisations.

For example, they could be used to identify and develop new medicines by greatly accelerating the computer-aided design of pharmaceutical compounds (and minimizing lengthy trial and error testing); develop new, lighter and stronger materials spanning consumer electronics to aircraft; and achieve much faster information searching through large databases.

Functional quantum computers will also open the door for new types of computational applications and solutions that are probably too premature to even conceive.

The fusion of quantum computing and machine learning has become a booming research area. Can it possibly live up to its high expectations?

“These advantages that you end up seeing, they’re modest; they’re not exponential, but they are quadratic,” said Nathan Wiebe, a quantum-computing researcher at Microsoft Research. “Given a big enough and fast enough quantum computer, we could revolutionize many areas of machine learning.” And in the course of using the systems, computer scientists might solve the theoretical puzzle of whether they are inherently faster, and for what.

A recent Fortune essay states, “Companies like Microsoft, Google and IBM are making rapid breakthroughs that could make quantum computing viable in less than 10 years.”

Very soon you will find hardware and software with fundamentally new integrated circuits that store and process quantum information. Many important computational problems will only be solved by building quantum computers. Quantum computing has been picking up the momentum, and there are many startups and scholars discussing quantum machine learning. A basic knowledge of quantum two-level computation ought to be acquired.

I am sure that Quantum Information and Quantum Computation will play a crucial role in defining where the course of our technology is headed.

“For quantum computing to take traction and blossom, we must enable the world to use and to learn it,” said Gambetta. “This period is for the world of scientists and industry to focus on getting quantum-ready.”

The Era of Quantum Computing Is Here.


Rate the Articles.

Please rate the web site. What do you feel about the articles in this website?