## Machine Learning Essentials – Machine Learning Taxonomy

This Artificial Intelligence post covers all essentials Machine Learning concepts required for a data engineer to know before he embarks upon journey towards Development of Deep Learning products.

I have listed below the most popular and often used Machine Learning concepts, read them and provide feedback to us:

**actor-critic** – Refers to a class of agent architectures, where the actor plays out a particular policy, while the critic learns to evaluate actor’s policy. Both the actor and critic are simultaneously improving by bootstrapping on each other.

**agent** – A system that is embedded in an environment, and takes actions to change the state of the environment. Examples include mobile robots, software agents, or industrial controllers.

**average-reward methods** – A framework where the agent’s goal is to maximize the expected payoff per step. Average-reward methods are appropriate in problems where the goal is maximize the long-term performance. They are usually much more difficult to analyze than discounted algorithms.

**discount factor** – A scalar value between 0 and 1 which determines the present value of future rewards. If the discount factor is 0, the agent is concerned with maximizing immediate rewards. As the discount factor approaches 1, the agent takes more future rewards into account. Algorithms which discount future rewards include Q-learning and TD(lambda).

**dynamic programming** (DP) is a class of solution methods for solving sequential decision problems with a compositional cost structure. Richard Bellman was one of the principal founders of this approach.

**environment** – The external system that an agent is “embedded” in, and can perceive and act on.

**function approximator** refers to the problem of inducing a function from training examples.

**Standard approximators** include decision trees, neural networks, and nearest-neighbor methods.

**Markov decision process (MDP)** – A probabilistic model of a sequential decision problem, where states can be perceived exactly, and the current state and action selected determine a probability distribution on future states. Essentially, the outcome of applying an action to a state depends only on the current action and state (and not on preceding actions or states).

**
model-based algorithms** – These compute value functions using a model of the system dynamics. Adaptive Real-time DP (ARTDP) is a well-known example of a model-based algorithm.

**model-free algorithms** – these directly learn a value function without requiring knowledge of the consequences of doing actions. Q-learning is the best known example of a model-free algorithm. For full details on this concept watch this video : https://youtu.be/12ehWwB45DQ

**model** – The agent’s view of the environment, which maps state-action pairs to probability distributions over states. Note that not every reinforcement learning agent uses a model of its environment. For full details on this concept watch this video : https://youtu.be/12ehWwB45DQ

**Monte Carlo methods** – A class of methods for learning of value functions, which estimates the value of a state by running many trials starting at that state, then averages the total rewards received on those trials.

**policy** – The decision-making function (control strategy) of the agent, which represents a mapping from situations to actions.

**reward** – A scalar value which represents the degree to which a state or action is desirable. Reward functions can be used to specify a wide range of planning goals (e.g. by penalizing every non-goal state, an agent can be guided towards learning the fastest route to the final state).

**sensor** – Agents perceive the state of their environment using sensors, which can refer to physical transducers, such as ultrasound, or simulated feature-detectors.

**state** – this can be viewed as a summary of the past history of the system, that determines its future evolution.

**stochastic approximation** – Most RL algorithms can be viewed as stochastic approximations of exact DP algorithms, where instead of complete sweeps over the state space, only selected states are backed up (which are sampled according to the underlying probabilistic model). A rich theory of stochastic approximation (e.g Robbins Munro) can be brought to bear to understand the theoretical convergence of RL methods.

**TD (temporal difference) algorithms **– A class of learning methods, based on the idea of comparing temporally successive predictions. Possibly the single most fundamental idea in all of reinforcement learning.

**unsupervised learning** – The area of machine learning in which an agent learns from interaction with its environment, rather than from a knowledgeable teacher that specifies the action the agent should take in any given state.

**value function** is a mapping from states to real numbers, where the value of a state represents the long-term reward achieved starting from that state, and executing a particular policy. The key distinguishing feature of RL methods is that they learn policies indirectly, by instead learning value functions. RL methods can be constrasted with direct optimization methods, such as genetic algorithms (GA), which attempt to search the policy space directly.

**Parametric Models** can be defined as models that first make an assumption about a function form, or shape of

function f ( ie linear). Then fits the model. This reduces estimating mapping function f to just estimating set of parameters, but if our assumption was wrong, will lead to bad results.

**Algorithms** that do not make strong assumptions about the form of the mapping function are called nonparametric machine learning algorithms. By not making assumptions, they are free to learn any functional form from the training data.

**Supervised Models** can be defined as models that fit input variables xi = (x 1 , x 2 , …x n ) to a known output variables y i = (y 1 , y 2 , …y n )

**Unsupervised Models **can be defined as models that take in input variables xi = (x1 , x2 , …xn ), but they do not have an associated output to supervise the training. The goal is to understand relationships between the variables or observations.

**Blackbox Machine Learning models **are models that make decisions, but we do not know what happens ”under the hood” e.g. deep learning, neural networks

**Descriptive Machine Learning models **are models that provide insight into why they make their decisions e.g. linear regression, decision trees

**First-Principle models **can be defined as models based on a prior belief of how the system under investigation works, incorporates domain knowledge on ad-hoc basis.

**Data-Driven models **can be defined as models based on observed correlations between input and output variables

**Deterministic models **are defined as models that produce a single ”prediction” e.g. yes or no, true or false

**Stochastic models **are defined as models that produce probability distributions over possible events

**Flat models **are models that solve problems on a single level,no notion of subproblems.

Flat clustering creates a flat set of clusters without any explicit structure that would relate clusters to each other.

**Hierarchical models **are models that solve several different nested subproblems. Hierarchical clustering creates a hierarchy of clusters.

There is very self explanatory video about these concepts, explained briefly about different modelling techniques and Models. Please have a look at it and provide the feedback.