Machine Learning Training Course (3+ days)
Note: this outline is our proposal, but the training can be tailored to your specific requirements upon prior request ahead of the proposed course date.
Why Learn Machine Learning?
Machine Learning brings together computer science and statistics to harness that predictive power. It’s a musthave skill for all aspiring data analysts and data scientists, or anyone else who wants to wrestle all that raw data into refined trends and predictions.
This is a class that will teach you the endtoend process of investigating data through a machine learning lens. It will teach you how to extract and identify useful features that best represent your data, a few of the most important machine learning algorithms, and how to evaluate the performance of your machine learning algorithms.
Course details
This outline can cover both fundamental and advanced topics.
This is a classical ML course, but depending on your branch/profile/needs there are a few possible course variations:
 Machine Learning with Python/R
 Machine Learning with Scala and Apache Spark
 Machine Learning and Deep Learning
 Machine Learning for Banking  with Python/R
 Machine Learning for Finance  with Python/R
The final training outline will be designed depending on your particular requirements.
The practical exercises constitute a big part of the course time, besides demonstrations and theoretical presentations. Discussions and questions can be asked throughout the course.
Course Outline
Naive Bayes

Basic concepts of Bayesian methods

Probability

Joint probability

Conditional probability with Bayes' theorem

The naive Bayes algorithm

The naive Bayes classification

The Laplace estimator

Using numeric features with naive Bayes
Decision trees

Divide and conquer

The C5.0 decision tree algorithm

Choosing the best split

Pruning the decision tree
Neural networks

From biological to artificial neurons

Activation functions

Network topology

The number of layers

The direction of information travel

The number of nodes in each layer

Training neural networks with backpropagation

Deep Learning
Support Vector Machines

Classification with hyperplanes

Finding the maximum margin

The case of linearly separable data

The case of nonlinearly separable data

Using kernels for nonlinear spaces
Clustering

Clustering as a machine learning task

The kmeans algorithm for clustering

Using distance to assign and update clusters

Choosing the appropriate number of clusters
Measuring performance for classification

Working with classification prediction data

A closer look at confusion matrices

Using confusion matrices to measure performance

Beyond accuracy – other measures of performance

The kappa statistic

Sensitivity and specificity

Precision and recall

The Fmeasure

Visualizing performance tradeoffs

ROC curves

Estimating future performance

The holdout method

Crossvalidation

Bootstrap sampling
Tuning stock models for better performance

Using caret for automated parameter tuning

Creating a simple tuned model

Customizing the tuning process

Improving model performance with metalearning

Understanding ensembles

Bagging

Boosting

Random forests

Training random forests

Evaluating random forest performance
Classification using the nearest neighbors

The kNN algorithm

Calculating distance

Choosing an appropriate k

Preparing data for use with kNN

Why is the kNN algorithm lazy?
Classification rules

Separate and conquer

The One Rule algorithm

The RIPPER algorithm

Rules from decision trees
Regression

Simple linear regression

Ordinary least squares estimation

Correlations

Multiple linear regression
Regression trees and model trees

Adding regression to trees
Association rules

The Apriori algorithm for association rule learning

Measuring rule interest – support and confidence

Building a set of rules with the Apriori principle