Choose your language:

Hong Kong

New Zealand


United Kingdom
United States
Course Code



3 Days

Participants should have the general knowledge of statistics and programming
This intensive training course provides theoretical and technical aspects of Data Science and Business Analytics. The course covers the fundamental and advanced concepts and methods of deriving business insights from Big Data. The course is supplemented by hands-on labs that help attendees reinforce their theoretical knowledge of the learned material.
Business Analysts, IT Architects and Managers
Upon completion of this course, participants will learn:
  • NoSQL and Big Data Systems Overview
  • Big Data Business Intelligence and Analytics
  • Applied Data Science and Business Analytics
  • Algorithms, Techniques and Common Analytical Methods
  • Machine Learning
  • Visualizing and Reporting Processed Results
  • Data Analysis with R
Chapter 1. Defining Big Data
Transforming Data into Business Information
Quality of Data
Gartner's Definition of Big Data
More Definitions of Big Data
Processing Big Data
Challenges Posed by Big Data
The Cloud and Big Data
The Business Value of Big Data
Big Data: Hype or Reality?
Big Data Quiz
Big Data Quiz Answers

Chapter 2. What is NoSQL?
Limitations of Relational Databases
Limitations of Relational Databases (Con't)
Defining NoSQL
What are NoSQL (Not Only SQL) Databases?
The Past and Present of the NoSQL World
NoSQL Database Properties
NoSQL Benefits
NoSQL Database Storage Types
The CAP Theorem
Mechanisms to Guarantee a Single CAP Property
Limitations of NoSQL Databases
Big Data Sharding
Sharding Example
Quiz Answers

Chapter 3. NoSQL Systems Overview
MongoDB Features (Cont'd)
MongoDB Operational Intelligence
MongoDB Use Cases
Amazon S3
Amazon Storage SLAs
Amazon Glacier
Amazon S3 Security
Data Lifecycle Management with Amazon S3
Amazon S3 Cost Monitoring
Object Store (Swift)
Components of Swift
Google BigTable
BigTable-based Applications
BigTable Design
Google Cloud Storage
Hadoop Clusters
Hadoop's Core Components
Hadoop Distributed File System
Accessing HDFS
Communication inside HDFS
HBase Design
HBase Design
Using MemcacheDB instead of memcached
Apache Cassandra
Apache Cassandra Design
Cassandra's Main Features and Qualities of Service

Chapter 4. Big Data Business Intelligence and Analytics
Traditional Business Intelligence and Analytics
OLAP Tasks
Data Mining Tasks
Big Data / NoSQL Solutions
NoSQL Data Querying and Processing
MapReduce Defined
MapReduce Explained
Example of Map & Reduce Operations using JavaScript
Hadoop-based Systems for Data Analysis
Hadoop's MapReduce
Hadoop's Streaming MapReduce
Streaming Use Cases
Setting up Java Classpath for Streaming Support
Making things simpler with Hadoop Pig Latin
Pig Latin Script Example
SQL Equivalent
Amazon Elastic MapReduce
Big Data with Google App Engine (GAE)
GAE Dashboard
Example of Google AppEngine Java Datastore API
MongoDB Data Model
MongoDB Query Language (QL)
The find and findOne Methods
A MongoDB QL Example
What is Hive?
Hive Architecture
Interfacing with Hive
Hive Data Definition Language
Business Analytics with Hive
The UnQL Specification
Quiz Answers

Chapter 5. Applied Data Science
What is Data Science?
Data Science Ecosystem
Data Mining vs. Data Science
Business Analytics vs. Data Science
Who is a Data Scientist?
Data Science Skill Sets Venn Diagram
Data Scientists at Work
Examples of Data Science Projects
An Example of a Data Product
Applied Data Science at Google
Data Science Gotchas

Chapter 6. Data Analytics Life-cycle Phases
Big Data Analytics Pipeline
Data Discovery Phase
Data Harvesting Phase
Data Priming Phase
Model Planning Phase
Model Building Phase
Communicating the Results
Production Roll-out

Chapter 7. Getting Started with R
Positioning of R in the Data Science Arena
R Integrated Development Environments
Running R
Ending the Current R Session
Getting Help
Getting System Information
General Notes on R Commands and Statements
R Data Structures
R Objects and Workspace
Assignment Operators
Assignment Example
Arithmetic Operators
Logical Operators
System Date and Time
User-defined Functions
User-defined Function Example
R Code Example
Type Conversion (Coercion)
Control Statements
Conditional Execution
Repetitive Execution
Repetitive execution
Built-in Functions
Reading Data from Files into Vectors
Example of Reading Data from a File
Writing Data to a File
Example of Writing Data to a File
Logical Vectors
Character Vectors
Matrix Data Structure
Creating Matrices
Working with Data Frames
Matrices vs Data Frames
A Data Frame Sample
Accessing Data Cells
Getting Info About a Data Frame
Selecting Columns in Data Frames
Selecting Rows in Data Frames
Getting a Subset of a Data Frame
Sorting (ordering) Data in Data Frames by Attribute(s)
Applying Functions to Matrices and Data Frames
Using the apply() Function
Example of Using apply()
Executing External R commands
Listing Objects in Workspace
Removing Objects in Workspace
Saving Your Workspace
Loading Your Workspace
Getting and Setting the Working Directory
Getting the List of Files in a Directory
Diverting Output to a File
Batch (Unattended) Processing
Importing Data into R
Exporting Data from R
Standard R Packages
Extending R

Chapter 8. R Statistical Computing Features
Statistical Computing Features
Descriptive Statistics
Basic Statistical Functions
Examples of Using Basic Statistical Functions
Non-uniformity of a Probability Distribution
Writing Your Own skew and kurtosis Functions
Generating Normally Distributed Random Numbers
Generating Uniformly Distributed Random Numbers
Using the summary() Function
Math Functions Used in Data Analysis
Examples of Using Math Functions
Correlation Example
Testing Correlation Coefficient for Significance
The cor.test() Function
The cor.test() Example
Regression Analysis
Types of Regression
Simple Linear Regression Model
Least-Squares Method (LSM)
LSM Assumptions
Fitting Linear Regression Models in R
Example of Using lm()
Confidence Intervals for Model Parameters
Example of Using lm() with a Data Frame
Regression Models in Excel
Multiple Regression Analysis
Finding the Best-Fitting Regression Model
Comparing Regression Models

Chapter 9. Data Science Algorithms and Analytical Methods
Supervised vs Unsupervised Machine Learning
Supervised Machine Learning Algorithms
Unsupervised Machine Learning Algorithms
Choose the Right Algorithm
Life-cycles of Machine Learning Development
Classifying with k-Nearest Neighbors (SL)
k-Nearest Neighbors Algorithm
k-Nearest Neighbors Algorithm
The Error Rate
Decision Trees (SL)
Decision Tree Terminology
Decision Trees in Pictures
Decision Tree Classification in Context of Information Theory
Information Entropy Defined
The Shannon Entropy Formula
The Simplified Decision Tree Algorithm
Using Decision Trees
Naive Bayes Classifier (SL)
Naive Bayesian Probabilistic Model in a Nutshell
Bayes Formula
Classification of Documents with Naive Bayes
Unsupervised Learning Type: Clustering
K-Means Clustering (UL)
K-Means Clustering in a Nutshell
Regression Analysis
Simple Linear Regression Model
Linear vs Non-Linear Regression
Linear Regression Illustration
Major Underlying Assumptions for Regression Analysis
Least-Squares Method (LSM)
Locally Weighted Linear Regression
Regression Models in Excel
Multiple Regression Analysis
Regression vs Classification
Time-Series Analysis
Decomposing Time-Series
Monte-Carlo Simulation (Method)
Who Uses Monte-Carlo Simulation?
Monte-Carlo Simulation in a Nutshell
Monte-Carlo Simulation Example
Monte-Carlo Simulation Example

Chapter 10. Visualizing and Reporting Processed Results
Data Visualization
Data Visualization in R
The ggplot2 Data Visualization Package
Creating Bar Plots in R
Creating Horizontal Bar Plots
Using barplot() with Matrices
Using barplot() with Matrices Example
Customizing Plots
Histograms in R
Building Histograms with hist()
Example of using hist()
Pie Charts in R
Examples of using pie()
Generic X-Y Plotting
Examples of the plot() function
Dot Plots in R
Saving Your Work
Supported Export Options
Plots in RStudio
Saving a Plot as an Image
The BIRT Project
Visualization with D3 JavaScript Library
Examples of D3 Visualization
Data Visualization with JavaFX

Chapter 11. Apache Mahout
What is Apache Mahout?
Main Use Cases
Supported Algorithms in Classification
Supported Algorithms in Clustering
The Stable Set of Algorithms
Running Mahout on Amazon

Chapter 12. Machine Learning with BigML
What is BigML?
How BigML Service Works
Data Files
Data Sets
Data Sets Example
The Prediction UI Form
Text Analysis in BigML

Chapter 13. The Semantic Web
Defining the Term "Semantic"
Metadata in HTML Pages
Defining the Semantic Web
The Original Web Proposal
W3C and the Semantic Web
The Semantic Web as Web 3.0
The Semantic Web Stack
Ontology and OWL
The Smart Data Continuum
Resource Description Framework
RDF Model
An RDF Example
SPARQL Example
Example of the hCard Microformat

Send Us a Message
First Name
Last Name
Address Line 1
Address Line 2
Zip Code
Choose one