Choose your language:

Hong Kong

New Zealand


United Kingdom
United States
Course Code



4 Days

General knowledge of data stores, advanced math, analytics and a working knowledge of the Python language
This course will help you to master the Apache Spark platform with added emphasis on Analytics with Machine Learning. Spark enables participants to build complete, unified big data applications combining batch, streaming, and interactive analytics on all their data. With Spark, developers can write sophisticated parallel applications to execute faster decisions, better decisions, and real-time actions, applied to a wide variety of use cases, architectures, and industries. The course covers the core APIs for using Spark, fundamental mechanisms and basic internals of the platform. The participants will use the Spark Shell, Python and SQL to access and transform data. We also cover Jupyter Notebooks, Anaconda and Apache Parquet.
This course is designed for data engineers, analysts, architects, software engineers, IT operations and technical managers interested in a thorough, hands-on course covering the Apache Spark platform and Machine Learning.
Upon completion of this course, participants will be able to:
  • Understand the motivation for non-relational data stores
  • Describe Spark’s fundamental mechanics
  • Use the core Spark APIs to operate on data
  • Articulate and implement typical use cases for Spark
  • Build data pipelines with SparkSQL and DataFrames
  • Analyze Spark jobs using the UIs and logs
  • Understand and use Machine Learning
Day 1 Fundamentals
1. Overview (Big Data and Spark)
2. Installation
3. Spark Shell
4. Resilient Distributed Datasets
5. Cluster Architecture
6. Spark SQL & DataFrames
7. Jupyter Notebooks

Day 2 Programming Techniques
8. RDD Operations
9. Key-Value Pairs
10. Transformations & Actions
11. Shuffle Operations
12. RDD Persistence
13. Shared Variables
14. Spark Streaming

Day 3 Basic Machine Learning
15. ML Data Types
16. Basic Statistics
17. Classification & Regression
18. Collaborative Filtering
19. Clustering
20. Dimensionality Reduction
21. Frequent Pattern Mining

Day 4 Advanced Machine Learning
22. DataFrames & Parquet
23. ML Pipelines
24. Transformers & Estimators
25. Extractors, Transformers & Selectors
26. Classification & Regression
27. Clustering (LDAs)
28. GraphX PageRank
Send Us a Message
First Name
Last Name
Address Line 1
Address Line 2
Zip Code
Choose one