Choose your language:

France
Germany
Hong Kong
India
Ireland
Japan
Malaysia
Netherlands
New Zealand
Singapore
Sweden
United Kingdom
United States

Data Science with Hadoop and Spark

Course Code

BD44

Duration

3 Days

Basic Linux command line skills are valuable but not required. Each participant will require the ability to run a 64 bit virtual machine (provided with the course).
This course will teach participants how to use Apache Hadoop and Apache Spark to solve sophisticated data science problems, producing valuable insights in a wide range of scenarios.

Day one focuses on data science basics, including data acquisition, scrubbing and manipulation, as well as a general overview of data science applications as well as the analytics and machine learning processes typically employed. A number of practical use cases are examined during class and lab sessions.

Day two focuses on Apache Hadoop and its ecosystem along with the types of data science applications typically handled by the Hadoop platform. The course outlines the statistical methods used to produce actionable business insights with Mapreduce, Python, Pig, Mahout and other tools.

Day three begins with an overview of the Apache Spark platform and its machine learning library, MLlib.

Participants will learn how to perform entity ranking, implement recommendation engines and perform other common data science tasks using Spark batch, streaming, graph and machine learning capabilities.
This course is designed for Application developers, analysts and data scientists.

Upon completion of this course, participants will be able to:

  • Have a clear understanding of data science, its typical use cases and how data science is performed using a range of tools in the Apache open source ecosystem
Day 1
Data Science
Data Science Overview
Structured and Unstructured Data
Data Acquisition and Transformation
Data Analysis and Machine Learning

Day 2
Apache Hadoop
Mapreduce Fundamentals
Common Hadoop use cases
Machine Learning with Mahout
NLTK and Natural Language Processing

Day 3
Apache Spark
Apache Spark Overview
Working with MLlib
Spark Streaming
Moving applications to production
Send Us a Message
Choose one