Choose your language:

France
Germany
Hong Kong

India
Ireland
Japan
Malaysia
Netherlands
New Zealand

Singapore

Sweden
United Kingdom
United States
Course Code

BD32

Duration

4 Days

It is beneficial to have familiarity with Java, Sequel, and general operating system knowledge.
This course provides Java programmers a deep-dive into Hadoop application development. Participants will learn how to design and develop MapReduce applications for Hadoop and manipulate, analyze and perform computations on their Big Data.
This course is designed for individuals who are developers and engineers who have programming experience.
Upon completion of this course, participants will be able to:
  • Describe the features of Hadoop-based software, HDFS, and MapReduce programs.
  • Use Cloudera to access Hadoop-based software, support, and services.
  • Build a MapReduce program.
  • Manage data using Hive.
  • Perform queries using Impala.
  • Manipulate data using Pig.
  • Extract data into Hadoop using Sqoop.
  • Create a workflow using Oozie.
  • Work with data using FLUME.
Module 1: Getting Started with Hadoop-based Software
Motivation for Hadoop
History of Hadoop
Big Data Use Cases
What is Hadoop?
Components of Hadoop
The Hadoop Distributed File System (HDFS)
Knowledge Check
MapReduce
Hadoop Architecture
Hadoop Daemons and Ports
The HDFS Model
Class Discussion: Nodes and Associated Config Files in HDFS
HDFS Java API
Exercise: Developing Java Code and Using the HDFS Java API

Module 2: Using Distributors to Access Hadoop-based Software, Support, and Services
Open Source Hadoop
Benefits of Using a Distributor
Distributors of Hadoop
HortonWorks Hadoop
The Apache Ambari System
Cloudera Hadoop
Components of Cloudera Hadoop
Architecture of Cloudera Manager
Installing Cloudera CDH4 and Cloudera Manager
Cloudera Manager: Stopping and Restarting

Module 3: Building a MapReduce Program
MapReduce and YARN
MapReduce Phases
MapReduce Programming in Java
Mapper Function
Reducer Function
Driver Function
Compile and Run the MapReduce Program
Job Submission in MRv1
NextGen MapReduce Using YARN
ResourceManager
ApplicationMaster
NodeManager
Job Submission in YARN
Exercise: Implement MapReduce Jobs

Module 4: Managing Data Using Hive
Hive
HiveQL (HQL)
The Components of Hive
Query Flows
Interfacing with Hive
Hive Commands
Starting Beeline CLI and Connecting
Hive Data
Data Types
Operators and Functions
Creating and Dropping Databases
Hive Tables
Hive Partitions
Browsing, Altering, and Dropping Tables and Partitions
Loading Data
Exercise: Basic Commands for Working with Hive Tables
Exercise: Partition a Table

Module 5: Performing Queries Using Impala
Impala Architecture
The Impala Daemon
The Impala Statestore
The Impala Catalog Service
How Impala Works with Hive
Impala SQL Dialect
Starting Impala
Data Types
SQL Operators
Class Discussion: Impala vs. Hive

Module 6: Manipulating Data Using Pig
Pig Relations
Simple and Complex Data Types
Nulls, Operators, and Functions
Expressions
Schemas
How to Start Pig Shell (grunt)
Pig Functions
Exercise: Executing Simple Pig Statements and Viewing the Output
FOREACH Function
GROUP
Exercise: Group by Display from Table Records
Exercise: Tokenize Text of a File Using Pig and Grunt
JOIN (Inner) and (Outer)
COGROUP
Exercise: Performing an Inner Join on Two Data Sets Using Pig
Exercise: Join Two Sets of Data
FILTER
Exercise: Finding First 5 Max Occurring Tokens Using Pig

Module 7: Extracting Data into Hadoop Using Sqoop
Sqoop
Syntax to Use Sqoop Commands
Sqoop Import
Controlling the Import
Exercise: Importing to HDFS
Exercise: Importing to HDFS Directory
Exercise: Importing a Subset of Rows
Exercise: Encoding Database NULL Values While Importing
Exercise: Importing Tables Using One Command
Exercise: Using Sqoop’s Incremental Import Feature
Sqoop Export
Sqoop’s Export Methodology
Export Control Arguments
Exercise: Export Table Back to MySQL
Exercise: Modifying and Exporting Rows
Exercise: Adding Rows and Making Changes before Exporting
Exercise: Overriding the NULL Substitution Characters

Module 8: Create a Workflow Using Oozie
Introduction to Oozie
Features of Oozie
Oozie Workflow
Creating a MapReduce Workflow
Start, End, and Error Nodes
Parallel Fork and Join Nodes
Workflow Jobs Lifecycle
Workflow Notifications
Workflow Manager
Creating and Running a Workflow
Exercise: Create an Oozie Workflow from Terminal
Exercise: Create an Oozie Workflow Using Java API
Oozie Coordinator
Oozie Coordinator Components, Variables, and Parameters
Exercise: Create an Oozie Workflow from HUE

Module 9: Working with Data Using Flume
Flume
Anatomy of a Flume Agent
Flume Sources
Flume Channels
Flume Sinks
Running Flume
Exercise: Flume with Avro as Source and Terminal as Sink
Exercise: Flume with Avro as Source and HDFS as Sink
Exercise: Flume with Net-Cat as Source and HDFS as Sink
Send Us a Message
First Name
*
Last Name
*
Company
*
Email
*
Address Line 1
*
Address Line 2
City
*
*
Zip Code
Telephone
*
*
Choose one
*
Comments