Choose your language:

Hong Kong
New Zealand
United Kingdom
United States

Hadoop Developer Essentials - MapR

Course Code



3 Days

It is beneficial to have familiarity with Java, SQL, and general Linux operating system knowledge.
This course provides Java programmers a deep-dive into Hadoop application development. Participants will learn how to design and develop MapReduce applications for Hadoop and manipulate, analyze and perform computations on their Big Data.
This course is designed for individuals who are developers and engineers who have programming experience.

In this course, participants will:

  • Describe the features of Hadoop-based software, HDFS, and MapReduce programs
  • Understanding of MapR FS and MapR Architecture
  • Build a MapReduce program
  • Manage data using Hive
  • Manipulate data using Pig
  • Extract data into Hadoop using Sqoop
  • Discuss on HBase and its Architecture
Module 1: Getting Started with Hadoop-based Software
Motivation for Hadoop
History of Hadoop
What is Hadoop?
Components of Hadoop
The Hadoop Distributed File System (HDFS)
Knowledge Check
MapReduce 2
Hadoop Architecture
Hadoop Daemons and Ports
The HDFS Model
Exercise: Developing Java Code and Using the HDFS Java API ]

Module 2: Distributors overview, MapR FS overview and Architecture
Hadoop Distributions overview
MapR Distribution
HDFS architecture limitations
Overview of MapR-FS Architectural components
Storage pools
Volumes mirrors and snapshots
Exercise: MapRFS commands

Module 3: Building a MapReduce Program
MapReduce and YARN
MapReduce Phases
MapReduce Programming in Java
Mapper Function
Reducer Function
Driver Function
Compile and Run the MapReduce Program
Job Submission in MRv1
NextGen MapReduce Using YARN
Job Submission in YARN
Exercise: Implement MapReduce Jobs

Module 4: Manipulating Data Using Pig
Pig Relations
Simple and Complex Data Types
Nulls, Operators, and Functions
How to Start Pig Shell (grunt)
Pig Functions
Exercise: Executing Simple Pig Statements and Viewing the Output
FOREACH Function
Exercise: Group by Display from Table Records
Exercise: Tokenize Text of a File Using Pig and Grunt
JOIN (Inner) and (Outer)
Exercise: Performing an Inner Join on Two Data Sets Using Pig
Exercise: Join Two Sets of Data
Exercise: Finding First 5 Max Occurring Tokens Using Pig
Sampling in Pig
Lab Exercise
User Defined Functions in Pig
Lab Exercise

Module 5: Managing Data Using Hive
HiveQL (HQL)
The Components of Hive
Query Flows
Interfacing with Hive
Hive Commands
Hive Data
Data Types
Operators and Functions
Creating and Dropping Databases
Hive Tables
Hive Views
Order By, Group By, Cluster by, Distributed By
Hive Partitions
Browsing, Altering, and Dropping Tables and Partitions
Loading Data
Exercise: Basic Commands for Working with Hive Tables
Exercise: Partition a Table
Bucketing in Hive
Lab Exercise
Indexes in Hive
Lab Exercise
User Defined Functions in Hive
Lab Exercise
HIVE DML (ACID) Operations (Alter and Delete)
Lab Exercise
Sampling in Hive
Lab Exercise
Running Hive as Script
Lab Exercise

Module 6: Extracting Data into Hadoop Using Sqoop
Syntax to Use Sqoop Commands
Sqoop Import
Controlling the Import
Exercise: Importing to HDFS
Exercise: Importing to HDFS Directory
Exercise: Importing a Subset of Rows
Exercise: Encoding Database NULL Values While Importing
Exercise: Importing Tables Using One Command
Exercise: Using Sqoop’s Incremental Import Feature
Sqoop Export
Sqoop’s Export Methodology
Export Control Arguments
Exercise: Export Table Back to MySQL
Exercise: Modifying and Exporting Rows
Exercise: Adding Rows and Making Changes before Exporting
Exercise: Overriding the NULL Substitution Characters

Module 7: Hbase
What is Hbase?
Hbase Architecture
Understanding HBase Schema
Hbase Vs RDBMS
Hbase Shell
Hbase commands
Importing Data to Hbase
Lab exercises

Module 7: Mini Project:
Discuss on the problem statement
Demonstrate end-to-end Implementation Methodology
Send Us a Message
Choose one