Choose your language:

Hong Kong
New Zealand
United Kingdom
United States

Big Data QA Essentials - Hortonworks

Course Code



2 Days

It is beneficial to have familiarity with Java, SQL, and general Linux operating system knowledge.
This course provides QA a deep-dive into Hadoop application development along with testing strategy. Participants will learn how to design and develop testing methods for Hadoop applications to validate the data ingested and performance of the scripts.
This course is designed for individuals who are QA and testers with programming experience.

In this course, participants will:

  • Describe the features of Hadoop-based software and its architecture.
  • Build, test and monitor a MapReduce program
  • Understand types of testing
  • Manage and manipulate data in hive and pig and understand testing methodology
  • Describe oozie, creating workflows and understand testing strategies
  • Extract data into Hadoop using Sqoop and validation methods
  • Understanding Hbase model with data validation strategies
Getting Started with Hadoop
Getting Started with Hadoop-based Software
Module Objectives
Motivation for Hadoop
History of Hadoop
What is Hadoop?
Components of Hadoop
The Hadoop Distributed File System (HDFS)
Commonly Used HDFS Commands
Knowledge Check Part I
Knowledge Check Part II
MapReduce Processing
Hadoop Architecture

Understanding and Testing MapReduce Program
Module Objectives
MapReduce and YARN
MapReduce Phases
NextGen MapReduce Using YARN
Job Submission in YARN
Example: MapReduce Programming in Java
Mapper Function: Prepare Input
Mapper Function: Map
Reducer Function
Reducer Function: Reduce
Driver Function
Compile and Run the MapReduce Program
Managing, Monitoring and Testing and MapReduce Jobs
Lab Exercises to demonstrate testing methodologies and monitoring

BigData Testing Strategy
Module Objectives
Challenges of Testing
Types of Testing
Performance Testing
Failover Testing
Security Testing
End to End Testing (2 types)
Example: ETL Load Testing
Testing Best Practices
Discuss on Testing tools

Create a Workflow Using Oozie
Module Objectives
Features of Oozie
Workflow Lifecycle
Oozie Workflow
Oozie Coordinator
Oozie Unit testing

Managing Data Using Hive
Module Objectives
HiveQL (HQL)
Hive vs. RDBMS
The Components of Hive
Query Flows
Interfacing with Hive
Command Line Interfaces (CLI)
Hive Data
Data Types
Built-In Operators
Relational Operators
Arithmetic Operators
Logical Operators
Complex Operators
Built-In Functions
Creating and Dropping Databases
Hive Tables
Loading Data
Example: Loading Data
Dropping Tables
Discuss on HiveQL Scripts
HiveQLUnit and other testing methodologies
Lab Exercises to demonstrate testing methodologies and monitoring
Best Practises followed while testing Hive

Extracting Data into Hadoop Using Sqoop
Module Objectives
Syntax to Use Sqoop Commands
Sqoop Import
Sqoop Import Example
Launching MYSQL Command Shell: mysql
Importing to HDFS
Sqoop Export
Sqoop Export Example
Sqoop’s Export Methodology
Lab Exercises
Testing Incremental Load Process
Discuss on Testing Standards and Best Practises

Storing and Accessing Data Using HBase
Module Objectives
Hbase Overview
The HBase Data Model
HBase Architecture
HBase Master Server
HBase Region Server
HBase Region Server Components
HBase Client
Hbase Write
Minor Compaction
Major Compaction
HBase versus RDBMS
HBase Shell Commands
HBase Hands On Exercises
Manual Testing on HBase
Best Practises and Standards for HBase Data Validation

Big Data Testing Methods - End to End (Integration testing)
Testing Methods, Tools and Reporting for Validation of Pre-Hadoop Processing
Different methods for ETL testing
Areas of ETL testing
Methodology for report testing
Monitoring tools
Send Us a Message
Choose one