Choose your language:

Hong Kong
New Zealand
United Kingdom
United States

Hadoop for Business Analysts

Course Code



3 Days

Participants should have a programming background with databases / SQL, as well as basic knowledge of Linux (be able to navigate Linux command line, editing files with vi / nano).
Apache Hadoop is the most popular framework for processing Big Data. Hadoop provides rich and deep analytics capability, and it is making in-roads in to traditional BI analytics world. This course will introduce an analyst to the core components of Hadoop eco system and its analytics.
This course is designed for Business Analysts.

In this course, participants will:

  • Gain an understanding of Hadoop ecosystem
  • Learn Data storage using HDFS
  • Learn ETL using Pig
  • Learn Data warehousing and querying using Hive

Section 1: Quick primer on Hadoop / HDFS / MapReduce
Hadoop eco system

  • Distributions
  • High level architecture
  • Hardware / software
  • Labs: First look at Hadoop

HDFS Overview

  • Concepts (horizontal scaling, replication, data locality)
  • Architecture (Namenode, Data node)
  • Demo: Interacting with HDFS

Map Reduce Overview

  • Map Reduce concepts
  • YARN operating system
  • Demo: Running a Map Reduce program

Section 2: Hive
Hive concepts & architecture
SQL support in Hive
Data warehousing in Hive
Data types
Table creation and queries
Text analytics
Labs (multiple): creating Hive tables and running queries, joins, using partitions, using text analytics functions

Section 3: Pig
Pig concepts and architecture
Pig latin language
Understanding pig job flow
Basic data analysis with Pig
Data cleanup
ETL workloads with Pig
Joins and multi datasets with Pig
User defined functions
Debugging Pig scripts
Lab: writing pig scripts to analyze / transform data

Section 4: BI Tools for Hadoop
BI tools and Hadoop
Overview of current BI tools landscape
Choosing the best tool for the job

Send Us a Message
Choose one