Choose your language:

France
Germany
Hong Kong
India
Ireland
Japan
Malaysia
Netherlands
New Zealand
Singapore
Sweden
United Kingdom
United States

Advanced Hadoop for Developers

Course Code

BD61

Duration

3 Days

Participants must be comfortable with:

  • Java programming language (most programming exercises are in java)
  • Linux environment (be able to navigate Linux command line, edit files using vi / nano)
  • And previously attended a Hadoop for Developers course or has working knowledge of Hadoop
Apache Hadoop is one of the most popular frameworks for processing Big Data on clusters of servers. This course focuses on advanced programming techniques that will be beneficial to experienced Hadoop developers. This course is lectures (50%) and hands-on labs (50%).
This course is designed for Developers.

In this course, participants will learn:

  • Advanced Pig
  • Advanced Hive
  • Advanced HBase (SQL)
Section 1: Data Management in HDFS
Various Data Formats (JSON / Avro / Parquet)
Compression Schemes
Data Masking
Labs: Analyzing different data formats; enabling compression

Section 2: Advanced Pig
User-defined Functions
Introduction to Pig Libraries (ElephantBird / Data-Fu)
Loading Complex Structured Data using Pig
Pig Tuning
Labs: Advanced pig scripting, parsing complex data types

Section 3: Advanced Hive
User-defined Functions
Compressed Tables
Hive Performance Tuning
Labs: Creating compressed tables, evaluating table formats and configuration

Section 4: Advanced HBase
Advanced Schema Modelling
Compression
Bulk Data Ingest
Wide-table / Tall-table comparison
HBase and Pig
HBase and Hive
HBase Performance Tuning
Labs: Tuning HBase; accessing HBase data from Pig & Hive; Using Phoenix for data modeling
Send Us a Message
Choose one