Big Data - Hadoop

Course Overview


Page 1/5

Data scientists build information platforms to provide deep insight and answer previously unimaginable questions.Hadoop are transforming how data scientists work by allowing interactive and iterative data analysis at scale.Learn how Hadoop enable data scientists to help companies reduce costs, increase profits, improve products,retain customers, and identify new opportunities. This course helps participants understand what data scientists do, the problems they solve, and the tools and techniques they use. Through in-class simulations, participants apply data science methods to real-world challenges in different industries and, ultimately, prepare for data scientist roles in the field. scientist roles in the field.

Expectations and Goals:

Through instructor-led discussion and interactive, hands-on exercises, participants will navigate the Hadoop ecosystem, and develop concrete skills such as:
 How to identify potential business use cases where data science can provide impactful results?
 How to obtain, clean and combine disparate data sources to create a coherent picture for       analysis?
 What statistical methods to leverage for data exploration that will provide critical insight into your data?

Introduction to Hive
Why we need Hive
Architecture of Hive
Hive Data Types
Hive Complex Datatypes
Managed Tables
External Tables
Running Hive Queries
Perform Joining
Handling JSON Data
Handling XML Data
Partitioned Table
Hive UDF
Scripting in Hive
Performance Tuning in Hive
Case Study in Hive based on Dataset

Module 3:

Introduction to Pig
Why we need Pig Technology
Architecture of Pig
Pig Data Types

Module 4:

Page 3/5

Concept of Impala
Running the Queries on Impala
Compare Impala with Hive
Concept of HUE
Access The Hadoop Component by means of HUE

Module 8:

Introduction to NOSQL Database
Compare between NOSQL and RDBMS
Introduction to HBASE
Why we need HBASE
HBASE Commands

Module 9:

Installation of Tableau
Communicate Tableau with impala
Plotting the Graph

Module 10:

Module 11:

Project Work and Documentation

Page 5/5


 Object Oriented Programming in Java, Exception in Java
 Knowledge of SQL Command
 Basic Command in Linux

Topics Covered:

 Where and when to leverage Hadoop streaming and Apache Flume for data science pipelines? Whatmachine learning technique to use for a particular data science project?

Introduction to Big Data
Features of Hadoop
Components in Hadoop
Concept of Hadoop Ecosystem
Introduction to HDFS
HDFS Practical

Module 1:

Page 2/5

Collection Framework in Java (List, Map, Iterator)
String Tokenizer, File Handling, String Handling
Concept of Map Reduce
Map Reduce Practical

Module 2:

Different Modes in Pig

Running Pig Command



Script in Pig

Case Study in Pig Based on Dataset

Introduction to SQOOP
Importing and Exporting the RDBMS to HDFS
Import data from RDBMS to Hive
Export Data from Hive to RDBMS

Module 5:

Introduction to Flume
Introduction Source,Sink,Flume Agents
Fetching Twitter Data into Solr
Configuration to create twitter data into HDFS
Use HiveSerde to Analyze the data

Module 6:

Sentiment Analysis based on Twitter Data

Module 7:

Page 4/5

Thank You