Hadoop Platform and Application Framework

Hadoop Platform and Application Framework

Taught in English

Some content may not be translated

148,856 already enrolled

Course

Gain insight into a topic and learn the fundamentals

Instructors: Natasha Balac, Ph.D.

Included with Coursera Plus

4.0

(3,318 reviews)

25 hours to complete

3 weeks at 8 hours a week

Flexible schedule

Learn at your own pace

Skills you'll gain

Details to know

Shareable certificate

Add to your LinkedIn profile

Assessments

11 quizzes

See how employees at top companies are mastering in-demand skills

Learn more about Coursera for Business

Earn a career certificate

Add this credential to your LinkedIn profile, resume, or CV

Share it on social media and in your performance review

There are 5 modules in this course

This course is for novice programmers or business people who would like to understand the core tools used to wrangle and analyze big data. With no prior experience, you will have the opportunity to walk through hands-on examples with Hadoop and Spark frameworks, two of the most common in the industry. You will be comfortable explaining the specific components and basic processes of the Hadoop architecture, software stack, and execution environment. In the assignments you will be guided in how data scientists apply the important concepts and techniques such as Map-Reduce that are used to solve fundamental problems in big data. You'll feel empowered to have conversations about big data and the data analysis process.

Welcome to the first module of the Big Data Platform course. This first module will provide insight into Big Data Hype, its technologies opportunities and challenges. We will take a deeper look into the Hadoop stack and tool and technologies associated with Big Data solutions.

What's included

7 videos4 readings1 quiz

7 videosTotal 52 minutes

Hadoop Stack Basics4 minutesPreview module
The Apache Framework: Basic Modules3 minutes
Hadoop Distributed File System (HDFS)5 minutes
The Hadoop "Zoo"5 minutes
Hadoop Ecosystem Major Components11 minutes
Exploring the Cloudera VM: Hands-On Part 116 minutes
Exploring the Cloudera VM: Hands-On Part 26 minutes

4 readingsTotal 40 minutes

Apache Hadoop Ecosystem10 minutes
Lesson 1 Slides (PDF)10 minutes
Hardware & Software Requirements10 minutes
Lesson 2 Slides - Cloudera VM Tour10 minutes

1 quizTotal 30 minutes

Basic Hadoop Stack30 minutes

In this module we will take a detailed look at the Hadoop stack ranging from the basic HDFS components, to application execution frameworks, and languages, services.

What's included

10 videos6 readings3 quizzes

10 videosTotal 69 minutes

Overview of the Hadoop Stack4 minutesPreview module
The Hadoop Distributed File System (HDFS) and HDFS28 minutes
MapReduce Framework and YARN8 minutes
The Hadoop Execution Environment4 minutes
YARN, Tez, and Spark11 minutes
Hadoop Resource Scheduling6 minutes
Hadoop-Based Applications3 minutes
Introduction to Apache Pig7 minutes
Introduction to Apache HIVE7 minutes
Introduction to Apache HBASE7 minutes

6 readingsTotal 60 minutes

Hadoop Basics - Lesson 1 Slides10 minutes
Lesson 2: Hadoop Execution Environment - Slides10 minutes
Lesson 3: Hadoop-Based Applications Overview - All Slides10 minutes
Command list for Applications Slides10 minutes
Tips to handle service connection errors10 minutes
References for Applications10 minutes

3 quizzesTotal 90 minutes

Overview of Hadoop Stack30 minutes
Hadoop Execution Environment30 minutes
Hadoop Applications30 minutes

In this module we will take a detailed look at the Hadoop Distributed File System (HDFS). We will cover the main design goals of HDFS, understand the read/write process to HDFS, the main configuration parameters that can be tuned to control HDFS performance and robustness, and get an overview of the different ways you can access data on HDFS.

What's included

9 videos5 readings3 quizzes

9 videosTotal 58 minutes

Overview of HDFS Architecture4 minutesPreview module
The HDFS Performance Envelope5 minutes
Read/Write Processes in HDFS4 minutes
HDFS Tuning Parameters6 minutes
HDFS Performance and Robustness9 minutes
Overview of HDFS Access, APIs, and Applications5 minutes
HDFS Commands8 minutes
Native Java API for HDFS4 minutes
REST API for HDFS8 minutes

5 readingsTotal 50 minutes

Lesson 1: Introduction to HDFS - Slides10 minutes
HDFS references10 minutes
Lesson 2: HDFS Performance and Tuning - Slides10 minutes
HDFS Access, APIs10 minutes
Lesson 3: HDFS Access, APIs, Applications - Slides10 minutes

3 quizzesTotal 90 minutes

HDFS Architecture30 minutes
HDFS performance,tuning, and robustness30 minutes
Accessing HDFS30 minutes

This module will introduce Map/Reduce concepts and practice. You will learn about the big idea of Map/Reduce and you will learn how to design, implement, and execute tasks in the map/reduce framework. You will also learn the trade-offs in map/reduce and how that motivates other tools.

What's included

9 videos3 readings1 quiz2 programming assignments

9 videosTotal 27 minutes

Introduction to Map/Reduce2 minutesPreview module
The Map/Reduce Framework2 minutes
A MapReduce Example: Wordcount in detail4 minutes
MapReduce: Intro to Examples and Principles2 minutes
MapReduce Example: Trending Wordcount1 minute
MapReduce Example: Joining Data4 minutes
MapReduce Example: Vector Multiplication2 minutes
Computational Costs of Vector Multiplication3 minutes
MapReduce Summary2 minutes

3 readingsTotal 30 minutes

Lesson 1: Introduction to MapReduce - Slides10 minutes
A note on debugging map/reduce programs.10 minutes
Lesson 2: MapReduce Examples and Principles - Slides10 minutes

1 quizTotal 30 minutes

Lesson 1 Review30 minutes

2 programming assignmentsTotal 360 minutes

Running Wordcount with Hadoop streaming, using Python code180 minutes
Joining Data180 minutes

Welcome to module 5, Introduction to Spark, this week we will focus on the Apache Spark cluster computing framework, an important contender of Hadoop MapReduce in the Big Data Arena. Spark provides great performance advantages over Hadoop MapReduce,especially for iterative algorithms, thanks to in-memory caching. Also, gives Data Scientists an easier way to write their analysis pipeline in Python and Scala,even providing interactive shells to play live with data.

What's included

10 videos4 readings3 quizzes2 programming assignments

10 videosTotal 70 minutes

Introduction to Apache Spark8 minutesPreview module
Architecture of Spark7 minutes
Resilient Distributed Datasets10 minutes
Spark Transformations10 minutes
Wide Transformations10 minutes
Directed Acyclic Graph (DAG) Scheduler8 minutes
Actions in Spark2 minutes
Memory Caching in Spark5 minutes
Broadcast Variables2 minutes
Accumulators1 minute

4 readingsTotal 40 minutes

Setup PySpark on the Cloudera VM10 minutes
Lesson 1: Intro to Apache Spark - Slides10 minutes
Lesson 2: RDD and Transformations - Slides10 minutes
Lesson 3: Scheduling, Actions, Caching - Slides10 minutes

3 quizzesTotal 90 minutes

Spark Lesson 130 minutes
Spark Lesson 230 minutes
Spark Lesson 330 minutes

2 programming assignmentsTotal 360 minutes

Simple Join in Spark180 minutes
Advanced Join in Spark180 minutes

Instructors

Instructor ratings

3.8 (92 ratings)

Natasha Balac, Ph.D.

University of California San Diego

4 Courses210,827 learners

Paul Rodriguez

University of California San Diego

3 Courses181,942 learners

Offered by

University of California San Diego

Recommended if you're interested in Data Analysis

Cloudera
Managing Big Data in Clusters and Cloud Storage
Course
IBM
Introduction to Big Data with Spark and Hadoop
Course
Fudan University
基于Unity引擎的游戏开发进阶
Course
Rice University
Parallel, Concurrent, and Distributed Programming in Java
Specialization

Why people choose Coursera for their career

Felipe M.

Learner since 2018

"To be able to take courses at my own pace and rhythm has been an amazing experience. I can learn whenever it fits my schedule and mood."

Jennifer J.

Learner since 2020

"I directly applied the concepts and skills I learned from my courses to an exciting new project at work."

Larry W.

Learner since 2021

"When I need courses on topics that my university doesn't offer, Coursera is one of the best places to go."

Chaitanya A.

"Learning isn't just about being better at your job: it's so much more than that. Coursera allows me to learn without limits."

Learner reviews

Showing 3 of 3318

4.0

3,318 reviews

5 stars
45.27%
4 stars
28.14%
3 stars
12.36%
2 stars
6.75%
1 star
7.45%

Reviewed on Oct 5, 2016

Reviewed on Jan 31, 2016

Reviewed on Oct 25, 2020

View more reviews

New to Data Analysis? Start here.

Open new doors with Coursera Plus

Unlimited access to 7,000+ world-class courses, hands-on projects, and job-ready certificate programs - all included in your subscription

Learn more

Advance your career with an online degree

Earn a degree from world-class universities - 100% online

Explore degrees

Join over 3,400 global companies that choose Coursera for Business

Upskill your employees to excel in the digital economy

Learn more

Frequently asked questions

Access to lectures and assignments depends on your type of enrollment. If you take a course in audit mode, you will be able to see most course materials for free. To access graded assignments and to earn a Certificate, you will need to purchase the Certificate experience, during or after your audit. If you don't see the audit option:

The course may not offer an audit option. You can try a Free Trial instead, or apply for Financial Aid.
The course may offer 'Full Course, No Certificate' instead. This option lets you see all course materials, submit required assessments, and get a final grade. This also means that you will not be able to purchase a Certificate experience.

When you enroll in the course, you get access to all of the courses in the Specialization, and you earn a certificate when you complete the work. Your electronic Certificate will be added to your Accomplishments page - from there, you can print your Certificate or add it to your LinkedIn profile. If you only want to read and view the course content, you can audit the course for free.

If you subscribed, you get a 7-day free trial during which you can cancel at no penalty. After that, we don’t give refunds, but you can cancel your subscription at any time. See our full refund policy.

Hadoop Platform and Application Framework

Course

Skills you'll gain

Details to know

See how employees at top companies are mastering in-demand skills

Earn a career certificate

There are 5 modules in this course

Hadoop Basics

What's included

Introduction to the Hadoop Stack

What's included

Introduction to Hadoop Distributed File System (HDFS)

What's included

Introduction to Map/Reduce

What's included

Spark

What's included

Instructors

Offered by

Recommended if you're interested in Data Analysis

Managing Big Data in Clusters and Cloud Storage

Introduction to Big Data with Spark and Hadoop

基于Unity引擎的游戏开发进阶

Parallel, Concurrent, and Distributed Programming in Java

Why people choose Coursera for their career

Learner reviews

New to Data Analysis? Start here.

Open new doors with Coursera Plus

Advance your career with an online degree

Join over 3,400 global companies that choose Coursera for Business

Frequently asked questions

When will I have access to the lectures and assignments?

What will I get if I subscribe to this Specialization?

What is the refund policy?

More questions