Data Engineering Certification – Apache Spark, Hadoop, Google Cloud Platform (GCP), Kafka and Azure Cloud

Course Content
My SQL- Data Manipulation Language And Table Alteration
-
My SQL- Null,UpdateAnd Delete DML queries
-
My SQL- Alter Table
-
Data Maniplulation Practise Questions
My SQL-Data Manipulation Language And Table Alteration
-
My SQL- Null,UpdateAnd Delete DML queries
-
My SQL- Alter Table
-
Data Maniplulation Practise Questions
Python Fundamentals
-
Getting Started with Python
-
Anaconda And VS Code Installation For Python
-
Getting Started With VS Code And Environment
-
Python Basics-Syntax and Semantics
-
Basics Data Types
-
Operators In Python
-
Conditional Statements In Python
-
Loops In Python
-
List In Python
-
Practical Examples Of List
-
Sets In Python
-
Tuples In Python
-
Dictionaries In Python
-
Functions In Python
-
Python Function Examples
-
Lambda Functions In Python
-
Map functions In Python
-
Python Filter Function
-
Import Modules And Packages In Python
-
Standard Library Overview
-
File Operation In Python
-
Working With File Paths
-
Exception Handling In Python
-
OOPS In Python
-
Inheritance In Python
-
Polymorphism In Python
-
Encapsulation In Python
-
Abstraction In Python
-
Magic Methods In Python
-
Custom Exception In Python
-
Operator OverLoading In Python
-
Iterators In Python
-
Generators In Pytho
-
Decorators In Python
-
Working With Numpy In Python
-
Pandas DataFrame And Series
-
Data Manipulation And Analysis
-
Data Source Reading
Working With Databases and Python
-
Variables In Python
-
Python With Sqllite
Logging In Python
-
Logging In Python
-
Logging With Multiple Loggers
-
Logging In a Real World Examples
-
Python Outro
Prerequisites My SQL Tutorials
-
SQL Section Intro
-
Basic To Intermediate MySQL Tutorials
Introduction To Big Data
-
Section Intro
-
1. What is Big Data – A Practical Example
-
5 V’s of Big Data
-
Designing a Good Big Data System
-
On-Premise Infra vs Cloud Solutions
-
Data Lake vs Data Warehouse vs Data Lake
-
ETL vs ELT
-
What does a Data engineer do & Where Big Data Fit in ?
-
Big Data and Distributed Systems
Hadoop Architecture
-
Section Intro
-
Introduction To Hadoop
-
Properties of Hadoop
-
Hadoop Ecosystem – Main Components
-
Hadoop Ecosystem – Components
MySQL- Different Types Of Constraints
-
1-MySQL Constraints-Primary Key,Foreign Key,Unique,Not Null Constraints
-
MYSQL Constraint- Default, Index,Candidate Keys
-
More Videos On MySQL
HDFS Architecture
-
Intro to HDFS and Common Terminology
-
Why HDFS
-
HDFS Architecture
-
Blocks In HDFS
-
Replication Factor in HDFS
-
Rack Awareness in HDFS
-
Node Failure
-
Create GCP Account
Hadoop Data Proc Cluster on Google Cloud
-
Data Node Failure – Temporary
-
Data Node Failure – Permanent
-
Secondary Name Node
-
Standby Name Node
-
Hadoop HA Architecture
-
Data Write in HDFS
-
Read Request in HDFS
Google Cloud Platform & Hadoop
-
GCP Hadoop Cluster Creation
-
GCP Cluster Best Practices
-
Linux Commands -1
-
Linux Commands -2
-
HDFS Commands
-
Hadoop Outro
Map Reduce
-
Map Reduce Intro
-
Intro To Distributed Processing
-
Map Reduce Introduction
-
Map Reduce & Cluster
-
Map reduce Practical Part 1
-
MR Example Part 2
-
MR Practical with 1 reducer
-
MR with 2 Reducer Practical
-
Combiner in MR
-
Map Reduce with 0 Reducer
-
MR on Big Log File
-
nput Split in MR
-
Map Reduce Outro
Yarn
-
YARN Section Intro
-
YARN Introduction
-
Components of YARN
-
YARN Analogy
-
YARN Process Step by step
Higher Order Function, Lambda, Map and Filter in Python (Revise)
-
Higher Order Functions
-
Lambda Functions
-
Map, Filter and Reduce
Apache Spark
-
Spark Section Intro
-
Spark Introduction
-
Spark Common questions
-
Limitations of MR
-
What is Spark and Its Features
-
Spark Ecosystem
-
Executing Code In Spark
-
Word Count Program in Spar
-
Ways to run Spark
-
Transformation vs Action
-
Why Is Spark Lazy
Spark Core API – RDD
-
What is Spark RDD
-
How Spark reads the data
-
Spark Read Data and Partitioning
-
Data Generation + Project Steps
-
Spark RDD Operations – Part 1
-
Spark RDD Operations – 2
-
Jobs Stages and Task in Spark UI
-
GroupByKey vs ReduceByKey Part 1
-
ReduceByKey vs GroupByKey Part 2
-
Increasing or Decreasing the Number of Partitions
-
Repartition vs Coalesce
-
Higher Level APIs – Dataframe
-
Spark Higher Level APIs – Spark Tables
Spark Dataframe
-
Spark Dataframe Intro
-
DataFrames in Spark
-
DataFrame – Reading from HDFS
-
Spark Read – Transformation or Action
-
Schema Enforcement in Spark
-
Read Modes in Spark
-
Write in Spark
-
Spark Operations
-
Handling Data Types in PySpark
-
Handling Date Type
Spark Table and Spark SQL
-
Spark SQL Intro
-
Spark Tables
-
Spark Table – Temporary
-
Spark Table – Global Temporary
-
Spark Tables – Persistent Table
-
Spark SQL
-
Spark – Managed vs External Table
-
Spark Creating DataFrame
Caching In Spark
-
Spark Caching Intro
-
Introduction To Persist and Caching
-
Difference Between Persist and Caching
-
Some Common Questions about Caching
-
RDD Caching – Small File
-
Spark RDD Caching – Big File
-
Caching DF in Spark
-
Caching DF – Large File 1
-
Spark DF Caching – Part 2
-
Spark Table Caching
Spark Architecture
-
Spark Architecture Intro
-
Spark Architecture – Run Mode
-
Spark’s Distributed Nature and In-Memory Computation
-
Spark Architecture and Components
-
Spark on Standalone cluster
-
YARN (Revision) – Component of YARN
-
YARN (Revision) – Step by Step Process
-
Yarn on Spark Architecture + UI
-
Difference Between Standalone and on Yarn
-
Deployment Modes in Spark
Deployment Modes in Spark
Project 1 Spark – Extracting Customer and Orders insight
-
Spark Projects Intro
-
Read Data
-
Process Customer Data
-
Actionable Insight from Customer and Order Datase
Project 1 Spark – Extracting Customer and Orders insight
Project 1 Spark – Extracting Customer and Orders insight
-
Spark Projects Intro
-
Read Data
-
Process Customer Data
-
Actionable Insight from Customer and Order Dataset
Spark Project 2 – Real World Data
-
Anatomy of Project – Real E-commerce Dataset
-
Exploration and Understanding Of Data
-
Data Ingestion into Dataproc Cluster
-
Data Exploration – 1
-
Data Exploration
-
Module 2 – Data Cleaning and Transformation
-
Data Cleaning & Transformation
-
Module 3 – Data Integration and Aggregation
-
Data Integration – Joining All Datasets
-
Optimized Joins Aggregation and Window Function
-
Advance Data Aggregations
-
Advance Enrichment
-
Module 4 – Spark Configuration Optimization
-
Join Optimization Strategies
-
Data Serving Layer
Hive
-
Hive Intro
-
Introduction To Hive
-
How Hive makes Big Data Processing Easier
-
Some Common Questions/Misconceptions about Hive
-
Hive Practical – Connecting to Hive via terminal and Beeline
-
Hive Practical 2 – Creating and Querying Table
-
Accessing Metadata in Hive
-
Hive Architecture & Components
-
Hive Query Flow
-
Derby DB in Hive
Kafka
-
Kafka Intro
-
Introduction To Apache Kafka
-
Why Kafka and Its Use Cases.mp4
-
kafka Architecture
-
Ways to run Kafka
-
Creating a confluent Kafka Cluster (No CC required)
-
Producing Message to Kafka Cluster
-
Kafka Producers Send Multiple Messages
-
Callback Poll and Flush
-
Consuming Message from Kafka
-
Confluent Kafka on CLI
Complete Basic To Advance Dockers
-
Docker and Airflow Intro
-
Introduction To Docker Series
-
What are Dockers And Containers
-
Docker Images vs Containers
-
Dockers vs Virtual Machines
-
Dockers Installation
-
Creating A Docker Image
-
Docker Basic Commands
-
Push Docker Image To Docker Hub
-
Docker Compose
Getting Started With Airflow
-
Introduction To Apache Airflow
-
Key Components Of Apache Airflow
-
Why AirFlow For Big Data And MLOPS
-
Setting Up Airflow With Astro
-
Building Your First DAG With Airflow
-
Designing Mathematical Calculation DAG With Airflow
-
Getting Started With TaskFlow API Using Apache Airflow
Airflow ETL Pipeline with Postgres and API Integration In ASTRO Cloud And AWS
-
Introduction To ETL Pipeline
-
ETL Problem Statement And Project Structure Set Up
-
Defining ETL DAG With Implementing Steps
-
Step 1- Setting Up Postgres And Creating Table Task In Postgres
-
Step 2- NASA API Integration With Extract Pipeline
-
Step 3- Building Transformation And Load Pipeline
-
ETL Pipeline Final Implementation With AirFlow Connection Set Up
-
ETL Pipeline Deployment In Astro Cloud And AWS
Databricks
-
Databricks Intro
-
What is Databricks
-
Why Databricks
-
Create Databricks Community Account
-
Databricks UI Walkthrough
-
Understanding Databricks Architecture
-
Databricks File System
Databricks – Project
-
Read Data in Databricks
-
Process Data on Databricks – Customer Data
-
Processing Customer with Orders – Actionable Insights
Azure Cloud
-
Azure Intro
-
Creating an Azure Account
-
Azure Cloud Overview
Azure Cloud Project Part 1
-
Complete Project Resources
-
Pre-Requisite
-
Project Architecture
-
Creating Azure Account
-
Azure Cloud Overview
-
Dataset Overview : Olist Dataset
-
SQL DB & Data Ingestion
-
Resource & Resource Group in Azure
-
Azure Data Factory
-
ADLS Gen 2 Storage Account
-
Medallion Architecture Overview
-
ngestion With Azure Data Factory
-
Real Time Ingestion with Azure Data Factory
-
Parametrized Ingestion with ADF
-
Azure Databricks Account Creation
-
Azure Databricks Overview
-
Azure Databricks UI Overview
-
Creating Compute and Notebook
-
MongoDB ingestion to Databricks
-
Azure Databricks Workflow Overview
-
ADLS Gen2 Datalake to Databricks Conenction
-
Accessing ADLS Gen2 Data
Azure Cloud Project Part 2
-
Quick Revision
-
Data Enrichment
-
Accessing Data in Databricks
-
Read Data in Databricks
-
Spark Transformation
-
Mongo DB data for Enrichment
-
Data Cleaning
-
Extracting Insights from Data
-
Spark – Transformation vs Action
-
Joining Data
-
Enriching Data via MongoDB
-
Visualizing Data in Databricks
-
Exporting Data to Silver Layer
-
Azure Synapse Overview and Account Creation
-
Synapse UI Overview
-
Synapse To Lake Access
-
SQL Pool – Dedicated vs Serverless
-
Access Lake Data
-
Create Gold View and Schema
-
Azure Synapse Workflow
-
CETAS Understanding
-
Create External Serving Table in Gold
-
Serving Layer Completed :)
-
Visualization Flow
-
Thank you and Congratulations on completing an industry level Project
Student Ratings & Reviews
No Review Yet