Data Engineering Certification - Apache Spark, Hadoop, Google Cloud Platform (GCP), Kafka and Azure Cloud

My SQL- Data Manipulation Language And Table Alteration

My SQL- Null,UpdateAnd Delete DML queries
My SQL- Alter Table
Data Maniplulation Practise Questions

My SQL-Data Manipulation Language And Table Alteration

Python Fundamentals

Getting Started with Python
Anaconda And VS Code Installation For Python
Getting Started With VS Code And Environment
Python Basics-Syntax and Semantics
Basics Data Types
Operators In Python
Conditional Statements In Python
Loops In Python
List In Python
Practical Examples Of List
Sets In Python
Tuples In Python
Dictionaries In Python
Functions In Python
Python Function Examples
Lambda Functions In Python
Map functions In Python
Python Filter Function
Import Modules And Packages In Python
Standard Library Overview
File Operation In Python
Working With File Paths
Exception Handling In Python
OOPS In Python
Inheritance In Python
Polymorphism In Python
Encapsulation In Python
Abstraction In Python
Magic Methods In Python
Custom Exception In Python
Operator OverLoading In Python
Iterators In Python
Generators In Pytho
Decorators In Python
Working With Numpy In Python
Pandas DataFrame And Series
Data Manipulation And Analysis
Data Source Reading

Working With Databases and Python

Logging In Python

Prerequisites My SQL Tutorials

Introduction To Big Data

Section Intro
1. What is Big Data – A Practical Example
5 V’s of Big Data
Designing a Good Big Data System
On-Premise Infra vs Cloud Solutions
Data Lake vs Data Warehouse vs Data Lake
ETL vs ELT
What does a Data engineer do & Where Big Data Fit in ?
Big Data and Distributed Systems

Hadoop Architecture

MySQL- Different Types Of Constraints

HDFS Architecture

Hadoop Data Proc Cluster on Google Cloud

Google Cloud Platform & Hadoop

Map Reduce

Map Reduce Intro
Intro To Distributed Processing
Map Reduce Introduction
Map Reduce & Cluster
Map reduce Practical Part 1
MR Example Part 2
MR Practical with 1 reducer
MR with 2 Reducer Practical
Combiner in MR
Map Reduce with 0 Reducer
MR on Big Log File
nput Split in MR
Map Reduce Outro

Yarn

Higher Order Function, Lambda, Map and Filter in Python (Revise)

Apache Spark

Spark Section Intro
Spark Introduction
Spark Common questions
Limitations of MR
What is Spark and Its Features
Spark Ecosystem
Executing Code In Spark
Word Count Program in Spar
Ways to run Spark
Transformation vs Action
Why Is Spark Lazy

Spark Core API – RDD

What is Spark RDD
How Spark reads the data
Spark Read Data and Partitioning
Data Generation + Project Steps
Spark RDD Operations – Part 1
Spark RDD Operations – 2
Jobs Stages and Task in Spark UI
GroupByKey vs ReduceByKey Part 1
ReduceByKey vs GroupByKey Part 2
Increasing or Decreasing the Number of Partitions
Repartition vs Coalesce
Higher Level APIs – Dataframe
Spark Higher Level APIs – Spark Tables

Spark Dataframe

Spark Dataframe Intro
DataFrames in Spark
DataFrame – Reading from HDFS
Spark Read – Transformation or Action
Schema Enforcement in Spark
Read Modes in Spark
Write in Spark
Spark Operations
Handling Data Types in PySpark
Handling Date Type

Spark Table and Spark SQL

Caching In Spark

Spark Caching Intro
Introduction To Persist and Caching
Difference Between Persist and Caching
Some Common Questions about Caching
RDD Caching – Small File
Spark RDD Caching – Big File
Caching DF in Spark
Caching DF – Large File 1
Spark DF Caching – Part 2
Spark Table Caching

Spark Architecture

Spark Architecture Intro
Spark Architecture – Run Mode
Spark’s Distributed Nature and In-Memory Computation
Spark Architecture and Components
Spark on Standalone cluster
YARN (Revision) – Component of YARN
YARN (Revision) – Step by Step Process
Yarn on Spark Architecture + UI
Difference Between Standalone and on Yarn
Deployment Modes in Spark

Deployment Modes in Spark

Project 1 Spark – Extracting Customer and Orders insight

Spark Project 2 – Real World Data

Anatomy of Project – Real E-commerce Dataset
Exploration and Understanding Of Data
Data Ingestion into Dataproc Cluster
Data Exploration – 1
Data Exploration
Module 2 – Data Cleaning and Transformation
Data Cleaning & Transformation
Module 3 – Data Integration and Aggregation
Data Integration – Joining All Datasets
Optimized Joins Aggregation and Window Function
Advance Data Aggregations
Advance Enrichment
Module 4 – Spark Configuration Optimization
Join Optimization Strategies
Data Serving Layer

Hive

Hive Intro
Introduction To Hive
How Hive makes Big Data Processing Easier
Some Common Questions/Misconceptions about Hive
Hive Practical – Connecting to Hive via terminal and Beeline
Hive Practical 2 – Creating and Querying Table
Accessing Metadata in Hive
Hive Architecture & Components
Hive Query Flow
Derby DB in Hive

Kafka

Kafka Intro
Introduction To Apache Kafka
Why Kafka and Its Use Cases.mp4
kafka Architecture
Ways to run Kafka
Creating a confluent Kafka Cluster (No CC required)
Producing Message to Kafka Cluster
Kafka Producers Send Multiple Messages
Callback Poll and Flush
Consuming Message from Kafka
Confluent Kafka on CLI

Complete Basic To Advance Dockers

Docker and Airflow Intro
Introduction To Docker Series
What are Dockers And Containers
Docker Images vs Containers
Dockers vs Virtual Machines
Dockers Installation
Creating A Docker Image
Docker Basic Commands
Push Docker Image To Docker Hub
Docker Compose

Getting Started With Airflow

Airflow ETL Pipeline with Postgres and API Integration In ASTRO Cloud And AWS

Databricks

Databricks – Project

Azure Cloud

Azure Cloud Project Part 1

Complete Project Resources
Pre-Requisite
Project Architecture
Creating Azure Account
Azure Cloud Overview
Dataset Overview : Olist Dataset
SQL DB & Data Ingestion
Resource & Resource Group in Azure
Azure Data Factory
ADLS Gen 2 Storage Account
Medallion Architecture Overview
ngestion With Azure Data Factory
Real Time Ingestion with Azure Data Factory
Parametrized Ingestion with ADF
Azure Databricks Account Creation
Azure Databricks Overview
Azure Databricks UI Overview
Creating Compute and Notebook
MongoDB ingestion to Databricks
Azure Databricks Workflow Overview
ADLS Gen2 Datalake to Databricks Conenction
Accessing ADLS Gen2 Data

Azure Cloud Project Part 2

Quick Revision
Data Enrichment
Accessing Data in Databricks
Read Data in Databricks
Spark Transformation
Mongo DB data for Enrichment
Data Cleaning
Extracting Insights from Data
Spark – Transformation vs Action
Joining Data
Enriching Data via MongoDB
Visualizing Data in Databricks
Exporting Data to Silver Layer
Azure Synapse Overview and Account Creation
Synapse UI Overview
Synapse To Lake Access
SQL Pool – Dedicated vs Serverless
Access Lake Data
Create Gold View and Schema
Azure Synapse Workflow
CETAS Understanding
Create External Serving Table in Gold
Serving Layer Completed :)
Visualization Flow
Thank you and Congratulations on completing an industry level Project

Course Content

My SQL- Data Manipulation Language And Table Alteration

My SQL- Null,UpdateAnd Delete DML queries

My SQL- Alter Table

Data Maniplulation Practise Questions

My SQL-Data Manipulation Language And Table Alteration

My SQL- Null,UpdateAnd Delete DML queries

My SQL- Alter Table

Data Maniplulation Practise Questions

Python Fundamentals

Getting Started with Python

Anaconda And VS Code Installation For Python

Getting Started With VS Code And Environment

Python Basics-Syntax and Semantics

Basics Data Types

Operators In Python

Conditional Statements In Python

Loops In Python

List In Python

Practical Examples Of List

Sets In Python

Tuples In Python

Dictionaries In Python

Functions In Python

Python Function Examples

Lambda Functions In Python

Map functions In Python

Python Filter Function

Import Modules And Packages In Python

Standard Library Overview

File Operation In Python

Working With File Paths

Exception Handling In Python

OOPS In Python

Inheritance In Python

Polymorphism In Python

Encapsulation In Python

Abstraction In Python

Magic Methods In Python

Custom Exception In Python

Operator OverLoading In Python

Iterators In Python

Generators In Pytho

Decorators In Python

Working With Numpy In Python

Pandas DataFrame And Series

Data Manipulation And Analysis

Data Source Reading

Working With Databases and Python

Variables In Python

Python With Sqllite

Logging In Python

Logging In Python

Logging With Multiple Loggers

Logging In a Real World Examples

Python Outro

Prerequisites My SQL Tutorials

SQL Section Intro

Basic To Intermediate MySQL Tutorials

Introduction To Big Data

Section Intro

1. What is Big Data – A Practical Example

5 V’s of Big Data

Designing a Good Big Data System

On-Premise Infra vs Cloud Solutions

Data Lake vs Data Warehouse vs Data Lake

ETL vs ELT

What does a Data engineer do & Where Big Data Fit in ?

Big Data and Distributed Systems

Hadoop Architecture

Section Intro

Introduction To Hadoop

Properties of Hadoop

Hadoop Ecosystem – Main Components

Hadoop Ecosystem – Components

MySQL- Different Types Of Constraints

1-MySQL Constraints-Primary Key,Foreign Key,Unique,Not Null Constraints

MYSQL Constraint- Default, Index,Candidate Keys

More Videos On MySQL

HDFS Architecture