Learn how to use Python and Spark 3.0 (PySpark) for Data Engineering and Data Analytics on Big Data Cloud Platforms
Description
The key objectives of this course are as follows;
- Learn Spark Architecture
- Learn Spark Execution Concepts
- Learn Spark Transformations and Actions using the Structured API
- Learn Spark Transformations and Actions using the RDD (Resilient Distributed Datasets) API
- Learn how to set up your own local PySpark Environment
- Learn how to interpret the Spark Web UI
- Learn how to interpret DAG (Directed Acyclic Graph) for Spark Execution
The Python Spark project that we are going to do together;
Sales Data
- Create a Spark Session
- Read a CSV file into a Spark Dataframe
- Learn to Infer a Schema
- Select data from the Spark Dataframe
- Produce analytics that shows the topmost sales orders per Region and Country
Who this course is for:
- Python Developers who wish to learn how to use the language for Data Engineering and Analytics with PySpark
- Aspiring Data Engineering and Analytics Professionals
- Data Scientists / Analysts who wish to learn an analytical processing strategy that can be deployed over a big data cluster
- Data Managers who want to gain a deeper understanding of managing data over a cluster
[maxbutton id=”1″ url=”https://www.udemy.com/course/introduction-to-python-for-big-data-engineering-with-pyspark/” ]