Learn how to use Python and Spark 3.0 (PySpark) for Data Engineering and Data Analytics on Big Data Cloud Platforms


The key objectives of this course are as follows;

  • Learn Spark Architecture
  • Learn Spark Execution Concepts
  • Learn Spark Transformations and Actions using the Structured API
  • Learn Spark Transformations and Actions using the RDD (Resilient Distributed Datasets) API
  • Learn how to set up your own local PySpark Environment
  • Learn how to interpret the Spark Web UI
  • Learn how to interpret DAG (Directed Acyclic Graph) for Spark Execution

The Python Spark project that we are going to do together;

Sales Data

  • Create a Spark Session
  • Read a CSV file into a Spark Dataframe
  • Learn to Infer a Schema
  • Select data from the Spark Dataframe
  • Produce analytics that shows the topmost sales orders per Region and Country

Who this course is for:

  • Python Developers who wish to learn how to use the language for Data Engineering and Analytics with PySpark
  • Aspiring Data Engineering and Analytics Professionals
  • Data Scientists / Analysts who wish to learn an analytical processing strategy that can be deployed over a big data cluster
  • Data Managers who want to gain a deeper understanding of managing data over a cluster
