fbpx

Big Data Analysis and Machine Learning using PySpark

Optimize Large-Scale Data Processing and Machine Learning with PySpark.

  • Schedule

    21 – 23 January 2025

    18.30 – 21.30 (WIB)

  • Online-Interactive Learning

    Via Zoom

  • Investment

    Rp. 1.500.000

Course Summary

This workshop is designed for individuals eager to dive into the world of big data and machine learning using PySpark. It covers the fundamentals of Python and PySpark, providing participants with the tools to manipulate and analyze large-scale data efficiently. By focusing on key PySpark concepts, such as data abstraction (RDD, DataFrame, Dataset) and lazy evaluation, participants will gain the skills necessary to handle, preprocess, and analyze big data effectively.

Throughout the course, participants will engage in hands-on learning with a rich, interactive experience. Our Instructor and two Teaching Assistants will guide participants through the material, offering support and troubleshooting assistance whenever needed.

Learning Outcomes

Upon completion of this workshop, you will be able to:

  • Gain foundational knowledge of PySpark and its architecture.
  • Learn to manipulate and analyze structured data using PySpark DataFrames.
  • Develop proficiency in using PySpark for exploratory data analysis.
  • Build a solid foundation in machine learning with PySpark.
  • Learn to train and evaluate machine learning models in PySpark.

Syllabus

  • Introduction to Python for Data Analysis
  • Overview of Big Data and PySpark
  • PySpark data abstraction: RDD, dataframe, dataset (Lazy Evaluation)
  • Setting Up PySpark Environment and Jupyter Notebooks
  • Connecting PySpark to data source (CSV files)
  • Inspect and change data type
  • Slicing dataframes
  • Conditional subsetting
  • Data aggregation using groupby and agg
  • Summarizing Data with Aggregation Function (e.g., sum, mean, count)
  • Load Data & Data preprocessing for Machine Learning
  • Train-test Split in PySpark
  • Predictive Analysis with ML: Linear Regression
  • Model Training and Prediction with PySpark MLlib
  • Model Evaluation

STUDENT TESTIMONIALS

This testimonial video is taken after our previous Online Data Science Series: Time Series Analysis for Business Forecasting.

LEARN FROM ANYWHERE

Our learning format is online-interactive, you will feel the interactive experience as if you were present in a physical classroom. You can access the class using your Zoom account on pre-defined dates.

  • LEARN AT YOUR OWN PACE

    Zoom recording, course Books (PDF & HTML files), the dataset for practice, reference notes, and working files are accessible through our Learning Management System account.

  • PROOF YOUR MASTERY

    Show current and prospective employers of your mastery in computer vision with a signed certificate of completion.

  • CONNECT WITH LIKE MINDED PEOPLE

    Be a part of our data-passionate community with 5000+ members and 1000+ alumni.

FOR ABSOLUTE BEGINNERS

Workshops in this series are tailored to casual programmers and non-programmers that are taking their first steps into data science. It assumes no prior knowledge or academic background, and attendees will be introduced to the beautiful art of writing R / Python code to produce data visualization and build machine learning models. The workshop has a gentle learning slope that is designed with non-technical professionals and academics in mind.

Yes, you can still attend the workshop as it is a beginner-friendly workshop.

Our system will send you an email containing a link and details to join a Google Classroom.

Online learning will be conducted via Zoom.us, Link to join the Zoom Class will be announced via Google Classroom.

Learning materials can be obtain via Google Classroom

Yes, you will receive a certificate of completion.

YOUR INSTRUCTOR

Dyah Nurlita

Sr. Data Science Instructor at Algoritma Data Science School

Dyah Nurlita is an experienced Sr. Data Science Instructor at Algoritma Data Science School, specializing in providing comprehensive training in data science to corporate clients. With a track record of successfully conducting training sessions for esteemed organizations such as Jasa Raharja, Pertamina Hulu Mahakam, Perusahaan Listrik Negara (PLN), and PT. Bank Central Asia (BCA), Lita has honed her expertise in various essential areas of data science. She excels in utilizing Python for Data Analysis, conducting Explanatory Data Analysis, performing Data Wrangling and Visualization, leveraging SQL for Data Manipulation, and applying Programming for Data Science.