fbpx

Machine Learning: Classification 1

The science of solving classification tasks

Ad-Hoc Course Registration:


  • Date: 11 – 14 January 2021
  • Time: 18.30 – 21.30
  • Venue: Menara Kadin Lantai 4, Jl. H. Rasuna Said, Jakarta Selatan
  • Investment: Rp. 5.200.000
  • Date: 11 – 14 January 2021
  • Time: 18.30 – 21.30
  • Investment: Rp. 2.600.000

REGISTER

Course details :


Learn to solve binary and multi-class classification models using machine learning algorithms that are easily understood and readily interpretable. You will learn to write a classification algorithm from scratch, and appreciate the mathematical foundations underpinning logistic regressions and nearest neighbors algorithms.

We strongly recommend that you complete the Regression Models workshop prior to taking this course. Upon completion of this workshop, you will acquire the depth to develop, apply, and evaluate two highly versatile algorithms widely used today.


Schedule


  • Relating Probabilities to Odds

    Day 1

  • Logistic Regression

    Day 2

  • Practical Tips and Case Study

    Day 2

  • Performance Evaluation and Model Selection

    Day 3

  • Learn-by-Building

    Day 4

Course Producer


Samuel Chan

An  RStudio-certified instructor and machine learning practitioner in the field of marketing automation, fraud detection, finance and e-commerce.  Samuel is Indonesia’s top-ranked Stack Overflow user in R (top 5% worldwide) for three years running, and boasts certifications from RStudio, Microsoft, MongoDB, Neo4J Database, Stanford University, John Hopkins University, among others.

Prior to Algoritma, he has 8 years of working experience, including a stint as in-house consultant to several public-trading companies from his time staying in China, Japan and Singapore. He is today an active trainer and consultant for various companies in the financial industry. He has guest lectured in various campuses: Binus, NUS (National University of Singapore)’s The Logistics Institute, University of Indonesia, Universitas Gadjah Mada (UGM), Binus, Institute of Technology Bandung (ITB), Telkom University etc. Courses he authored are offered also in Singapore through Ngee Ann Polytechnic.

Samuel is also among the first recipients of Microsoft Professional Program Certificate in Data Science in Southeast Asia, having demonstrated proficiency in R, Python, Microsoft Azure, SQL / T-SQL, PowerBI and a list of other technologies, and among the first to be certified in RStudio’s program. Technical committee member and competition judge on Finhacks 2018, the largest Machine Learning competition of the year organized by PT. Bank Central Asia (BCA) and DailySocial.

4-Day Workshop Modules

Syllabus: Classification in Machine Learning 1

Module 1: Logistic Regression


Relating Probabilities to Odds

  • Understanding Odds
  • Understanding Log of Odds
  • Plotting Odds and Log of Odds

Logistic Regression from First Principles

  • Sigmoidal Logistic Function
  • Key Assumptions of Sigmoid Function
  • Extra Proof: Intuition Behind The
  • Sigmoid Function

Logistic Regression in Action

  • Binary Logistic Regression
  • Interpreting Coefficients
  • Interpretation Against Continuous & Discrete Variables

Practical Tips and Case Study

  • Flight Delay Prediction Examples
  • Customer Churn and Attrition Examples
  • Risk Modeling on Loans from Quarter 4, 2017

Performance Evaluation and Model Selection

  • AIC (Akaike Information Criteria)
  • Null Deviance and Residual Deviance
  • Hauck Donner Effect

Module 2: Nearest Neighbours
Algorithm


Closer Look at Classification

  • Probabilities vs Class responses
  • Cross Validation and Out-of sample error
  • Bias-variance trade off
  • Confusion matrix (accuracy, sensitivity, specificity, & precision)

k-NN in Action

  • Characteristics of k-NN
  • Positives and Negatives
  • Diagnosing Breast Cancer with k-NN

Building Blocks of k-NN

  • Distance Function (Euclidean, Minkowsky)
  • The k Parameter
  • Standardization vs Min-Max Normalization

k-NN from First Principles

  • Classifying Customer Segments with k-NN
  • Writing Your Own k-NN Classifier
  • Predicting Using Your Own k-NN Classifier

Academy Modules


Graded Quiz

Learning-by-Building Module (3 Points)

Logistic Regression on Credit Risk

  • Applying what you’ve learned, present a simple R Markdown document in which you demonstrate the use of logistic regression on the lbb_loans.csv dataset. Explain your findings wherever necessary and show the necessary data preparation steps. To help you through the exercise, consider the following questions throughout the document:
    • How do we correctly interpret the negative coefficients obtained from your logistic regression?
    • How do we know which of the variables are more statistically significant as predictors?
    • What are some strategies to improve your model?

Customer Segment Prediction

  • Applying what you’ve learned, present a simple R Markdown document in which you demonstrate the use of k-NN on the wholesale.csv dataset. Compare the k-NN to the logistic regression model and answer the following questions throughout the document:
    • What is your accuracy? Was the logistic regression better than k-NN in terms of accuracy? (recall the lesson on obtaining an unbiased estimate of the model’s accuracy)
    • Was the logistic regression better than our kNN model at explaining which of the variables are good predictors of a customer’s industry?
    • List down 1 disadvantage and 1 strength of each of the approach (k-NN and logistic regression)

Ad-Hoc Course Registration:


  • Date: 11 – 14 January 2021
  • Time: 18.30 – 21.30
  • Venue: Menara Kadin Lantai 4, Jl. H. Rasuna Said, Jakarta Selatan
  • Investment: Rp. 5.200.000
  • Date: 11 – 14 January 2021
  • Time: 18.30 – 21.30
  • Investment: Rp. 2.600.000

REGISTER

Workshop Receivables:


  • Workshop Lecturer’s Notes

    Including 2x Course Books (PDF), HTML files, course transcripts (if any).

  • Highly-accelerated Learning

    Learn under the assistance of mentorship of our lead instructor and a band of qualified teaching assistants throughout the 4-day course.

  • Certification of Completion

    Show current and prospective employers that you’ve completed the course with a signed certificate of completion.

  • Quality Learning Environment

    We pay meticulous attention to the logistical details of our workshops: quality audio and visual setups, comfortable sitting arrangements, small group size. Dinners are included for evening workshops.

  • Supplement Materials

    Receive supplement datasets to practice on, reference notes, working files (R Notebook or Jupyter Notebook), and other materials that will help you master the topics.

This workshop is recommended for:

The Machine Learning: Classification 1 workshop is an intermediate-level programming workshop best suited to R programmers that are taking their first steps into data science and machine learning.

Students are assumed to have a working knowledge of R and have completed the necessary pre-requisites. Consider taking the pre-requisite course or a beginner-level course instead if you have no prior programming experience or statistics knowledge.


Past Workshops in this Series:



Students work through tons of real-life examples using sample datasets donated by our team of mentors and corporate partners. We believe in a learn-by-building approach, and we employ instructors who are uncompromisingly passionate about your growth and education.

Part of the Machine Learning Specialization

This workshop is part of the Machine Learning Specialization offered by Algoritma Data Science Academy. Participants are rewarded with a certificate of completion upon passing criteria, and are encouraged to advance further in the respective data science specialization.


Regression Models

An in-depth look at regression models

Ad-Hoc Course Registration:


  • Date: 4 – 7 January 2021
  • Time: 18.30 – 21.30
  • Venue: Menara Kadin Lantai 4, Jl. H. Rasuna Said, Jakarta Selatan
  • Investment: Rp. 5.200.000
  • Date: 4 – 7 January 2021
  • Time: 18.30 – 21.30
  • Investment: Rp. 2.600.000

REGISTER

Course details :


This course strives for a fine balance between business applications and mathematical rigor in its treatment to regression models, one of the most essential statistical techniques in the field of machine learning. Its aim is to equip you with the knowledge to investigate relationships between variables of a data effectively and rigorously.

We strongly recommend that you complete Practical Statistics prior to taking this course. Upon completion of this workshop, you will acquire a rigorous statistical understanding of machine learning models, allowing you to extrapolate the same ideas into other, more advanced machine learning models.


Schedule


  • OLS Regression

    Day 1

  • Linear Models in R

    Day 1

  • Interpreting Linear Models

    Day 2

  • Multiple Regression

    Day 3

  • Dive Deeper: Regression Models

    Day 3

  • Learn-by-Building

    Day 4

Course Producer


Samuel Chan

An  RStudio-certified instructor and machine learning practitioner in the field of marketing automation, fraud detection, finance and e-commerce.  Samuel is Indonesia’s top-ranked Stack Overflow user in R (top 5% worldwide) for three years running, and boasts certifications from RStudio, Microsoft, MongoDB, Neo4J Database, Stanford University, John Hopkins University, among others.

Prior to Algoritma, he has 8 years of working experience, including a stint as in-house consultant to several public-trading companies from his time staying in China, Japan and Singapore. He is today an active trainer and consultant for various companies in the financial industry. He has guest lectured in various campuses: Binus, NUS (National University of Singapore)’s The Logistics Institute, University of Indonesia, Universitas Gadjah Mada (UGM), Binus, Institute of Technology Bandung (ITB), Telkom University etc. Courses he authored are offered also in Singapore through Ngee Ann Polytechnic.

Samuel is also among the first recipients of Microsoft Professional Program Certificate in Data Science in Southeast Asia, having demonstrated proficiency in R, Python, Microsoft Azure, SQL / T-SQL, PowerBI and a list of other technologies, and among the first to be certified in RStudio’s program. Technical committee member and competition judge on Finhacks 2018, the largest Machine Learning competition of the year organized by PT. Bank Central Asia (BCA) and DailySocial.

4-Day Workshop Modules

Syllabus: Regression Models

Module 1: Regression Models I


OLS Regression

  • Understanding Least Squares
  • Simple Linear Regression

Linear Models in R

  • Understanding Coefficients
  • Plotting Regression
  • Model Construction

Interpreting Linear Models

  • Residuals Manually
  • Coefficients Manually
  • R-Squared Manually

Module 2: Regression Models II


Interpreting Linear Models

  • Estimates and Standard Errors
  • t-Value and p-Value
  • Adjusted R-Squared

Multiple Regression

  • Multicollinearity and VIF
  • Model Assumptions
  • Bias-Variance Trade-off
  • Outliers: Leverage and Influence
  • Model Limitation and Evaluation

Dive Deeper: Regression Models

  • Model Selection and Specification
  • Step-wise Regression
  • All-possible Regressions
  • Residual Plots
  • Model Diagnostics
  • Limitations of Regression Models

Academy Modules


Graded Quiz

Learning-by-Building Module (3 Points)

Recommendation on Lowering Crime Rates

  • Write a regression analysis report applying what you’ve learned in the workshop. Using the dataset provided by you, write your findings on the different socioeconomic variables most highly correlated to crime rates.Explain your recommendations where appropriate.

Ad-Hoc Course Registration:


  • Date: 4 – 7 January 2021
  • Time: 18.30 – 21.30
  • Venue: Menara Kadin Lantai 4, Jl. H. Rasuna Said, Jakarta Selatan
  • Investment: Rp. 5.200.000
  • Date: 4 – 7 January 2021
  • Time: 18.30 – 21.30
  • Investment: Rp. 2.600.000

REGISTER

Workshop Receivables:


  • Workshop Lecturer’s Notes

    Including 2x Course Books (PDF), HTML files, course transcripts (if any).

  • Highly-accelerated Learning

    Learn under the assistance of mentorship of our lead instructor and a band of qualified teaching assistants throughout the 4-day course.

  • Certification of Completion

    Show current and prospective employers that you’ve completed the course with a signed certificate of completion.

  • Quality Learning Environment

    We pay meticulous attention to the logistical details of our workshops: quality audio and visual setups, comfortable sitting arrangements, small group size. Dinners are included for evening workshops.

  • Supplement Materials

    Receive supplement datasets to practice on, reference notes, working files (R Notebook or Jupyter Notebook), and other materials that will help you master the topics.

This workshop is recommended for:

The Regression Models workshop is an intermediate-level programming workshop best suited to R programmers that are taking their first steps into data science and data visualization.

Students are assumed to have a working knowledge of R and have completed the necessary pre-requisites. Consider taking the pre-requisite course or a beginner-level course instead if you have no prior programming experience or statistics knowledge.


Past Workshops in this Series:



Students work through tons of real-life examples using sample datasets donated by our team of mentors and corporate partners. We believe in a learn-by-building approach, and we employ instructors who are uncompromisingly passionate about your growth and education.

Part of the Machine Learning Specialization

This workshop is part of the Machine Learning Specialization offered by Algoritma Data Science Academy. Participants are rewarded with a certificate of completion upon passing criteria, and are encouraged to advance further in the respective data science specialization.


Data Visualization in R

Create stunning graphics for your Data Science projects

Ad-Hoc Course Registration:


  • Date: 7 – 10 December 2020
  • Time: 18.30 – 21.30
  • Venue: Menara Kadin Lantai 4, Jl. H. Rasuna Said, Jakarta Selatan
  • Investment: Rp. 5.200.000
  • Date: 7 – 10 December 2020
  • Time: 18.30 – 21.30
  • Investment: Rp. 2.600.000

REGISTER

Course details :


A fun, hands-on, and project-based workshop that help students gain full proficiency in data visualization systems and tools. Create compelling narratives by combining charting elements with custom aesthetics under the guidance of our instructors.

The 4-day course follows our learn-by-building approach, in that students are tasked to reproduce a series of plots applying what they’ve learned. While it covers the three main plotting systems in R, its particular focus is on ggplot2 and the additional libraries centered around it that brings interactivity and enhanced aesthetic options to the art of creating rich, powerful visualizations.


Schedule


  • Base Plotting

    Day 1

  • Working with ggplot2

    Day 2

  • Enhancing ggplot2

    Day 3

  • Other Visualization Toolset

    Day 3

  • Learn-by-Building

    Day 4

Course Producer


Samuel Chan

An  RStudio-certified instructor and machine learning practitioner in the field of marketing automation, fraud detection, finance and e-commerce.  Samuel is Indonesia’s top-ranked Stack Overflow user in R (top 5% worldwide) for three years running, and boasts certifications from RStudio, Microsoft, MongoDB, Neo4J Database, Stanford University, John Hopkins University, among others.

Prior to Algoritma, he has 8 years of working experience, including a stint as in-house consultant to several public-trading companies from his time staying in China, Japan and Singapore. He is today an active trainer and consultant for various companies in the financial industry. He has guest lectured in various campuses: Binus, NUS (National University of Singapore)’s The Logistics Institute, University of Indonesia, Universitas Gadjah Mada (UGM), Binus, Institute of Technology Bandung (ITB), Telkom University etc. Courses he authored are offered also in Singapore through Ngee Ann Polytechnic.

Samuel is also among the first recipients of Microsoft Professional Program Certificate in Data Science in Southeast Asia, having demonstrated proficiency in R, Python, Microsoft Azure, SQL / T-SQL, PowerBI and a list of other technologies, and among the first to be certified in RStudio’s program. Technical committee member and competition judge on Finhacks 2018, the largest Machine Learning competition of the year organized by PT. Bank Central Asia (BCA) and DailySocial.

4-Day Workshop Modules

Syllabus: Data Visualization in R

Module 1: Plotting Essentials


Base Plotting I

  • Plots and Lines
  • Built-in Plot Types
  • Legends and Annotations
  • Other Built-in Plotting Functionalities

Base Plotting II

  • Histograms and Curves
  • Cleveland’s Dot Plot
  • Axis, Titles, Subtitles and Panel Styles
  • The Notorious Pie Chart

Working with ggplot2

  • Grammar of Graphics System
  • Mapping Aesthetics
  • Working with Geometries
  • Background Image

Enhancing ggplot2

  • Axis, Titles, and Scales
  • Adding Themes to Your Plots
  • Custom Aesthetics and Styles
  • Working with Legends

Module 2: Richer Visualization
Techniques


Enhancing ggplot2 II

  • Flipping Coordinates and Axis Rotation
  • Multi-dimensional Faceting
  • Text Layers and Label Layers
  • Expected Values

Enhancing ggplot2 III

  • Enriching: Scatterplots and Bubble Plots
  • Enriching: Jitterplots
  • Enriching: Boxplots and Violin Plots
  • Layer Transparency

Enhancing ggplot2 IV

  • Enriching: Column Plots
  • Enriching: Texts and Labels
  • Enriching: Horizontal and Vertical Lines
  • Fills and Colors

Enhancing ggplot2 IV

  • Discrete, Continuous, and Gradient Colors
  • Facet with Wraps and Grids
  • Visualizing Spatial Data
  • Working with Leaflet and Maps

Academy Modules


Pproject: Mining Trending Videos on YouTube

  • Hands-on Data Visualization
  • Identifying Temporal Patterns in Trending Videos
  • Combining Aesthetics and Geometries

Learning-by-Building Module (2 Points)

Creating a Publication-Grade Plot

  • Applying what you’ve learned, create an economics- or social-related plot that is polished with the appropriate annotations, aesthetics, and some simple commentary. You may use the same “YouTube Trending Videos” dataset or any other dataset for this practice.

Creating an Interactive Map

  • Applying what you’ve learned, create a web page with an interactive map embedded on it. Use a custom icon for the map markers to represent business locations, and show details about each location pin (“markers”) upon user’s interaction with it.

Ad-Hoc Course Registration:


  • Date: 7 – 10 December 2020
  • Time: 18.30 – 21.30
  • Venue: Menara Kadin Lantai 4, Jl. H. Rasuna Said, Jakarta Selatan
  • Investment: Rp. 5.200.000
  • Date: 7 – 10 December 2020
  • Time: 18.30 – 21.30
  • Investment: Rp. 2.600.000

REGISTER

Workshop Receivables:


  • Workshop Lecturer’s Notes

    Including 2x Course Books (PDF), HTML files, course transcripts (if any).

  • Highly-accelerated Learning

    Learn under the assistance of mentorship of our lead instructor and a band of qualified teaching assistants throughout the 4-day course.

  • Certification of Completion

    Show current and prospective employers that you’ve completed the course with a signed certificate of completion.

  • Quality Learning Environment

    We pay meticulous attention to the logistical details of our workshops: quality audio and visual setups, comfortable sitting arrangements, small group size. Dinners are included for evening workshops.

  • Supplement Materials

    Receive supplement datasets to practice on, reference notes, working files (R Notebook or Jupyter Notebook), and other materials that will help you master the topics.

This workshop is recommended for:

The Programming for Data Science workshop is designed for casual learners, working professionals and non-programmers that are taking their first steps into data science and machine learning.

Students are not assumed to have a working knowledge of R or prior proficiency in statistics / mathematics / algebra. At such the workshop follows a gentle learning curve and emphasize on hands-on, one-to-one tutoring from our team of instructors and teaching assistants.

Consider taking our Intermediate-level workshops instead for more advanced-level materials in statistical programming and machine learning.


Past Workshops in this Series:



Students work through tons of real-life examples using sample datasets donated by our team of mentors and corporate partners. We believe in a learn-by-building approach, and we employ instructors who are uncompromisingly passionate about your growth and education.

Part of the Data Visualization Specialization Track

This workshop is part of the Data Visualization specialization track offered by Algoritma Data Science Academy. Participants are rewarded with a certificate of completion upon passing criteria, and are encouraged to advance further in the respective data science specialization.


Practical Statistics

An in-depth statistics course from a data science perspective

Ad-Hoc Course Registration:


  • Time: 18.30 – 21.30
  • Venue: Menara Kadin Lantai 4, Jl. H. Rasuna Said, Jakarta Selatan
  • Time: 18.30 – 21.30
  • Venue: Google Classroom

Course details :


Pave the statistical foundation for more advanced machine learning theories later on in the specialization by picking up the key ideas in statistical thinking. Learn to interpret correlations, construct confidence intervals and other statistical principles that form the basis of many common machine learning models.

The 2-day course is optional for participation of the Data Visualization and Machine Learning Specialization and intended for learners without prior experience in statistics.


Course schedule:


  • 5-Number Summary

    Day 1

  • Central Tendency & Variability

    Day 1

  • Standard Score and z-Score

    Day 1

  • Probabilities

    Day 2

  • Intervals

    Day 2

  • Inferential Statistics in Practice

    Day 2

Course Producer


Samuel Chan

An  RStudio-certified instructor and machine learning practitioner in the field of marketing automation, fraud detection, finance and e-commerce.  Samuel is Indonesia’s top-ranked Stack Overflow user in R (top 5% worldwide) for three years running, and boasts certifications from RStudio, Microsoft, MongoDB, Neo4J Database, Stanford University, John Hopkins University, among others.

Prior to Algoritma, he has 8 years of working experience, including a stint as in-house consultant to several public-trading companies from his time staying in China, Japan and Singapore. He is today an active trainer and consultant for various companies in the financial industry. He has guest lectured in various campuses: Binus, NUS (National University of Singapore)’s The Logistics Institute, University of Indonesia, Universitas Gadjah Mada (UGM), Binus, Institute of Technology Bandung (ITB), Telkom University etc. Courses he authored are offered also in Singapore through Ngee Ann Polytechnic.

Samuel is also among the first recipients of Microsoft Professional Program Certificate in Data Science in Southeast Asia, having demonstrated proficiency in R, Python, Microsoft Azure, SQL / T-SQL, PowerBI and a list of other technologies, and among the first to be certified in RStudio’s program. Technical committee member and competition judge on Finhacks 2018, the largest Machine Learning competition of the year organized by PT. Bank Central Asia (BCA) and DailySocial.

2-Day Workshop Modules

Syllabus: Practical Statistics

Module 1: Descriptive Statistics


5-Number Summary

  • Mean, Median, and Mode
  • Measures of Central Tendency
  • Quantiles in R

Central Tendency & Variability

  • Visualizing Central Tendency
  • Variance, and Covariance

Standard Score and z-Score

  • Standard Normal Curve
  • Central Limit Theorem
  • z-Score Calculation & Student’s T-test

Module 2: Inferential Statistics


Probabilities

  • Probability Mass Function
  • Probability Density Function
  • Expected Values
  • p-Values

Intervals

  • Confidence Intervals
  • Prediction Intervals

Inferential Statistics in Practice

  • Hypothesis Testing
  • Deriving Scientific Truths from Data
  • Case Study

Academy Modules


Tips & Techniques: R for Statisticians

  • Density Plots
  • Interpreting Box Plots (Box-and-Whisker)
  • Better Summary Statistics with skimr()

Learning-by-Building Module (Not Graded)

Statistical Treatment of Retail Dataset

  • Using what you’ve learned, formulate a question and derive a statistical hypothesis test to answer the question. You have to demonstrate that you’re able to make decisions using data in a scientific manner.
    Examples of questions can be:
  • Is there a difference in profitability between standard shipment and same-day shipment?
  • Supposed there is no difference in profitability between the different product segment, what is the probability that we obtain the current observation due to pure chance alone?

Workshop Receivables:


  • Workshop Lecturer’s Notes

    Including 2x Course Books (PDF), HTML files, course transcripts (if any).

  • Highly-accelerated Learning

    Learn under the assistance of mentorship of our lead instructor and a band of qualified teaching assistants throughout the 2-day course.

  • Certification of Completion

    Show current and prospective employers that you’ve completed the course with a signed certificate of completion.

  • Quality Learning Environment

    We pay meticulous attention to the logistical details of our workshops: quality audio and visual setups, comfortable sitting arrangements, small group size. Dinners are included for evening workshops.

  • Supplement Materials

    Receive supplement datasets to practice on, reference notes, working files (R Notebook or Jupyter Notebook), and other materials that will help you master the topics.

This workshop is recommended for:

The Programming for Data Science workshop is designed for casual learners, working professionals and non-programmers that are taking their first steps into data science and machine learning.

Students are not assumed to have a working knowledge of R or prior proficiency in statistics / mathematics / algebra. At such the workshop follows a gentle learning curve and emphasize on hands-on, one-to-one tutoring from our team of instructors and teaching assistants.

Consider taking our Intermediate-level workshops instead for more advanced-level materials in statistical programming and machine learning.


Past Workshops in this Series:



Students work through tons of real-life examples using sample datasets donated by our team of mentors and corporate partners. We believe in a learn-by-building approach, and we employ instructors who are uncompromisingly passionate about your growth and education.

Part of the Data Visualization and Machine Learning Specialization Track

This workshop is part of the two specialization tracks offered by Algoritma Data Science Academy. Participants are rewarded with a certificate of completion upon passing criteria, and are encouraged to advance further in the respective data science specialization.


Programming for Data Science

R programming for the modern-day data scientist

Programming for Data Science Badge

Ad-Hoc Course Registration:


  • Time: 18.30 – 21.30
  • Venue: Menara Kadin Lantai 4, Jl. H. Rasuna Said, Jakarta Selatan
  • Time: 18.30 – 21.30
  • Venue: Google Classroom

Course details :


Programming for Data Science is a course that covers the important programming paradigms and tools used by data analysts and data scientists today. You will be guided through a series of coding exercises designed to maximize your familiarity with data science programming in RStudio, an integrated development environment for the statistical computing language R.

Upon completion of this workshop, you will be familiar with the programming language, popular tools, libraries (data science packages) and toolkits required to excel in your data analysis and statistical computing projects.


Schedule


  • Data Science in R

    Day 1

  • Working with Data

    Day 1

  • Data Manipulation

    Day 2

  • Practical Data Cleansing

    Day 2

  • R in Practice

    Day 3

Course Producer


Samuel Chan

An  RStudio-certified instructor and machine learning practitioner in the field of marketing automation, fraud detection, finance and e-commerce.  Samuel is Indonesia’s top-ranked Stack Overflow user in R (top 5% worldwide) for three years running, and boasts certifications from RStudio, Microsoft, MongoDB, Neo4J Database, Stanford University, John Hopkins University, among others.

Prior to Algoritma, he has 8 years of working experience, including a stint as in-house consultant to several public-trading companies from his time staying in China, Japan and Singapore. He is today an active trainer and consultant for various companies in the financial industry. He has guest lectured in various campuses: Binus, NUS (National University of Singapore)’s The Logistics Institute, University of Indonesia, Universitas Gadjah Mada (UGM), Binus, Institute of Technology Bandung (ITB), Telkom University etc. Courses he authored are offered also in Singapore through Ngee Ann Polytechnic.

Samuel is also among the first recipients of Microsoft Professional Program Certificate in Data Science in Southeast Asia, having demonstrated proficiency in R, Python, Microsoft Azure, SQL / T-SQL, PowerBI and a list of other technologies, and among the first to be certified in RStudio’s program. Technical committee member and competition judge on Finhacks 2018, the largest Machine Learning competition of the year organized by PT. Bank Central Asia (BCA) and DailySocial.

3-Day Workshop Modules

Module 1: Data Science in R


Data Science in R

  • R Programming Basics
  • Why Learn R?
  • R Studio Interface
  • Data Structures in R

Working with Data

  • Reading & Extracting Data
  • Understanding Statistics
  • Exploratory Data Analytics

Data Manipulation

  • Working with Your Global Environment
  • Getting Familiar with Your Workspace
  • Continuous and Categorical Data

Module 2: Data Manipulation


Data Manipulation II

  • Vector Types and Classes
  • List and Objects
  • Matrix and Data Frames

Practical Data Cleansing

  • The Data Transformation Process
  • Reproducible Data Science Projects
  • Reading and Writing from Your IDE

R in Practice

  • Programming Exercise: e-Commerce Retail Datasets
  • In-depth Review of Data Frame Subsetting
  • Sampling and Randomization
  • Cross-Tabulations
  • Aggregations

Academy Modules


Graded Quiz

Working with R

  • R Scripts and Functions
  • R Markdown
  • Why Care About Reproducibility

Learning-by-Building Module (2 Points)

Writing your code as R scripts make up for automation and integration with other tools and services, while writing a R Markdown presents your findings and recommendations in a way that is friendly to non-technical / managerial team members.

  • R Script to clean & transform the data

Write a R script containing a function (name the function however way you want) that reads a dataset as input, perform the necessary transformation and export a cross-tabulation numeric result or plot as output.

  • Reproducible Data Science

Create an R Markdown file that combines your step-by-step data transformation code with some explanatory text. Add formatting styles and hierarchical structure using Markdown.

Workshop Receivables:


  • Workshop Lecturer’s Notes

    Including 2x Course Books (PDF), HTML files, course transcripts (if any).

  • Highly-accelerated Learning

    Learn under the assistance of mentorship of our lead instructor and a band of qualified teaching assistants throughout the 3 day course.

  • Certification of Completion

    Show current and prospective employers that you’ve completed the course with a signed certificate of completion.

  • Quality Learning Environment

    We pay meticulous attention to the logistical details of our workshops: quality audio and visual setups, comfortable sitting arrangements, small group size. Dinners are included for evening workshops.

  • Supplement Materials

    Receive supplement datasets to practice on, reference notes, working files (R Notebook or Jupyter Notebook), and other materials that will help you master the topics.

THIS WORKSHOP IS RECOMMENDED FOR:


The Programming for Data Science workshop is designed for casual learners, working professionals and non-programmers that are taking their first steps into data science and machine learning.

Students are not assumed to have a working knowledge of R or prior proficiency in statistics / mathematics / algebra. At such the workshop follows a gentle learning curve and emphasize on hands-on, one-to-one tutoring from our team of instructors and teaching assistants.

Consider taking our Intermediate-level workshops instead for more advanced-level materials in statistical programming and machine learning.


Past Workshops in this Series:



Students work through tons of real-life examples using sample datasets donated by our team of mentors and corporate partners. We believe in a learn-by-building approach, and we employ instructors who are uncompromisingly passionate about your growth and education.

Data Science Specialization Badges

Part of the Data Visualization and Machine Learning Specialization Track

This workshop is part of the two specialization tracks offered by Algoritma Data Science Academy. Participants are rewarded with a certificate of completion upon passing criteria, and are encouraged to advance further in the respective data science specialization.