fbpx

STATIC AND DYNAMIC WEB SCRAPING USING R

  • Schedule

    4-7 May 2021

    14.00 – 17.00 (WIB)

  • Online-Interactive Learning

    Via Zoom

  • Investment

    Rp. 4.400.000

CLASS STARTS IN

0Days0Hours0Minutes0Seconds

Course Summary

The vast amount of information on the internet enables you to get external data to support your next business decision. One of the methods you can use to mine data from the web is Web Scraping. This method allows you to save plenty of time so that you can focus on what matters most in your data science workflow.

This 4-day online workshop is a beginner-friendly introduction to web scraping using rvest and Rselenium library in R. Throughout the online course, we will provide participants with hands-on examples and a rich interactive experience. One Instructor and two Teaching Assistants will help participants troubleshoot or help with any difficulties encountered.

NOTE: This workshop will be delivered in Bahasa Indonesia.

LEARNING OUTCOMES

Upon completion of this workshop, you will be able to:

  • Work with the R language and open source packages for data cleansing and manipulation process.
  • Understand the basic idea of web scraping and its legality.
  • Build a script to parse HTML code and access desired information using Rselenium and rvest.
  • Tidying and exporting scraped data to the desired data type.

Syllabus

  • Description of course materials, timeline, and objectives of the workshop
    * Workshop objective and output.
    * Timeline.
  • A brief explanation of web scraping and its possibility
    * How web scraping works.
    * Interactive web scraping.
  • Description of the workflow, tools, and setup for the course
    * Web scraping workflow.
    * Browser installation (Google Chrome is recommended).
    * CSS selector installation.
  • Rselenium Installation
    * Installing Rselenium packages and web driver.
    * Making sure the web driver is running and controlled by R.
  • Introduction to R Programming Language
  • Working with RStudio Environment
  • Inspecting data structure
    * Data structure in R.
    * Data manipulation using dplyr.
    * Intro to stringr for text manipulation.
  • The legality of web scraping
    *Website terms and conditions.
    *Robots.txt as regulation of the website about web crawling.
  • How web scraping work in general
    * Intro to HTML and CSS.
    * Web scraping workflow.
  • Scraping data from non-java-scripted website using rvest
    * Hands-on web scraping using rvest.
    * Using CSS selector.
    * Build looping code for multiple pages.
  • Scraping data from java-scripted website and building browser bot using Rselenium
    * Hands-on web scraping using Rselenium.
    * The difference between Rselenium and rvest ability.
    * The interactive of Rselenium.
    * Build looping code for multiple pages and input.
  • Tidying scraped data into R object
    Wrangling scraped data.
  • Exporting scraped data into various type
    Export R object.
  • Exploratory data analysis and further improvement
    *Simple analysis using scraped data.
    *Optional: Example of a project using web scraping capabilities to deliver insightful knowledge.

STUDENT TESTIMONIALS

This testimonial video is taken after our previous Online Data Science Series: Time Series Analysis for Business Forecasting.

LEARN FROM ANYWHERE

Our learning format is online-interactive, you will feel the interactive experience as if you were present in a physical classroom. You can access the class using your Zoom account on pre-defined dates.

  • LEARN AT YOUR OWN PACE

    Zoom recording, course Books (PDF & HTML files), the dataset for practice, reference notes, and working files are accessible through our Learning Management System account.

  • PROOF YOUR MASTERY

    Show current and prospective employers of your mastery with a signed certificate of completion.

  • CONNECT WITH LIKE MINDED PEOPLE

    Be a part of our data-passionate community with 5000+ members and 1000+ alumni.

FOR ABSOLUTE BEGINNERS

Workshops in this series are tailored to casual programmers and non-programmers that are taking their first steps into data science. It assumes no prior knowledge or academic background. The workshop has a gentle learning slope that is designed with non-technical professionals and academics in mind.

Yes, you can still attend the workshop as it is a beginner-friendly workshop.

Our system will send you an email containing a link and details to join a Google Classroom.

Online learning will be conducted via Zoom.us, Link to join the Zoom Class will be announced via Google Classroom.

Learning materials can be obtain via Google Classroom

Yes, you will receive a certificate of completion.

YOUR INSTRUCTOR

Web Scraping Using R

JOE NATHAN CRISTIAN

Data Science Instructor at Algoritma Data Science School, Joe dedicates himself to using data science knowledge in social-computing areas like social network analysis, online consumer behavior, human personality, NLP, and tourism movement. Some of his publications are available online:

Joe is a passionate Instructor with expertise in R programming languages. He has involved in numerous mentoring, projects, and consultative data science training for our clients, to name a few:

  • Badan Pemeriksa Keuangan Republik Indonesia.
  • Bank Permata.
  • Bank Rakyat Indonesia, BRI Data Hackathon 2021.