Zoom recording, course Books (PDF & HTML files), the dataset for practice, reference notes, and working files are accessible through our Learning Management System account.
The vast amount of information on the internet enables you to get external data to support your next business decision. One of the methods you can use to mine data from the web is Web Scraping. This method allows you to save plenty of time so that you can focus on what matters most in your data science workflow.
This 4-day online workshop is a beginner-friendly introduction to web scraping using rvest and Rselenium library in R. Throughout the online course, we will provide participants with hands-on examples and a rich interactive experience. One Instructor and two Teaching Assistants will help participants troubleshoot or help with any difficulties encountered.
NOTE: This workshop will be delivered in Bahasa Indonesia.
Upon completion of this workshop, you will be able to:
- Work with the R language and open source packages for data cleansing and manipulation process.
- Understand the basic idea of web scraping and its legality.
- Build a script to parse HTML code and access desired information using Rselenium and rvest.
- Tidying and exporting scraped data to the desired data type.
- Description of course materials, timeline, and objectives of the workshop
* Workshop objective and output.
- A brief explanation of web scraping and its possibility
* How web scraping works.
* Interactive web scraping.
- Description of the workflow, tools, and setup for the course
* Web scraping workflow.
* Browser installation (Google Chrome is recommended).
* CSS selector installation.
- Rselenium Installation
* Installing Rselenium packages and web driver.
* Making sure the web driver is running and controlled by R.
R PROGRAMMING BASICS
- Introduction to R Programming Language
- Working with RStudio Environment
- Inspecting data structure
* Data structure in R.
* Data manipulation using dplyr.
* Intro to stringr for text manipulation.
INTRODUCTION TO RVEST AND RSELENIUM FOR WEB SCRAPING
- The legality of web scraping
*Website terms and conditions.
*Robots.txt as regulation of the website about web crawling.
- How web scraping work in general
* Intro to HTML and CSS.
* Web scraping workflow.
- Scraping data from non-java-scripted website using rvest
* Hands-on web scraping using rvest.
* Using CSS selector.
* Build looping code for multiple pages.
- Scraping data from java-scripted website and building browser bot using Rselenium
* Hands-on web scraping using Rselenium.
* The difference between Rselenium and rvest ability.
* The interactive of Rselenium.
* Build looping code for multiple pages and input.
DATA WRANGLING AND EXPLORATORY DATA ANALYSIS
- Tidying scraped data into R object
Wrangling scraped data.
- Exporting scraped data into various type
Export R object.
- Exploratory data analysis and further improvement
*Simple analysis using scraped data.
*Optional: Example of a project using web scraping capabilities to deliver insightful knowledge.
This testimonial video is taken after our previous Online Data Science Series: Time Series Analysis for Business Forecasting.
LEARN FROM ANYWHERE
Our learning format is online-interactive, you will feel the interactive experience as if you were present in a physical classroom. You can access the class using your Zoom account on pre-defined dates.
FOR ABSOLUTE BEGINNERS
Workshops in this series are tailored to casual programmers and non-programmers that are taking their first steps into data science. It assumes no prior knowledge or academic background. The workshop has a gentle learning slope that is designed with non-technical professionals and academics in mind.
If I don’t have any IT or programming skills, can I still attend this workshop?
Yes, you can still attend the workshop as it is a beginner-friendly workshop.
How to join the interactive-online learning class after I’ve done the payment & registration?
Our system will send you an email containing a link and details to join a Google Classroom.
What platform will be utilized for this online-interactive learning workshop?
Online learning will be conducted via Zoom.us, Link to join the Zoom Class will be announced via Google Classroom.
How will the participants receive the learning materials?
Learning materials can be obtain via Google Classroom
Would I receive a certificate after participating in the Workshop?
Yes, you will receive a certificate of completion.
JOE NATHAN CRISTIAN
Data Science Instructor at Algoritma Data Science School, Joe dedicates himself to using data science knowledge in social-computing areas like social network analysis, online consumer behavior, human personality, NLP, and tourism movement. Some of his publications are available online:
- Analytics Vidhya Article: Social Network Analysis in R part 1: Ego Network.
- Analytics Vidhya Article: Lyric Mood Identifier.
- Analytics Vidhya Hackathon: Time series forecasting.
Joe is a passionate Instructor with expertise in R programming languages. He has involved in numerous mentoring, projects, and consultative data science training for our clients, to name a few:
- Badan Pemeriksa Keuangan Republik Indonesia.
- Bank Permata.
- Bank Rakyat Indonesia, BRI Data Hackathon 2021.