fbpx

Data Science Series: Text Mining for Beginners

Building an NLP-Powered Classification Algorithm

2

Days

Course details :

This 2-day workshop is a beginner-friendly introduction to data science and AI.

If you have ever wondered how spam classification algorithms work and how Gmail’s automatically label incoming emails,  dive in with us in this hands-on machine learning workshop led by Algoritma’s acclaimed team of instructors. The workshop combines highly effective natural language processing (NLP) theories, hands-on coding and programming sessions to help students understand — and implement — some of the most widely used text mining techniques.

By building a text classifier algorithm, students will learn the fundamentals of Indonesian natural language processing. They will also be guided through the right approach on fine-tuning algorithms in order to create their own text classifier modeled after real-life business cases.

* The 2-day workshop is taught in Bahasa Indonesia

“Mathematics possesses not only truth, but supreme beauty. ”
~ Bertrand Russell

Please bring along:

  • 1x Laptop
  • Purchased ticket (from organizer’s website)

Partners:

Schedule

  • Know-your-neighbors

    Registration and Networking

    Day 1 — 17:30 – 18:00

  • R Programming Language

    Utilizing R Studio for daily-use data analytics

    Day 1 — 18:00 – 19:00

  • NLP for Business

    Creating added value using NLP technologies in different industries

    Day 1 — 19:00 – 19:30

  • Preprocessing Text Data

    Work with corpus, tokenization, and stemming

    Day 1 — 19:30 – 21:00

  • Exploratory Text Mining

    Understanding document term matrix and getting insights of text data

    Day 2 — 18:30 – 19:00

  • Naïve Bayes Classifier

    Intuitive explanation of Naïve Bayes machine learning techniques

    Day 2 — 19:30 – 20:00

  • Building Text Classification

    Step by step hands-on coding session creating news article classifier

    Day 2 — 20:00 – 21:00

Event Ended
Explore other data science workshops

Trainer

Tiara Dwiputri

tiara@algorit.ma

Detailed Syllabus

Syllabus: Data Science Series Text Mining for Beginners

Preface

  • Description of course materials, timeline, and objectives of the workshop
  • A comprehensive point of view of the role of data science
  • Brief explanation of NLP, text mining, and machine learning
  • Description of the workflow, tools, and setup for the course

R Programming Basics

  • Introduction to R Programming language
  • Working with R Studio Environment
  • Using R Markdown for reproducible research
  • Inspecting data structure using built-in functions
  • Tips on using R Studio for daily data analytics

NLP for Business

  • Introduction to text mining and machine learning
  • Natural language processing in Bahasa Indonesia
  • Examples of utilizing NLP in different industries
  • Direction of development and constraints of country-specific NLP

Preprocessing Text Data

  • Peek in a text corpus, a large and structured set of texts
  • Creating your own text corpus from a text data
  • Preparing your text data: data cleansing and manipulation
  • Understanding stemming and lemmatization
  • Word-tokenizing to identify word’s meaning
  • Bahasa Indonesia stemming using Nazief and Andriani’s algorithm
  • Codes example of tokenizing words from a text corpus

Exploratory Text Mining

  • Understanding document term matrix and sparse matrix
  • Quantifying our text data using document term matrix
  • Finding the most used terms in a text data
  • Plotting term frequencies using R’s plotting ggplot library
  • Generate a word cloud to visualize representation of text data

Building Text Classifier

  • Concepts of supervised machine learning
  • Bayes theorem on probability of an event
  • Naïve Bayes algorithm used for classification
  • Code example of web scraping for news article online
  • Step by step hands-on processing raw data into a document term matrix
  • Building a news article classifier using Naïve Bayes algorithm
  • Tuning precision, recall, and accuracy for model optimization

This workshop will cost 2 workshop credits for subscribers. Non-subscribers are welcomed to participate at a cost of IDR 2,000,000.

Workshop Receivables:

  • Workshop Lecturer’s Notes

    Including 2x Course Books (PDF), HTML files, course transcripts (if any).

  • Highly-accelerated Learning

    Learn under the assistance of mentorship of our lead instructor and a band of qualified teaching assistants throughout the 3 day course.

  • Certification of Completion

    Show current and prospective employers that you’ve completed the course with a signed certificate of completion.

  • Quality Learning Environment

    We pay meticulous attention to the logistical details of our workshops: quality audio and visual setups, comfortable sitting arrangements, small group size. Dinners are included for evening workshops.

  • Supplement Materials

    Receive supplement datasets to practice on, reference notes, working files (R Notebook or Jupyter Notebook), and other materials that will help you master the topics.

Data Science Series

Workshops in our Data Science Series are tailored to casual learners, working professionals and non-programmers that are taking their first steps into data science and machine learning.

Students are not assumed to have a working knowledge of R or prior proficiency in statistics / mathematics / algebra. At such the workshop follows a gentle learning curve and emphasize on hands-on, one-to-one tutoring from our team of instructors and teaching assistants.

Consider taking our Data Science Intermediate workshops instead for more advanced-level materials in statistical programming and machine learning.

Past Workshops in this Series:

Students work through tons of real-life examples using sample datasets donated by our team of mentors and corporate partners. We believe in a learn-by-building approach, and we employ instructors who are uncompromisingly passionate about your growth and education.