Text Analysis: Tokenization, TF-IDF, Topic Modeling 
Text Analysis: Tokenization, TF-IDF, Topic Modeling
by HKU / Jonathan Stray
Video Lecture 2 of 8
Not yet rated
Views: 1,816
Date Added: March 14, 2015

Lecture Description

Can we use machines to help us understand text? In this class we will cover basic text analysis techniques, from word counting to topic modeling. The algorithms we will discuss this class are used in just about everything: search engines, document set visualization, figuring out when two different articles are about the same story, finding trending topics. The vector space document model is fundamental to algorithmic handling of news content, and we will need it to understand how just about every filtering and personalization system works.

Topics: Telling stories from quantitative analysis of language, word frequencies, the bag-of-words document vector model, cosine distance, TF-IDF, and a demonstration of the Overview document set mining tool.

Instructor: Jonathan Stray
course blog at jmsc.hku.hk/courses/jmsc6041spring2013/

Course Index

Course Description

Computational Journalism is a course given at JMSC during the Spring 2013 semester. It covers, in great detail, some of the most advanced techniques used by journalists to understand digital information, and communicate it to users. We will focus on unstructured text information in large quantities, and also cover related topics such as how to draw conclusions from data without fooling yourself, social network analysis, and online security for journalists. These are the algorithms used by search engines and intelligence agencies and everyone in between.

Comments

There are no comments. Be the first to post one.
  Post comment as a guest user.
Click to login or register:
Your name:
Your email:
(will not appear)
Your comment:
(max. 1000 characters)
Are you human? (Sorry)