Lecture Description
Can we use machines to help us understand text? In this class we will cover basic text analysis techniques, from word counting to topic modeling. The algorithms we will discuss this class are used in just about everything: search engines, document set visualization, figuring out when two different articles are about the same story, finding trending topics. The vector space document model is fundamental to algorithmic handling of news content, and we will need it to understand how just about every filtering and personalization system works.
Topics: Telling stories from quantitative analysis of language, word frequencies, the bag-of-words document vector model, cosine distance, TF-IDF, and a demonstration of the Overview document set mining tool.
Instructor: Jonathan Stray
course blog at jmsc.hku.hk/courses/jmsc6041spring2013/
Course Index
- Basics of Computational Journalism: Feature Vectors, Clustering, Projections
- Text Analysis: Tokenization, TF-IDF, Topic Modeling
- Algorithmic Filters: Information Overload
- Social and Hybrid Filters: Collaborative Filtering
- Social Network Analysis: Centrality Algorithms
- Knowledge Representation: Structured data & Linked open data
- Drawing Conclusions from Data
- Security, Surveillance, and Privacy
Course Description
Computational Journalism is a course given at JMSC during the Spring 2013 semester. It covers, in great detail, some of the most advanced techniques used by journalists to understand digital information, and communicate it to users. We will focus on unstructured text information in large quantities, and also cover related topics such as how to draw conclusions from data without fooling yourself, social network analysis, and online security for journalists. These are the algorithms used by search engines and intelligence agencies and everyone in between.