Lecture Description
This class we begin our study of filtering with some basic ideas about its role in journalism. There’s just way too much information produced every day, more than any one person can read by a factor of millions. We need software to help us deal with this flood. In this lecture, we discuss purely algorithmic approaches to filtering, with a look at how the Newsblaster system works (similar to Google News.)
Topics: How bad information overload actually is. The Newsblaster system, a precursor to Google News. Clustering together stories on the same event. Sorting stories into topics. Personalization. The filter bubble, and the filter design problem.
Instructor: Jonathan Stray
course blog at jmsc.hku.hk/courses/jmsc6041spring2013/
Course Index
- Basics of Computational Journalism: Feature Vectors, Clustering, Projections
- Text Analysis: Tokenization, TF-IDF, Topic Modeling
- Algorithmic Filters: Information Overload
- Social and Hybrid Filters: Collaborative Filtering
- Social Network Analysis: Centrality Algorithms
- Knowledge Representation: Structured data & Linked open data
- Drawing Conclusions from Data
- Security, Surveillance, and Privacy
Course Description
Computational Journalism is a course given at JMSC during the Spring 2013 semester. It covers, in great detail, some of the most advanced techniques used by journalists to understand digital information, and communicate it to users. We will focus on unstructured text information in large quantities, and also cover related topics such as how to draw conclusions from data without fooling yourself, social network analysis, and online security for journalists. These are the algorithms used by search engines and intelligence agencies and everyone in between.