Lecture Description
Is journalism in the text/video/audio business, or is it in the knowledge business? This class we’ll look at this question in detail, which gets us deep into the issue of how knowledge is represented in a computer. The traditional relational database model is often inappropriate for journalistic work, so we’re going to concentrate on so-called “linked data” representations. Such representations are widely used and increasingly popular. For example Google recently released the Knowledge Graph. But generating this kind of data from unstructured text is still very tricky, as we’ll see when we look at the Reverb algorithm.
Topics: Structured and unstructured data. Article metadata and schema.org. Linked open data and RDF. Entity extraction. Propositional representation of knowledge. The Reverb algorithm. DeepQA. Automatic story writing from data.
Instructor: Jonathan Stray
Course blog: jmsc.hku.hk/courses/jmsc6041spring2013/
Course Index
- Basics of Computational Journalism: Feature Vectors, Clustering, Projections
- Text Analysis: Tokenization, TF-IDF, Topic Modeling
- Algorithmic Filters: Information Overload
- Social and Hybrid Filters: Collaborative Filtering
- Social Network Analysis: Centrality Algorithms
- Knowledge Representation: Structured data & Linked open data
- Drawing Conclusions from Data
- Security, Surveillance, and Privacy
Course Description
Computational Journalism is a course given at JMSC during the Spring 2013 semester. It covers, in great detail, some of the most advanced techniques used by journalists to understand digital information, and communicate it to users. We will focus on unstructured text information in large quantities, and also cover related topics such as how to draw conclusions from data without fooling yourself, social network analysis, and online security for journalists. These are the algorithms used by search engines and intelligence agencies and everyone in between.