Computational Journalism
Video Lectures
Displaying all 8 video lectures.
Lecture 1![]() Play Video |
Basics of Computational Journalism: Feature Vectors, Clustering, Projections We’ll try to define computational journalism, as the application of computer science to four different areas: data-driven reporting, story presentation, information filtering, and effect tracking. But first we have to figure out how to represent the outside world as data. We do this using the feature vector representation. One of the most useful things we can do with such vectors is compute the distances between two of them. We can also visualize the entire vector space, but to do this we have to project the high-dimensional space down to the two dimensions of the screen. Topics: The definition of computational journalism, encoding the world as feature vectors, distance metrics, clustering algorithms, and visualization using multi-dimensional scaling. Course blog at http://jmsc.hku.hk/courses/jmsc6041spring2013/ Instructor: Jonathan Stray |
Lecture 2![]() Play Video |
Text Analysis: Tokenization, TF-IDF, Topic Modeling Can we use machines to help us understand text? In this class we will cover basic text analysis techniques, from word counting to topic modeling. The algorithms we will discuss this class are used in just about everything: search engines, document set visualization, figuring out when two different articles are about the same story, finding trending topics. The vector space document model is fundamental to algorithmic handling of news content, and we will need it to understand how just about every filtering and personalization system works. Topics: Telling stories from quantitative analysis of language, word frequencies, the bag-of-words document vector model, cosine distance, TF-IDF, and a demonstration of the Overview document set mining tool. Instructor: Jonathan Stray course blog at http://jmsc.hku.hk/courses/jmsc6041spring2013/ |
Lecture 3![]() Play Video |
Algorithmic Filters: Information Overload This class we begin our study of filtering with some basic ideas about its role in journalism. There’s just way too much information produced every day, more than any one person can read by a factor of millions. We need software to help us deal with this flood. In this lecture, we discuss purely algorithmic approaches to filtering, with a look at how the Newsblaster system works (similar to Google News.) Topics: How bad information overload actually is. The Newsblaster system, a precursor to Google News. Clustering together stories on the same event. Sorting stories into topics. Personalization. The filter bubble, and the filter design problem. Instructor: Jonathan Stray course blog at http://jmsc.hku.hk/courses/jmsc6041spring2013/ |
Lecture 4![]() Play Video |
Social and Hybrid Filters: Collaborative Filtering It’s possible to build powerful filtering systems by combining software and people, incorporating both algorithmic content analysis and human actions such as follow, share, and like. We’ll look recommendation systems, the Facebook news feed, and the socially-driven algorithms behind them. We’ll finish by looking at an example of using human preferences to drive machine learning algorithms: Google Web search. Topics: Social filtering. The network structure of Twitter. Social software. Comment ranking on Reddit. Confidence sorting. User-item recommendation and collaborative filtering. Hybrid filters. What makes a good filter? Instructor: Jonathan Stray course blog at http://jmsc.hku.hk/courses/jmsc6041spring2013/ |
Lecture 5![]() Play Video |
Social Network Analysis: Centrality Algorithms Network analysis (aka social network analysis, link analysis) is a promising and popular technique for uncovering relationships between diverse individuals and organizations. It is widely used in intelligence and law enforcement, but not so much in journalism. We’ll look at basic techniques and algorithms and try to understand the promise — and the many practical problems. Topics: What's a social network? Link analysis. Homophily and structural determinants of behavior. Centrality measurements. Community detection and the modularity algorithm. K-core decomposition. SNA in journalism. SNA that could be in journalism. Instructor: Jonathan Stray course blog at http://jmsc.hku.hk/courses/jmsc6041spring2013/ |
Lecture 6![]() Play Video |
Knowledge Representation: Structured data & Linked open data Is journalism in the text/video/audio business, or is it in the knowledge business? This class we’ll look at this question in detail, which gets us deep into the issue of how knowledge is represented in a computer. The traditional relational database model is often inappropriate for journalistic work, so we’re going to concentrate on so-called “linked data” representations. Such representations are widely used and increasingly popular. For example Google recently released the Knowledge Graph. But generating this kind of data from unstructured text is still very tricky, as we’ll see when we look at the Reverb algorithm. Topics: Structured and unstructured data. Article metadata and schema.org. Linked open data and RDF. Entity extraction. Propositional representation of knowledge. The Reverb algorithm. DeepQA. Automatic story writing from data. Instructor: Jonathan Stray Course blog: http://jmsc.hku.hk/courses/jmsc6041spring2013/ |
Lecture 7![]() Play Video |
Drawing Conclusions from Data What does randomness look like? Variation from rolling dice. Base rate fallacy. Conditional probability. Bayes' theorem. Cognitive biases. Method of competing hypotheses. Probabilistic scoring of hypotheses. Correlation and causation. Finding alternate hypotheses for the NYPD stop and frisk data. Instructor: Jonathan Stray Course blog: http://jmsc.hku.hk/courses/jmsc6041spring2013/ |
Lecture 8![]() Play Video |
Security, Surveillance, and Privacy Who is watching our online activities? How do you protect a source in the 21st Century? Who gets to access to all of this mass intelligence, and what does the ability to survey everything all the time mean both practically and ethically for journalism? In this lecture we will talk about who is watching and how, and how to create a security plan using threat modeling. Topics: How is email transmitted? Who has access to your emails. Mass surveillance and its legal status. How cryptography works. Encryption versus authentication. Man-in-the-middle attacks. Secure communications using OTR. Case study: the leaked Wikileaks cables. Threat modeling. Security planning. Instructor: Jonathan Stray Course blog: http://jmsc.hku.hk/courses/jmsc6041spring2013/ |