Class 1: Intro

August 27, 2019

Slides

Welcome to the exciting world of data journalism!

What is data journalism?

Journalism that involves data. Finding data, extracting stories from data, presenting visualizations backed by data.

Ok, so what exactly is data?

In its essence, data is information about the world. It can take the shape of the number in attendance at political inaugurations, the results from polls or surveys, networks of connections extracted from government documents, and just about anything else you can think of. Data becomes interesting at large scales, when you have a lot of data points and are able to find trends and outliers.

Why now?

Data journalism is a field that’s just starting to get big now. We are producing more information than ever, and computers are getting faster and faster. We are at a time where there is an unprecedented scale of being able to automate tasks, crunch numbers, and pull data from different sources.

When you combine a journalistic sense of getting to the bottom of a story with the ability to comb through tons of data computationally, you can bring about stories that have never been possible before.

The naming of the field

Comic making fun of all the names for data journalism

Computer-assisted reporting

The field of data journalism goes by a number of names. The earliest name was probably “computer-assisted reporting,” referring to using computing to aid in the reporting process. The practice of combining journalists and computers to tell stories goes back to at least the 1960s, when data guru Philip Meyer of the Detroit Free Press used a computer to show that rioters spanned the age spectrum. Meyer insists in his book Precision Journalism (available freely online) that it’s no longer enough as a journalist to just be a truth-seeking writer:

There was a time when all it took was a dedication to truth, plenty of energy, and some talent for writing. You still need those things, but they are no longer sufficient. The world has become so complicated, the growth of available information so explosive, that the journalist needs to be a filter as well as a transmitter, an organizer and interpreter as well as one who gathers and delivers facts. In addition to knowing how to get information into print or on the air, he or she also must know how to get it into the receiver's head. In short, a journalist has to be a database manager, a data processor, and a data analyst.

Computational journalism

An academic paper by computational journalist pioneers Sarah Cohen, James T. Hamilton, and Fred Turner provides the following definition for computational journalism: “Broadly defined, [computational journalism] can involve changing how stories are discovered, presented, aggregated, monetized, and archived. Computation can advance journalism by drawing on innovations in topic detection, video analysis, personalization, aggregation, visualization, and sensemaking.”

When I was a journalism student at Stanford, Hamilton provided the following more concise definition:

Computational journalism is stories by, through, and about algorithms.

Let‘s dissect that definition. Algorithms are computational methods or routines.

  • Stories by algorithms? Well, some stories are already being written by computers, mainly those that follow predictable recipes, e.g. sports games and stocks.
  • Stories through algorithms? That’s more in the spirit of computer-assisted reporting: using computational methods to find, filter, clean, and extract data in the service of telling stories.
  • Stories about algorithms? Well, mysterious computer code is responsible for many consequential parts of modern society: Facebook’s algorithm mysteriously curates content and purportedly holds back fake news. Or does it? And how do credit scores work? Do they disadvantage certain groups of people?

As computers increasingly run our lives, computational journalism captures the essence of computer-assisted reporting but also looks at holding accountable the companies and institutions responsible for the code that runs our society.

Data journalism

Ok, you’re probably thoroughly confused by now. The point I’m trying to make is that these terms are mostly interchangeable. There have been many attempts by well-meaning people to capture this emerging field with a catchy name. There’s a rich history behind all the different names computational and data-driven journalism can take, but the important thing to know is that they mostly refer to the same thing.

In this course, we will focus on understanding data and using the computer to collect, filter, clean, and pull stories from it. We will look at using data and design principles to create compelling and beautiful visualizations. And we will learn the basics of coding for creating content on the web and scraping information. We’ll also discuss some of the nuances of the field and emerging trends, like artificial intelligence and the ethical problems in data and AI.

The key is not to learn everything, but to be exposed to a wide surface area and learn how to teach yourself towards mastery.