Data Formats

This page is a quick reference on common data formats. This week, we will only be focusing on tabular data formats that can be imported into Google Sheets.

Tabular data formats:

  • csv: Comma-separated values. This file contains a list of tabular records (“tabular” is a fancy way to say spreadsheet, or matrix). Each line of the file corresponds to a row, and each line contains multiple values separated by commas (hence comma-separated) corresponding to columns. This kind of file can be imported into Google Sheets.
  • xls and xlsx: Microsoft Excel spreadsheet. Like a CSV, this kind of file describes a list of tabular records. The format is more flexible though, and allows for multiple sheets within the same file (pages of data, if you will) as well as formatting. This kind of file can also be imported into Google Sheets.

Geographic data formats

  • shp: Shapefile. This file stores geographic features (points, lines, or an area) — basically shapes that correspond to geography. It can also hold associated data for each feature, e.g. the name and population for each county in a state.
  • GeoJSON: Geographic JavaScript Object Notation. This file describes pretty much the same as a Shapefile above, but it is in the JSON file format (JavaScript Object Notation), which let’s JavaScript web application read it more easily.
  • kml: Keyhole Markup Language. We won’t really look at this format in this class, but it’s a way to specify geographic annotations and 3D shapes. It could be used to specify the shapes of buildings on Google Earth.

Programmatic

  • API: Application Programming Interface. This is not really a data format, in that it doesn’t describe a file type. Instead, it refers to some interface with which you can request data with code and get results back quickly. In most cases, APIs are web applications that return data in one of the formats above after you specify your desired parameters.