Types of Datasets

This page describes how Insights-X shares different Datasets with customers.

As a customer, you can access Insights-X data via BigQuery Dataset or as Exported Table Data to Cloud Storage

BigQuery

BigQuery is a “big data” SQL store invented by Google. Many massive Datasets, like all the code in GitHub and the complete history of the Bitcoin blockchain, are made available to the public through the Google BigQuery Datasets initiative.

BigQuery Datasets are multi-terabyte datasets hosted on Google’s servers. You interact with the dataset by writing SQL fetch queries within either the web UI, Command-line tool or any client library.

Some resources on BigQuery:

To get started using a BigQuery Dataset, Insights-X will allow permission to your own project so you can start querying datasets. If you intend to go beyond the included quotas, you must also enable billing.

Exported Data

Up to 1 GB of table data can be exported into a single file.

CSV

The simplest file type available on Insights-X is the “Comma-Separated List”, or CSV, for tabular data. A CSV representation of a booking list with a header row, for example, looks like this:

client_id provider_id check_in
clientA providerX 2019-01-13 00:00:00 UTC
clientB providerX 2019-01-15 00:00:00 UTC

CSV format does not support nested and repeated data.

JSON

While CSV is the most common file format for “flat” data, JSON is the most common file format for “tree-like” data that potentially has multiple layers, like the branches on a tree:

{
    "bookings": [
      {
            "client_id": "clientA",
            "provider_id": "providerX",
            "check_in": "2019-01-13 00:00:00 UTC"
       },
       {
            "client_id": "clientB",
            "provider_id": "providerX",
            "check_in": "2019-01-15 00:00:00 UTC"
       }
       ]
}

When exported data is in JSON format, INT64 (integer) data types are encoded as JSON strings to preserve 64-bit precision when the data is read by other systems.

AVRO

Avro™ is an open source project that provides data serialization and data exchange services for Apache™ Hadoop®. … Avro stores the data definition in JSON format making it easy to read and interpret, the data itself is stored in binary format making it compact and efficient.