Types of Datasets
This page describes how Insights-X shares different Datasets with customers.
As a customer, you can access Insights-X data via BigQuery Dataset or as Exported Table Data to Cloud Storage
BigQuery
BigQuery is a “big data” SQL store invented by Google. Many massive Datasets, like all the code in GitHub and the complete history of the Bitcoin blockchain, are made available to the public through the Google BigQuery Datasets initiative.
BigQuery Datasets are multi-terabyte datasets hosted on Google’s servers. You interact with the dataset by writing SQL fetch queries within either the web UI, Command-line tool or any client library.
Some resources on BigQuery:
To get started using a BigQuery Dataset, Insights-X will allow permission to your own project so you can start querying datasets. If you intend to go beyond the included quotas, you must also enable billing.
Exported Data
Up to 1 GB of table data can be exported into a single file.
CSV
The simplest file type available on Insights-X is the “Comma-Separated List”, or CSV, for tabular data. A CSV representation of a booking list with a header row, for example, looks like this:
client_id | provider_id | check_in |
---|---|---|
clientA | providerX | 2019-01-13 00:00:00 UTC |
clientB | providerX | 2019-01-15 00:00:00 UTC |
CSV format does not support nested and repeated data.
JSON
While CSV is the most common file format for “flat” data, JSON is the most common file format for “tree-like” data that potentially has multiple layers, like the branches on a tree:
{
"bookings": [
{
"client_id": "clientA",
"provider_id": "providerX",
"check_in": "2019-01-13 00:00:00 UTC"
},
{
"client_id": "clientB",
"provider_id": "providerX",
"check_in": "2019-01-15 00:00:00 UTC"
}
]
}
When exported data is in JSON format, INT64 (integer) data types are encoded as JSON strings to preserve 64-bit precision when the data is read by other systems.
AVRO
Avro™ is an open source project that provides data serialization and data exchange services for Apache™ Hadoop®. … Avro stores the data definition in JSON format making it easy to read and interpret, the data itself is stored in binary format making it compact and efficient.