Data Management

Organisation: Copyright (C) 2022-2022 Olivier Boudeville about (dash) howtos (at) esperide (dot) com Saturday, November 20, 2021 Tuesday, August 30, 2022

Overview

This section concentrates information about data management, including data formats and data processing tools.

General-Purpose Data Format

Such a format is typically useful to hold configuration information.

We prefer JSON to, for example, YAML, due to the Python-style indentation on which the latter relies in order to indicate nesting.

Language-Independent Data Formats

JSON

A JSON document is in plain-text and may contain:

• basic types:
• Number: 2 or 4.1
• String: "I am a string"
• Boolean: true or false
• null: to denote an empty value
• attribute-value pairs (ex: "firstName": "John")
• "arrays" (ordered lists), ex: "myNumbers": ["12", "7", "4"]
• "objects" (collection of name-value pairs), ex:
{

"city": "New York"
}

}


The order in arrays is expected to be preserved, but not the one of the elements in an object.

Defining an element (ex: an attribute-value pair) more than once is allowed, and the last instance thereof will be the one kept.

Ex:

{
"tcp_port": 8084,
"tcp_port": 8085,
[...]
}


Here, once the document is parsed, tcp_port will be considered equal to 8085.

Pretty-Printing

On GNU/Linux, one may rely on jq, a command-line JSON processor.

Ex: jq . my_document.json.

Validating

One may consider that a given document is a legit JSON one iff jq type reports a non-empty output.

Example:

\$ jq type my_document.json
"object"


Example

Regarding syntax, a typical JSON document is:

{
"firstName": "John",
"__comment": "This is a comment!",
"lastName": "Smith",
"isAlive": true,
"age": 27,
"city": "New York",
"state": "NY",
"postalCode": "10021-3100"
},
"phoneNumbers": [
{
"type": "home",
"number": "212 555-1234"
},
{
"type": "office",
"number": "646 555-4567"
}
],
"children": [],
"spouse": null
}


We recommend to mark them specifically (ex: as __comment) so that they should not interfere with the "real" data. As an example, see the second key of the previous JSON document.

YAML

YAML is a data serialization language for all programming languages.

We prefer the .yaml extension to the .yml one.

No tabulation should be used for indentation, only spaces, and preferably a fixed amount of them; we used to prefer 4, now 2, since it allows to properly align the items listed with a dash (ex: "- I am an item").

With Emacs, the Yaml Mode may be of help.

Erlang-Friendly Data Format: ETF

Such a format is typically useful to hold configuration information in an Erlang context.

We recommend the use of ETF (the Erlang Term Format), that we find particularly useful and even more suitable than JSON (entry order preserved, comments supported, etc.).