Data Management

Organisation:Copyright (C) 2022-2024 Olivier Boudeville
Contact:about (dash) howtos (at) esperide (dot) com
Creation date:Saturday, November 20, 2021
Lastly updated:Friday, June 21, 2024

Overview

This section concentrates information about data management, including data formats and data processing tools.

General-Purpose Data Format

Such a format is typically useful to hold configuration information.

We prefer JSON to, for example, YAML, due to the Python-style indentation on which the latter relies in order to indicate nesting.

Language-Independent Data Formats

JSON

A JSON document is in plain-text and may contain:

  • basic types:
    • Number: 2 or 4.1
    • String: "I am a string"
    • Boolean: true or false
    • null: to denote an empty value
  • attribute-value pairs (e.g. "firstName": "John")
  • "arrays" (ordered lists), e.g. "myNumbers": ["12", "7", "4"]
  • "objects" (collection of name-value pairs), e.g.
{

  "address": {
     "streetAddress": "21 2nd Street",
     "city": "New York"
  }

}

The order in arrays is expected to be preserved, but not the one of the elements in an object.

Defining an element (e.g. an attribute-value pair) more than once is allowed, and the last instance thereof will be the one kept.

For instance:

{
 "tcp_port": 8084,
 "tcp_port": 8085,
 [...]
}

Here, once the document is parsed, tcp_port will be considered equal to 8085.

Pretty-Printing

On GNU/Linux, one may rely on jq, a command-line JSON processor.

For instance: jq . my_document.json.

Validating

One may consider that a given document is a legit JSON one iff jq type reports a non-empty output.

Example:

$ jq type my_document.json
"object"

Example

Regarding syntax, a typical JSON document is:

{
  "firstName": "John",
  "__comment": "This is a comment!",
  "lastName": "Smith",
  "isAlive": true,
  "age": 27,
  "address": {
    "streetAddress": "21 2nd Street",
    "city": "New York",
    "state": "NY",
    "postalCode": "10021-3100"
  },
  "phoneNumbers": [
    {
      "type": "home",
      "number": "212 555-1234"
    },
    {
      "type": "office",
      "number": "646 555-4567"
    }
  ],
  "children": [],
  "spouse": null
}

Specifying comments

With JSON, there is, on purpose, no built-in way to add comments.

The sole solution/workaround is to add comments as specific fields, although they will end up as data like the other fields.

We recommend to mark them specifically (e.g. as __comment) so that they should not interfere with the "real" data. As an example, see the second key of the previous JSON document.

YAML

YAML is a data serialization language for all programming languages.

We prefer the .yaml extension to the .yml one.

No tabulation should be used for indentation, only spaces, and preferably a fixed amount of them; we used to prefer 4, now 2, since it allows to properly align the items listed with a dash (e.g. "- I am an item").

With Emacs, the Yaml Mode may be of help.

Erlang-Friendly Data Format: ETF

Such a format is typically useful to hold configuration information in an Erlang context.

We recommend the use of ETF (the Erlang Term Format), that we find particularly useful and even more suitable than JSON (entry order preserved, comments supported, etc.).