Data Management

Organisation:	Copyright (C) 2022-2025 Olivier Boudeville
Contact:	about (dash) howtos (at) esperide (dot) com
Creation date:	Saturday, November 20, 2021
Lastly updated:	Wednesday, March 19, 2025

Table of Contents

Overview
General-Purpose Data Format
- Language-Independent Data Formats
- Erlang-Friendly Data Format: ETF
Data-related Processing Tools
Data-related Displaying Tools

Overview

This section concentrates information about data management, including data formats and data processing tools.

General-Purpose Data Format

Such a format is typically useful to hold configuration information.

We prefer JSON to, for example, YAML, due to the Python-style indentation on which the latter relies in order to indicate nesting.

Language-Independent Data Formats

JSON

A JSON document is in plain-text and may contain:

basic types:
- Number: 2 or 4.1
- String: "I am a string"
- Boolean: true or false
- null: to denote an empty value
attribute-value pairs (e.g. "firstName": "John")
"arrays" (ordered lists), e.g. "myNumbers": ["12", "7", "4"]
"objects" (collection of name-value pairs), e.g.

{

  "address": {
     "streetAddress": "21 2nd Street",
     "city": "New York"
  }

}

The order in arrays is expected to be preserved, but not the one of the elements in an object.

Defining an element (e.g. an attribute-value pair) more than once is allowed, and the last instance thereof will be the one kept.

For instance:

{
 "tcp_port": 8084,
 "tcp_port": 8085,
 [...]
}

Here, once the document is parsed, tcp_port will be considered equal to 8085.

Pretty-Printing

On GNU/Linux, one may rely on jq, a command-line JSON processor.

For instance: jq . my_document.json.

Validating

One may consider that a given document is a legit JSON one iff jq type reports a non-empty output.

Example:

$ jq type my_document.json
"object"

Example

Regarding syntax, a typical JSON document is:

{
  "firstName": "John",
  "__comment": "This is a comment!",
  "lastName": "Smith",
  "isAlive": true,
  "age": 27,
  "address": {
    "streetAddress": "21 2nd Street",
    "city": "New York",
    "state": "NY",
    "postalCode": "10021-3100"
  },
  "phoneNumbers": [
    {
      "type": "home",
      "number": "212 555-1234"
    },
    {
      "type": "office",
      "number": "646 555-4567"
    }
  ],
  "children": [],
  "spouse": null
}

Specifying comments

With JSON, there is, on purpose, no built-in way to add comments.

The sole solution/workaround is to add comments as specific fields, although they will end up as data like the other fields.

We recommend to mark them specifically (e.g. as __comment) so that they should not interfere with the "real" data. As an example, see the second key of the previous JSON document.

YAML

YAML is a data serialization language for all programming languages.

We prefer the .yaml extension to the .yml one.

No tabulation should be used for indentation, only spaces, and preferably a fixed amount of them; we used to prefer 4, now 2, since it allows to properly align the items listed with a dash (e.g. "- I am an item").

With Emacs, the Yaml Mode may be of help.

Erlang-Friendly Data Format: ETF

Such a format is typically useful to hold configuration information in an Erlang context.

We recommend the use of ETF (the Erlang Term Format), that we find particularly useful and even more suitable than JSON (entry order preserved, comments supported, etc.).

Data-related Displaying Tools

For that we rely mostly on gnuplot.

Our conventions are the following:

rely on a recent version of gnuplot (e.g. 5.4)
the image formats for the generated plots are (large-enough) PNG or SVG (better in spirit, yet usually of higher file size, and with a slightly different rendering)
the extension of command files is p (e.g. foobar.p), the one for data files that they refer to is dat (e.g. foobar.dat); running gnuplot scripts is then as simple as executing gnuplot foobar.p

Besides generating images, gnuplot is able, thanks to interactive terminal types (like qt), to let the user navigate in the plots (e.g. move around, zoom on them).

As an example to be copied in a hyperboloid.p command file:

reset
set grid
set parametric
set view equal

splot [-pi:pi][-2.5:2.5] cos(u)*sinh(v), sin(u)*sinh(v), cosh(v), cos(u)*sinh(v), sin(u)*sinh(v), -cosh(v)

set terminal qt persist
pause mouse close

Then running gnuplot hyperboloid.p results in an interactive viewer like:

that we can explore with the mouse and/or keyboard. Press the h key to list the available mouse/keyboard commands.

Based on our Erlang developments, we implemented the plot_utils module, which is a library (relying on gnuplot) to generate plots more conveniently.

Data Management

JSON

Pretty-Printing

Validating

Example

Specifying comments

YAML

Syntax

Value Ranges

Array-based Functions

Outputs

Script Files

Defining a Function

Plotting a Function

Defining a Function

Plotting a Function

To obtain a list of all the different values in a selection