(This file is adapted from github.com/libuconf/cni/SPEC.md).
Note that the prime source of information about CNI is to be considered the reference implementation and test suite. This document is a reading companion (rather than the source of truth) that should help expediate the development of implementations.
A CNI document consists of an arbitrary amount of statements. A statement may either be a section declaration, key-value pair, or key-rawvalue pair. Whitespace between elements is not significant, meaning that unless it would be consumed otherwise (such as in a value of a key-value pair), it may be inserted whenever. This obviously includes empty lines, as well as prepending statements with whitespace, though more uses will be mentioned later.
Note: some technically valid syntax is discouraged. Have a read through the test suite to see what is discouraged. Note: a comment does not count as whitespace. Note: it is valid to have multiple statements assigning to the same key. In the event this happens, the last key definition wins. For example,
sub.source = main.zip [sub] source = src.zip
will result in the sub.source key having the value of src.zip.
A key is composed of alphanumeric ([a-zA-Z0-9]) characters, underscores, and dashes. Keys may contain dots, but may not start or end with them - dots semantically signify sections. Two dots in a row are disallowed (due to being nonsensical, it would imply a section with no name), but implementations do not need to explicitly disallow them.
A section consists of an opening bracket ( [ ), the section name, and a closing bracket ( ] ). The section name may either be a valid key, or empty.
Until the next section, all keys will be automatically prepended the section name and a ., or with nothing (if the section name is the empty string). Note that there may be any amount of whitespace in between the section name and the brackets.
A key-value and key-rawvalue pair look very similar. It's a key, an equals, and a value, with arbitrary amounts of whitespace before and after the =.
A value may contain any UTF-8 characters except vertical whitespace or the # sign (note: this is expanded upon in ini-compatibility). It may not start or end with whitespace, or start with a backtick.
A rawvalue starts with a backtick and ends with a backtick, but may contain any UTF-8 characters besides those. If a backtick needs to be inserted into a raw value, a backtick may be escaped with a second one.
After both a value and a rawvalue, you may have any amount of horizontal whitespace before a comment or the end of the line. Vertical whitespace is excluded, since that would imply the statement has ended.
Theoretically, this means that both values and raw values may contain arbitrary bytes, but the specification is exclusively intended to be used with UTF-8 data. Implementations are not required to validate it, though.
A comment starts with a # (note: this is expanded upon in ini-compatibility) and ends at the next vertical whitespace. A comment may start on its own line, at any point "inside" of a value (thus ending it), after a section declaration, or after a raw value. It is impossible to have comments inside of a raw value, or inside of a section declaration.
For compatibility with INI, the ; character may also be used for comments, i.e treated as if it were a #.
All extensions are fully optional:
If this extension is used, keys may be composed of any characters which are not whitespace, #, =, [, ], or `. The meaning and restrictions of dots are unchanged.
This API is purely a suggestion for the sake of users - implementations may follow any scheme they wish.
Backend storage should be implemented in a trie-like fashion, separated by .s. It should be possible to get the direct list of children of a trie node, as well as the full list of children of a trie node. It should be possible to know if a given key is a section, key, both, or neither. It should be possible to walk over matches without requiring allocation of a new trie. Keys do not need to be ordered in any specific way, but sorting them may make implementing the API easier.
The following API endpoints provide all of this functionality:
WalkLeaves|WalkTree takes a pattern (which may be empty) and a function. It will call the function with each matching key-(raw)value pair. The matches are performed like so:
ListLeaves|ListTree takes a pattern and returns a list of values. The list will contain all of the values (including duplicates) that match under the corresponding Walk* function. It can be implemented in pseudocode using the Walk* functions like so:
ListLeaves (pattern): var output_list WalkLeaves (pattern, lambda (key, value): output_list.append(value) ) return output_list ListTree (pattern): var output_list WalkTree (pattern, lambda (key, value): output_list.append(value) ) return output_list
KeyLeaves|KeyTree takes a pattern and returns a set of keys. It is the converse of ListLeaves|ListTree.
SubLeaves|SubTree takes a pattern, and constructs a new trie. The new trie will contain all of the key-(raw)value pairs that match under the corresponding Walk* function, but with the pattern (and the dot immediately after it) removed. It can be implemented in pseudocode using the Walk* functions like so:
SubLeaves (pattern): var output_trie WalkLeaves (pattern, lambda (key, value): output_trie[key.strip-from-start(pattern + ".")] = value ) return output_trie SubTree (pattern): var output_trie WalkTree (pattern, lambda (key, value): output_trie[key.strip-from-start(pattern + ".")] = value ) return output_trie
SectionLeaves|SectionTree takes a pattern and returns a set of prefixes. It can be calculated roughly as follows:
Some terms in this document may be confusing. This section should clarify those.
A set is like a list, but cannot contain duplicate items. For example, if I add "a" to an empty set twice, I end up with a set with one element: "a".
Unlike horizontal whitespace, unicode does not provide a good definition for "vertical whitespace". For the purposes of this document, consider it to mean the following code points:
If the library implementer's language or regex engine provides its own vertical whitespace group (such as raku's \v), implementers are encouraged to use it instead of hardcoding the above list.