HSD-python documentation
Introduction
This package contains utilities to read and write files in the Human-friendly Structured Data (HSD) format.
The HSD-format is very similar to XML, JSON and YAML, but tries to minimize the effort for humans to read and write it. It ommits special characters as much as possible (in contrast to XML and JSON) and is not indentation dependent (in contrast to YAML). It was developed originally as the input format for the scientific simulation tool (DFTB+), but is of general purpose. Data stored in HSD can be easily mapped to a subset of JSON, YAML or XML and vice versa.
This document describes hsd-python version 0.1.
Installation
The package can be installed via conda-forge:
conda install hsd-python
Alternatively, the package can be downloaded and installed via pip into the active Python interpreter (preferably using a virtual python environment) by
pip install hsd
or into the user space issueing
pip install --user hsd
Quick tutorial
A typical, self-explaining input written in HSD looks like
driver {
conjugate_gradients {
moved_atoms = 1 2 "7:19"
max_steps = 100
}
}
hamiltonian {
dftb {
scc = yes
scc_tolerance = 1e-10
mixer {
broyden {}
}
filling {
fermi {
# This is comment which will be ignored
# Note the attribute (unit) of the field below
temperature [kelvin] = 100
}
}
k_points_and_weights {
supercell_folding {
2 0 0
0 2 0
0 0 2
0.5 0.5 0.5
}
}
}
}
The above input can be parsed into a Python dictionary with:
import hsd
hsdinput = hsd.load("test.hsd")
The dictionary hsdinput
will then look as:
{
"driver": {
"conjugate_gradients" {
"moved_atoms": [1, 2, "7:19"],
"max_steps": 100
}
},
"hamiltonian": {
"dftb": {
"scc": True,
"scc_tolerance": 1e-10,
"mixer": {
"broyden": {}
},
"filling": {
"fermi": {
"temperature": 100,
"temperature.attrib": "kelvin"
}
}
"k_points_and_weights": {
"supercell_folding": [
[2, 0, 0],
[0, 2, 0],
[0, 0, 2],
[0.5, 0.5, 0.5]
]
}
}
}
}
Being a simple Python dictionary, it can be easily queried and manipulated in Python
hsdinput["driver"]["conjugate_gradients"]["max_steps"] = 200
and then stored again in HSD format
hsd.dump(hsdinput, "test2.hsd")
The HSD format
General description
You can think about the Human-readable Structured Data format as a pleasant representation of a tree structure. It can represent a subset of what you can do for example with XML. The following constraints compared to XML apply:
Every node of a tree, which is not empty, either contains further nodes or data, but never both.
Every node may have a single (string) attribute only.
These constraints allow a very natural looking formatting of the data.
As an example, let’s have a look at a data tree, which represents input for scientific software. In the XML representation, it could be written as
<Hamiltonian>
<Dftb>
<Scc>Yes</Scc>
<Filling>
<Fermi>
<Temperature attrib="Kelvin">77</Temperature>
</Fermi>
<Filling>
</Dftb>
</Hamiltonian>
The same information can be encoded in a much more natural and compact form in HSD format as
Hamiltonian {
Dftb {
Scc = Yes
Filling {
Fermi {
Temperature [Kelvin] = 77
}
}
}
}
The content of a node are passed either between an opening and a closing curly brace or after an equals sign. In the latter case the end of the line will be the closing delimiter. The attribute (typically the unit of the data which the node contains) is specified between square brackets after the node name.
The equals sign can be used to assign data as a node content (provided
the data fits into one line), or to assign a single child node as content
for a given node. This leads to a compact and expressive notation for those
cases, where (by the semantics of the input) a given node is only allowed to
have a single child node as content. The tree above is a piece of a typical
DFTB+ input, where only one child node is allowed for the nodes Hamiltonian
and Filling
, respectively (They specify the type of the Hamiltonian
and the filling function). By making use of equals signs, the
simplified HSD representation can be as compact as
Hamiltonian = Dftb {
Scc = Yes
Filling = Fermi {
Temperature [Kelvin] = 77
}
}
and still represent the same tree.
Mapping to dictionaries
Being basically a subset of XML, HSD data is best represented as an XML DOM-tree. However, very often a dictionary representation is more desirable, especially when the language used to query and manipulate the tree offers dictionaries as primary data type (e.g. Python). The data in an HSD input can be easily represented with the help of nested dictionaries and lists. The input from the previous section would have the following representation as Python dictionary (or as a JSON formatted input file):
{
"Hamiltonian": {
"Dftb": {
"Scc": Yes,
"Filling": {
"Fermi": {
"Temperature": 77,
"Temperature.attrib": "Kelvin"
}
}
}
}
}
The attribute of a node is stored under a special key containting the name of
the node and the .attrib
suffix.
One slight complication of the dictionary representation arises in the case of node which has multiple child nodes with the same name
<ExternalField>
<PointCharges>
<GaussianBlurWidth>3</GaussianBlurWidth>
<CoordsAndCharges>
3.3 -1.2 0.9 9.2
1.2 -3.4 5.6 -3.3
</CoordsAndCharges>
</PointCharges>
<PointCharges>
<GaussianBlurWidth>10</GaussianBlurWidth>
<CoordsAndCharges>
1.0 2.0 3.0 4.0
-1.0 -2.0 -3.0 -4.0
</CoordsAndCharges>
</PointCharges>
</ExternalField>
While the HSD representation has no problem to cope with the situation
ExternalField {
PointCharges {
GaussianBlurWidth = 3
CoordsAndCharges {
3.3 -1.2 0.9 9.2
1.2 -3.4 5.6 -3.3
}
}
PointCharges {
GaussianBlurWidth = 10
CoordsAndCharges {
1.0 2.0 3.0 4.0
-1.0 -2.0 -3.0 -4.0
}
}
}
a trick is needed for the dictionary / JSON representation, as multiple keys with the same name are not allowed in a dictionary. Therefore, the repetitive nodes will be mapped to one key, which will contain a list of dictionaries (instead of a single dictionary as in the usual case):
{
"ExternalField": {
// Note the list of dictionaries here!
"PointCharges": [
{
"GaussianBlurWidth": 3,
"CoordsAndCharges": [
[3.3, -1.2, 0.9, 9.2],
[1.2, -3.4, 5.6, -3.3]
]
},
{
"GaussianBlurWidth": 10,
"CoordsAndCharges": [
[1.0, 2.0, 3.0, 4.0 ],
[-1.0, -2.0, -3.0, -4.0 ]
]
},
]
# Also attributes becomes a list. Due to technialc reasons the
# dictbuilder always creates an attribute list for mulitple nodes,
# even if none of the nodes carries an actual attribute.
"PointCharges.attrib": [None, None]
}
}
The mapping works in both directions, so that this dictionary (or the JSON file created from it) can be easily converted back to the HSD form again.
API documentation
High level routines
- hsd.load_string(hsdstr: str, lower_tag_names: bool = False, include_hsd_attribs: bool = False, flatten_data: bool = False) dict
Loads a string with HSD-formatted data into a Python dictionary.
- Parameters:
hsdstr – String with HSD-formatted data.
lower_tag_names – When set, all tag names will be converted to lower-case (practical, when input should be treated case insensitive.) If
include_hsd_attribs
is set, the original tag name will be stored among the HSD attributes.include_hsd_attribs – Whether the HSD-attributes (processing related attributes, like original tag name, line information, etc.) should be stored. Use it, if you wish to keep the formatting of the data close to the original one on writing (e.g. lowered tag names converted back to their original form, equals signs between parent and only child kept, instead of converted to curly braces).
flatten_data – Whether multiline data in the HSD input should be flattened into a single list. Othewise a list of lists is created, with one list for every line (default).
- Returns:
Dictionary representing the HSD data.
Examples
>>> hsdstr = """ ... Dftb { ... Scc = Yes ... Filling { ... Fermi { ... Temperature [Kelvin] = 100 ... } ... } ... } ... """ >>> hsd.load_string(hsdstr) {'Dftb': {'Scc': True, 'Filling': {'Fermi': {'Temperature': 100, 'Temperature.attrib': 'Kelvin'}}}}
In order to ease the case-insensitive handling of the input, the tag names can be converted to lower case during reading using the
lower_tag_names
option.>>> hsd.load_string(hsdstr, lower_tag_names=True) {'dftb': {'scc': True, 'filling': {'fermi': {'temperature': 100, 'temperature.attrib': 'Kelvin'}}}}
The original tag names (together with additional information like the line number of a tag) can be recorded, if the
include_hsd_attribs
option is set:>>> data = hsd.load_string(hsdstr, lower_tag_names=True, ... include_hsd_attribs=True)
Each tag in the dictionary will have a corresponding “.hsdattrib” entry with the recorded data:
>>> data["dftb.hsdattrib"] {'equal': False, 'line': 1, 'name': 'Dftb'}
This additional data can be then also used to format the tags in the original style, when writing the data in HSD-format again. Compare:
>>> hsd.dump_string(data) 'dftb {\n scc = Yes\n filling {\n fermi {\n temperature [Kelvin] = 100\n }\n }\n}\n'
versus
>>> hsd.dump_string(data, use_hsd_attribs=True) 'Dftb {\n Scc = Yes\n Filling {\n Fermi {\n Temperature [Kelvin] = 100\n }\n }\n}\n'
- hsd.load(hsdfile: Union[TextIO, str], lower_tag_names: bool = False, include_hsd_attribs: bool = False, flatten_data: bool = False) dict
Loads a file with HSD-formatted data into a Python dictionary
- Parameters:
hsdfile – Name of file or file like object to read the HSD data from
lower_tag_names – When set, all tag names will be converted to lower-case (practical, when input should be treated case insensitive.) If
include_hsd_attribs
is set, the original tag name will be stored among the HSD attributes.include_hsd_attribs – Whether the HSD-attributes (processing related attributes, like original tag name, line information, etc.) should be stored. Use it, if you wish to keep the formatting of the data close to the original on writing (e.g. lowered tag names converted back to their original form, equals signs between parent and only child kept, instead of converted to curly braces).
flatten_data – Whether multiline data in the HSD input should be flattened into a single list. Othewise a list of lists is created, with one list for every line (default).
- Returns:
Dictionary representing the HSD data.
Examples
See
hsd.load_string()
for examples of usage.
- hsd.dump_string(data: dict, use_hsd_attribs: bool = False) str
Serializes an object to string in HSD format.
- Parameters:
data – Dictionary like object to be written in HSD format.
use_hsd_attribs – Whether HSD attributes of the data structure should be used to format the output (e.g. to restore original mixed case tag names)
- Returns:
HSD formatted string.
Examples
>>> hsdtree = { ... 'Dftb': { ... 'Scc': True, ... 'Filling': { ... 'Fermi': { ... 'Temperature': 100, ... 'Temperature.attrib': 'Kelvin' ... } ... } ... } ... } >>> hsd.dump_string(hsdtree) 'Dftb {\n Scc = Yes\n Filling {\n Fermi {\n Temperature [Kelvin] = 100\n }\n }\n}\n'
See also
hsd.load_string()
for an example.
- hsd.dump(data: dict, hsdfile: Union[TextIO, str], use_hsd_attribs: bool = False)
Dumps data to a file in HSD format.
- Parameters:
data – Dictionary like object to be written in HSD format
hsdfile – Name of file or file like object to write the result to.
use_hsd_attribs –
Whether HSD attributes in the data structure should be used to format the output.
This option can be used to for example to restore original tag names, if the file was loaded with the
lower_tag_names
andinclude_hsd_attribs
options set or keep the equal signs between parent and contained only child.
- Raises:
TypeError – if object is not a dictionary instance.
Examples
See
hsd.load_string()
for an example.
Lower level building blocks
- class hsd.HsdParser(eventhandler: Optional[HsdEventHandler] = None)
Event based parser for the HSD format.
- Parameters:
eventhandler – Object which should handle the HSD-events triggered during parsing. When not specified, HsdEventPrinter() is used.
Examples
>>> from io import StringIO >>> dictbuilder = hsd.HsdDictBuilder() >>> parser = hsd.HsdParser(eventhandler=dictbuilder) >>> hsdfile = StringIO(""" ... Hamiltonian { ... Dftb { ... Scc = Yes ... Filling = Fermi { ... Temperature [Kelvin] = 100 ... } ... } ... } ... """) >>> parser.parse(hsdfile) >>> dictbuilder.hsddict {'Hamiltonian': {'Dftb': {'Scc': True, 'Filling': {'Fermi': {'Temperature': 100, 'Temperature.attrib': 'Kelvin'}}}}}
- parse(fobj: Union[TextIO, str])
Parses the provided file-like object.
The parser will process the data and trigger the corresponding events in the eventhandler which was passed at initialization.
- Parameters:
fobj – File like object or name of a file containing the data.
- class hsd.HsdEventHandler
Abstract base class for handling HSD events.
- abstract open_tag(tagname: str, attrib: Optional[str], hsdattrib: Optional[dict])
Opens a tag.
- Parameters:
tagname – Name of the tag which had been opened.
attrib – String containing the attribute of the tag or None.
hsdattrib – Dictionary of the options created during the processing in the hsd-parser.
- abstract close_tag(tagname: str)
Closes a tag.
- Parameters:
tagname – Name of the tag which had been closed.
- abstract add_text(text: str)
Adds text (data) to the current tag.
- Parameters:
text – Text in the current tag.
- class hsd.HsdDictBuilder(flatten_data: bool = False, lower_tag_names: bool = False, include_hsd_attribs: bool = False)
Specific HSD event handler, which builds a nested Python dictionary.
- Parameters:
flatten_data – Whether multiline data in the HSD input should be flattened into a single list. Othewise a list of lists is created, with one list for every line (default).
lower_tag_names – Whether tag names should be all converted to lower case (to ease case insensitive processing). Default: False. If set and include_hsd_attribs is also set, the original tag names can be retrieved from the “name” hsd attributes.
include_hsd_attribs – Whether the HSD-attributes (processing related attributes, like original tag name, line information, etc.) should be stored (default: False).
- property hsddict
The dictionary which has been built
- open_tag(tagname, attrib, hsdattrib)
Opens a tag.
- Parameters:
tagname – Name of the tag which had been opened.
attrib – String containing the attribute of the tag or None.
hsdattrib – Dictionary of the options created during the processing in the hsd-parser.
- close_tag(tagname)
Closes a tag.
- Parameters:
tagname – Name of the tag which had been closed.
- add_text(text)
Adds text (data) to the current tag.
- Parameters:
text – Text in the current tag.
- class hsd.HsdDictWalker(eventhandler: Optional[HsdEventHandler] = None)
Walks through a Python dictionary and triggers HSD events.
- Parameters:
eventhandler – Event handler dealing with the HSD events generated while walking through the dictionary. When not specified, the events are printed.
- walk(dictobj)
Walks through the directory and generates HSD events.
- Parameters:
dictobj – Directory to walk through.
- class hsd.HsdFormatter(fobj, use_hsd_attribs=True)
Implements an even driven HSD formatter.
- Parameters:
fobj – File like object to write the formatted output to.
use_hsd_attribs – Whether HSD attributes passed to the formatter should be considered, when formatting the the output (default: True)
- open_tag(tagname: str, attrib: str, hsdattrib: dict)
Opens a tag.
- Parameters:
tagname – Name of the tag which had been opened.
attrib – String containing the attribute of the tag or None.
hsdattrib – Dictionary of the options created during the processing in the hsd-parser.
- close_tag(tagname: str)
Closes a tag.
- Parameters:
tagname – Name of the tag which had been closed.
- add_text(text: str)
Adds text (data) to the current tag.
- Parameters:
text – Text in the current tag.