From 04533c2c2f1958e7318dbb279b7361d875631a1c Mon Sep 17 00:00:00 2001 From: Simon Kuberski Date: Mon, 21 Feb 2022 18:33:06 +0100 Subject: [PATCH 1/3] Documentation of the JSON format --- pyerrors/__init__.py | 44 ++++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 42 insertions(+), 2 deletions(-) diff --git a/pyerrors/__init__.py b/pyerrors/__init__.py index d8f526a6..677dd4b5 100644 --- a/pyerrors/__init__.py +++ b/pyerrors/__init__.py @@ -187,7 +187,7 @@ obs3.details() ``` -`Obs` objects defined on regular and irregular histories of the same ensemble can be computed with each other and the correct error propagation and estimation is automatically taken care of. +`Obs` objects defined on regular and irregular histories of the same ensemble can be combined with each other and the correct error propagation and estimation is automatically taken care of. **Warning:** Irregular Monte Carlo chains can result in odd patterns in the autocorrelation functions. Make sure to check the autocorrelation time with e.g. `pyerrors.obs.Obs.plot_rho` or `pyerrors.obs.Obs.plot_tauint`. @@ -339,7 +339,47 @@ For the full API see `pyerrors.linalg`. # Export data -The preferred exported file format within `pyerrors` is json.gz. The exact specifications of this formats will be listed here soon. +The preferred exported file format within `pyerrors` is json.gz. Files written to this format are valid JSON files that have been compressed using gzip. The structure of the content is inspired by the dobs format of the ALPHA collaboration. The aim of the format is to facilitate the storage of data in a self-contained way such that, even years after the creation of the file, it is possible to extract all necessary information: +- What observables are stored? Possibly: How exactly are they defined. +- How does each single ensemble or external quantity contribute to the error of the observable? +- Who did write the file when and on which machine? + +This can be achieved by storing all information in on single file. The export routines of `pyerrors` are written such that as much information is written automatically. The first entries of the file provide optional auxiliary information: +- `program` is a string that indicates which program was used to write the file. +- `version` is a string that specifies the version of the format. +- `who' is a string that specifies the user name of the creator of the file. +- `date` is a string and contains the creation date of the file. +- `host` is a string and contains the hostname on which the file was written. +- `description` contains information on the content of the file. This field is not filled automatically. The user is advised to provide as detailed information as possible in this field. Examples are: Input files of measurements or simulations, LaTeX formulae or references to publications to specify how the observables have been computed, details on the analysis strategy, ... This field may be any valid JSON type. Strings, arrays or objects (equivalent to dicts in python) are well suited to provide information. + +The only necessary entry of the file is the field +-`obsdata`, an array that contains the actual data. + +Each entry of the array belongs to a single structure of observables. Currently, these strucutres can be eiter of `Obs`, `list`, `numpy.ndarray`, `Corr`. All `Obs` inside a structure (with dimension > 0) have to be defined on the same set of configurations. Different structures, that are represented by entries of the array `obsdata`, are treated independently. Each entry of this array has the following required entries: +- `type` is a string that specifies the type of the structure. This allows to parse the content to the correct form after reading the file. It is always possible to interpret the content as list of Obs. +- `value` is an array that contains the mean values of the Obs inside the structure. +The following entries are optional: +- `layout` is a string that specifies the layout of multi-dimensional structures. Examples are "2, 2" for a 2x2 dimensional matrix or "64, 4, 4" for a Corr with T=64 and 4x4 matrices at each time slices. "1" denotes a single Obs. +- `tag` is any JSON type. It contains additional information concerning the structure. The `tag` of an `Obs` in `pyerrors` is written here. +- `reweighted` is a Bool that may be used to specify, whether the `Obs` in the structure have been reweighted. +- `data` is an array that contains the data from MC chains. We will define it below. +- `cdata` is an array that contains the data from external quantities with an error (`Covobs` in `pyerrors`). We will define it below. + +The array `data` contains the data from MC chains. Each entry of the array corresponds to one ensemble and contains: +- `id`, a string giving the name of the ensemble +- `replica`, an array that contains an entry per replica of the ensemble. + +Each entry of `replica` contains +`name`, a string that contains the name of the replica +`deltas`, an array that contains the actual data. + +Each entry in `deltas` corresponds to one configuration of the replica and has $1+N$ many entries. The first entry is an integer that specifies the configuration number that, together with ensemble and replica name, may be used to uniquely identify the configuration on which the data has been obtained. The following N entries specify the deltas, i.e., the deviation of the observable from the mean value on this configuration, of each `Obs` inside the structure. Multi-dimensional structures are stored in a row-major format. + +The array `cdata` contains information about the contribution of auxiliary observables, represented by `Covobs` in `pyerrors`, to the total error of the observables. Each entry of the array belongs to one auxiliary covariance matrix and contains: +- `id`, a string that identifies the covariance matrix +- `layout`, a string that defines the dimensions of the $M\times M$ covariance matrix (has to be "M, M"). +- `cov`, an array that contains the $M\times M$ many entries of the covariance matrix, stored in row-major format. +- `grad`, an array that contains N entries, one for each `Obs` inside the structure. Each entry is an array, that contains the M gradients of the Nth observable with respect to the values that correspond to the diagonal entries of the covariance matrix. ## Jackknife samples For comparison with other analysis workflows `pyerrors` can generate jackknife samples from an `Obs` object or import jackknife samples into an `Obs` object. From 0e685d552a8c98082fe22df87f59ca71bf119121 Mon Sep 17 00:00:00 2001 From: Simon Kuberski Date: Mon, 21 Feb 2022 18:46:56 +0100 Subject: [PATCH 2/3] Added reference to JSON schema --- pyerrors/__init__.py | 2 ++ 1 file changed, 2 insertions(+) diff --git a/pyerrors/__init__.py b/pyerrors/__init__.py index 4e2db252..0aeb65d8 100644 --- a/pyerrors/__init__.py +++ b/pyerrors/__init__.py @@ -381,6 +381,8 @@ The array `cdata` contains information about the contribution of auxiliary obser - `cov`, an array that contains the $M\times M$ many entries of the covariance matrix, stored in row-major format. - `grad`, an array that contains N entries, one for each `Obs` inside the structure. Each entry itself is an array, that contains the M gradients of the Nth observable with respect to the quantity that corresponds to the Mth diagonal entry of the covariance matrix. +A JSON schema that may be used to verify the correctness of a file with respect to the format definition is stored in ./examples/json_schema.json. The schema is a self-descriptive format definition and contains an exemplary file. + ## Jackknife samples For comparison with other analysis workflows `pyerrors` can generate jackknife samples from an `Obs` object or import jackknife samples into an `Obs` object. See `pyerrors.obs.Obs.export_jackknife` and `pyerrors.obs.import_jackknife` for details. From 8e5938ebe3c83f56deba3f8164f99e3a7557a3f0 Mon Sep 17 00:00:00 2001 From: Simon Kuberski Date: Mon, 21 Feb 2022 18:47:47 +0100 Subject: [PATCH 3/3] Added version number of JSON file format to 1.0 --- pyerrors/input/json.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/pyerrors/input/json.py b/pyerrors/input/json.py index a4fab75e..a6060d5f 100644 --- a/pyerrors/input/json.py +++ b/pyerrors/input/json.py @@ -207,7 +207,7 @@ def create_json_string(ol, description='', indent=1): d = {} d['program'] = 'pyerrors %s' % (pyerrorsversion.__version__) - d['version'] = '0.2' + d['version'] = '1.0' d['who'] = getpass.getuser() d['date'] = datetime.datetime.now().astimezone().strftime('%Y-%m-%d %H:%M:%S %z') d['host'] = socket.gethostname() + ', ' + platform.platform()