Creating new simulations with the provided unseen-awg weather generator

You can download and extract an instance of unseen-awg as described in Data download and preparation.

Below, we first detail some technical aspects of unseen-awg and then explain how to load stored weather generators and how to use them to generate new simulations.

Storage format

Each unseen-awg weather generator gets saved as a directory that contains params.yaml, a file holding the parameters of the weather generator:

with open(os.path.join(dir_wg, "params.yaml")) as file:
    config = yaml.safe_load(file)

config

{'weather_generator.similarity': 'mse_similarity',
 'weather_generator.var': 'geopotential_height',
 'weather_generator.window_size': 10,
 'weather_generator.n_samples': 14,
 'weather_generator.use_precomputed_similarities': True,
 'zarr_year_dayofyear': '<dataset that similarities are computed over (year-dayofyear format)>',
 'dir_wg': '<weather generator directory>'}

For unseen-awg instances with precomputed similarities, the parameters "zarr_year_dayofyear" and "dir_wg" are mainly used within the Snakemake workflow and for some diagnostic plots. Consider adjusting them using the paths you defined in configs/paths.yaml (Defining paths in configs/paths.yaml). For new instances of unseen-awg, the paths will automatically be set correctly.

A core concept of unseen-awg weather generators is computing similarities between pairs of atmospheric circulation states. By default, the similarities are computed as the negative squared Euclidean distance.

If similarities are precomputed, i.e., if the weather_generator.use_precomputed_similarities parameter was set to True during initialization of the weather generator, the weather generator’s directory additionally includes a zarr store similarities.zarr holding similarities between pairs of atmospheric circulation states. Similarities can also be computed on-the-fly. While simulating time series with the weather generator, additional helper files map_<tau>_steps_transition.nc are created in the weather generator’s directory, tau indicates the block size used in the given simulation.

Loading a Weather Generator

A weather generator instance can be loaded from such a directory using:

wg = WeatherGenerator.load(dir_wg)

The dataset of similarities can be accessed through:

wg.ds_similarities

<xarray.Dataset> Size: 704GB
Dimensions:            (dayofyear: 366, year: 21, sample: 14,
                        ensemble_member: 11, d_shift: 23, c_year: 21,
                        c_sample: 14, c_ensemble_member: 11)
Coordinates: (12/16)
  * dayofyear          (dayofyear) int64 3kB 1 2 3 4 5 6 ... 362 363 364 365 366
  * year               (year) int64 168B 2003 2004 2005 2006 ... 2021 2022 2023
    valid_time         (dayofyear, year) datetime64[ns] 61kB dask.array<chunksize=(366, 21), meta=np.ndarray>
  * sample             (sample) int64 112B 0 1 2 3 4 5 6 7 8 9 10 11 12 13
    init_time          (dayofyear, year, sample) datetime64[ns] 861kB dask.array<chunksize=(1, 21, 14), meta=np.ndarray>
    lead_time          (dayofyear, year, sample) timedelta64[ns] 861kB dask.array<chunksize=(183, 11, 14), meta=np.ndarray>
    ...                 ...
    c_valid_time       (dayofyear, d_shift, c_year) datetime64[ns] 1MB dask.array<chunksize=(183, 12, 21), meta=np.ndarray>
    m_is_near          (dayofyear, year, d_shift, c_year) bool 4MB dask.array<chunksize=(183, 11, 12, 21), meta=np.ndarray>
  * c_sample           (c_sample) int64 112B 0 1 2 3 4 5 6 7 8 9 10 11 12 13
    c_init_time        (dayofyear, d_shift, c_year, c_sample) datetime64[ns] 20MB dask.array<chunksize=(92, 12, 11, 7), meta=np.ndarray>
    c_lead_time        (dayofyear, d_shift, c_year, c_sample) timedelta64[ns] 20MB dask.array<chunksize=(92, 12, 11, 7), meta=np.ndarray>
  * c_ensemble_member  (c_ensemble_member) int64 88B 0 1 2 3 4 5 6 7 8 9 10
Data variables:
    similarities       (dayofyear, year, sample, ensemble_member, d_shift, c_year, c_sample, c_ensemble_member) float64 704GB dask.array<chunksize=(1, 1, 14, 11, 23, 21, 14, 11), meta=np.ndarray>
Attributes:
    Conventions:    CF-1.7
    Title:          Similarities between pairs of states in the underlying da...
    Source:         Computed according to the selected similarity measure fro...
    Creator:        Jonathan Wider (ORCID: 0000-0002-5185-5768)
    Institution:    Helmholtz Centre for Environmental Research – UFZ
    Creation_date:  2026-04-13 20:11:23
    License:        Creative Commons Attribution 4.0 International

The array is set up in a way that it includes similarities between all “valid” pairs of large-scale atmospheric circulation states, i.e. states whose calendar date is similar enough. The conventions for the dimensions of the similarities dataset are as follows:

wg.ds_similarities.dims

FrozenMappingWarningOnValuesAccess({'dayofyear': 366, 'year': 21, 'sample': 14, 'ensemble_member': 11, 'd_shift': 23, 'c_year': 21, 'c_sample': 14, 'c_ensemble_member': 11})

Assume that we want to compute between a fixed “reference state” and a set of “candidate states”. Then the dimensions are:

dayofyear, year: Day-of-year and year of a reference state.
sample: “Sample” of a reference state. This is a dimension we create internally to differentiate between forecasts with the same valid_time (and therefore same year, dayofyear), but different initialization date (init_time).
ensemble_member: The reforecast dataset includes multiple ensemble members. ensemble_member identifies the ensemble member of the reference state.
d_shift: Shift between the day of year of the valid_time of the reference state and the corresponding day of year for the candidate state.
c_year, c_sample and c_ensemble_member have the same meaning for the candidate state as year, sample and ensemble_member for the reference state.

For example, to extract the similarity between

the large-scale atmospheric state projected for January 1st 2010 by the 5th ensemble member of a ensemble reforecast initialized on December 28th 2009 and
the large-scale atmospheric state of the 0th ensemble member of a reforecast started on January 1st 2010 itself,

one could do the following:

reference_valid_time = np.datetime64("2010-01-01")
reference_init_time = np.datetime64("2009-12-28")
reference_ensemble_member = 5

candidate_valid_time = np.datetime64("2010-01-01")
candidate_init_time = np.datetime64("2010-01-01")
candidate_ensemble_member = 0

In practice, this manual extraction of similarities is rarely necessary.

Creating new simulations with the provided unseen-awg weather generator

Storage format

Loading a Weather Generator

Sampling a time series with the weather generator