Creating new simulations with the provided unseen-awg weather generator
You can download and extract an instance of unseen-awg as described in Data download and preparation.
Below, we first detail some technical aspects of unseen-awg and then explain how to load stored weather generators and how to use them to generate new simulations.
Storage format
Each unseen-awg weather generator gets saved as a directory that contains params.yaml, a file holding the parameters of the weather generator:
with open(os.path.join(dir_wg, "params.yaml")) as file:
config = yaml.safe_load(file)
config
{'weather_generator.similarity': 'mse_similarity',
'weather_generator.var': 'geopotential_height',
'weather_generator.window_size': 10,
'weather_generator.n_samples': 14,
'weather_generator.use_precomputed_similarities': True,
'zarr_year_dayofyear': '<dataset that similarities are computed over (year-dayofyear format)>',
'dir_wg': '<weather generator directory>'}
For unseen-awg instances with precomputed similarities, the parameters "zarr_year_dayofyear" and "dir_wg" are mainly used within the Snakemake workflow and for some diagnostic plots. Consider adjusting them using the paths you defined in configs/paths.yaml (Defining paths in configs/paths.yaml). For new instances of unseen-awg, the paths will automatically be set correctly.
A core concept of unseen-awg weather generators is computing similarities between pairs of atmospheric circulation states. By default, the similarities are computed as the negative squared Euclidean distance.
If similarities are precomputed, i.e., if the weather_generator.use_precomputed_similarities parameter was set to True during initialization of the weather generator, the weather generator’s directory additionally includes a zarr store similarities.zarr holding similarities between pairs of atmospheric circulation states. Similarities can also be computed on-the-fly. While simulating time series with the weather generator, additional helper files map_<tau>_steps_transition.nc are created in the weather generator’s directory, tau indicates the block size used in the given simulation.
Loading a Weather Generator
A weather generator instance can be loaded from such a directory using:
wg = WeatherGenerator.load(dir_wg)
The dataset of similarities can be accessed through:
wg.ds_similarities
<xarray.Dataset> Size: 704GB
Dimensions: (dayofyear: 366, year: 21, sample: 14,
ensemble_member: 11, d_shift: 23, c_year: 21,
c_sample: 14, c_ensemble_member: 11)
Coordinates: (12/16)
* dayofyear (dayofyear) int64 3kB 1 2 3 4 5 6 ... 362 363 364 365 366
* year (year) int64 168B 2003 2004 2005 2006 ... 2021 2022 2023
valid_time (dayofyear, year) datetime64[ns] 61kB dask.array<chunksize=(366, 21), meta=np.ndarray>
* sample (sample) int64 112B 0 1 2 3 4 5 6 7 8 9 10 11 12 13
init_time (dayofyear, year, sample) datetime64[ns] 861kB dask.array<chunksize=(1, 21, 14), meta=np.ndarray>
lead_time (dayofyear, year, sample) timedelta64[ns] 861kB dask.array<chunksize=(183, 11, 14), meta=np.ndarray>
... ...
c_valid_time (dayofyear, d_shift, c_year) datetime64[ns] 1MB dask.array<chunksize=(183, 12, 21), meta=np.ndarray>
m_is_near (dayofyear, year, d_shift, c_year) bool 4MB dask.array<chunksize=(183, 11, 12, 21), meta=np.ndarray>
* c_sample (c_sample) int64 112B 0 1 2 3 4 5 6 7 8 9 10 11 12 13
c_init_time (dayofyear, d_shift, c_year, c_sample) datetime64[ns] 20MB dask.array<chunksize=(92, 12, 11, 7), meta=np.ndarray>
c_lead_time (dayofyear, d_shift, c_year, c_sample) timedelta64[ns] 20MB dask.array<chunksize=(92, 12, 11, 7), meta=np.ndarray>
* c_ensemble_member (c_ensemble_member) int64 88B 0 1 2 3 4 5 6 7 8 9 10
Data variables:
similarities (dayofyear, year, sample, ensemble_member, d_shift, c_year, c_sample, c_ensemble_member) float64 704GB dask.array<chunksize=(1, 1, 14, 11, 23, 21, 14, 11), meta=np.ndarray>
Attributes:
Conventions: CF-1.7
Title: Similarities between pairs of states in the underlying da...
Source: Computed according to the selected similarity measure fro...
Creator: Jonathan Wider (ORCID: 0000-0002-5185-5768)
Institution: Helmholtz Centre for Environmental Research – UFZ
Creation_date: 2026-04-13 20:11:23
License: Creative Commons Attribution 4.0 InternationalThe array is set up in a way that it includes similarities between all “valid” pairs of large-scale atmospheric circulation states, i.e. states whose calendar date is similar enough. The conventions for the dimensions of the similarities dataset are as follows:
wg.ds_similarities.dims
FrozenMappingWarningOnValuesAccess({'dayofyear': 366, 'year': 21, 'sample': 14, 'ensemble_member': 11, 'd_shift': 23, 'c_year': 21, 'c_sample': 14, 'c_ensemble_member': 11})
Assume that we want to compute between a fixed “reference state” and a set of “candidate states”. Then the dimensions are:
dayofyear,year: Day-of-year and year of a reference state.sample: “Sample” of a reference state. This is a dimension we create internally to differentiate between forecasts with the samevalid_time(and therefore sameyear,dayofyear), but different initialization date (init_time).ensemble_member: The reforecast dataset includes multiple ensemble members.ensemble_memberidentifies the ensemble member of the reference state.d_shift: Shift between the day of year of thevalid_timeof the reference state and the corresponding day of year for the candidate state.c_year,c_sampleandc_ensemble_memberhave the same meaning for the candidate state asyear,sampleandensemble_memberfor the reference state.
For example, to extract the similarity between
the large-scale atmospheric state projected for January 1st 2010 by the 5th ensemble member of a ensemble reforecast initialized on December 28th 2009 and
the large-scale atmospheric state of the 0th ensemble member of a reforecast started on January 1st 2010 itself,
one could do the following:
reference_valid_time = np.datetime64("2010-01-01")
reference_init_time = np.datetime64("2009-12-28")
reference_ensemble_member = 5
candidate_valid_time = np.datetime64("2010-01-01")
candidate_init_time = np.datetime64("2010-01-01")
candidate_ensemble_member = 0
wg.ds_similarities.where(
np.logical_and(
wg.ds_similarities.valid_time.load() == reference_valid_time,
wg.ds_similarities.init_time.load() == reference_init_time,
),
drop=True,
).sel(
ensemble_member=reference_ensemble_member,
c_ensemble_member=candidate_ensemble_member,
).where(
np.logical_and(
wg.ds_similarities.c_valid_time.load() == candidate_valid_time,
wg.ds_similarities.c_init_time.load() == candidate_init_time,
),
drop=True,
)["similarities"].load()
<xarray.DataArray 'similarities' (dayofyear: 1, year: 1, sample: 1, d_shift: 1,
c_year: 1, c_sample: 1)> Size: 8B
array([[[[[[-10.66284565]]]]]])
Coordinates: (12/16)
* dayofyear (dayofyear) int64 8B 1
* year (year) int64 8B 2010
valid_time (dayofyear, year) datetime64[ns] 8B 2010-01-01
* sample (sample) int64 8B 11
init_time (dayofyear, year, sample) datetime64[ns] 8B 2009-12-28
lead_time (dayofyear, year, sample) timedelta64[ns] 8B 4 days
... ...
m_is_near (dayofyear, year, d_shift, c_year) bool 1B True
* c_sample (c_sample) int64 8B 12
c_init_time (dayofyear, d_shift, c_year, c_sample) datetime64[ns] 8B ...
c_lead_time (dayofyear, d_shift, c_year, c_sample) timedelta64[ns] 8B ...
c_ensemble_member int64 8B 0
ensemble_member int64 8B 5
Attributes:
long_name: similarity score between forecast and candidate analog
units: 1In practice, this manual extraction of similarities is rarely necessary.
Sampling a time series with the weather generator
Even after a weather generator was configured, several parameters that can be adapted per simulation. These are:
blocksize: The sampling can be configured to generate time series in contiguous blocks ofblocksizedays.probability_model: Specifies how probabilities are to be calculated from the similarities. This object allows restrictions to allowed candidate states, e.g., using an instance of theNormalProbabilityNotLargerThanFixedDateclass forbids sampling analogs beyond a chosen date. With this restriction, unseen-awg can be used in a forecast-like setting.stepper_class: Determines how to evolve the output time. This makes it possible to use different output calendars. E.g., theNoLeapYearStepperclass can be chosen to sample a time series using a no-leap-year calendar.n_steps: How many steps of lengthblocksizethe resulting time series should include.rng: A random number generator that allows making sampling reproducible.initialization: The initial state to start the time series generation from. By default, a random state is chosen among the possible options in the dataset.start_by_taking_analog: Whether the sampling of the time series starts by from an analog of the initial condition.show_progressbar: Track sampling progress.
trajectory = wg.sample_trajectory(
blocksize=5,
probability_model=NormalProbabilityModel(sigma=2.5),
stepper_class=StandardStepper,
n_steps=100,
rng=np.random.default_rng(seed=0),
initialization=InitTimeLeadTimeMemberState(
init_time=np.datetime64("2009-12-28", "ns"),
lead_time=np.timedelta64(4, "D"),
ensemble_member=0,
),
start_by_taking_analog=False,
show_progressbar=False,
)
The resulting time series has a very compact format: it is an xarray.Dataset with the assigned output time out_time as the dimension and the lead_time, init_time, and ensemble_member of the sampled states as variables. That means, we don’t store any meteorological data, only coordinates to look up the corresponding data for a given out_time:
trajectory
<xarray.Dataset> Size: 16kB
Dimensions: (out_time: 500)
Coordinates:
* out_time (out_time) object 4kB 2010-01-01 00:00:00 ... 2011-05-15...
Data variables:
lead_time (out_time) timedelta64[ns] 4kB 4 days 5 days ... 13 days
init_time (out_time) datetime64[ns] 4kB 2009-12-28 ... 2021-05-02
ensemble_member (out_time) int64 4kB 0 0 0 0 0 0 0 0 0 ... 9 9 9 4 4 4 4 4Using a different set up (blocksize 10 days, no-leap-year calendar, no transitions between states closer than 180 days, random initialization):
trajectory_2 = wg.sample_trajectory(
blocksize=10,
probability_model=NormalProbabilityKeepMinimalNDays(sigma=2.5, n_days_min=180),
stepper_class=NoLeapYearStepper,
n_steps=100,
rng=np.random.default_rng(seed=0),
start_by_taking_analog=False,
show_progressbar=False,
)
/gpfs1/schlecker/home/wider/Projects/unseen-awg/src/unseen_awg/weather_generator.py:876: FutureWarning: dropping variables using `drop` is deprecated; use drop_vars.
.drop("datapoint")
trajectory_2
<xarray.Dataset> Size: 32kB
Dimensions: (out_time: 1000)
Coordinates:
* out_time (out_time) object 8kB 2016-11-07 00:00:00 ... 2019-08-03...
Data variables:
lead_time (out_time) timedelta64[ns] 8kB 12 days 13 days ... 22 days
init_time (out_time) datetime64[ns] 8kB 2016-10-26 ... 2016-07-06
ensemble_member (out_time) int64 8kB 7 7 7 7 7 7 7 7 7 ... 8 8 8 8 8 8 8 8The code for the different components is modular, so that creating new classes for different probability distributions and calendars is possible. Similarly, different methods for computing similarities can be implemented.