unseen_awg package

Submodules

unseen_awg.data_classes module

class InitTimeLeadTimeMemberState(init_time: numpy.datetime64, lead_time: numpy.timedelta64, ensemble_member: int)[source]

Bases: object

Parameters:
  • init_time (datetime64)

  • lead_time (timedelta64)

  • ensemble_member (int)

ensemble_member: int
init_time: datetime64
lead_time: timedelta64

unseen_awg.data_utils module

Utilities for data manipulation and Zarr store creation.

This module provides functions for stacking xarray data structures and creating Zarr stores for efficient data storage and retrieval.

create_zarr_store_for_dataarray(zarr_path, shape, chunks, dims, coords, dtype, variable_name)[source]

Create a Zarr store for a single DataArray with metadata only.

Parameters:
  • zarr_path (str) – Path where the Zarr store will be created.

  • shape (tuple[int, ...]) – Shape of the array.

  • chunks (tuple[int, ...]) – Chunk sizes for each dimension.

  • dims (tuple[str, ...]) – Dimension names.

  • coords (dict[str, Any]) – Coordinate arrays and metadata.

  • dtype (np.dtype) – Data type of the array.

  • variable_name (str) – Name of the variable.

Raises:

ValueError – If shape, chunks, and dims have different lengths.

Return type:

None

create_zarr_store_for_dataset(zarr_path, coords, data_vars)[source]

Create a Zarr store for a Dataset with multiple variables.

Parameters:
  • zarr_path (str) – Path where the Zarr store will be created.

  • coords (dict[str, Any]) – Coordinate arrays and metadata.

  • data_vars (dict[str, dict[str, Any]]) – Dictionary mapping variable names to their specifications. Each specification should contain ‘shape’, ‘chunks’, ‘dims’, and ‘dtype’.

Raises:

ValueError – If any variable has mismatched shape, chunks, and dims lengths.

Return type:

None

unseen_awg.grids module

unseen_awg.plotting_utils module

add_contours(ax, da, major_levels, minor_levels, use_contour_labels=True, linewidth_major=1, linewidth_minor=0.5, **plot_kwargs)[source]
Parameters:

da (DataArray)

add_headers(fig, row_headers=None, col_headers=None, cbar_headers=None, row_pad=1, col_pad=5, rotate_row_headers=True, **text_kwargs)[source]
add_label_to_axes(ax, label, ax_xpos=0.05, ax_ypos=0.95, ha='left', va='top', edgecolor='white', **font_kwargs)[source]
contourf_plot_without_frame_with_bounds(ax, da, **plot_kwargs)[source]
Parameters:

da (DataArray)

map_plot_without_frame_with_bounds(ax, da, **plot_kwargs)[source]
Parameters:

da (DataArray)

transition_init_time_plot(ax_joint, ax_top, ax_right, traj, only_jumps=False)[source]
transition_lead_time_plot(ax_joint, ax_top, ax_right, traj, lt_max=44, only_jumps=False, use_log_cnorm_in_joint_plot=True)[source]
transition_valid_time_plot(ax_joint, ax_top, ax_right, traj, only_jumps=False)[source]

unseen_awg.probability_models module

class NormalProbabilityAvoidDirectRepeats(sigma)[source]

Bases: ProbabilityModel

Probability model that avoids sampling the base state.

This model computes probabilities using a Gaussian-like weighting, but sets the probability to zero (unnormalized probability negative infinity) for candidates that are exact repeats of the base state.

Parameters:

sigma (float)

__init__(sigma)[source]

Initialize the Probability Model that Avoids Direct Repeats.

Parameters:

sigma (float) – Standard deviation for the similarity weighting. Must be a positive value.

Raises:

AssertionError – If sigma is not a positive value.

unnormalized_log_probability(similarities, coords_s_next, coords_candidates)[source]

Compute unnormalized log probabilities, excluding direct repeats.

Parameters:
  • similarities (NDArray) – Similarity values for candidate states.

  • coords_s_next (xr.Dataset) – Coordinates of the true next state (unused in this model).

  • coords_candidates (xr.Dataset) – Coordinates of candidate states.

Returns:

Unnormalized log probabilities for each candidate, with direct repeats set to negative infinity.

Return type:

NDArray

class NormalProbabilityKeepMinimalNDays(sigma, n_days_min)[source]

Bases: ProbabilityModel

Probability model that enforces a minimum time separation to the true next state.

This model computes probabilities using a Gaussian-like weighting, but sets the probability to zero (unnormalized probability negative infinity) for candidates that are within a specified number of days from the true next state.

Parameters:
  • sigma (float)

  • n_days_min (int)

__init__(sigma, n_days_min)[source]

Initialize the Probability Model with Minimal Time Separation.

Parameters:
  • sigma (float) – Standard deviation for the similarity weighting. Must be a positive value.

  • n_days_min (int) – Minimum number of days required between candidate and next state.

Raises:

AssertionError – If sigma is not a positive value.

unnormalized_log_probability(similarities, coords_s_next, coords_candidates)[source]

Compute unnormalized log probabilities, excluding candidates too close in time.

Parameters:
  • similarities (NDArray) – Similarity values for candidate states.

  • coords_s_next (xr.Dataset) – Coordinates of the true next state.

  • coords_candidates (xr.Dataset) – Coordinates of candidate states.

Returns:

Unnormalized log probabilities for each candidate, with candidates too close in time set to negative infinity.

Return type:

NDArray

class NormalProbabilityModel(sigma)[source]

Bases: ProbabilityModel

Probability model assuming probabilities follow a normal distribution.

If combined with MSE similarities, this amounts to assuming a normal distribution centered at s_next with a given standard deviation sigma.

Parameters:

sigma (float)

__init__(sigma)[source]

Initialize the Normal Probability Model.

Parameters:

sigma (float) – Standard deviation for the similarity weighting. Must be a positive value.

Raises:

AssertionError – If sigma is not a positive value.

unnormalized_log_probability(similarities, coords_s_next, coords_candidates)[source]

Compute unnormalized log probabilities based on similarities.

Parameters:
  • similarities (NDArray) – Similarity values for candidate states.

  • coords_s_next (xr.Dataset) – Coordinates of the true next state (unused in this model).

  • coords_candidates (xr.Dataset) – Coordinates of candidate states (unused in this model).

Returns:

Unnormalized log probabilities for each candidate.

Return type:

NDArray

class NormalProbabilityModelSeasonality(sigma_amplitude, normalized_sigma_climatology)[source]

Bases: ProbabilityModel

Probability model that incorporates seasonal variability in standard deviation.

This model adjusts the probability computation based on a sigma that varies with the day of the year, reflecting changes in the atmosphere over the year.

Parameters:
  • sigma_amplitude (float)

  • normalized_sigma_climatology (DataArray)

__init__(sigma_amplitude, normalized_sigma_climatology)[source]

Initialize the Seasonally Variable Probability Model.

Parameters:
  • sigma_amplitude (float) – Amplitude factor for the climatological sigma.

  • normalized_sigma_climatology (xr.DataArray) – Normalized sigma values for each day of the year.

Notes

Sigma is split into amplitude and a normalized climatology to allow rescaling.

unnormalized_log_probability(similarities, coords_s_next, coords_candidates)[source]

Compute unnormalized log probabilities with seasonal sigma adjustment.

Parameters:
  • similarities (NDArray) – Similarity values for candidate states.

  • coords_s_next (xr.Dataset) – Coordinates of the true next state, ignored for this model.

  • coords_candidates (xr.Dataset) – Coordinates of candidate states, ignored for this model.

Returns:

Unnormalized log probabilities for each candidate.

Return type:

NDArray

class NormalProbabilityNotLargerThanFixedDate(sigma, date_max)[source]

Bases: ProbabilityModel

Probability model that restricts candidates to a maximum date.

This model computes probabilities using a Gaussian-like weighting, but sets the probability to zero (unnormalized probability negative infinity) for candidates whose date is later than a specified maximum date.

Parameters:
  • sigma (float)

  • date_max (datetime64)

__init__(sigma, date_max)[source]

Initialize the Probability Model with Maximum Date Restriction.

Parameters:
  • sigma (float) – Standard deviation for the similarity weighting. Must be a positive value.

  • date_max (np.datetime64) – Maximum allowed date for candidate states.

Raises:

AssertionError – If sigma is not a positive value, or if no candidates exist before the maximum date.

unnormalized_log_probability(similarities, coords_s_next, coords_candidates)[source]

Compute unnormalized log probabilities, excluding candidates after max date.

Parameters:
  • similarities (NDArray) – Similarity values for candidate states.

  • coords_s_next (xr.Dataset) – Coordinates of the true next state (unused in this model).

  • coords_candidates (xr.Dataset) – Coordinates of candidate states.

Returns:

Unnormalized log probabilities for each candidate, with candidates after the maximum date set to negative infinity.

Return type:

NDArray

Raises:

AssertionError – If no candidates exist before the maximum date.

class ProbabilityModel(*args, **kwargs)[source]

Bases: ABC

Abstract base class for probability models used in analog sampling.

This class defines the interface for probability models that determine the likelihood of selecting an analog based on its similarity with the true next state. In some derived classes, restrictions on the coordinates of the candidate samples are imposed and probabilities are zero-ed if they aren’t fulfilled.

Parameters:
  • args (Any)

  • kwargs (Any)

abstractmethod __init__(*args, **kwargs)[source]

Initialize the probability model.

Parameters:
  • *args (Any) – Positional arguments for model initialization.

  • **kwargs (Any) – Keyword arguments for model initialization.

Return type:

None

sample(rng, size, similarities, coords_s_next, coords_candidates)[source]

Sample analog states using the Gumbel-max trick.

Parameters:
  • rng (np.random.Generator) – Random number generator.

  • size (int or tuple of int) – Number of samples to generate.

  • similarities (NDArray) – Similarity values for candidate states.

  • coords_s_next (xr.Dataset) – Coordinates of the true next state.

  • coords_candidates (xr.Dataset) – Coordinates of candidate states.

Returns:

Indices of sampled analog states.

Return type:

NDArray

abstractmethod unnormalized_log_probability(coords_s_next, coords_candidates, similarities)[source]

Compute unnormalized log probabilities for candidate analogs.

Parameters:
  • coords_s_next (xr.Dataset) – Coordinates of the true next state.

  • coords_candidates (xr.Dataset) – Coordinates of candidate states.

  • similarities (NDArray) – Similarity values for candidate states.

Returns:

Unnormalized log probabilities for each candidate.

Return type:

NDArray

class UniformProbabilityModel[source]

Bases: ProbabilityModel

Probability model that assigns equal probability to all candidates.

This model treats all candidate states as equally likely, regardless of their similarities or coordinates.

__init__()[source]

Initialize the Uniform Probability Model.

No parameters are required, as all candidates are treated equally.

Return type:

None

unnormalized_log_probability(similarities, coords_s_next, coords_candidates)[source]

Compute unnormalized log probabilities.

Returns a constant value of 1 for all candidates, effectively making them uniformly probable.

Parameters:
  • similarities (NDArray) – Similarity values for candidate states (ignored in this model).

  • coords_s_next (xr.Dataset) – Coordinates of the true next state (ignored in this model).

  • coords_candidates (xr.Dataset) – Coordinates of candidate states (ignored in this model).

Returns:

Constant array of ones, representing uniform probabilities.

Return type:

NDArray

gumbel_max_sample(unnormalized_logp, rng, size)[source]

Sample from a categorical distribution using the Gumbel-max trick.

This method provides an efficient way to sample from a categorical distribution by using the properties of the Gumbel distribution.

Parameters:
  • unnormalized_logp (NDArray) – Unnormalized log probabilities for each category.

  • rng (np.random.Generator) – Random number generator.

  • size (int or tuple of int) – Number of samples to generate.

Returns:

Indices of sampled categories.

Return type:

NDArray

Notes

The Gumbel-max trick allows sampling from a categorical distribution without explicitly normalizing probabilities.

unseen_awg.similarity_measures module

Similarity measures for weather generator analog selection.

This module provides similarity measures used to compare reference points with candidate points in the weather generator’s analog selection process.

mse_similarity(ref_point_data, cand_points_data, reduction_axes=(-3, -2, -1))[source]

Calculate similarity based on negative mean squared error.

Computes the similarity between a reference point and candidate points using the negative mean squared error. Lower MSE values result in higher (less negative) similarity scores.

Parameters:
  • ref_point_data (NDArray[np.floating[Any]]) – Reference point data array.

  • cand_points_data (NDArray[np.floating[Any]]) – Candidate points data array with the same shape as ref_point_data or broadcastable to it.

  • reduction_axes (tuple[int, ...], optional) – Axes along which to compute the mean, by default (-3, -2, -1). These should correspond to (latitude, longitude, lag).

Returns:

Similarity scores as negative MSE values. Higher values indicate greater similarity.

Return type:

NDArray[np.floating[Any]]

Notes

The similarity is computed as: similarity = -mean((cand_points_data - ref_point_data)^2)

unseen_awg.simulate_trajectory module

Trajectory simulation module for weather generator within Snakemake workflows.

This module provides functionality to simulate weather trajectories using analog sampling methods with configurable probability models and time stepping approaches.

main(snakemake)[source]
Parameters:

snakemake (Any)

Return type:

None

simulate_trajectory(path_wg, probability_model, sigma, seed, n_days, blocksize, path_trajectory, n_days_min=None)[source]

Simulate and save a time series with a weather generator.

Parameters:
  • path_wg (str) – Path of the directory of the weather generator.

  • probability_model (str) – Name of a probability model to be used when sampling analogs.

  • sigma (float) – Sigma parameter of the proability model used during sampling.

  • seed (int) – Seed for the random sampling of the time series.

  • n_days (int) – Number of days in the final time series. Gets rounded to conform with the selected blocksize.

  • blocksize (int) – Size of the contiguous blocks of states during the sampling of the time series.

  • path_trajectory (str) – Path to store the trajectory in.

  • n_days_min (int | None, optional) – To be used in combination with the “KeepMinimalNDays” probability model, avoids sampling states closer than n_days_min from the true sample. By default None.

Raises:

ValueError – If an invalid name of a probability model was specified.

Return type:

None

unseen_awg.time_steppers module

Time stepping strategies for weather generation simulations.

This module provides abstract and concrete implementations of time steppers that control how time progresses during weather generation. Different steppers handle various calendar systems and time representations.

class FractionalYearStepper(blocksize, initial_time, reference_time, tropical_year=365.2422)[source]

Bases: TimeStepper

Time stepper that works with fractional year representations.

This stepper advances time using fractional year values.

Parameters:
  • blocksize (int) – Number of days to advance in each time step.

  • initial_time (cftime.DatetimeGregorian) – Starting time for the simulation.

  • reference_time (cftime.DatetimeGregorian) – Reference time used for fractional year calculations.

  • tropical_year (float, optional) – Length of tropical year in days, by default 365.2422.

blocksize

Number of days to advance in each time step.

Type:

int

initial_time

Starting time for the simulation.

Type:

cftime.DatetimeGregorian

reference_time

Reference time used for fractional year calculations.

Type:

cftime.DatetimeGregorian

tropical_year

Length of tropical year in days.

Type:

float

daily_increment

Fractional year increment per day.

Type:

float

current_year_fraction

Current position as fractional year.

Type:

float

__init__(blocksize, initial_time, reference_time, tropical_year=365.2422)[source]

Initialize the fractional year stepper.

Parameters:
  • blocksize (int) – Number of days to advance in each time step.

  • initial_time (cftime.DatetimeGregorian) – Starting time for the simulation.

  • reference_time (cftime.DatetimeGregorian) – Reference time used for fractional year calculations.

  • tropical_year (float, optional) – Length of tropical year in days, by default 365.2422.

Return type:

None

__next__()[source]

Advance to the next time step.

Returns:

Current fractional year before advancing.

Return type:

float

class NoLeapYearStepper(init_year, init_month, init_day, blocksize, tropical_year=365.2422)[source]

Bases: TimeStepper

Time stepper using a no-leap-year calendar system.

This stepper advances time using a calendar without leap years, ensuring consistent 365-day years. The fractional year calculation wraps around at 365 days to maintain annual periodicity.

Parameters:
  • init_year (int) – Initial year for the simulation.

  • init_month (int) – Initial month for the simulation.

  • init_day (int) – Initial day for the simulation.

  • blocksize (int) – Number of days to advance in each time step.

  • tropical_year (float, optional) – Length of tropical year in days, by default 365.2422.

blocksize

Number of days to advance in each time step.

Type:

int

initial_time

Starting time for the simulation.

Type:

cftime.DatetimeNoLeap

ref_time

Reference time (2000-01-01) for fractional year calculations.

Type:

cftime.DatetimeNoLeap

tropical_year

Length of tropical year in days.

Type:

float

daily_increment

Fractional year increment per day.

Type:

float

initial_year_fraction

Initial position as fractional year.

Type:

float

current_year_fraction

Current position as fractional year.

Type:

float

current_time

Current datetime.

Type:

cftime.DatetimeNoLeap

Notes

The fractional year calculation uses modulo 365 to ensure proper wrapping for the no-leap-year calendar system.

__init__(init_year, init_month, init_day, blocksize, tropical_year=365.2422)[source]

Initialize the no-leap-year stepper.

Parameters:
  • init_year (int) – Initial year for the simulation.

  • init_month (int) – Initial month for the simulation.

  • init_day (int) – Initial day for the simulation.

  • blocksize (int) – Number of days to advance in each time step.

  • tropical_year (float, optional) – Length of tropical year in days, by default 365.2422.

Return type:

None

__next__()[source]

Advance to the next time step.

Returns:

Tuple containing current datetime and current fractional year before advancing.

Return type:

tuple[cftime.DatetimeNoLeap, float]

Notes

The fractional year is calculated with modulo 365 to ensure proper wrapping for the no-leap-year calendar.

class StandardStepper(init_year, init_month, init_day, blocksize, tropical_year=365.2422)[source]

Bases: TimeStepper

Standard time stepper using Gregorian calendar with leap years.

This stepper advances time using standard datetime objects and provides both datetime and fractional year representations.

Parameters:
  • init_year (int) – Initial year for the simulation.

  • init_month (int) – Initial month for the simulation.

  • init_day (int) – Initial day for the simulation.

  • blocksize (int) – Number of days to advance in each time step.

  • tropical_year (float, optional) – Length of tropical year in days, by default 365.2422.

blocksize

Number of days to advance in each time step.

Type:

int

initial_time

Starting time for the simulation.

Type:

cftime.DatetimeGregorian

ref_time

Reference time (2000-01-01) for fractional year calculations.

Type:

cftime.DatetimeGregorian

tropical_year

Length of tropical year in days.

Type:

float

daily_increment

Fractional year increment per day.

Type:

float

initial_year_fraction

Initial position as fractional year.

Type:

float

current_year_fraction

Current position as fractional year.

Type:

float

current_time

Current datetime.

Type:

cftime.DatetimeGregorian

Notes

Uses cftime.DatetimeGregorian instead of standard datetime to avoid issues with time delta calculations in time conversion operations.

__init__(init_year, init_month, init_day, blocksize, tropical_year=365.2422)[source]

Initialize the standard stepper.

Parameters:
  • init_year (int) – Initial year for the simulation.

  • init_month (int) – Initial month for the simulation.

  • init_day (int) – Initial day for the simulation.

  • blocksize (int) – Number of days to advance in each time step.

  • tropical_year (float, optional) – Length of tropical year in days, by default 365.2422.

Return type:

None

__next__()[source]

Advance to the next time step.

Returns:

Tuple containing current datetime and current fractional year before advancing.

Return type:

tuple[cftime.DatetimeGregorian, float]

class TimeStepper(blocksize)[source]

Bases: ABC

Abstract base class for time stepping strategies in weather generation.

This class defines a template for different time stepping methods, allowing flexible time progression for weather simulation.

Parameters:

blocksize (int) – Number of days to advance in each time step.

blocksize

Number of days to advance in each time step.

Type:

int

__init__(blocksize)[source]

Initialize the time stepper with a specified block size.

Parameters:

blocksize (int) – Number of days to advance in each time step.

Return type:

None

__iter__()[source]

Make the time stepper iterable.

Returns:

The time stepper itself.

Return type:

Iterator[Any]

abstractmethod __next__()[source]

Abstract method to advance to the next time step.

Returns:

The next time step value(s).

Return type:

Any

Raises:

StopIteration – When no more time steps are available.

unseen_awg.timestep_utils module

convert_to_cftime_gregorian(data)[source]

Convert data to use cftime with Gregorian calendar.

Parameters:

data (xr.Dataset | xr.DataArray) – Input data to convert.

Returns:

Data with cftime datetimes and Gregorian calendar.

Return type:

xr.Dataset | xr.DataArray

dayofyear_year_to_datetime64(dayofyear, year)[source]

Create numpy datetime64 from year and dayofyear.

Parameters:
  • dayofyear (int) – Day of the year (1-366).

  • year (int) – Year value.

Returns:

Corresponding datetime64 object.

Return type:

np.datetime64

dayofyear_year_to_datetime64_naive(dayofyear, year)[source]

Create numpy datetime64 from year and dayofyear, setting NaTs when dayofyear is out of range.

Parameters:
  • dayofyear (int) – Day of the year (1-366).

  • year (int) – Year.

Returns:

Corresponding datetime64 object, or NaT if invalid.

Return type:

np.datetime64

is_in_window_from_time(base_dates, other_dates, window_size, ref_time, tropical_year=365.2422)[source]

Check if dates are within a temporal window from base dates.

Parameters:
  • base_dates (xr.DataArray) – Base dates for comparison.

  • other_dates (xr.DataArray) – Dates to check.

  • window_size (int) – Size of the window in days.

  • ref_time (np.datetime64) – Reference time for year fraction calculation.

  • tropical_year (float, optional) – Length of tropical year in days, by default 365.2422.

Returns:

Boolean array indicating which dates fall within the window.

Return type:

xr.DataArray

is_in_window_from_year_fraction(base_year_fractions, other_dates, window_size, ref_time, tropical_year=365.2422)[source]

Check if dates are within a temporal window from base “fraction of years”.

Parameters:
  • base_year_fractions (xr.DataArray | float) – Base year fractions for comparison.

  • other_dates (xr.DataArray) – Dates to check against the window.

  • window_size (int) – Size of the window in days.

  • ref_time (np.datetime64) – Reference time for year fraction calculation.

  • tropical_year (float, optional) – Length of tropical year in days, by default 365.2422.

Returns:

Boolean DataArray indicating which dates fall within the window.

Return type:

xr.DataArray

time_to_year_fraction_cftime(time, ref_time, tropical_year=365.2422)[source]

Convert cftime datetimes to fraction of years relative to a reference time.

Parameters:
  • time (xr.DataArray | xr.Dataset | cftime.datetime) – The time to convert to fractional year. Data should be of cftime.datetime type.

  • ref_time (xr.DataArray | xr.Dataset | cftime.datetime) – The reference time for the conversion. Data should be of cftime.datetime type.

  • tropical_year (float, optional) – The length of a tropical year in days, by default 365.2422.

Returns:

The fractional year representation of the input time.

Return type:

float

time_to_year_fraction_np_datetime64(time, ref_time, tropical_year=365.2422)[source]

Convert datetime(s) to fraction of years relative to a reference time.

Parameters:
  • time (xr.DataArray | xr.Dataset | np.datetime64) – The time to convert to fractional year. Data should be of numpy.datetime64 type.

  • ref_time (xr.DataArray | xr.Dataset | np.datetime64) – The reference time for the conversion. Data should be of numpy.datetime64 type.

  • tropical_year (float, optional) – The length of a tropical year in days, by default 365.2422.

Returns:

The fractional year representation of the input time.

Return type:

float

unseen_awg.tune_wg_by_forecasting module

Module for tuning weather generator parameters by optimizing forecast accuracy.

analog_ensemble_forecast(wg, probability_model, initial_datapoint, lead_time, n_members, blocksize, rng, stepper_class=<class 'unseen_awg.time_steppers.StandardStepper'>)[source]

Generate an ensemble forecast using analog sampling.

Parameters:
  • wg (WeatherGenerator) – Weather generator instance.

  • probability_model (ProbabilityModel) – Probability model for the weather generator’s analog selection.

  • initial_datapoint (InitTimeLeadTimeMemberState) – Initial state for the forecast.

  • lead_time (np.timedelta64) – Forecast lead time.

  • n_members (int) – Number of ensemble members to be created in the forecast.

  • blocksize (int) – Block size for weather generator sampling.

  • rng (np.random.Generator) – Random number generator.

  • stepper_class (TimeStepper, optional) – Time stepper class, by default StandardStepper.

Returns:

Trajectories of ensemble forecast.

Return type:

xr.Dataset

crps_climatology(ground_truth, ds_t0, forecast_lead_time, ds_full, var)[source]

Calculate CRPS (Continuous Ranked Probability Score) for a climatology forecast.

Parameters:
  • ground_truth (xr.Dataset) – Ground truth observations.

  • ds_t0 (xr.Dataset) – Initial dataset at time t0.

  • forecast_lead_time (np.timedelta64) – Lead time for the forecast.

  • ds_full (xr.Dataset) – Full dataset for climatology calculation.

  • var (str) – Variable name to compute CRPS for.

Returns:

CRPS values for the climatology forecast.

Return type:

xr.Dataset

crps_persistence(ground_truth, ds_t0, forecast_lead_time, var)[source]

Calculate CRPS (Continuous Ranked Probability Score) for a persistence forecast.

Parameters:
  • ground_truth (xr.Dataset) – Ground truth observations.

  • ds_t0 (xr.Dataset) – Initial dataset at time t0.

  • forecast_lead_time (np.timedelta64) – Lead time for the forecast.

  • var (str) – Variable name to compute CRPS for.

Returns:

CRPS values for the persistence forecast.

Return type:

xr.Dataset

crps_wg_forecasts(ds, wg, n_members, probability_model, forecast_init_time, forecast_lead_time, blocksize, rng, stepper_class=<class 'unseen_awg.time_steppers.StandardStepper'>)[source]

Compute CRPS (Continuous Ranked Probability Score) for analog forecasts.

Parameters:
  • ds (xr.Dataset) – Dataset the weather generator samples from.

  • wg (WeatherGenerator) – Weather generator instance.

  • n_members (int) – Number of members in the analog forecast ensemble.

  • probability_model (ProbabilityModel) – Probability model for analog selection.

  • forecast_init_time (np.datetime64) – Initial time for the analog forecast.

  • forecast_lead_time (np.timedelta64) – Lead time for the analog forecast.

  • blocksize (int) – Block size (days) for weather generator sampling.

  • rng (np.random.Generator) – Random number generator.

  • stepper_class (TimeStepper, optional) – Time stepper class, by default StandardStepper

Returns:

CRPS for the weather generator’s forecasts.

Return type:

xr.Dataset

eval_analogue_forecast_skill(sigma, type_probability_model, wg, sampled_timesteps, ds, var, n_analogs, forecast_lead_time, blocksize, rng, stepper_class=<class 'unseen_awg.time_steppers.StandardStepper'>)[source]

Evaluate the skill of analog forecasts.

Parameters:
  • sigma (float) – Standard deviation parameter for the probability model.

  • type_probability_model (str) – Type of probability model to use.

  • wg (WeatherGenerator) – Weather generator instance.

  • sampled_timesteps (xr.DataArray) – Timesteps to initialize forecasts from.

  • ds (xr.Dataset) – Dataset the weather generator samples from.

  • var (str) – Name of variable to evaluate.

  • n_analogs (int) – Number of analog forecasts to create.

  • forecast_lead_time (np.timedelta64) – Forecast lead time.

  • blocksize (int) – Block size for sampling.

  • rng (np.random.Generator) – Random number generator.

  • stepper_class (TimeStepper, optional) – Time stepper class, by default StandardStepper.

Returns:

Mean CRPS value for the analog forecasts.

Return type:

NDArray

eval_climatology_persistence_forecast(sampled_timesteps, ds, ds_rechunk, var, forecast_lead_time)[source]

Evaluate climatology and persistence forecasts.

Parameters:
  • sampled_timesteps (xr.DataArray) – Timesteps to initialize forecasts from.

  • ds (xr.Dataset) – Dataset the weather generator samples from.

  • ds_rechunk (xr.Dataset) – Rechunked dataset to allow faster computation of climatology forecasts.

  • var (str) – Name of variable to evaluate.

  • forecast_lead_time (np.timedelta64) – Forecast lead time.

Returns:

Tuple of (climatology_CRPS, persistence_CRPS) arrays.

Return type:

Tuple[NDArray, NDArray]

eval_wg(seed, forecast_lead_time_days, n_analogs, sigma_min, sigma_max, use_log_scale, n_sampled_inits, min_timedelta_from_dataset_start, probability_model_type, var, blocksize, n_optuna_trials, path_ds, path_ds_rechunk, path_wg, path_study, ds_type)[source]

Evaluate weather generator in terms of its accuracy as ensemble forecast model.

This function also runs climatology and persistance baseline forecasts and conducts an optuna study to optimize parameters of the weather generator and the trajectory sampling.

Parameters:
  • seed (int) – Seed used in sampling forecast initializations

  • forecast_lead_time_days (int) – Lead time of the analog forecasts in days.

  • n_analogs (int) – Number of ensemble members used in the analog ensemble forecast.

  • sigma_min (float) – Lower bound of range of possible values of weather generator’s sigma parameter.

  • sigma_max (float) – Upper bound of range of possible values of weather generator’s sigma parameter.

  • use_log_scale (bool) – Use a logarithmic scaling during the optimization of the parameter sigma.

  • n_sampled_inits (int) – Number of forecast initializations analog forecasts should be started from. Performance is averaged across initialization times.

  • min_timedelta_from_dataset_start (int) – In some settings, the weather generator only uses analogs from timesteps before the forecast initialization. To avoid having few/zero analogs available, choose initialization times only min_timedelta_from_dataset_start days after the start of the dataset.

  • probability_model_type (str) – A probability model to be used when sampling analogs.

  • var (str) – Name of the variable to be evaluated.

  • blocksize (int) – Blocksize of the simulated time series.

  • n_optuna_trials (int) – Number of optimization steps in optuna.

  • path_ds (str) – Path under which the dataset used during the evaluation is stored.

  • path_ds_rechunk (str) – Path for dataset similar to the path_ds one but use chunking along longitudes.

  • path_wg (str) – Path of the directory of the weather generator.

  • path_study (str) – Path to write the optuna study to.

  • ds_type (Literal[&quot;era5&quot;, &quot;reforecasts&quot;]) – Dataset used (ERA5 reanalysis or reforecasts).

Raises:

ValueError – If the weather generator contains negative values in its lag dimension.

Return type:

None

get_gt(sel_valid_time, ds)[source]

Get ground truth data for a given valid time.

Parameters:
  • sel_valid_time (np.datetime64) – The selected valid time.

  • ds (xr.Dataset) – The dataset containing the data.

Returns:

Dataset with the ground truth data for the selected time.

Return type:

xr.Dataset

get_gt_coords(sel_valid_time, ds)[source]

For a given valid_time, get (ensemble_member, lead_time, init_time) that allow extracting the ground truth sample from the dataset.

Parameters:
  • sel_valid_time (np.datetime64) – The selected valid time.

  • ds (xr.Dataset) – The dataset the weather generator is trained on.

Returns:

Dictionary with ensemble_member, init_time, and lead_time keys.

Return type:

dict[str, Any]

Raises:

NotImplementedError – If the dataset type is not supported.

get_n_valid_forecast_start_points(n, wg, rng, forecast_lead_time=numpy.timedelta64(0, 'D'), min_timedelta_from_dataset_start=numpy.timedelta64(366, 'D'), with_replacement=True, balance_months=True)[source]

Select n valid ground truth data points to start forecasts from.

Parameters:
  • n (int) – Number of start points to select.

  • wg (WeatherGenerator) – Weather generator instance.

  • rng (np.random.Generator) – Random number generator used in the selection.

  • forecast_lead_time (np.timedelta64, optional) – Forecast lead time, by default np.timedelta64(0, “D”)

  • min_timedelta_from_dataset_start (np.timedelta64, optional) – Samples are only considered as valid if min_timedelta_from_dataset_start from the start of the dataset, by default np.timedelta64(366, “D”). This is because the analog weather generator relies only on analogs with valid_time smaller than the forecast start time. Therefore, on the end of the dataset, the set of analogs would be very small.

  • with_replacement (bool, optional) – Whether to sample with replacement, by default True.

  • balance_months (bool, optional) – Whether to balance months in sampling, by default True.

Returns:

DataArray of valid forecast start points.

Return type:

xr.DataArray

main(snakemake)[source]
Parameters:

snakemake (Any)

Return type:

None

persistence_forecast(ds_t0, lead_time)[source]

Create persistence a forecast.

Parameters:
  • ds_t0 (xr.Dataset) – Initial dataset at time t0.

  • lead_time (np.timedelta64) – Lead time to add to the valid time.

Returns:

Persistence forecast, i.e. initial data with modified valid_time coordinate.

Return type:

xr.Dataset

unseen_awg.utils module

Utility functions for unseen-awg.

apply_similarity_metric(ds_reference, ds_candidate, similarity_func, variable_name='geopotential_height', ref_core_dims=None, cand_core_dims=None, output_core_dims=None, reduction_axes_for_numpy=(-3, -2, -1), dask_options=None, **similarity_func_kwargs)[source]

Apply a similarity metric between reference point and candidates.

Parameters:
  • ds_reference (xr.Dataset) – Reference dataset, expanded to match dimensions of candidates.

  • ds_candidate (xr.Dataset) – Candidate for the candidate states.

  • similarity_func – The similarity function to apply.

  • variable_name (str, optional) – Name of the variable to compare, by default “geopotential_height”.

  • ref_core_dims (list of str, optional) – Core dimensions (passed to xr.apply_ufunc) of the reference dataset, by default [“latitude”, “longitude”, “lag”].

  • cand_core_dims (list of str, optional) – Core dimensions (passed to xr.apply_ufunc) of the candidate dataset, by default [“c_year”, “c_sample”, “c_ensemble_member”, “latitude”, “longitude”, “lag”].

  • output_core_dims (list of str, optional) – Output core dimensions, by default [“c_year”, “c_sample”, “c_ensemble_member”].

  • reduction_axes_for_numpy (tuple of int, optional) – Axes to reduce when applying numpy function, by default (-3, -2, -1).

  • dask_options (dict, optional) – Dask options for apply_ufunc, by default None.

  • **similarity_func_kwargs – Additional keyword arguments for the similarity function.

Returns:

Array of computed similarities.

Return type:

xr.DataArray

get_k_random_indices(arr, mask, k, rng)[source]

Get k random indices in an array after applying masking.

Parameters:
  • arr (NDArray) – Array of values.

  • mask (NDArray) – Boolean mask indicating valid elements.

  • k (int) – Number of random indices to return.

  • rng (np.random.Generator) – Random number generator.

Returns:

Array of k random indices.

Return type:

NDArray

get_k_smallest_indices(arr, mask, k)[source]

Get indices of k smallest values in an array after applying masking.

Parameters:
  • arr (NDArray) – Array of values.

  • mask (NDArray) – Boolean mask indicating valid elements.

  • k (int) – Number of smallest indices to return.

Returns:

Indices of k smallest values.

Return type:

NDArray

get_map_valid_n_day_transitions(da, n)[source]

Map valid n-day transitions in a dataset.

This function is used when sampling trajectories. In particular, it allows identifying the “next state” for each base state - and it allows identifying which states are actually valid samples to be included in the sampled time series (both the state and the next state are in data set).

Parameters:
  • da (xr.Dataset) – Input dataset containing valid_time and sample dimensions.

  • n (int) – Number of days to look ahead.

Returns:

Dataset with next sample, year, and dayofyear information.

Return type:

xr.Dataset

grids_are_identical_subset(source_ds, target_ds, coord_names=['latitude', 'longitude'])[source]

Check if target is a simple subset with IDENTICAL spacing

Parameters:
  • source_ds (Dataset | DataArray)

  • target_ds (Dataset | DataArray)

is_no_jump(traj)[source]

Check which steps in a trajectory are “jumps”.

For each pair of consecutive states in the trajectory, test whether the trajectory has a “jump”, i.e. the samples aren’t actually consecutive in the original dataset. For a reforecast dataset, not having a jump means having the same init_time and ensemble_member, while the second state in each pair has a lead_time 1 day larger than the first state.

Parameters:

traj (xr.Dataset) – Sampled trajectory data.

Returns:

Boolean DataArray indicating where there are no jumps.

Return type:

NDArray

unseen_awg.weather_generator module

Weather generator module for analog-based weather simulation.

This module provides the core WeatherGenerator class that implements analog-based weather generation using similarity measures and probability models to sample realistic weather trajectories from historical data.

class WeatherGenerator(params)[source]

Bases: object

Analog-based weather generator for creating synthetic weather trajectories.

This class implements an analog-based approach to weather generation, where weather states are sampled from historical data while assuring that successive states either follow each other in the historical dataset or are analogs of the true successor. The generator uses configurable similarity measures and probability models to create realistic weather sequences, additional parameters can be specified when sampling time series.

Parameters:

params (dict[str, Any]) –

Configuration parameters containing:

  • weather_generator.window_sizeint

    Half-window size within which states are considered as potential analogs.

  • weather_generator.varstr

    Name of variable to use for similarity calculations.

  • weather_generator.similaritystr

    Name of similarity function to use.

  • weather_generator.use_precomputed_similaritiesbool

    Whether to use precomputed similarities or not. If True, similarities are precomputed when a WeatherGenerator instance is initialized. Otherwise, a lazy array is set up and similarities are computed on-the-fly during the sampling process. This slows down the sampling process.

  • weather_generator.n_samplesint

    Number of samples to use from dataset. By “sample” we denote datapoints that possess the same valid_time (but a different init_time). Providing a low n_samples allows restricting the number of included states.

  • dir_wgstr

    Directory path for weather generator outputs.

  • zarr_year_dayofyearstr

    Path to zarr store containing the preprocessed input dataset.

window_size

Half-window size within which states are considered as potential analogs.

Type:

int

var

Name of variable to use for similarity calculations.

Type:

str

similarity_function

Function used to compute similarities between states.

Type:

callable

path_wg

Path to weather generator working directory.

Type:

str

path_dataset

Path to input dataset.

Type:

str

use_precomputed_similarities

Flag indicating whether to use precomputed similarities.

Type:

bool

ds_similarities

Dataset containing results of similarity computations. Is initialized as lazy dataset and computed during initialization if use_precomputed_similarities is True.

Type:

xr.Dataset

get_analog_data(queries, use_candidate_coords=False)[source]

Retrieve analog weather data for specified query coordinates.

Extracts weather data from the dataset at the coordinates specified in the query array, either using the query coordinates directly or the candidate coordinates.

Parameters:
  • queries (xr.DataArray) – Query array containing coordinate information.

  • use_candidate_coords (bool, optional) – Whether to pick the sample according to provided coordinates of a candidate state or of a base state. by default False.

Returns:

Weather data at the specified coordinates.

Return type:

xr.Dataset

get_initial_state(initialization, map_n_step_transition, blocksize, rng)[source]

Get initial state to start sampling a trajectory from.

Determines the starting state for weather generation, either from a specified initialization or by random selection from set of valid states.

Parameters:
  • initialization (InitTimeLeadTimeMemberState | None) – Specific initialization state, or None for random selection.

  • map_n_step_transition (xr.Dataset) – Mapping of allowed n-day transitions between states.

  • blocksize (int) – Number of days in each time block.

  • rng (np.random.Generator) – Random number generator for random initialization.

Returns:

Initial state for trajectory generation.

Return type:

xr.Dataset

Raises:
  • ValueError – If the specified initialization is invalid.

  • AssertionError – If the initialization state is not found in valid transitions.

get_similarities_k_closest_neighbors(states, k, minimum_timedelta_days=None, dim_states=None)[source]

Get the k closest neighbors based on similarity measures.

Finds the k most similar historical states to the given query states based on (precomputed) similarity measures.

Parameters:
  • states (xr.DataArray) – Query states to find neighbors for.

  • k (int) – Number of closest neighbors to return.

  • minimum_timedelta_days (int | None, optional) – Minimum time separation in days between query and candidate states, that allows excluding analogs that are temporally close to the base state if this is undesired. By default None, i.e. no restriction.

  • dim_states (str | None, optional) – Dimension name for states, by default None.

Returns:

Dataset containing the k closest neighbor states and their similarities.

Return type:

xr.Dataset

get_similarities_k_random_neighbors(states, k, rng, minimum_timedelta_days=None, dim_states=None)[source]

Get k randomly selected neighbors from valid candidates.

Randomly selects k historical states from valid candidates that meet the specified temporal constraints.

Parameters:
  • states (xr.DataArray) – Query states to find neighbors for.

  • k (int) – Number of random neighbors to return.

  • rng (np.random.Generator) – Random number generator for sampling.

  • minimum_timedelta_days (int | None, optional) – Minimum time separation in days between query and candidate states, by default None.

  • dim_states (str | None, optional) – Dimension name for states, by default None.

Returns:

Dataset containing k randomly selected neighbor states.

Return type:

xr.Dataset

classmethod load(wg_path)[source]

Load a WeatherGenerator instance from saved configuration.

Parameters:

wg_path (str) – Path to directory containing saved weather generator configuration.

Returns:

Loaded weather generator instance.

Return type:

WeatherGenerator

plot_k_nearest_and_random_neighbors(state, k, rng, minimum_timedelta_days=None, vmin=450, vmax=600, minor_spacing_contours=10, major_spacing_contours=30)[source]

Create comparison plot of nearest neighbors vs random neighbors.

Generates a visualization comparing the k nearest neighbors and k random neighbors for a given weather state, showing the base state, and the random and nearest neighbors (analogs) among the candidates and side by side.

Parameters:
  • state (xr.Dataset) – Reference weather state to find neighbors for.

  • k (int) – Number of neighbors to display.

  • rng (np.random.Generator) – Random number generator for random neighbor selection.

  • minimum_timedelta_days (int | None, optional) – Minimum time separation constraint, by default None.

  • vmin (float, optional) – Minimum value for color scale, by default 450.

  • vmax (float, optional) – Maximum value for color scale, by default 600.

  • minor_spacing_contours (float, optional) – Spacing for minor contour lines, by default 10.

  • major_spacing_contours (float, optional) – Spacing for major contour lines, by default 30.

Returns:

Figure containing the comparison plots.

Return type:

matplotlib.figure.Figure

sample_trajectory(blocksize, probability_model, stepper_class, n_steps, rng, initialization=None, start_by_taking_analog=False, show_progressbar=False)[source]

Sample a synthetic weather trajectory using the analog method.

Generates a weather trajectory by iteratively sampling analog states from historical data. The sampling alternates between following a historical trajectory and sampling analogs of the true successor states - so that in effect blocks of size blocksize are sampled while for the transition between blocks close analogs of the “true” state that would follow each block are chosen.

Parameters:
  • blocksize (int) – Number of days to sample contiguously from the same historical trajectory.

  • probability_model (ProbabilityModel) – Model defining sampling probabilities given similarities between base states and candidate states.

  • stepper_class (type[TimeStepper]) – Class for managing the output time assigned to each sample in the resulting trajectory. This is used as a means for supporting different calendars in sampled datasets.

  • n_steps (int) – Number of sampling steps to perform, not necessarily equal to the length of the sampled series in days.

  • rng (np.random.Generator) – Random number generator for sampling.

  • initialization (InitTimeLeadTimeMemberState | None, optional) – Initial state specification, by default None (random initialization).

  • start_by_taking_analog (bool, optional) – Whether to start by taking an analog of the initial state, by default False.

  • show_progressbar (bool, optional) – Whether to display progress bar, by default False.

Returns:

Generated weather trajectory with time series of sampled states.

Return type:

xr.Dataset

sampling_step(next_state, next_year_fraction, map_n_step_transition, probability_model, rng)[source]

Perform analog sampling step to select next weather state.

Samples an analog state from historical data based on similarity to the true next state and according to distribution and constraints defined by the probability_model.

Parameters:
  • next_state (xr.Dataset) – True next state in underlying historic dataset.

  • next_year_fraction (float) – Year fraction of next sample. Used to define temporal similarity rather than an actual calender date to simplify calendar handling.

  • map_n_step_transition (xr.Dataset) – Mapping of allowed n-day transitions between states.

  • probability_model (ProbabilityModel) – Model defining sampling probabilities given similarities.

  • rng (np.random.Generator) – Random number generator for sampling.

Returns:

Sampled analog state for the next time step.

Return type:

xr.Dataset

time_evolution_step(trajectory, current_block_start_state, map_n_step_transition, stepper, blocksize)[source]

Perform one time evolution step in trajectory generation.

Advances the trajectory by one block of time steps, following the evolution in the underlying historical data set starting from current_state.

Parameters:
  • trajectory (list[xr.Dataset]) – List of trajectory states to append new states to.

  • current_block_start_state (xr.Dataset) – State to start current state from.

  • map_n_step_transition (xr.Dataset) – Mapping of allowed n-day transitions between states.

  • stepper (TimeStepper) – Time stepper instance for managing temporal progression.

  • blocksize (int) – Number of days in each time block.

Returns:

Next state and corresponding year fraction.

Return type:

tuple[xr.Dataset, float]

main(snakemake)[source]

Main function for weather generator execution in Snakemake workflow.

Initializes and runs the weather generator with parameters from Snakemake, handling logging and parameter management for the workflow execution.

Parameters:

snakemake (Any) – Snakemake object containing input/output paths, parameters, and logging configuration.

Return type:

None

setup_lazy_similarity_dataset(ds_year_dayofyear_format, window_size, ref_time=numpy.datetime64('2000-01-01T00:00:00.000000000'))[source]

Set up a lazy dataset to store similarities computed in weather generator in.

Creates a dataset structure for computing similarities between weather states within a specified time window. The dataset includes coordinates for reference states and candidate states with time shifts.

The dataset has dimensions that identify the base sample (dayofyear, year, sample, ensemble member) and additional dimensions that identify the candidate (d_shift, c_year, c_sample, c_ensemble_member). The valid_time of the candidate state can be computed from c_year, dayofyear, and d_shift.

Parameters:
  • ds_year_dayofyear_format (xr.Dataset) – Input dataset in year-dayofyear format containing weather data.

  • window_size (int) – Size of the time window (in days) for similarity computations.

  • ref_time (np.datetime64, optional) – Reference time for temporal calculations, by default np.datetime64(“2000-01-01”, “ns”).

Returns:

Lazy dataset with similarity computation structure including coordinates for reference and candidate states.

Return type:

xr.Dataset