unseen_awg package
Submodules
unseen_awg.data_classes module
unseen_awg.data_utils module
Utilities for data manipulation and Zarr store creation.
This module provides functions for stacking xarray data structures and creating Zarr stores for efficient data storage and retrieval.
- create_zarr_store_for_dataarray(zarr_path, shape, chunks, dims, coords, dtype, variable_name)[source]
Create a Zarr store for a single DataArray with metadata only.
- Parameters:
zarr_path (str) – Path where the Zarr store will be created.
shape (tuple[int, ...]) – Shape of the array.
chunks (tuple[int, ...]) – Chunk sizes for each dimension.
dims (tuple[str, ...]) – Dimension names.
coords (dict[str, Any]) – Coordinate arrays and metadata.
dtype (np.dtype) – Data type of the array.
variable_name (str) – Name of the variable.
- Raises:
ValueError – If shape, chunks, and dims have different lengths.
- Return type:
None
- create_zarr_store_for_dataset(zarr_path, coords, data_vars)[source]
Create a Zarr store for a Dataset with multiple variables.
- Parameters:
zarr_path (str) – Path where the Zarr store will be created.
coords (dict[str, Any]) – Coordinate arrays and metadata.
data_vars (dict[str, dict[str, Any]]) – Dictionary mapping variable names to their specifications. Each specification should contain ‘shape’, ‘chunks’, ‘dims’, and ‘dtype’.
- Raises:
ValueError – If any variable has mismatched shape, chunks, and dims lengths.
- Return type:
None
unseen_awg.grids module
unseen_awg.plotting_utils module
- add_contours(ax, da, major_levels, minor_levels, use_contour_labels=True, linewidth_major=1, linewidth_minor=0.5, **plot_kwargs)[source]
- Parameters:
da (DataArray)
- add_headers(fig, row_headers=None, col_headers=None, cbar_headers=None, row_pad=1, col_pad=5, rotate_row_headers=True, **text_kwargs)[source]
- add_label_to_axes(ax, label, ax_xpos=0.05, ax_ypos=0.95, ha='left', va='top', edgecolor='white', **font_kwargs)[source]
unseen_awg.probability_models module
- class NormalProbabilityAvoidDirectRepeats(sigma)[source]
Bases:
ProbabilityModelProbability model that avoids sampling the base state.
This model computes probabilities using a Gaussian-like weighting, but sets the probability to zero (unnormalized probability negative infinity) for candidates that are exact repeats of the base state.
- Parameters:
sigma (float)
- __init__(sigma)[source]
Initialize the Probability Model that Avoids Direct Repeats.
- Parameters:
sigma (float) – Standard deviation for the similarity weighting. Must be a positive value.
- Raises:
AssertionError – If sigma is not a positive value.
- unnormalized_log_probability(similarities, coords_s_next, coords_candidates)[source]
Compute unnormalized log probabilities, excluding direct repeats.
- Parameters:
similarities (NDArray) – Similarity values for candidate states.
coords_s_next (xr.Dataset) – Coordinates of the true next state (unused in this model).
coords_candidates (xr.Dataset) – Coordinates of candidate states.
- Returns:
Unnormalized log probabilities for each candidate, with direct repeats set to negative infinity.
- Return type:
NDArray
- class NormalProbabilityKeepMinimalNDays(sigma, n_days_min)[source]
Bases:
ProbabilityModelProbability model that enforces a minimum time separation to the true next state.
This model computes probabilities using a Gaussian-like weighting, but sets the probability to zero (unnormalized probability negative infinity) for candidates that are within a specified number of days from the true next state.
- Parameters:
sigma (float)
n_days_min (int)
- __init__(sigma, n_days_min)[source]
Initialize the Probability Model with Minimal Time Separation.
- Parameters:
sigma (float) – Standard deviation for the similarity weighting. Must be a positive value.
n_days_min (int) – Minimum number of days required between candidate and next state.
- Raises:
AssertionError – If sigma is not a positive value.
- unnormalized_log_probability(similarities, coords_s_next, coords_candidates)[source]
Compute unnormalized log probabilities, excluding candidates too close in time.
- Parameters:
similarities (NDArray) – Similarity values for candidate states.
coords_s_next (xr.Dataset) – Coordinates of the true next state.
coords_candidates (xr.Dataset) – Coordinates of candidate states.
- Returns:
Unnormalized log probabilities for each candidate, with candidates too close in time set to negative infinity.
- Return type:
NDArray
- class NormalProbabilityModel(sigma)[source]
Bases:
ProbabilityModelProbability model assuming probabilities follow a normal distribution.
If combined with MSE similarities, this amounts to assuming a normal distribution centered at s_next with a given standard deviation sigma.
- Parameters:
sigma (float)
- __init__(sigma)[source]
Initialize the Normal Probability Model.
- Parameters:
sigma (float) – Standard deviation for the similarity weighting. Must be a positive value.
- Raises:
AssertionError – If sigma is not a positive value.
- unnormalized_log_probability(similarities, coords_s_next, coords_candidates)[source]
Compute unnormalized log probabilities based on similarities.
- Parameters:
similarities (NDArray) – Similarity values for candidate states.
coords_s_next (xr.Dataset) – Coordinates of the true next state (unused in this model).
coords_candidates (xr.Dataset) – Coordinates of candidate states (unused in this model).
- Returns:
Unnormalized log probabilities for each candidate.
- Return type:
NDArray
- class NormalProbabilityModelSeasonality(sigma_amplitude, normalized_sigma_climatology)[source]
Bases:
ProbabilityModelProbability model that incorporates seasonal variability in standard deviation.
This model adjusts the probability computation based on a sigma that varies with the day of the year, reflecting changes in the atmosphere over the year.
- Parameters:
sigma_amplitude (float)
normalized_sigma_climatology (DataArray)
- __init__(sigma_amplitude, normalized_sigma_climatology)[source]
Initialize the Seasonally Variable Probability Model.
- Parameters:
sigma_amplitude (float) – Amplitude factor for the climatological sigma.
normalized_sigma_climatology (xr.DataArray) – Normalized sigma values for each day of the year.
Notes
Sigma is split into amplitude and a normalized climatology to allow rescaling.
- unnormalized_log_probability(similarities, coords_s_next, coords_candidates)[source]
Compute unnormalized log probabilities with seasonal sigma adjustment.
- Parameters:
similarities (NDArray) – Similarity values for candidate states.
coords_s_next (xr.Dataset) – Coordinates of the true next state, ignored for this model.
coords_candidates (xr.Dataset) – Coordinates of candidate states, ignored for this model.
- Returns:
Unnormalized log probabilities for each candidate.
- Return type:
NDArray
- class NormalProbabilityNotLargerThanFixedDate(sigma, date_max)[source]
Bases:
ProbabilityModelProbability model that restricts candidates to a maximum date.
This model computes probabilities using a Gaussian-like weighting, but sets the probability to zero (unnormalized probability negative infinity) for candidates whose date is later than a specified maximum date.
- Parameters:
sigma (float)
date_max (datetime64)
- __init__(sigma, date_max)[source]
Initialize the Probability Model with Maximum Date Restriction.
- Parameters:
sigma (float) – Standard deviation for the similarity weighting. Must be a positive value.
date_max (np.datetime64) – Maximum allowed date for candidate states.
- Raises:
AssertionError – If sigma is not a positive value, or if no candidates exist before the maximum date.
- unnormalized_log_probability(similarities, coords_s_next, coords_candidates)[source]
Compute unnormalized log probabilities, excluding candidates after max date.
- Parameters:
similarities (NDArray) – Similarity values for candidate states.
coords_s_next (xr.Dataset) – Coordinates of the true next state (unused in this model).
coords_candidates (xr.Dataset) – Coordinates of candidate states.
- Returns:
Unnormalized log probabilities for each candidate, with candidates after the maximum date set to negative infinity.
- Return type:
NDArray
- Raises:
AssertionError – If no candidates exist before the maximum date.
- class ProbabilityModel(*args, **kwargs)[source]
Bases:
ABCAbstract base class for probability models used in analog sampling.
This class defines the interface for probability models that determine the likelihood of selecting an analog based on its similarity with the true next state. In some derived classes, restrictions on the coordinates of the candidate samples are imposed and probabilities are zero-ed if they aren’t fulfilled.
- Parameters:
args (Any)
kwargs (Any)
- abstractmethod __init__(*args, **kwargs)[source]
Initialize the probability model.
- Parameters:
*args (Any) – Positional arguments for model initialization.
**kwargs (Any) – Keyword arguments for model initialization.
- Return type:
None
- sample(rng, size, similarities, coords_s_next, coords_candidates)[source]
Sample analog states using the Gumbel-max trick.
- Parameters:
rng (np.random.Generator) – Random number generator.
size (int or tuple of int) – Number of samples to generate.
similarities (NDArray) – Similarity values for candidate states.
coords_s_next (xr.Dataset) – Coordinates of the true next state.
coords_candidates (xr.Dataset) – Coordinates of candidate states.
- Returns:
Indices of sampled analog states.
- Return type:
NDArray
- abstractmethod unnormalized_log_probability(coords_s_next, coords_candidates, similarities)[source]
Compute unnormalized log probabilities for candidate analogs.
- Parameters:
coords_s_next (xr.Dataset) – Coordinates of the true next state.
coords_candidates (xr.Dataset) – Coordinates of candidate states.
similarities (NDArray) – Similarity values for candidate states.
- Returns:
Unnormalized log probabilities for each candidate.
- Return type:
NDArray
- class UniformProbabilityModel[source]
Bases:
ProbabilityModelProbability model that assigns equal probability to all candidates.
This model treats all candidate states as equally likely, regardless of their similarities or coordinates.
- __init__()[source]
Initialize the Uniform Probability Model.
No parameters are required, as all candidates are treated equally.
- Return type:
None
- unnormalized_log_probability(similarities, coords_s_next, coords_candidates)[source]
Compute unnormalized log probabilities.
Returns a constant value of 1 for all candidates, effectively making them uniformly probable.
- Parameters:
similarities (NDArray) – Similarity values for candidate states (ignored in this model).
coords_s_next (xr.Dataset) – Coordinates of the true next state (ignored in this model).
coords_candidates (xr.Dataset) – Coordinates of candidate states (ignored in this model).
- Returns:
Constant array of ones, representing uniform probabilities.
- Return type:
NDArray
- gumbel_max_sample(unnormalized_logp, rng, size)[source]
Sample from a categorical distribution using the Gumbel-max trick.
This method provides an efficient way to sample from a categorical distribution by using the properties of the Gumbel distribution.
- Parameters:
unnormalized_logp (NDArray) – Unnormalized log probabilities for each category.
rng (np.random.Generator) – Random number generator.
size (int or tuple of int) – Number of samples to generate.
- Returns:
Indices of sampled categories.
- Return type:
NDArray
Notes
The Gumbel-max trick allows sampling from a categorical distribution without explicitly normalizing probabilities.
unseen_awg.similarity_measures module
Similarity measures for weather generator analog selection.
This module provides similarity measures used to compare reference points with candidate points in the weather generator’s analog selection process.
- mse_similarity(ref_point_data, cand_points_data, reduction_axes=(-3, -2, -1))[source]
Calculate similarity based on negative mean squared error.
Computes the similarity between a reference point and candidate points using the negative mean squared error. Lower MSE values result in higher (less negative) similarity scores.
- Parameters:
ref_point_data (NDArray[np.floating[Any]]) – Reference point data array.
cand_points_data (NDArray[np.floating[Any]]) – Candidate points data array with the same shape as ref_point_data or broadcastable to it.
reduction_axes (tuple[int, ...], optional) – Axes along which to compute the mean, by default (-3, -2, -1). These should correspond to (latitude, longitude, lag).
- Returns:
Similarity scores as negative MSE values. Higher values indicate greater similarity.
- Return type:
NDArray[np.floating[Any]]
Notes
The similarity is computed as: similarity = -mean((cand_points_data - ref_point_data)^2)
unseen_awg.simulate_trajectory module
Trajectory simulation module for weather generator within Snakemake workflows.
This module provides functionality to simulate weather trajectories using analog sampling methods with configurable probability models and time stepping approaches.
- simulate_trajectory(path_wg, probability_model, sigma, seed, n_days, blocksize, path_trajectory, n_days_min=None)[source]
Simulate and save a time series with a weather generator.
- Parameters:
path_wg (str) – Path of the directory of the weather generator.
probability_model (str) – Name of a probability model to be used when sampling analogs.
sigma (float) – Sigma parameter of the proability model used during sampling.
seed (int) – Seed for the random sampling of the time series.
n_days (int) – Number of days in the final time series. Gets rounded to conform with the selected blocksize.
blocksize (int) – Size of the contiguous blocks of states during the sampling of the time series.
path_trajectory (str) – Path to store the trajectory in.
n_days_min (int | None, optional) – To be used in combination with the “KeepMinimalNDays” probability model, avoids sampling states closer than n_days_min from the true sample. By default None.
- Raises:
ValueError – If an invalid name of a probability model was specified.
- Return type:
None
unseen_awg.time_steppers module
Time stepping strategies for weather generation simulations.
This module provides abstract and concrete implementations of time steppers that control how time progresses during weather generation. Different steppers handle various calendar systems and time representations.
- class FractionalYearStepper(blocksize, initial_time, reference_time, tropical_year=365.2422)[source]
Bases:
TimeStepperTime stepper that works with fractional year representations.
This stepper advances time using fractional year values.
- Parameters:
blocksize (int) – Number of days to advance in each time step.
initial_time (cftime.DatetimeGregorian) – Starting time for the simulation.
reference_time (cftime.DatetimeGregorian) – Reference time used for fractional year calculations.
tropical_year (float, optional) – Length of tropical year in days, by default 365.2422.
- blocksize
Number of days to advance in each time step.
- Type:
int
- initial_time
Starting time for the simulation.
- Type:
cftime.DatetimeGregorian
- reference_time
Reference time used for fractional year calculations.
- Type:
cftime.DatetimeGregorian
- tropical_year
Length of tropical year in days.
- Type:
float
- daily_increment
Fractional year increment per day.
- Type:
float
- current_year_fraction
Current position as fractional year.
- Type:
float
- __init__(blocksize, initial_time, reference_time, tropical_year=365.2422)[source]
Initialize the fractional year stepper.
- Parameters:
blocksize (int) – Number of days to advance in each time step.
initial_time (cftime.DatetimeGregorian) – Starting time for the simulation.
reference_time (cftime.DatetimeGregorian) – Reference time used for fractional year calculations.
tropical_year (float, optional) – Length of tropical year in days, by default 365.2422.
- Return type:
None
- class NoLeapYearStepper(init_year, init_month, init_day, blocksize, tropical_year=365.2422)[source]
Bases:
TimeStepperTime stepper using a no-leap-year calendar system.
This stepper advances time using a calendar without leap years, ensuring consistent 365-day years. The fractional year calculation wraps around at 365 days to maintain annual periodicity.
- Parameters:
init_year (int) – Initial year for the simulation.
init_month (int) – Initial month for the simulation.
init_day (int) – Initial day for the simulation.
blocksize (int) – Number of days to advance in each time step.
tropical_year (float, optional) – Length of tropical year in days, by default 365.2422.
- blocksize
Number of days to advance in each time step.
- Type:
int
- initial_time
Starting time for the simulation.
- Type:
cftime.DatetimeNoLeap
- ref_time
Reference time (2000-01-01) for fractional year calculations.
- Type:
cftime.DatetimeNoLeap
- tropical_year
Length of tropical year in days.
- Type:
float
- daily_increment
Fractional year increment per day.
- Type:
float
- initial_year_fraction
Initial position as fractional year.
- Type:
float
- current_year_fraction
Current position as fractional year.
- Type:
float
- current_time
Current datetime.
- Type:
cftime.DatetimeNoLeap
Notes
The fractional year calculation uses modulo 365 to ensure proper wrapping for the no-leap-year calendar system.
- __init__(init_year, init_month, init_day, blocksize, tropical_year=365.2422)[source]
Initialize the no-leap-year stepper.
- Parameters:
init_year (int) – Initial year for the simulation.
init_month (int) – Initial month for the simulation.
init_day (int) – Initial day for the simulation.
blocksize (int) – Number of days to advance in each time step.
tropical_year (float, optional) – Length of tropical year in days, by default 365.2422.
- Return type:
None
- __next__()[source]
Advance to the next time step.
- Returns:
Tuple containing current datetime and current fractional year before advancing.
- Return type:
tuple[cftime.DatetimeNoLeap, float]
Notes
The fractional year is calculated with modulo 365 to ensure proper wrapping for the no-leap-year calendar.
- class StandardStepper(init_year, init_month, init_day, blocksize, tropical_year=365.2422)[source]
Bases:
TimeStepperStandard time stepper using Gregorian calendar with leap years.
This stepper advances time using standard datetime objects and provides both datetime and fractional year representations.
- Parameters:
init_year (int) – Initial year for the simulation.
init_month (int) – Initial month for the simulation.
init_day (int) – Initial day for the simulation.
blocksize (int) – Number of days to advance in each time step.
tropical_year (float, optional) – Length of tropical year in days, by default 365.2422.
- blocksize
Number of days to advance in each time step.
- Type:
int
- initial_time
Starting time for the simulation.
- Type:
cftime.DatetimeGregorian
- ref_time
Reference time (2000-01-01) for fractional year calculations.
- Type:
cftime.DatetimeGregorian
- tropical_year
Length of tropical year in days.
- Type:
float
- daily_increment
Fractional year increment per day.
- Type:
float
- initial_year_fraction
Initial position as fractional year.
- Type:
float
- current_year_fraction
Current position as fractional year.
- Type:
float
- current_time
Current datetime.
- Type:
cftime.DatetimeGregorian
Notes
Uses cftime.DatetimeGregorian instead of standard datetime to avoid issues with time delta calculations in time conversion operations.
- __init__(init_year, init_month, init_day, blocksize, tropical_year=365.2422)[source]
Initialize the standard stepper.
- Parameters:
init_year (int) – Initial year for the simulation.
init_month (int) – Initial month for the simulation.
init_day (int) – Initial day for the simulation.
blocksize (int) – Number of days to advance in each time step.
tropical_year (float, optional) – Length of tropical year in days, by default 365.2422.
- Return type:
None
- class TimeStepper(blocksize)[source]
Bases:
ABCAbstract base class for time stepping strategies in weather generation.
This class defines a template for different time stepping methods, allowing flexible time progression for weather simulation.
- Parameters:
blocksize (int) – Number of days to advance in each time step.
- blocksize
Number of days to advance in each time step.
- Type:
int
- __init__(blocksize)[source]
Initialize the time stepper with a specified block size.
- Parameters:
blocksize (int) – Number of days to advance in each time step.
- Return type:
None
unseen_awg.timestep_utils module
- convert_to_cftime_gregorian(data)[source]
Convert data to use cftime with Gregorian calendar.
- Parameters:
data (xr.Dataset | xr.DataArray) – Input data to convert.
- Returns:
Data with cftime datetimes and Gregorian calendar.
- Return type:
xr.Dataset | xr.DataArray
- dayofyear_year_to_datetime64(dayofyear, year)[source]
Create numpy datetime64 from year and dayofyear.
- Parameters:
dayofyear (int) – Day of the year (1-366).
year (int) – Year value.
- Returns:
Corresponding datetime64 object.
- Return type:
np.datetime64
- dayofyear_year_to_datetime64_naive(dayofyear, year)[source]
Create numpy datetime64 from year and dayofyear, setting NaTs when dayofyear is out of range.
- Parameters:
dayofyear (int) – Day of the year (1-366).
year (int) – Year.
- Returns:
Corresponding datetime64 object, or NaT if invalid.
- Return type:
np.datetime64
- is_in_window_from_time(base_dates, other_dates, window_size, ref_time, tropical_year=365.2422)[source]
Check if dates are within a temporal window from base dates.
- Parameters:
base_dates (xr.DataArray) – Base dates for comparison.
other_dates (xr.DataArray) – Dates to check.
window_size (int) – Size of the window in days.
ref_time (np.datetime64) – Reference time for year fraction calculation.
tropical_year (float, optional) – Length of tropical year in days, by default 365.2422.
- Returns:
Boolean array indicating which dates fall within the window.
- Return type:
xr.DataArray
- is_in_window_from_year_fraction(base_year_fractions, other_dates, window_size, ref_time, tropical_year=365.2422)[source]
Check if dates are within a temporal window from base “fraction of years”.
- Parameters:
base_year_fractions (xr.DataArray | float) – Base year fractions for comparison.
other_dates (xr.DataArray) – Dates to check against the window.
window_size (int) – Size of the window in days.
ref_time (np.datetime64) – Reference time for year fraction calculation.
tropical_year (float, optional) – Length of tropical year in days, by default 365.2422.
- Returns:
Boolean DataArray indicating which dates fall within the window.
- Return type:
xr.DataArray
- time_to_year_fraction_cftime(time, ref_time, tropical_year=365.2422)[source]
Convert cftime datetimes to fraction of years relative to a reference time.
- Parameters:
time (xr.DataArray | xr.Dataset | cftime.datetime) – The time to convert to fractional year. Data should be of cftime.datetime type.
ref_time (xr.DataArray | xr.Dataset | cftime.datetime) – The reference time for the conversion. Data should be of cftime.datetime type.
tropical_year (float, optional) – The length of a tropical year in days, by default 365.2422.
- Returns:
The fractional year representation of the input time.
- Return type:
float
- time_to_year_fraction_np_datetime64(time, ref_time, tropical_year=365.2422)[source]
Convert datetime(s) to fraction of years relative to a reference time.
- Parameters:
time (xr.DataArray | xr.Dataset | np.datetime64) – The time to convert to fractional year. Data should be of numpy.datetime64 type.
ref_time (xr.DataArray | xr.Dataset | np.datetime64) – The reference time for the conversion. Data should be of numpy.datetime64 type.
tropical_year (float, optional) – The length of a tropical year in days, by default 365.2422.
- Returns:
The fractional year representation of the input time.
- Return type:
float
unseen_awg.tune_wg_by_forecasting module
Module for tuning weather generator parameters by optimizing forecast accuracy.
- analog_ensemble_forecast(wg, probability_model, initial_datapoint, lead_time, n_members, blocksize, rng, stepper_class=<class 'unseen_awg.time_steppers.StandardStepper'>)[source]
Generate an ensemble forecast using analog sampling.
- Parameters:
wg (WeatherGenerator) – Weather generator instance.
probability_model (ProbabilityModel) – Probability model for the weather generator’s analog selection.
initial_datapoint (InitTimeLeadTimeMemberState) – Initial state for the forecast.
lead_time (np.timedelta64) – Forecast lead time.
n_members (int) – Number of ensemble members to be created in the forecast.
blocksize (int) – Block size for weather generator sampling.
rng (np.random.Generator) – Random number generator.
stepper_class (TimeStepper, optional) – Time stepper class, by default StandardStepper.
- Returns:
Trajectories of ensemble forecast.
- Return type:
xr.Dataset
- crps_climatology(ground_truth, ds_t0, forecast_lead_time, ds_full, var)[source]
Calculate CRPS (Continuous Ranked Probability Score) for a climatology forecast.
- Parameters:
ground_truth (xr.Dataset) – Ground truth observations.
ds_t0 (xr.Dataset) – Initial dataset at time t0.
forecast_lead_time (np.timedelta64) – Lead time for the forecast.
ds_full (xr.Dataset) – Full dataset for climatology calculation.
var (str) – Variable name to compute CRPS for.
- Returns:
CRPS values for the climatology forecast.
- Return type:
xr.Dataset
- crps_persistence(ground_truth, ds_t0, forecast_lead_time, var)[source]
Calculate CRPS (Continuous Ranked Probability Score) for a persistence forecast.
- Parameters:
ground_truth (xr.Dataset) – Ground truth observations.
ds_t0 (xr.Dataset) – Initial dataset at time t0.
forecast_lead_time (np.timedelta64) – Lead time for the forecast.
var (str) – Variable name to compute CRPS for.
- Returns:
CRPS values for the persistence forecast.
- Return type:
xr.Dataset
- crps_wg_forecasts(ds, wg, n_members, probability_model, forecast_init_time, forecast_lead_time, blocksize, rng, stepper_class=<class 'unseen_awg.time_steppers.StandardStepper'>)[source]
Compute CRPS (Continuous Ranked Probability Score) for analog forecasts.
- Parameters:
ds (xr.Dataset) – Dataset the weather generator samples from.
wg (WeatherGenerator) – Weather generator instance.
n_members (int) – Number of members in the analog forecast ensemble.
probability_model (ProbabilityModel) – Probability model for analog selection.
forecast_init_time (np.datetime64) – Initial time for the analog forecast.
forecast_lead_time (np.timedelta64) – Lead time for the analog forecast.
blocksize (int) – Block size (days) for weather generator sampling.
rng (np.random.Generator) – Random number generator.
stepper_class (TimeStepper, optional) – Time stepper class, by default StandardStepper
- Returns:
CRPS for the weather generator’s forecasts.
- Return type:
xr.Dataset
- eval_analogue_forecast_skill(sigma, type_probability_model, wg, sampled_timesteps, ds, var, n_analogs, forecast_lead_time, blocksize, rng, stepper_class=<class 'unseen_awg.time_steppers.StandardStepper'>)[source]
Evaluate the skill of analog forecasts.
- Parameters:
sigma (float) – Standard deviation parameter for the probability model.
type_probability_model (str) – Type of probability model to use.
wg (WeatherGenerator) – Weather generator instance.
sampled_timesteps (xr.DataArray) – Timesteps to initialize forecasts from.
ds (xr.Dataset) – Dataset the weather generator samples from.
var (str) – Name of variable to evaluate.
n_analogs (int) – Number of analog forecasts to create.
forecast_lead_time (np.timedelta64) – Forecast lead time.
blocksize (int) – Block size for sampling.
rng (np.random.Generator) – Random number generator.
stepper_class (TimeStepper, optional) – Time stepper class, by default StandardStepper.
- Returns:
Mean CRPS value for the analog forecasts.
- Return type:
NDArray
- eval_climatology_persistence_forecast(sampled_timesteps, ds, ds_rechunk, var, forecast_lead_time)[source]
Evaluate climatology and persistence forecasts.
- Parameters:
sampled_timesteps (xr.DataArray) – Timesteps to initialize forecasts from.
ds (xr.Dataset) – Dataset the weather generator samples from.
ds_rechunk (xr.Dataset) – Rechunked dataset to allow faster computation of climatology forecasts.
var (str) – Name of variable to evaluate.
forecast_lead_time (np.timedelta64) – Forecast lead time.
- Returns:
Tuple of (climatology_CRPS, persistence_CRPS) arrays.
- Return type:
Tuple[NDArray, NDArray]
- eval_wg(seed, forecast_lead_time_days, n_analogs, sigma_min, sigma_max, use_log_scale, n_sampled_inits, min_timedelta_from_dataset_start, probability_model_type, var, blocksize, n_optuna_trials, path_ds, path_ds_rechunk, path_wg, path_study, ds_type)[source]
Evaluate weather generator in terms of its accuracy as ensemble forecast model.
This function also runs climatology and persistance baseline forecasts and conducts an optuna study to optimize parameters of the weather generator and the trajectory sampling.
- Parameters:
seed (int) – Seed used in sampling forecast initializations
forecast_lead_time_days (int) – Lead time of the analog forecasts in days.
n_analogs (int) – Number of ensemble members used in the analog ensemble forecast.
sigma_min (float) – Lower bound of range of possible values of weather generator’s sigma parameter.
sigma_max (float) – Upper bound of range of possible values of weather generator’s sigma parameter.
use_log_scale (bool) – Use a logarithmic scaling during the optimization of the parameter sigma.
n_sampled_inits (int) – Number of forecast initializations analog forecasts should be started from. Performance is averaged across initialization times.
min_timedelta_from_dataset_start (int) – In some settings, the weather generator only uses analogs from timesteps before the forecast initialization. To avoid having few/zero analogs available, choose initialization times only min_timedelta_from_dataset_start days after the start of the dataset.
probability_model_type (str) – A probability model to be used when sampling analogs.
var (str) – Name of the variable to be evaluated.
blocksize (int) – Blocksize of the simulated time series.
n_optuna_trials (int) – Number of optimization steps in optuna.
path_ds (str) – Path under which the dataset used during the evaluation is stored.
path_ds_rechunk (str) – Path for dataset similar to the path_ds one but use chunking along longitudes.
path_wg (str) – Path of the directory of the weather generator.
path_study (str) – Path to write the optuna study to.
ds_type (Literal["era5", "reforecasts"]) – Dataset used (ERA5 reanalysis or reforecasts).
- Raises:
ValueError – If the weather generator contains negative values in its lag dimension.
- Return type:
None
- get_gt(sel_valid_time, ds)[source]
Get ground truth data for a given valid time.
- Parameters:
sel_valid_time (np.datetime64) – The selected valid time.
ds (xr.Dataset) – The dataset containing the data.
- Returns:
Dataset with the ground truth data for the selected time.
- Return type:
xr.Dataset
- get_gt_coords(sel_valid_time, ds)[source]
For a given valid_time, get (ensemble_member, lead_time, init_time) that allow extracting the ground truth sample from the dataset.
- Parameters:
sel_valid_time (np.datetime64) – The selected valid time.
ds (xr.Dataset) – The dataset the weather generator is trained on.
- Returns:
Dictionary with ensemble_member, init_time, and lead_time keys.
- Return type:
dict[str, Any]
- Raises:
NotImplementedError – If the dataset type is not supported.
- get_n_valid_forecast_start_points(n, wg, rng, forecast_lead_time=numpy.timedelta64(0, 'D'), min_timedelta_from_dataset_start=numpy.timedelta64(366, 'D'), with_replacement=True, balance_months=True)[source]
Select n valid ground truth data points to start forecasts from.
- Parameters:
n (int) – Number of start points to select.
wg (WeatherGenerator) – Weather generator instance.
rng (np.random.Generator) – Random number generator used in the selection.
forecast_lead_time (np.timedelta64, optional) – Forecast lead time, by default np.timedelta64(0, “D”)
min_timedelta_from_dataset_start (np.timedelta64, optional) – Samples are only considered as valid if min_timedelta_from_dataset_start from the start of the dataset, by default np.timedelta64(366, “D”). This is because the analog weather generator relies only on analogs with valid_time smaller than the forecast start time. Therefore, on the end of the dataset, the set of analogs would be very small.
with_replacement (bool, optional) – Whether to sample with replacement, by default True.
balance_months (bool, optional) – Whether to balance months in sampling, by default True.
- Returns:
DataArray of valid forecast start points.
- Return type:
xr.DataArray
- persistence_forecast(ds_t0, lead_time)[source]
Create persistence a forecast.
- Parameters:
ds_t0 (xr.Dataset) – Initial dataset at time t0.
lead_time (np.timedelta64) – Lead time to add to the valid time.
- Returns:
Persistence forecast, i.e. initial data with modified valid_time coordinate.
- Return type:
xr.Dataset
unseen_awg.utils module
Utility functions for unseen-awg.
- apply_similarity_metric(ds_reference, ds_candidate, similarity_func, variable_name='geopotential_height', ref_core_dims=None, cand_core_dims=None, output_core_dims=None, reduction_axes_for_numpy=(-3, -2, -1), dask_options=None, **similarity_func_kwargs)[source]
Apply a similarity metric between reference point and candidates.
- Parameters:
ds_reference (xr.Dataset) – Reference dataset, expanded to match dimensions of candidates.
ds_candidate (xr.Dataset) – Candidate for the candidate states.
similarity_func – The similarity function to apply.
variable_name (str, optional) – Name of the variable to compare, by default “geopotential_height”.
ref_core_dims (list of str, optional) – Core dimensions (passed to xr.apply_ufunc) of the reference dataset, by default [“latitude”, “longitude”, “lag”].
cand_core_dims (list of str, optional) – Core dimensions (passed to xr.apply_ufunc) of the candidate dataset, by default [“c_year”, “c_sample”, “c_ensemble_member”, “latitude”, “longitude”, “lag”].
output_core_dims (list of str, optional) – Output core dimensions, by default [“c_year”, “c_sample”, “c_ensemble_member”].
reduction_axes_for_numpy (tuple of int, optional) – Axes to reduce when applying numpy function, by default (-3, -2, -1).
dask_options (dict, optional) – Dask options for apply_ufunc, by default None.
**similarity_func_kwargs – Additional keyword arguments for the similarity function.
- Returns:
Array of computed similarities.
- Return type:
xr.DataArray
- get_k_random_indices(arr, mask, k, rng)[source]
Get k random indices in an array after applying masking.
- Parameters:
arr (NDArray) – Array of values.
mask (NDArray) – Boolean mask indicating valid elements.
k (int) – Number of random indices to return.
rng (np.random.Generator) – Random number generator.
- Returns:
Array of k random indices.
- Return type:
NDArray
- get_k_smallest_indices(arr, mask, k)[source]
Get indices of k smallest values in an array after applying masking.
- Parameters:
arr (NDArray) – Array of values.
mask (NDArray) – Boolean mask indicating valid elements.
k (int) – Number of smallest indices to return.
- Returns:
Indices of k smallest values.
- Return type:
NDArray
- get_map_valid_n_day_transitions(da, n)[source]
Map valid n-day transitions in a dataset.
This function is used when sampling trajectories. In particular, it allows identifying the “next state” for each base state - and it allows identifying which states are actually valid samples to be included in the sampled time series (both the state and the next state are in data set).
- Parameters:
da (xr.Dataset) – Input dataset containing valid_time and sample dimensions.
n (int) – Number of days to look ahead.
- Returns:
Dataset with next sample, year, and dayofyear information.
- Return type:
xr.Dataset
- grids_are_identical_subset(source_ds, target_ds, coord_names=['latitude', 'longitude'])[source]
Check if target is a simple subset with IDENTICAL spacing
- Parameters:
source_ds (Dataset | DataArray)
target_ds (Dataset | DataArray)
- is_no_jump(traj)[source]
Check which steps in a trajectory are “jumps”.
For each pair of consecutive states in the trajectory, test whether the trajectory has a “jump”, i.e. the samples aren’t actually consecutive in the original dataset. For a reforecast dataset, not having a jump means having the same init_time and ensemble_member, while the second state in each pair has a lead_time 1 day larger than the first state.
- Parameters:
traj (xr.Dataset) – Sampled trajectory data.
- Returns:
Boolean DataArray indicating where there are no jumps.
- Return type:
NDArray
unseen_awg.weather_generator module
Weather generator module for analog-based weather simulation.
This module provides the core WeatherGenerator class that implements analog-based weather generation using similarity measures and probability models to sample realistic weather trajectories from historical data.
- class WeatherGenerator(params)[source]
Bases:
objectAnalog-based weather generator for creating synthetic weather trajectories.
This class implements an analog-based approach to weather generation, where weather states are sampled from historical data while assuring that successive states either follow each other in the historical dataset or are analogs of the true successor. The generator uses configurable similarity measures and probability models to create realistic weather sequences, additional parameters can be specified when sampling time series.
- Parameters:
params (dict[str, Any]) –
Configuration parameters containing:
- weather_generator.window_sizeint
Half-window size within which states are considered as potential analogs.
- weather_generator.varstr
Name of variable to use for similarity calculations.
- weather_generator.similaritystr
Name of similarity function to use.
- weather_generator.use_precomputed_similaritiesbool
Whether to use precomputed similarities or not. If True, similarities are precomputed when a WeatherGenerator instance is initialized. Otherwise, a lazy array is set up and similarities are computed on-the-fly during the sampling process. This slows down the sampling process.
- weather_generator.n_samplesint
Number of samples to use from dataset. By “sample” we denote datapoints that possess the same valid_time (but a different init_time). Providing a low n_samples allows restricting the number of included states.
- dir_wgstr
Directory path for weather generator outputs.
- zarr_year_dayofyearstr
Path to zarr store containing the preprocessed input dataset.
- window_size
Half-window size within which states are considered as potential analogs.
- Type:
int
- var
Name of variable to use for similarity calculations.
- Type:
str
- similarity_function
Function used to compute similarities between states.
- Type:
callable
- path_wg
Path to weather generator working directory.
- Type:
str
- path_dataset
Path to input dataset.
- Type:
str
- use_precomputed_similarities
Flag indicating whether to use precomputed similarities.
- Type:
bool
- ds_similarities
Dataset containing results of similarity computations. Is initialized as lazy dataset and computed during initialization if use_precomputed_similarities is True.
- Type:
xr.Dataset
- get_analog_data(queries, use_candidate_coords=False)[source]
Retrieve analog weather data for specified query coordinates.
Extracts weather data from the dataset at the coordinates specified in the query array, either using the query coordinates directly or the candidate coordinates.
- Parameters:
queries (xr.DataArray) – Query array containing coordinate information.
use_candidate_coords (bool, optional) – Whether to pick the sample according to provided coordinates of a candidate state or of a base state. by default False.
- Returns:
Weather data at the specified coordinates.
- Return type:
xr.Dataset
- get_initial_state(initialization, map_n_step_transition, blocksize, rng)[source]
Get initial state to start sampling a trajectory from.
Determines the starting state for weather generation, either from a specified initialization or by random selection from set of valid states.
- Parameters:
initialization (InitTimeLeadTimeMemberState | None) – Specific initialization state, or None for random selection.
map_n_step_transition (xr.Dataset) – Mapping of allowed n-day transitions between states.
blocksize (int) – Number of days in each time block.
rng (np.random.Generator) – Random number generator for random initialization.
- Returns:
Initial state for trajectory generation.
- Return type:
xr.Dataset
- Raises:
ValueError – If the specified initialization is invalid.
AssertionError – If the initialization state is not found in valid transitions.
- get_similarities_k_closest_neighbors(states, k, minimum_timedelta_days=None, dim_states=None)[source]
Get the k closest neighbors based on similarity measures.
Finds the k most similar historical states to the given query states based on (precomputed) similarity measures.
- Parameters:
states (xr.DataArray) – Query states to find neighbors for.
k (int) – Number of closest neighbors to return.
minimum_timedelta_days (int | None, optional) – Minimum time separation in days between query and candidate states, that allows excluding analogs that are temporally close to the base state if this is undesired. By default None, i.e. no restriction.
dim_states (str | None, optional) – Dimension name for states, by default None.
- Returns:
Dataset containing the k closest neighbor states and their similarities.
- Return type:
xr.Dataset
- get_similarities_k_random_neighbors(states, k, rng, minimum_timedelta_days=None, dim_states=None)[source]
Get k randomly selected neighbors from valid candidates.
Randomly selects k historical states from valid candidates that meet the specified temporal constraints.
- Parameters:
states (xr.DataArray) – Query states to find neighbors for.
k (int) – Number of random neighbors to return.
rng (np.random.Generator) – Random number generator for sampling.
minimum_timedelta_days (int | None, optional) – Minimum time separation in days between query and candidate states, by default None.
dim_states (str | None, optional) – Dimension name for states, by default None.
- Returns:
Dataset containing k randomly selected neighbor states.
- Return type:
xr.Dataset
- classmethod load(wg_path)[source]
Load a WeatherGenerator instance from saved configuration.
- Parameters:
wg_path (str) – Path to directory containing saved weather generator configuration.
- Returns:
Loaded weather generator instance.
- Return type:
- plot_k_nearest_and_random_neighbors(state, k, rng, minimum_timedelta_days=None, vmin=450, vmax=600, minor_spacing_contours=10, major_spacing_contours=30)[source]
Create comparison plot of nearest neighbors vs random neighbors.
Generates a visualization comparing the k nearest neighbors and k random neighbors for a given weather state, showing the base state, and the random and nearest neighbors (analogs) among the candidates and side by side.
- Parameters:
state (xr.Dataset) – Reference weather state to find neighbors for.
k (int) – Number of neighbors to display.
rng (np.random.Generator) – Random number generator for random neighbor selection.
minimum_timedelta_days (int | None, optional) – Minimum time separation constraint, by default None.
vmin (float, optional) – Minimum value for color scale, by default 450.
vmax (float, optional) – Maximum value for color scale, by default 600.
minor_spacing_contours (float, optional) – Spacing for minor contour lines, by default 10.
major_spacing_contours (float, optional) – Spacing for major contour lines, by default 30.
- Returns:
Figure containing the comparison plots.
- Return type:
matplotlib.figure.Figure
- sample_trajectory(blocksize, probability_model, stepper_class, n_steps, rng, initialization=None, start_by_taking_analog=False, show_progressbar=False)[source]
Sample a synthetic weather trajectory using the analog method.
Generates a weather trajectory by iteratively sampling analog states from historical data. The sampling alternates between following a historical trajectory and sampling analogs of the true successor states - so that in effect blocks of size blocksize are sampled while for the transition between blocks close analogs of the “true” state that would follow each block are chosen.
- Parameters:
blocksize (int) – Number of days to sample contiguously from the same historical trajectory.
probability_model (ProbabilityModel) – Model defining sampling probabilities given similarities between base states and candidate states.
stepper_class (type[TimeStepper]) – Class for managing the output time assigned to each sample in the resulting trajectory. This is used as a means for supporting different calendars in sampled datasets.
n_steps (int) – Number of sampling steps to perform, not necessarily equal to the length of the sampled series in days.
rng (np.random.Generator) – Random number generator for sampling.
initialization (InitTimeLeadTimeMemberState | None, optional) – Initial state specification, by default None (random initialization).
start_by_taking_analog (bool, optional) – Whether to start by taking an analog of the initial state, by default False.
show_progressbar (bool, optional) – Whether to display progress bar, by default False.
- Returns:
Generated weather trajectory with time series of sampled states.
- Return type:
xr.Dataset
- sampling_step(next_state, next_year_fraction, map_n_step_transition, probability_model, rng)[source]
Perform analog sampling step to select next weather state.
Samples an analog state from historical data based on similarity to the true next state and according to distribution and constraints defined by the probability_model.
- Parameters:
next_state (xr.Dataset) – True next state in underlying historic dataset.
next_year_fraction (float) – Year fraction of next sample. Used to define temporal similarity rather than an actual calender date to simplify calendar handling.
map_n_step_transition (xr.Dataset) – Mapping of allowed n-day transitions between states.
probability_model (ProbabilityModel) – Model defining sampling probabilities given similarities.
rng (np.random.Generator) – Random number generator for sampling.
- Returns:
Sampled analog state for the next time step.
- Return type:
xr.Dataset
- time_evolution_step(trajectory, current_block_start_state, map_n_step_transition, stepper, blocksize)[source]
Perform one time evolution step in trajectory generation.
Advances the trajectory by one block of time steps, following the evolution in the underlying historical data set starting from current_state.
- Parameters:
trajectory (list[xr.Dataset]) – List of trajectory states to append new states to.
current_block_start_state (xr.Dataset) – State to start current state from.
map_n_step_transition (xr.Dataset) – Mapping of allowed n-day transitions between states.
stepper (TimeStepper) – Time stepper instance for managing temporal progression.
blocksize (int) – Number of days in each time block.
- Returns:
Next state and corresponding year fraction.
- Return type:
tuple[xr.Dataset, float]
- main(snakemake)[source]
Main function for weather generator execution in Snakemake workflow.
Initializes and runs the weather generator with parameters from Snakemake, handling logging and parameter management for the workflow execution.
- Parameters:
snakemake (Any) – Snakemake object containing input/output paths, parameters, and logging configuration.
- Return type:
None
- setup_lazy_similarity_dataset(ds_year_dayofyear_format, window_size, ref_time=numpy.datetime64('2000-01-01T00:00:00.000000000'))[source]
Set up a lazy dataset to store similarities computed in weather generator in.
Creates a dataset structure for computing similarities between weather states within a specified time window. The dataset includes coordinates for reference states and candidate states with time shifts.
The dataset has dimensions that identify the base sample (dayofyear, year, sample, ensemble member) and additional dimensions that identify the candidate (d_shift, c_year, c_sample, c_ensemble_member). The valid_time of the candidate state can be computed from c_year, dayofyear, and d_shift.
- Parameters:
ds_year_dayofyear_format (xr.Dataset) – Input dataset in year-dayofyear format containing weather data.
window_size (int) – Size of the time window (in days) for similarity computations.
ref_time (np.datetime64, optional) – Reference time for temporal calculations, by default np.datetime64(“2000-01-01”, “ns”).
- Returns:
Lazy dataset with similarity computation structure including coordinates for reference and candidate states.
- Return type:
xr.Dataset